| clear query|facets|time |
Search criteria: .
Results from 51 to 60 from
102 (0.259s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Canopies and RowSimilarity - Mahout - [mail # user]
|
|
...Uploaded a patch that only deletes the temp output if -ow has been specified. ________________________________ From: Sebastian Schelter To: [EMAIL PROTECTED] S...
|
|
|
Author: Suneel Marthi,
2012-05-07, 12:39
|
|
|
Re: Canopies and RowSimilarity - Mahout - [mail # user]
|
|
...1. Please take a look at MAHOUT-834 for the -ow option, there is a patch available and is pebnding review.. 2. Please take a look at MAHOUT-979 for calculating the number of columns fr...
|
|
|
Author: Suneel Marthi,
2012-05-07, 12:02
|
|
|
Re: Recommended way to consume Nutch data in Mahout - Mahout - [mail # user]
|
|
...You may want to look at Tika's HtmlParser to strip out all the HTML tags and return only the raw text content from the crawled pages. This could then be written out to sequence files with t...
|
|
|
Author: Suneel Marthi,
2012-04-16, 15:04
|
|
|
Re: RowSimilarityJob - Mahout - [mail # user]
|
|
...I should have been more elaborate in my previous reply. RowId job creates a matrix which is of type and a docIndex docIndex is a map of the rowId to the keys genera...
|
|
|
Author: Suneel Marthi,
2012-03-20, 18:52
|
|
|
Re: RowSimilarityJob - Mahout - [mail # user]
|
|
...Docindex is ur answer Sent from my iPhone On Mar 20, 2012, at 12:28 PM, Pat Ferrel wrote: ...
|
|
|
Author: Suneel Marthi,
2012-03-20, 17:41
|
|
|
Re: How to find the k most similar docs - Mahout - [mail # user]
|
|
...Pat, MatrixDump expects an input file of . The matrix that gets created from RowIdJob is and you cannot run MatrixDump to see the contents of the matrix. You need to use...
|
|
|
Author: Suneel Marthi,
2012-03-09, 12:26
|
|
|
Re: Minhash review - Mahout - [mail # dev]
|
|
...That's correct. ________________________________ From: Frank Scholten To: [EMAIL PROTECTED] Sent: Thursday, March 8, 2012 4:17 AM Subject: Re: Minhash review &...
|
|
|
Author: Suneel Marthi,
2012-03-08, 12:44
|
|
|
Re: Minhash review - Mahout - [mail # dev]
|
|
...Frank, I modified the present MinHash to hash on the index as opposed to the present tf-idf weights, but the change had no impact on the output and I still get bad clusters. I di...
|
|
|
Author: Suneel Marthi,
2012-03-08, 07:22
|
|
|
Re: How to find the k most similar docs - Mahout - [mail # user]
|
|
...Did the RowSimilarityJob execute successfully? Your output should have been one or more part-r-* files (depending on the number of reducers you have configured in ur environment). &nb...
|
|
|
Author: Suneel Marthi,
2012-03-07, 02:25
|
|
|
Re: How to find the k most similar docs - Mahout - [mail # user]
|
|
...Pat, Your input to RowSimilarity seems to be the tfidf-vectors directory which is . Before executing the RowSimilarity job u need to run the RowIdJob which creates a matrix of . ...
|
|
|
Author: Suneel Marthi,
2012-03-05, 19:48
|
|
|
|