Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - Using TFIDF instead of TF Vectors in LDA


Copy link to this message
-
Using TFIDF instead of TF Vectors in LDA
ivan obeso 2012-06-19, 08:06
Hi all,

Im using the 0.6 version of Mahout, and I have read that the LDA
implementation of the algorithm in this version can work with TFIDF vectors
as well as TF vectors. The problem is that
DocumentProcessor.tokenizeDocuments and
DictionaryVectorizer.createTermFrequencyVectors uses sequencefiles formed
by Text as key and Text as value.

Now, i want to use TFIDFConverter.calculateDF and
TFIDFConverter.processTfIdf but this methods uses VectorWritable as value
in the sequencefile. Am I doing the things in the right way?
How can I transform the Text sequencefile into VectorWritable sequencefile?

I get the next exception:
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
org.apache.mahout.math.VectorWritable