|
|
-
Using TFIDF instead of TF Vectors in LDAivan obeso 2012-06-19, 08:06
Hi all,
Im using the 0.6 version of Mahout, and I have read that the LDA implementation of the algorithm in this version can work with TFIDF vectors as well as TF vectors. The problem is that DocumentProcessor.tokenizeDocuments and DictionaryVectorizer.createTermFrequencyVectors uses sequencefiles formed by Text as key and Text as value. Now, i want to use TFIDFConverter.calculateDF and TFIDFConverter.processTfIdf but this methods uses VectorWritable as value in the sequencefile. Am I doing the things in the right way? How can I transform the Text sequencefile into VectorWritable sequencefile? I get the next exception: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.mahout.math.VectorWritable |