Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Mahout, mail # user - question on VectorWritable convertor in elephant-bird.


+
Yohan Chin 2012-05-15, 06:43
+
Ted Dunning 2012-05-15, 06:57
Copy link to this message
-
Re: question on VectorWritable convertor in elephant-bird.
Andy Schlaikjer 2012-05-15, 14:01
Yohan, that's a typo in VectorWritableConverter javadoc. I'll update today.

The SequenceFileStorage and ...Loader classes are in separate packages:

com.twitter.elephantbird.pig.*load*.SequenceFileLoader<https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java>
com.twitter.elephantbird.pig.*store*.SequenceFileStorage<https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java>

Both of these classes rely on the
WritableConverter<https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/WritableConverter.java>interface.
They classload converters at runtime, given the classname of the
converters you'd like to use for key and value Writable instances. When
dealing with SequenceFile<IntWritable, VectorWritable> data, do this:

{{{

%declare SEQFILE_LOADER
'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare INT_CONVERTER
'com.twitter.elephantbird.pig.util.IntWritableConverter';
%declare VECTOR_CONVERTER
'com.twitter.elephantbird.pig.mahout.VectorWritableConverter';

pair = LOAD '$INPUT_PATH' USING $SEQFILE_LOADER (
  '-c $INT_CONVERTER',
  '-c $VECTOR_CONVERTER -- -sparse'
);

}}}

Hope this helps!

Andy
On Mon, May 14, 2012 at 11:57 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> Sounds like a class path issue.
>
> Sent from my iPhone
>
> On May 15, 2012, at 2:43 AM, Yohan Chin <[EMAIL PROTECTED]> wrote:
>
>>
>> Hi,
>> Recently, I've tried to utilize elephant-bird for loading mahout result
into pig.
>> I could install elephant-bird and got .jar file.
>> and followed instructions as appears in below; (written by Andy
Schlaikjer)
>>
https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/mahout/VectorWritableConverter.java
>> ex)
>> pair = LOAD '$data' USING
com.twitter.elephantbird.pig.store.SequenceFileLoader (
>> '-c $INT_CONVERTER',
>> '-c $VECTOR_CONVERTER -- -dense -cardinality 2'
>> );
>> however,  there is no sequenceFileLoader in store folder,  and
load/sequencefileloader.java doesn't import
"com.twitter.elephantbird.pig.mahout.VectorWritableConverter"
>>
>> Is there any points I've missed?
>>
>> Thanks a lot for this awesome api!
>>
+
Yohan Chin 2012-05-15, 14:59
+
Andy Schlaikjer 2012-05-15, 15:15
+
Andy Schlaikjer 2012-05-15, 15:29
+
Yohan Chin 2012-05-15, 21:15