Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - How to train naive bayes classifier using text files


Copy link to this message
-
Re: How to train naive bayes classifier using text files
Lance Norskog 2012-02-16, 03:55
I take that back. Follow the Naive Bayes example in
examples/bin/asf-examples.sh. You'll have to examine the files in each
stage to figure out exactly what to pass in.

On Wed, Feb 15, 2012 at 7:50 PM, Lance Norskog <[EMAIL PROTECTED]> wrote:
> Look at the Naive Bayes example in examples/bin/asf-examples.sh.
>
> You would skip the first stage where it parses email files, and
> instead just create Hadoop SequenceFiles with a key of the filename
> and the value is a Text field with the contents of the value.
>
> On Wed, Feb 15, 2012 at 7:38 AM, Felipe Ferreira
> <[EMAIL PROTECTED]> wrote:
>> Hi All,
>>
>>
>> I am newbie using apache mahout and I don't know how to train and test
>> mahout naive bayes classifier with a set of text files. In my scenario I
>> have five directories labeled as text categories with its files into
>> them. They are organized as follow:
>>
>> *Training dataSet:*
>>
>> - Category1
>>    - file1.txt
>>    - file2.txt
>>    ...
>>
>> - Category2
>>    - file10.txt
>>    - file11.txt
>>    ....
>>
>> *Test dataset:*
>>
>>  - Category1
>>    - file1.txt
>>    - file2.txt
>>    ...
>>
>> - Category2
>>    - file10.txt
>>    - file11.txt
>>    ....
>>
>> I would like to train and test Naive Bayes classifier using these two
>> datasets (train and test).
>>
>>
>> Questions:
>>
>> 1- What are the necessary steps to do that ?
>> 2- How can I collect statics informations as weka shows ?
>>
>>
>> Cheers,
>>
>>
>> Felipe Ferreira.
>
>
>
> --
> Lance Norskog
> [EMAIL PROTECTED]

--
Lance Norskog
[EMAIL PROTECTED]