Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - How to train naive bayes classifier using text files


Copy link to this message
-
Re: How to train naive bayes classifier using text files
Lance Norskog 2012-02-16, 03:50
Look at the Naive Bayes example in examples/bin/asf-examples.sh.

You would skip the first stage where it parses email files, and
instead just create Hadoop SequenceFiles with a key of the filename
and the value is a Text field with the contents of the value.

On Wed, Feb 15, 2012 at 7:38 AM, Felipe Ferreira
<[EMAIL PROTECTED]> wrote:
> Hi All,
>
>
> I am newbie using apache mahout and I don't know how to train and test
> mahout naive bayes classifier with a set of text files. In my scenario I
> have five directories labeled as text categories with its files into
> them. They are organized as follow:
>
> *Training dataSet:*
>
> - Category1
>    - file1.txt
>    - file2.txt
>    ...
>
> - Category2
>    - file10.txt
>    - file11.txt
>    ....
>
> *Test dataset:*
>
>  - Category1
>    - file1.txt
>    - file2.txt
>    ...
>
> - Category2
>    - file10.txt
>    - file11.txt
>    ....
>
> I would like to train and test Naive Bayes classifier using these two
> datasets (train and test).
>
>
> Questions:
>
> 1- What are the necessary steps to do that ?
> 2- How can I collect statics informations as weka shows ?
>
>
> Cheers,
>
>
> Felipe Ferreira.

--
Lance Norskog
[EMAIL PROTECTED]