|
|
-
How to train naive bayes classifier using text files
Felipe Ferreira 2012-02-15, 15:38
Hi All, I am newbie using apache mahout and I don't know how to train and test mahout naive bayes classifier with a set of text files. In my scenario I have five directories labeled as text categories with its files into them. They are organized as follow:
*Training dataSet:*
- Category1 - file1.txt - file2.txt ...
- Category2 - file10.txt - file11.txt ....
*Test dataset:*
- Category1 - file1.txt - file2.txt ...
- Category2 - file10.txt - file11.txt ....
I would like to train and test Naive Bayes classifier using these two datasets (train and test). Questions:
1- What are the necessary steps to do that ? 2- How can I collect statics informations as weka shows ? Cheers, Felipe Ferreira.
-
Re: How to train naive bayes classifier using text files
Lance Norskog 2012-02-16, 03:50
Look at the Naive Bayes example in examples/bin/asf-examples.sh.
You would skip the first stage where it parses email files, and instead just create Hadoop SequenceFiles with a key of the filename and the value is a Text field with the contents of the value.
On Wed, Feb 15, 2012 at 7:38 AM, Felipe Ferreira <[EMAIL PROTECTED]> wrote: > Hi All, > > > I am newbie using apache mahout and I don't know how to train and test > mahout naive bayes classifier with a set of text files. In my scenario I > have five directories labeled as text categories with its files into > them. They are organized as follow: > > *Training dataSet:* > > - Category1 > - file1.txt > - file2.txt > ... > > - Category2 > - file10.txt > - file11.txt > .... > > *Test dataset:* > > - Category1 > - file1.txt > - file2.txt > ... > > - Category2 > - file10.txt > - file11.txt > .... > > I would like to train and test Naive Bayes classifier using these two > datasets (train and test). > > > Questions: > > 1- What are the necessary steps to do that ? > 2- How can I collect statics informations as weka shows ? > > > Cheers, > > > Felipe Ferreira.
-- Lance Norskog [EMAIL PROTECTED]
-
Re: How to train naive bayes classifier using text files
Lance Norskog 2012-02-16, 03:55
I take that back. Follow the Naive Bayes example in examples/bin/asf-examples.sh. You'll have to examine the files in each stage to figure out exactly what to pass in.
On Wed, Feb 15, 2012 at 7:50 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Look at the Naive Bayes example in examples/bin/asf-examples.sh. > > You would skip the first stage where it parses email files, and > instead just create Hadoop SequenceFiles with a key of the filename > and the value is a Text field with the contents of the value. > > On Wed, Feb 15, 2012 at 7:38 AM, Felipe Ferreira > <[EMAIL PROTECTED]> wrote: >> Hi All, >> >> >> I am newbie using apache mahout and I don't know how to train and test >> mahout naive bayes classifier with a set of text files. In my scenario I >> have five directories labeled as text categories with its files into >> them. They are organized as follow: >> >> *Training dataSet:* >> >> - Category1 >> - file1.txt >> - file2.txt >> ... >> >> - Category2 >> - file10.txt >> - file11.txt >> .... >> >> *Test dataset:* >> >> - Category1 >> - file1.txt >> - file2.txt >> ... >> >> - Category2 >> - file10.txt >> - file11.txt >> .... >> >> I would like to train and test Naive Bayes classifier using these two >> datasets (train and test). >> >> >> Questions: >> >> 1- What are the necessary steps to do that ? >> 2- How can I collect statics informations as weka shows ? >> >> >> Cheers, >> >> >> Felipe Ferreira. > > > > -- > Lance Norskog > [EMAIL PROTECTED]
-- Lance Norskog [EMAIL PROTECTED]
-
Re: How to train naive bayes classifier using text files
Felipe Ferreira 2012-02-16, 17:44
Thank you Lance Norskog. I am going to try understand this example and to do the same steps in my project. After that, I would like to compare mahout tests results with weka tests results because I already performed a training session using the same datasets in weka and I got 89 % of correctly classified instances. Do you think I would get the same percentage in mahout ? On Wed, Feb 15, 2012 at 11:55 PM, Lance Norskog <[EMAIL PROTECTED]> wrote:
> I take that back. Follow the Naive Bayes example in > examples/bin/asf-examples.sh. You'll have to examine the files in each > stage to figure out exactly what to pass in. > > On Wed, Feb 15, 2012 at 7:50 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > > Look at the Naive Bayes example in examples/bin/asf-examples.sh. > > > > You would skip the first stage where it parses email files, and > > instead just create Hadoop SequenceFiles with a key of the filename > > and the value is a Text field with the contents of the value. > > > > On Wed, Feb 15, 2012 at 7:38 AM, Felipe Ferreira > > <[EMAIL PROTECTED]> wrote: > >> Hi All, > >> > >> > >> I am newbie using apache mahout and I don't know how to train and test > >> mahout naive bayes classifier with a set of text files. In my scenario I > >> have five directories labeled as text categories with its files into > >> them. They are organized as follow: > >> > >> *Training dataSet:* > >> > >> - Category1 > >> - file1.txt > >> - file2.txt > >> ... > >> > >> - Category2 > >> - file10.txt > >> - file11.txt > >> .... > >> > >> *Test dataset:* > >> > >> - Category1 > >> - file1.txt > >> - file2.txt > >> ... > >> > >> - Category2 > >> - file10.txt > >> - file11.txt > >> .... > >> > >> I would like to train and test Naive Bayes classifier using these two > >> datasets (train and test). > >> > >> > >> Questions: > >> > >> 1- What are the necessary steps to do that ? > >> 2- How can I collect statics informations as weka shows ? > >> > >> > >> Cheers, > >> > >> > >> Felipe Ferreira. > > > > > > > > -- > > Lance Norskog > > [EMAIL PROTECTED] > > > > -- > Lance Norskog > [EMAIL PROTECTED] >
-
Re: How to train naive bayes classifier using text files
Lance Norskog 2012-02-17, 00:20
I have not used the Weka tools for this.
On Thu, Feb 16, 2012 at 9:44 AM, Felipe Ferreira <[EMAIL PROTECTED]> wrote: > Thank you Lance Norskog. I am going to try understand this example and to > do the same steps in my project. After that, I would like to compare mahout > tests results with weka tests results because I already performed a > training session using the same datasets in weka and I got 89 % of > correctly classified instances. Do you think I would get the same > percentage in mahout ? > On Wed, Feb 15, 2012 at 11:55 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > >> I take that back. Follow the Naive Bayes example in >> examples/bin/asf-examples.sh. You'll have to examine the files in each >> stage to figure out exactly what to pass in. >> >> On Wed, Feb 15, 2012 at 7:50 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: >> > Look at the Naive Bayes example in examples/bin/asf-examples.sh. >> > >> > You would skip the first stage where it parses email files, and >> > instead just create Hadoop SequenceFiles with a key of the filename >> > and the value is a Text field with the contents of the value. >> > >> > On Wed, Feb 15, 2012 at 7:38 AM, Felipe Ferreira >> > <[EMAIL PROTECTED]> wrote: >> >> Hi All, >> >> >> >> >> >> I am newbie using apache mahout and I don't know how to train and test >> >> mahout naive bayes classifier with a set of text files. In my scenario I >> >> have five directories labeled as text categories with its files into >> >> them. They are organized as follow: >> >> >> >> *Training dataSet:* >> >> >> >> - Category1 >> >> - file1.txt >> >> - file2.txt >> >> ... >> >> >> >> - Category2 >> >> - file10.txt >> >> - file11.txt >> >> .... >> >> >> >> *Test dataset:* >> >> >> >> - Category1 >> >> - file1.txt >> >> - file2.txt >> >> ... >> >> >> >> - Category2 >> >> - file10.txt >> >> - file11.txt >> >> .... >> >> >> >> I would like to train and test Naive Bayes classifier using these two >> >> datasets (train and test). >> >> >> >> >> >> Questions: >> >> >> >> 1- What are the necessary steps to do that ? >> >> 2- How can I collect statics informations as weka shows ? >> >> >> >> >> >> Cheers, >> >> >> >> >> >> Felipe Ferreira. >> > >> > >> > >> > -- >> > Lance Norskog >> > [EMAIL PROTECTED] >> >> >> >> -- >> Lance Norskog >> [EMAIL PROTECTED] >>
-- Lance Norskog [EMAIL PROTECTED]
|
|