Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Mahout, mail # dev - Options in TrainClassifier.java


+
Joe Kumar 2010-09-19, 00:57
+
Joe Kumar 2010-09-19, 12:06
+
Gangadhar Nittala 2010-09-20, 03:13
+
Ted Dunning 2010-09-20, 03:25
+
Joe Kumar 2010-09-20, 06:38
+
Robin Anil 2010-09-20, 10:31
+
Joe Kumar 2010-09-20, 17:09
+
Joe Kumar 2010-09-21, 02:30
+
Gangadhar Nittala 2010-09-21, 03:13
+
Gangadhar Nittala 2010-09-24, 02:43
+
Joe Kumar 2010-09-24, 12:44
+
Gangadhar Nittala 2010-09-26, 13:28
+
Gangadhar Nittala 2010-10-07, 04:22
+
Ted Dunning 2010-10-07, 16:57
+
Gangadhar Nittala 2010-10-07, 21:44
+
Ted Dunning 2010-09-21, 05:41
+
Joe Kumar 2010-09-20, 05:14
Copy link to this message
-
Re: Options in TrainClassifier.java
deneche abdelhakim 2010-09-20, 05:45
I don't know if it's related, but I remember getting a similar
Exception one year ago when I was  working on the implementation of
Random Forests. In my case it was caused by
SequenceFile.Sorter.merge(). I ended up writing my own merge function
because I really didn't need to sort the output.

On Mon, Sep 20, 2010 at 6:14 AM, Joe Kumar <[EMAIL PROTECTED]> wrote:
> Gangadhar,
>
> Just to eliminate the usual suspects, I am using Mac OSX 10.5.8, Mahout 0.4
> (revision 986659), Hadoop 0.20.2, 2GB Mem for Hadoop , 80 GB free space.
> commands tat I executed.
>
> I had issues with my namenode and so did a format using hadoop namenode
> -format.
> $MAHOUT_HOME/examples/src/test/resources/country.txt had just 1 entry
> (spain). I havent tried with multiple entries.
>
> $> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
> org.apache.mahout.classifier.bayes.WikipediaXmlSplitter -d
> $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles10.xml -o
> wikipedia/chunks -c 64
>
> $> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver -i
> wikipedia/chunks -o wikipediainput -c
> $MAHOUT_HOME/examples/src/test/resources/country.txt
>
> $> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
> org.apache.mahout.classifier.bayes.TrainClassifier -i wikipediainput -o
> wikipediamodel  -type bayes -source hdfs
>
> $> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
> org.apache.mahout.classifier.bayes.TestClassifier -m wikipediamodel -d
>  wikipediainput  -ng 3 -type bayes -source hdfs
>
> Please try the above and let me know. we'll try and find out what is going
> wrong.
> Reg,
> Joe.
>
> On Sun, Sep 19, 2010 at 11:13 PM, Gangadhar Nittala <[EMAIL PROTECTED]
>> wrote:
>
>> Joe,
>> Even I tried with reducing the number of countries in the country.txt.
>> That didn't help. And in my case, I was monitoring the disk space and
>> at no time did it reach 0%. So, I am not sure if that is the case. To
>> remove the dependency on the number of countries, I even tried with
>> the subjects.txt as the classification - that also did not help.
>> I think this problem is due to the type of the data being processed,
>> but what I am not sure of is what I need to change to get the data to
>> be processed successfully.
>>
>> The experienced folks on Mahout will be able to tell us what is missing I
>> guess.
>>
>> Thank you
>> Gangadhar
>>
>> On Sun, Sep 19, 2010 at 8:06 AM, Joe Kumar <[EMAIL PROTECTED]> wrote:
>> > Gangadhar,
>> >
>> > I modified $MAHOUT_HOME/examples/src/test/resources/country.txt to just
>> have
>> > 1 entry (spain) and used WikipediaDatasetCreatorDriver to create the
>> > wikipediainput data set and then ran TrainClassifier and it worked. when
>> I
>> > ran TestClassifier as below, I got blank results in the output.
>> >
>> > $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
>> > org.apache.mahout.classifier.bayes.TestClassifier -m wikipediamodel -d
>> >  wikipediainput  -ng 3 -type bayes -source hdfs
>> >
>> > Summary
>> > -------------------------------------------------------
>> > Correctly Classified Instances          :          0         ?%
>> > Incorrectly Classified Instances        :          0         ?%
>> > Total Classified Instances              :          0
>> >
>> > ======================================================>> > Confusion Matrix
>> > -------------------------------------------------------
>> > a     <--Classified as
>> > 0     |  0     a     = spain
>> > Default Category: unknown: 1
>> >
>> > I am not sure if I am doing something wrong.. have to figure out why my
>> o/p
>> > is so blank.
>> > I'll document these steps and mention about country.txt in the wiki.
>> >
>> > Question to all
>> > Should we have 2 country.txt
>> >
>> >   1. country_full_list.txt - this is the existing list
>> >   2. country_sample_list.txt - a list with 2 or 3 countries
+
Joe Kumar 2010-09-15, 04:56
+
Robin Anil 2010-09-15, 05:10
+
Joe Kumar 2010-09-15, 05:16
+
Gangadhar Nittala 2010-09-16, 01:41
+
Joe Kumar 2010-09-16, 02:20
+
Joe Kumar 2010-09-17, 03:34
+
Gangadhar Nittala 2010-09-18, 00:36
+
Joe Kumar 2010-09-18, 03:33
+
Gangadhar Nittala 2010-09-18, 16:36