|
|
-
Re: Text categorization / classificationLance Norskog 2010-10-27, 23:59
There are tools for this in the Mahout project. These are oriented
toward large-scale work. http://mahout.apache.org There is a big learning curve and you have to learn Hadoop somewhat. The book 'Collective Intelligence' includes a suite of Python tools for small-scale experiments. On Wed, Oct 27, 2010 at 1:12 PM, Maria Vazquez <[EMAIL PROTECTED]> wrote: > I need to auto-categorize a large number of documents. They are basically news articles from major news sources (nytimes, npr, abcnews, etc). > I'd like to categorize them automatically. Any suggestions? > Lucene in Action suggests using a set of documents to build category vectors and then comparing each document to each of those vectors and get the closest one. > The approach seems pretty simple (from other papers I read on text categorization) but maybe you guys know of something out there that already does this using Lucene/Solr. > Thanks! > Maria > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Lance Norskog [EMAIL PROTECTED] --------------------------------------------------------------------- |