Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - Relevance score - Classification


Copy link to this message
-
Re: Relevance score - Classification
Lance Norskog 2011-11-27, 03:15
Solr is an application-level wrapper for Lucene. Carrot2 is a fine
clustering system, and Solr has an integration for it. You can do a lot of
research quickly using this combination of tools.

On Thu, Nov 24, 2011 at 10:50 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:

> +1 to Tanton's wise words.
>
> On Thu, Nov 24, 2011 at 9:56 AM, Tanton Gibbs <[EMAIL PROTECTED]
> >wrote:
>
> > Hi Faizan,
> >
> > It seems like you have an IR problem where the query is a document (and
> the
> > documents are documents, too).
> >
> > Have you looked a Lucene?  Seems like that would be a good starting
> point.
> >  After you have done that, then I would come back to clustering (which it
> > seems you are wanting to do here).  You could add the generated cluster
> ids
> > as unique terms in your index and then ensure you always match the
> cluster
> > term, but then the IR features would help correctly rank the documents
> with
> > that clustered term.
> >
> > On Wednesday, November 23, 2011, Faizan(Aroha) <
> > [EMAIL PROTECTED]>
> > wrote:
> > > We are trying to implement relevant search(using machine learning) at a
> > > website where we have 3 million visitors a week.. and 150k blog posts a
> > > single day.
> > >
> > > We are currently in the planning phase,  so we are trying several
> > different
> > > approaches.
> > >
> > > I will take the news group dataset example to explain my situation :
> > >
> > > Let's say , we apply the classifier on a new document X that may belong
> > to
> > > "rec.sport.baseball", we know that 397 documents in our collection that
> > have
> > > been correctly identified by the classifier.
> > >
> > > When we apply the classifier on X, the classifier should bring back a
> > result
> > > with the list of documents that are sorted  in a way that the top most
> > > document is most relevant document to the query (document X) and the
> last
> > > document is the most irrelevant one.
> > >
> > > and in order to do the above stated, we need to devise a way where we
> can
> > > use these classifiers for information retrieval
> > >
> > > The classifier should be used as a retrieval algorithm where it will
> > first
> > > compute relevance scores for all the documents and produce a ranking.
> > When
> > > that retrieval algorithm is applied to an individual query document ,
> it
> > > will bring back a set of documents that are sorted in a way that the
> top
> > > most documents are the most relevant one to the blog post and the last
> > > document is the most irrelevant one.
> > >
> > > This is a little background.
> > >
> > > Thanks.
> > >
> > >
> > > -----Original Message-----
> > > From: Isabel Drost [mailto:[EMAIL PROTECTED]]
> > > Sent: Thursday, November 24, 2011 1:47 AM
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: Relevance score - Classification
> > >
> > > On 23.11.2011 Faizan(Aroha) wrote:
> > >> We are working on using Classification as a Search.
> > >>
> > >> I want to compute the relevance score of the output which is generated
> > >> by the Naive Bayes Classifier or some other classifier.
> > >>
> > >> Please give any guideline/hint!
> > >
> > > Can you please provide some more background to your use case? Which
> > > documents do you want to search? How is relevance defined in your
> > setting?
> > >
> > > Isabel
> > >
> > >
> >
>

--
Lance Norskog
[EMAIL PROTECTED]