Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Lucene, mail # user - filter by term frequency


Copy link to this message
-
Re: filter by term frequency
Jack Krupansky 2012-06-16, 21:26
If you were a *Solr* user, I could say "try the 'termfreq' function query":

    termfreq(field,term) returns the number of times the term appears in the
field for that document.
    Example Syntax: termfreq(text,'memory')

See:
http://wiki.apache.org/solr/FunctionQuery#tf

Lucene does have "FunctionQuery", "ValueSource", and "TermFreqValueSource".

See:
http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html

-- Jack Krupansky

-----Original Message-----
From: Mike Sokolov
Sent: Saturday, June 16, 2012 2:33 PM
To: [EMAIL PROTECTED]
Subject: filter by term frequency

I imagine this is a question that comes up from time to time, but I
haven't been able to find a definitive answer anywhere, so...

I'm wondering whether there is some type of Lucene query that filters by
term frequency.   For example, suppose I want to find all documents that
have exactly 2 occurrences of some word.  I know that the frequency is
stored and used in scoring , but I don't think it is exposed in a simple
way at the query level.  It looks to me as if CustomScoreQuery might be
a convenient way to monkey with scores?  But it doesn't seem to use that
for filtering, just sorting.  Perhaps a Collector could then impose a
score threshold later? Any suggestions here?

-Mike

---------------------------------------------------------------------
---------------------------------------------------------------------