Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Lucene, mail # user - PayloadNearQuery and AveragePayloadFunction


+
shyama 2012-02-02, 16:57
Copy link to this message
-
Re: PayloadNearQuery and AveragePayloadFunction
Peter Keegan 2012-02-02, 21:39
I don't quite follow what you're doing, but is it possible that your
payloads are not on the desired terms when you indexed them? The first
explanation shows that the matching document contained "luteinizing
hormone" in both fields 'AbstractText' and 'AbstractTitle'. The average
payload value was '3.0', so either both terms had payloads that averaged
3.0 or only one had a payload of 3.0. In the 2nd query, the phrase was
found in both fields again, but no payloads were found (thus the 1.0).
According to your 'scorePayload' method, the first match would return 3
only if semantic=A. But the Similarity class is associated with an
IndexReader, so the same 'semantic' would be used for all queries.

Peter
On Thu, Feb 2, 2012 at 11:57 AM, shyama <[EMAIL PROTECTED]> wrote:

> Hi List
> Apologies for such a long message. I have tried to include everything, that
> you might need to know to answer my question.
>
> I am having difficulties understanding how or what AveragePayloadFunction
> is
> doing. Here is my example
>
> Title:Human|9 pineal|5 luteinizing hormone receptors.
> Text:The presence of luteinizing hormone receptors in human|9 pineal|5
> glands from five females and three males, ranging in age from 61-89 yr, was
> examined by in situ hybridization and immunocytochemistry. The results
> demonstrated the presence of these receptors at the mRNA|7 and protein
> levels in all the pineal|5 glands examined. Pineal|5 gland luteinizing
> hormone receptors could potentially be involved in the regulation of
> melatonin|7 synthesis.
>
> 3 is for class A
> 5 is for class B
> 7 is for class C
> 9 is for class D
> These are the payloads stored in the index. But when I search, I use these
> values for encoding term class, and then return 3 for selected class.
>
> I am using WhiteSpaceTokenizer and LowerCaseFilter. In my PayloadSimilarity
> class, I manipulate payload in a way so that, if I am interested in class
> A,
> it will return payload value "x=3" only for terms in class A, I decide term
> class by checking its payload value.
>
> Now, I query for "luteinizing hormone" using PayloadNearQuery with slop of
> 5. First I try with interest in class B and next with interest in class A.
>
> *Result of Class A interest:*
>
> Explain: 10.97332 = (MATCH) sum of:
>  2.5589073 = (MATCH) weight(payloadNear([AbstractText:luteinizing,
> AbstractText:hormone], 5, true) in 5362133), product of:
>    0.68000716 = queryWeight(payloadNear([AbstractText:luteinizing,
> AbstractText:hormone], 5, true)), product of:
>      14.045828 = idf(AbstractText:  luteinizing=15481 hormone=164637)
>      0.048413463 = queryNorm
>    3.7630591 = (MATCH) fieldWeight(AbstractText:payloadNear([luteinizing,
> hormone], 5, true) in 5362133), product of:
>      2.4494898 = PayloadNearQuery, product of:
>        0.8164966 = tf(phraseFreq=0.6666667)
>        *3.0 = AveragePayloadFunction(...)*
>      14.045828 = idf(AbstractText:  luteinizing=15481 hormone=164637)
>      0.109375 = fieldNorm(field=AbstractText, doc=5362133)
>  8.4144125 = (MATCH) weight(payloadNear([ArticleTitle:luteinizing,
> ArticleTitle:hormone], 5, true) in 5362133), product of:
>    0.7332054 = queryWeight(payloadNear([ArticleTitle:luteinizing,
> ArticleTitle:hormone], 5, true)), product of:
>      15.144659 = idf(ArticleTitle:  hormone=86980 luteinizing=9765)
>      0.048413463 = queryNorm
>    11.476201 = (MATCH) fieldWeight(ArticleTitle:payloadNear([luteinizing,
> hormone], 5, true) in 5362133), product of:
>      1.7320508 = PayloadNearQuery, product of:
>        0.57735026 = tf(phraseFreq=0.33333334)
>       * 3.0 = AveragePayloadFunction(...)*
>      15.144659 = idf(ArticleTitle:  hormone=86980 luteinizing=9765)
>      0.4375 = fieldNorm(field=ArticleTitle, doc=5362133)
> ---------------------------------------------------------------------
>
> *Result of Class B Interest:*
>
> Explain: 3.657773 = (MATCH) sum of:
>  0.85296905 = (MATCH) weight(payloadNear([AbstractText:luteinizing,
> AbstractText:hormone], 5, true) in 5362133), product of:
+
shyama 2012-02-03, 09:13
+
Peter Keegan 2012-02-03, 13:35
+
shyama 2012-02-03, 16:50
+
Peter Keegan 2012-02-03, 17:28