Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Lucene, mail # dev - Re: Diversifying Search Results - Custom Collector


Copy link to this message
-
Re: Diversifying Search Results - Custom Collector
Karthick Duraisamy Sounda... 2012-08-21, 13:31
Hi Lance,
                Thanks for your response. Wouldnt randomizing affect
relevancy? Maybe I should explain my problem better:

              Lets say there are 1000 matches for a search of "Sofas". For
the sake of simplcity, lets assume all of these 1000 matches(1000 sofas)
have same Merchant. Then, the solution you suggest and tanguy suggest on
randomizing the result order would be perfect. However, my case is
different. My case is that, out of these 1000 matches, there are about 100
unique manufacturer and each of them make 10 sofas each. So now, whenever
one sofa from a particular manufacturer is displayed, other sofas from the
manufacturer is appearing close together as well. Please note that the
problem is not about relevancy as sofas are very relevant but just because
they are described the more or less the same way with same words which make
them appear close together in the result set.

Thats why I want to have a policy while sorting which is something like *"Find
all the highest scoring document for each manufacuturer in the current
result set and place them ahead of the rest. Here as you can see, the idea
is to display one product from each unique manufacturer first"*. Now to
decide how many unique manufacturer to show before the normal ordering can
be determined relative to the total number of unique manufacturers. Like
for example, if there are 90 unique manufacturers, display products from 45
(approx 50%) first before displaying the rest of the products.

Does this make sense?

Thanks,
Karthick

On Mon, Aug 20, 2012 at 5:05 PM, Lance Norskog <[EMAIL PROTECTED]> wrote:

> If you do the same search twice in a row, the second search takes < 3
> ms. Try finding your base result set and then augmenting it with a
> second search within the first result set.
> You can sort from a function call. Sorting is multi-level, so you can
> make one of the levels random.
>
> Does this app have to support paging the search list? If so, do you
> plan to do a second search for the next 5 results?  Complex results
> shuffling can make this hard. Also, I don't know exactly how random
> works, whether it generates the same random order twice. This would
> make paging impossible.
>
> On Mon, Aug 20, 2012 at 1:52 PM, Karthick Duraisamy Soundararaj
> <[EMAIL PROTECTED]> wrote:
> > Hi Mikhail,
> >                   You are correct.  "[+] show 6 result.."  will work but
> it
> > wouldn't suit my requirements. This is a question of user experience
> right?
> >
> > Imagine if the product manager comes to you and says I dont want to see
> > "[+] show 6 result.." and I want the results to be diverse but should be
> > showed like any other search results.
> >
> > I think grouping does this by two pass collection. First pass, it figures
> > out all the groups and then in the second  pass, it collects the results
> > into these groups.
> >
> >
> > Thanks,
> > Karthick
> >
> > On Mon, Aug 20, 2012 at 3:24 PM, Mikhail Khludnev
> > <[EMAIL PROTECTED]> wrote:
> >>
> >> Hello,
> >>
> >> I don't believe your task can be solved by playing with
> scoring/collector
> >> or shuffling.
> >> For me it's absolutely Grouping usecase (despite I don't really know
> this
> >> feature well).
> >>
> >> > Grouping cannot solve the problem because I dont want to limit the
> >> > number of results showed based on the grouping field.
> >>
> >> I'm not really getting it. why you can set limit to 11 and just show the
> >> labels like "[+] show 6 result.." or if you have 11 "[+] show more than
> 10
> >> .."
> >>
> >> If you experience problem with constructing search result page, I can
> >> suggest submit search request with rows=0&facet.field=BRAND, then your
> >> algorithm can choose number of necessary items per every brand and
> submit
> >> rows=X&fq=BRAND:Y it gives you arbitrarily sizes for "groups".
> >>
> >> Will this work for you?
> >>
> >>
> >> On Mon, Aug 20, 2012 at 8:28 PM, Karthick Duraisamy Soundararaj
> >> <[EMAIL PROTECTED]> wrote:
**
*
*