|
|
-
Re: Diversifying Search Results - Custom Collector
Tanguy Moal 2012-08-20, 16:01
Hello, I don't know if that could help, but if I understood your issue, you have a lot of documents with the same or very close scores. Moreover I think you get your matches in Merchant order (more or less) because they must be indexed in that very same order, so solr returns documents of same scores in insertion order (although there is no contract specifying this) You could work around that issue by : 1/ Turning off tf/idf because you're searching in documents with little text where only the match counts, but frequencies obviously aren't helping. 2/ Add a random number to each document at index time, and boost on that random value at query time, this will shuffle your results, that's probably the simplest thing to do. Hope this helps, Tanguy 2012/8/20 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> > Hello Mikhail, > Thank you for the reply. In terms of user > experience, I want to spread out the products from same brand farther from > each other, *atleast* in the first 50-100 results we display. I am > thinking about two different approaches as solution. > > 1. For first few results, display one top scoring > product of a manufacturer (For a given field, display the top scoring > results of the unique field values for the first N matches) . This N could > be either a percentage relative to total matches or a configurable absolute > value. > 2. Enforce a penalty on the score for the results > that have duplicate field values. The penalty can be enforced such a way > that, the results with higher scores will not be affected as against the > ones with lower score. > > Both of the solutions can be implemented while sorting the documents with > TopFieldCollector / TopScoreDocCollector. > > Does this answer your question? Please let me know if you have any more > questions. > > Thanks, > Karthick > > On Mon, Aug 20, 2012 at 3:26 AM, Mikhail Khludnev < > [EMAIL PROTECTED]> wrote: > >> Hello, >> >> I've got the problem description below. Can you explain the expected user >> experience, and/or solution approach before diving into the algorithm >> design? >> >> Thanks >> >> >> On Sat, Aug 18, 2012 at 2:50 AM, Karthick Duraisamy Soundararaj < >> [EMAIL PROTECTED]> wrote: >> >>> My problem is that when there are a lot of documents representing >>> products, >>> products from same manufacturer seem to appear in close proximity in the >>> results and therefore, it doesnt provide brand diversity. When you search >>> for sofas, you get sofas from a manufacturer A dominating the first page >>> while the sofas from manufacturer B dominating the second page, etc. The >>> issue here is that a manufacturer tends to describes the different sofas >>> he >>> produces the same way and therefore there is a very little difference >>> between the documents representing two sofas. >>> >> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> Tech Lead >> Grid Dynamics >> >> < http://www.griddynamics.com>>> <[EMAIL PROTECTED]> >> >> > >
-
Re: Diversifying Search Results - Custom Collector
Lance Norskog 2012-08-20, 21:05
If you do the same search twice in a row, the second search takes < 3 ms. Try finding your base result set and then augmenting it with a second search within the first result set.
You can sort from a function call. Sorting is multi-level, so you can make one of the levels random.
Does this app have to support paging the search list? If so, do you plan to do a second search for the next 5 results? Complex results shuffling can make this hard. Also, I don't know exactly how random works, whether it generates the same random order twice. This would make paging impossible.
On Mon, Aug 20, 2012 at 1:52 PM, Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> wrote: > Hi Mikhail, > You are correct. "[+] show 6 result.." will work but it > wouldn't suit my requirements. This is a question of user experience right? > > Imagine if the product manager comes to you and says I dont want to see > "[+] show 6 result.." and I want the results to be diverse but should be > showed like any other search results. > > I think grouping does this by two pass collection. First pass, it figures > out all the groups and then in the second pass, it collects the results > into these groups. > > > Thanks, > Karthick > > On Mon, Aug 20, 2012 at 3:24 PM, Mikhail Khludnev > <[EMAIL PROTECTED]> wrote: >> >> Hello, >> >> I don't believe your task can be solved by playing with scoring/collector >> or shuffling. >> For me it's absolutely Grouping usecase (despite I don't really know this >> feature well). >> >> > Grouping cannot solve the problem because I dont want to limit the >> > number of results showed based on the grouping field. >> >> I'm not really getting it. why you can set limit to 11 and just show the >> labels like "[+] show 6 result.." or if you have 11 "[+] show more than 10 >> .." >> >> If you experience problem with constructing search result page, I can >> suggest submit search request with rows=0&facet.field=BRAND, then your >> algorithm can choose number of necessary items per every brand and submit >> rows=X&fq=BRAND:Y it gives you arbitrarily sizes for "groups". >> >> Will this work for you? >> >> >> On Mon, Aug 20, 2012 at 8:28 PM, Karthick Duraisamy Soundararaj >> <[EMAIL PROTECTED]> wrote: >>> >>> Tanguy, >>> You idea is perfect for cases where there is a too many >>> documents with 80-90% documents having same value for a particular field. As >>> an example, your idea is ideal for, lets say we have 10 documents in total >>> like this, >>> >>> doc1 : <merchantName> Kellog's </merchantName> >>> doc2 : <merchantName> Kellog's </merchantName> >>> doc3 : <merchantName> Kellog's </merchantName> >>> doc4 : <merchantName> Kellog's </merchantName> >>> doc5 : <merchantName> Kellog's </merchantName> >>> doc6 : <merchantName> Kellog's </merchantName> >>> doc7 : <merchantName> Kellog's </merchantName> >>> doc8 : <merchantName> Nestle </merchantName> >>> doc9 : <merchantName> Kellog's </merchantName> >>> doc10 : <merchantName> Kellog's </merchantName> >>> >>> But I have >>> doc1 : <merchantName> Maggi </merchantName> >>> doc2 : <merchantName> Maggi </merchantName> >>> doc3 : <merchantName> M&M's </merchantName> >>> doc4 : <merchantName> M&M's </merchantName> >>> doc5 : <merchantName> Hershey's </merchantName> >>> doc6 : <merchantName> Hershey's </merchantName> >>> doc7 : <merchantName> Nestle </merchantName> >>> doc8 : <merchantName> Nestle </merchantName> >>> doc9 : <merchantName> Kellog's </merchantName> >>> doc10 : <merchantName> Kellog's </merchantName> >>> >>> >>> Thanks, >>> Karthick >>> >>> On Mon, Aug 20, 2012 at 12:01 PM, Tanguy Moal <[EMAIL PROTECTED]> >>> wrote: >>>> >>>> Hello, >>>> >>>> I don't know if that could help, but if I understood your issue, you >>>> have a lot of documents with the same or very close scores. Moreover I think >>>> you get your matches in Merchant order (more or less) because they must be >>>> indexed in that very same order, so solr returns documents of same scores in
Lance Norskog [EMAIL PROTECTED]
-
Re: Diversifying Search Results - Custom Collector
Karthick Duraisamy Sounda... 2012-08-21, 13:31
Hi Lance, Thanks for your response. Wouldnt randomizing affect relevancy? Maybe I should explain my problem better:
Lets say there are 1000 matches for a search of "Sofas". For the sake of simplcity, lets assume all of these 1000 matches(1000 sofas) have same Merchant. Then, the solution you suggest and tanguy suggest on randomizing the result order would be perfect. However, my case is different. My case is that, out of these 1000 matches, there are about 100 unique manufacturer and each of them make 10 sofas each. So now, whenever one sofa from a particular manufacturer is displayed, other sofas from the manufacturer is appearing close together as well. Please note that the problem is not about relevancy as sofas are very relevant but just because they are described the more or less the same way with same words which make them appear close together in the result set.
Thats why I want to have a policy while sorting which is something like *"Find all the highest scoring document for each manufacuturer in the current result set and place them ahead of the rest. Here as you can see, the idea is to display one product from each unique manufacturer first"*. Now to decide how many unique manufacturer to show before the normal ordering can be determined relative to the total number of unique manufacturers. Like for example, if there are 90 unique manufacturers, display products from 45 (approx 50%) first before displaying the rest of the products.
Does this make sense?
Thanks, Karthick
On Mon, Aug 20, 2012 at 5:05 PM, Lance Norskog <[EMAIL PROTECTED]> wrote:
> If you do the same search twice in a row, the second search takes < 3 > ms. Try finding your base result set and then augmenting it with a > second search within the first result set. > You can sort from a function call. Sorting is multi-level, so you can > make one of the levels random. > > Does this app have to support paging the search list? If so, do you > plan to do a second search for the next 5 results? Complex results > shuffling can make this hard. Also, I don't know exactly how random > works, whether it generates the same random order twice. This would > make paging impossible. > > On Mon, Aug 20, 2012 at 1:52 PM, Karthick Duraisamy Soundararaj > <[EMAIL PROTECTED]> wrote: > > Hi Mikhail, > > You are correct. "[+] show 6 result.." will work but > it > > wouldn't suit my requirements. This is a question of user experience > right? > > > > Imagine if the product manager comes to you and says I dont want to see > > "[+] show 6 result.." and I want the results to be diverse but should be > > showed like any other search results. > > > > I think grouping does this by two pass collection. First pass, it figures > > out all the groups and then in the second pass, it collects the results > > into these groups. > > > > > > Thanks, > > Karthick > > > > On Mon, Aug 20, 2012 at 3:24 PM, Mikhail Khludnev > > <[EMAIL PROTECTED]> wrote: > >> > >> Hello, > >> > >> I don't believe your task can be solved by playing with > scoring/collector > >> or shuffling. > >> For me it's absolutely Grouping usecase (despite I don't really know > this > >> feature well). > >> > >> > Grouping cannot solve the problem because I dont want to limit the > >> > number of results showed based on the grouping field. > >> > >> I'm not really getting it. why you can set limit to 11 and just show the > >> labels like "[+] show 6 result.." or if you have 11 "[+] show more than > 10 > >> .." > >> > >> If you experience problem with constructing search result page, I can > >> suggest submit search request with rows=0&facet.field=BRAND, then your > >> algorithm can choose number of necessary items per every brand and > submit > >> rows=X&fq=BRAND:Y it gives you arbitrarily sizes for "groups". > >> > >> Will this work for you? > >> > >> > >> On Mon, Aug 20, 2012 at 8:28 PM, Karthick Duraisamy Soundararaj > >> <[EMAIL PROTECTED]> wrote: ** * *
-
Re: Diversifying Search Results - Custom Collector
Tanguy Moal 2012-08-21, 14:33
Hello Karthick,
2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]>
> *"Find all the highest scoring document for each manufacuturer in the > current result set and place them ahead of the rest. Here as you can see, > the idea is to display one product from each unique manufacturer first"*. > Now to decide how many unique manufacturer to show before the normal > ordering can be determined relative to the total number of unique > manufacturers. Like for example, if there are 90 unique manufacturers, > display products from 45 (approx 50%) first before displaying the rest of > the products. >
That's exactly what grouping will do. At least for the first sentence. You can ask for many items in each group, display only the first and store the others "somewhere", for later use. When you reach your "merchant representation threshold" (say 50% of total number of groups) then you can start picking the items you stored "somewhere" to display them at randomly chosen positions. That won't help pagination, though.
Could that help you ?
-
Re: Diversifying Search Results - Custom Collector
Karthick Duraisamy Sounda... 2012-08-21, 15:03
Hello Tanguy, I need pagination. The problem with your approach is that, to achieve pagination, you need to do a sort at application level for sorting rather than at the solr level which I think would become messy. Do you see a way around this?
Thanks, Karthick
On Tue, Aug 21, 2012 at 10:33 AM, Tanguy Moal <[EMAIL PROTECTED]> wrote:
> Hello Karthick, > > 2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> > >> *"Find all the highest scoring document for each manufacuturer in the >> current result set and place them ahead of the rest. Here as you can see, >> the idea is to display one product from each unique manufacturer first"*. >> Now to decide how many unique manufacturer to show before the normal >> ordering can be determined relative to the total number of unique >> manufacturers. Like for example, if there are 90 unique manufacturers, >> display products from 45 (approx 50%) first before displaying the rest of >> the products. >> > > That's exactly what grouping will do. At least for the first sentence. You > can ask for many items in each group, display only the first and store the > others "somewhere", for later use. When you reach your "merchant > representation threshold" (say 50% of total number of groups) then you can > start picking the items you stored "somewhere" to display them at randomly > chosen positions. That won't help pagination, though. > > Could that help you ? >
-
Re: Diversifying Search Results - Custom Collector
Tanguy Moal 2012-08-21, 15:32
Sorry then, my approach really disables pagination jumps. You're left with the 'next' button only, or an "infinite-scroll" type of pagination, which may not be what you wanted to do...
Did you try disabling tf/idf and using random field as a secondary sort ? I'm pretty sure it will give you the best results with best efforts. If you like the results and need re-playable results sets, remember to store the name you gave to your dynamic random field so you can re-use it later on.
-- Tanguy
2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]>
> Hello Tanguy, > I need pagination. The problem with your approach > is that, to achieve pagination, you need to do a sort at application level > for sorting rather than at the solr level which I think would become messy. > Do you see a way around this? > > Thanks, > Karthick > > > On Tue, Aug 21, 2012 at 10:33 AM, Tanguy Moal <[EMAIL PROTECTED]>wrote: > >> Hello Karthick, >> >> 2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> >> >>> *"Find all the highest scoring document for each manufacuturer in the >>> current result set and place them ahead of the rest. Here as you can see, >>> the idea is to display one product from each unique manufacturer first"*. >>> Now to decide how many unique manufacturer to show before the normal >>> ordering can be determined relative to the total number of unique >>> manufacturers. Like for example, if there are 90 unique manufacturers, >>> display products from 45 (approx 50%) first before displaying the rest of >>> the products. >>> >> >> That's exactly what grouping will do. At least for the first sentence. >> You can ask for many items in each group, display only the first and store >> the others "somewhere", for later use. When you reach your "merchant >> representation threshold" (say 50% of total number of groups) then you can >> start picking the items you stored "somewhere" to display them at randomly >> chosen positions. That won't help pagination, though. >> >> Could that help you ? >> > > > > >
-
Re: Diversifying Search Results - Custom Collector
Karthick Duraisamy Sounda... 2012-08-21, 16:04
On Tue, Aug 21, 2012 at 11:32 AM, Tanguy Moal <[EMAIL PROTECTED]> wrote:
> Sorry then, my approach really disables pagination jumps. You're left with > the 'next' button only, or an "infinite-scroll" type of pagination, which > may not be what you wanted to do...
You are right. > Did you try disabling tf/idf and using random field as a secondary sort > ? I'm pretty sure it will give you the best results with best efforts. > I was little nervous about turning off idf as I was concerned it might affect relevancy. Considering that idf promotes documents with unique words across the index more than the ones, turning off idf might make sense in an ecommerce application. What do you think?
But then, I dont get idea of random field for secondary sort. Secondary sort is applicable only in the cases where the scores/primary sort values are tied right? So I am not quite sure as how it would fit in here.
--
> Tanguy > > 2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> > >> Hello Tanguy, >> I need pagination. The problem with your approach >> is that, to achieve pagination, you need to do a sort at application level >> for sorting rather than at the solr level which I think would become messy. >> Do you see a way around this? >> >> Thanks, >> Karthick >> >> >> On Tue, Aug 21, 2012 at 10:33 AM, Tanguy Moal <[EMAIL PROTECTED]>wrote: >> >>> Hello Karthick, >>> >>> 2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> >>> >>>> *"Find all the highest scoring document for each manufacuturer in the >>>> current result set and place them ahead of the rest. Here as you can see, >>>> the idea is to display one product from each unique manufacturer first" >>>> *. Now to decide how many unique manufacturer to show before the >>>> normal ordering can be determined relative to the total number of unique >>>> manufacturers. Like for example, if there are 90 unique manufacturers, >>>> display products from 45 (approx 50%) first before displaying the rest of >>>> the products. >>>> >>> >>> That's exactly what grouping will do. At least for the first sentence. >>> You can ask for many items in each group, display only the first and store >>> the others "somewhere", for later use. When you reach your "merchant >>> representation threshold" (say 50% of total number of groups) then you can >>> start picking the items you stored "somewhere" to display them at randomly >>> chosen positions. That won't help pagination, though. >>> >>> Could that help you ? >>> >> >> >> >> >> >
-
Re: Diversifying Search Results - Custom Collector
Mikhail Khludnev 2012-08-22, 06:13
one more idea: first search is grouped by brand with limit 1, it gives you a most relevant products for this particular search. than second search boost top products from the first search result by ie. q=original:query ID:(44,56,78,99,22)^1000 On Tue, Aug 21, 2012 at 8:04 PM, Karthick Duraisamy Soundararaj < [EMAIL PROTECTED]> wrote: > > On Tue, Aug 21, 2012 at 11:32 AM, Tanguy Moal <[EMAIL PROTECTED]>wrote: > >> Sorry then, my approach really disables pagination jumps. You're left >> with the 'next' button only, or an "infinite-scroll" type of pagination, >> which may not be what you wanted to do... > > You are right. > > >> Did you try disabling tf/idf and using random field as a secondary sort >> ? I'm pretty sure it will give you the best results with best efforts. >> > I was little nervous about turning off idf as I was concerned it might > affect relevancy. Considering that idf promotes documents with unique words > across the index more than the ones, turning off idf might make sense in an > ecommerce application. What do you think? > > But then, I dont get idea of random field for secondary sort. Secondary > sort is applicable only in the cases where the scores/primary sort values > are tied right? So I am not quite sure as how it would fit in here. > > -- > >> Tanguy >> >> 2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> >> >>> Hello Tanguy, >>> I need pagination. The problem with your >>> approach is that, to achieve pagination, you need to do a sort at >>> application level for sorting rather than at the solr level which I think >>> would become messy. Do you see a way around this? >>> >>> Thanks, >>> Karthick >>> >>> >>> On Tue, Aug 21, 2012 at 10:33 AM, Tanguy Moal <[EMAIL PROTECTED]>wrote: >>> >>>> Hello Karthick, >>>> >>>> 2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> >>>> >>>>> *"Find all the highest scoring document for each manufacuturer in >>>>> the current result set and place them ahead of the rest. Here as you can >>>>> see, the idea is to display one product from each unique manufacturer first" >>>>> *. Now to decide how many unique manufacturer to show before the >>>>> normal ordering can be determined relative to the total number of unique >>>>> manufacturers. Like for example, if there are 90 unique manufacturers, >>>>> display products from 45 (approx 50%) first before displaying the rest of >>>>> the products. >>>>> >>>> >>>> That's exactly what grouping will do. At least for the first sentence. >>>> You can ask for many items in each group, display only the first and store >>>> the others "somewhere", for later use. When you reach your "merchant >>>> representation threshold" (say 50% of total number of groups) then you can >>>> start picking the items you stored "somewhere" to display them at randomly >>>> chosen positions. That won't help pagination, though. >>>> >>>> Could that help you ? >>>> >>> >>> >>> >>> >>> >> > > > > -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics < http://www.griddynamics.com> <[EMAIL PROTECTED]>
-
Re: Diversifying Search Results - Custom Collector
Karthick Duraisamy Sounda... 2012-08-22, 11:37
Hey Mikhail, Yes. Thats a very good idea and a certain solution for my problem:). But two solr calls for each search results might be a concern. Maybe I should tweak https://issues.apache.org/jira/browse/SOLR-1093 a little bit so it takes the grouping results and boots them. Other way i think is to come up with a new field type with a custom comparator and a new collector. On Wed, Aug 22, 2012 at 2:13 AM, Mikhail Khludnev < [EMAIL PROTECTED]> wrote: > one more idea: > first search is grouped by brand with limit 1, it gives you a most > relevant products for this particular search. than second search boost top > products from the first search result by ie. q=original:query > ID:(44,56,78,99,22)^1000 > > > > On Tue, Aug 21, 2012 at 8:04 PM, Karthick Duraisamy Soundararaj < > [EMAIL PROTECTED]> wrote: > >> >> On Tue, Aug 21, 2012 at 11:32 AM, Tanguy Moal <[EMAIL PROTECTED]>wrote: >> >>> Sorry then, my approach really disables pagination jumps. You're left >>> with the 'next' button only, or an "infinite-scroll" type of pagination, >>> which may not be what you wanted to do... >> >> You are right. >> >> >>> Did you try disabling tf/idf and using random field as a secondary sort >>> ? I'm pretty sure it will give you the best results with best efforts. >>> >> I was little nervous about turning off idf as I was concerned it might >> affect relevancy. Considering that idf promotes documents with unique words >> across the index more than the ones, turning off idf might make sense in an >> ecommerce application. What do you think? >> >> But then, I dont get idea of random field for secondary sort. Secondary >> sort is applicable only in the cases where the scores/primary sort values >> are tied right? So I am not quite sure as how it would fit in here. >> >> -- >> >>> Tanguy >>> >>> 2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> >>> >>>> Hello Tanguy, >>>> I need pagination. The problem with your >>>> approach is that, to achieve pagination, you need to do a sort at >>>> application level for sorting rather than at the solr level which I think >>>> would become messy. Do you see a way around this? >>>> >>>> Thanks, >>>> Karthick >>>> >>>> >>>> On Tue, Aug 21, 2012 at 10:33 AM, Tanguy Moal <[EMAIL PROTECTED]>wrote: >>>> >>>>> Hello Karthick, >>>>> >>>>> 2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> >>>>> >>>>>> *"Find all the highest scoring document for each manufacuturer in >>>>>> the current result set and place them ahead of the rest. Here as you can >>>>>> see, the idea is to display one product from each unique manufacturer first" >>>>>> *. Now to decide how many unique manufacturer to show before the >>>>>> normal ordering can be determined relative to the total number of unique >>>>>> manufacturers. Like for example, if there are 90 unique manufacturers, >>>>>> display products from 45 (approx 50%) first before displaying the rest of >>>>>> the products. >>>>>> >>>>> >>>>> That's exactly what grouping will do. At least for the first sentence. >>>>> You can ask for many items in each group, display only the first and store >>>>> the others "somewhere", for later use. When you reach your "merchant >>>>> representation threshold" (say 50% of total number of groups) then you can >>>>> start picking the items you stored "somewhere" to display them at randomly >>>>> chosen positions. That won't help pagination, though. >>>>> >>>>> Could that help you ? >>>>> >>>> >>>> >>>> >>>> >>>> >>> >> >> >> >> > > > -- > Sincerely yours > Mikhail Khludnev > Tech Lead > Grid Dynamics > > < http://www.griddynamics.com>> <[EMAIL PROTECTED]> > >
-
Re: Diversifying Search Results - Custom Collector
Mikhail Khludnev 2012-08-22, 14:27
SOLR-1093, which is a little bit vague itself, doesn't help for implementing my approach, because second query is build in according to the results of the first one. On Wed, Aug 22, 2012 at 3:37 PM, Karthick Duraisamy Soundararaj < [EMAIL PROTECTED]> wrote: > Hey Mikhail, > Yes. Thats a very good idea and a certain solution for > my problem:). But two solr calls for each search results might be a > concern. Maybe I should tweak > https://issues.apache.org/jira/browse/SOLR-1093 a little bit so it takes > the grouping results and boots them. > > Other way i think is to come up with a new field type with a custom > comparator and a new collector. > > On Wed, Aug 22, 2012 at 2:13 AM, Mikhail Khludnev < > [EMAIL PROTECTED]> wrote: > >> one more idea: >> first search is grouped by brand with limit 1, it gives you a most >> relevant products for this particular search. than second search boost top >> products from the first search result by ie. q=original:query >> ID:(44,56,78,99,22)^1000 >> >> >> >> On Tue, Aug 21, 2012 at 8:04 PM, Karthick Duraisamy Soundararaj < >> [EMAIL PROTECTED]> wrote: >> >>> >>> On Tue, Aug 21, 2012 at 11:32 AM, Tanguy Moal <[EMAIL PROTECTED]>wrote: >>> >>>> Sorry then, my approach really disables pagination jumps. You're left >>>> with the 'next' button only, or an "infinite-scroll" type of pagination, >>>> which may not be what you wanted to do... >>> >>> You are right. >>> >>> >>>> Did you try disabling tf/idf and using random field as a secondary sort >>>> ? I'm pretty sure it will give you the best results with best efforts. >>>> >>> I was little nervous about turning off idf as I was concerned it might >>> affect relevancy. Considering that idf promotes documents with unique words >>> across the index more than the ones, turning off idf might make sense in an >>> ecommerce application. What do you think? >>> >>> But then, I dont get idea of random field for secondary sort. Secondary >>> sort is applicable only in the cases where the scores/primary sort values >>> are tied right? So I am not quite sure as how it would fit in here. >>> >>> -- >>> >>>> Tanguy >>>> >>>> 2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> >>>> >>>>> Hello Tanguy, >>>>> I need pagination. The problem with your >>>>> approach is that, to achieve pagination, you need to do a sort at >>>>> application level for sorting rather than at the solr level which I think >>>>> would become messy. Do you see a way around this? >>>>> >>>>> Thanks, >>>>> Karthick >>>>> >>>>> >>>>> On Tue, Aug 21, 2012 at 10:33 AM, Tanguy Moal <[EMAIL PROTECTED]>wrote: >>>>> >>>>>> Hello Karthick, >>>>>> >>>>>> 2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> >>>>>> >>>>>>> *"Find all the highest scoring document for each manufacuturer in >>>>>>> the current result set and place them ahead of the rest. Here as you can >>>>>>> see, the idea is to display one product from each unique manufacturer first" >>>>>>> *. Now to decide how many unique manufacturer to show before the >>>>>>> normal ordering can be determined relative to the total number of unique >>>>>>> manufacturers. Like for example, if there are 90 unique manufacturers, >>>>>>> display products from 45 (approx 50%) first before displaying the rest of >>>>>>> the products. >>>>>>> >>>>>> >>>>>> That's exactly what grouping will do. At least for the first >>>>>> sentence. You can ask for many items in each group, display only the first >>>>>> and store the others "somewhere", for later use. When you reach your >>>>>> "merchant representation threshold" (say 50% of total number of groups) >>>>>> then you can start picking the items you stored "somewhere" to display them >>>>>> at randomly chosen positions. That won't help pagination, though. >>>>>> >>>>>> Could that help you ? >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >>> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> Tech Lead Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics < http://www.griddynamics.com> <[EMAIL PROTECTED]>
-
Re: Diversifying Search Results - Custom Collector
Karthick Duraisamy Sounda... 2012-08-22, 18:21
Yeah the SOLR-1093 itself is a little vague but the core idea is to run multiple queries in a request. The patch is an implementation that runs the sub queries serially. On Wed, Aug 22, 2012 at 10:27 AM, Mikhail Khludnev < [EMAIL PROTECTED]> wrote: > SOLR-1093, which is a little bit vague itself, doesn't help for > implementing my approach, because second query is build in according to the > results of the first one. > > On Wed, Aug 22, 2012 at 3:37 PM, Karthick Duraisamy Soundararaj < > [EMAIL PROTECTED]> wrote: > >> Hey Mikhail, >> Yes. Thats a very good idea and a certain solution for >> my problem:). But two solr calls for each search results might be a >> concern. Maybe I should tweak >> https://issues.apache.org/jira/browse/SOLR-1093 a little bit so it takes >> the grouping results and boots them. >> >> Other way i think is to come up with a new field type with a custom >> comparator and a new collector. >> >> On Wed, Aug 22, 2012 at 2:13 AM, Mikhail Khludnev < >> [EMAIL PROTECTED]> wrote: >> >>> one more idea: >>> first search is grouped by brand with limit 1, it gives you a most >>> relevant products for this particular search. than second search boost top >>> products from the first search result by ie. q=original:query >>> ID:(44,56,78,99,22)^1000 >>> >>> >>> >>> On Tue, Aug 21, 2012 at 8:04 PM, Karthick Duraisamy Soundararaj < >>> [EMAIL PROTECTED]> wrote: >>> >>>> >>>> On Tue, Aug 21, 2012 at 11:32 AM, Tanguy Moal <[EMAIL PROTECTED]>wrote: >>>> >>>>> Sorry then, my approach really disables pagination jumps. You're left >>>>> with the 'next' button only, or an "infinite-scroll" type of pagination, >>>>> which may not be what you wanted to do... >>>> >>>> You are right. >>>> >>>> >>>>> Did you try disabling tf/idf and using random field as a secondary >>>>> sort ? I'm pretty sure it will give you the best results with best efforts. >>>>> >>>> I was little nervous about turning off idf as I was concerned it might >>>> affect relevancy. Considering that idf promotes documents with unique words >>>> across the index more than the ones, turning off idf might make sense in an >>>> ecommerce application. What do you think? >>>> >>>> But then, I dont get idea of random field for secondary sort. Secondary >>>> sort is applicable only in the cases where the scores/primary sort values >>>> are tied right? So I am not quite sure as how it would fit in here. >>>> >>>> -- >>>> >>>>> Tanguy >>>>> >>>>> 2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> >>>>> >>>>>> Hello Tanguy, >>>>>> I need pagination. The problem with your >>>>>> approach is that, to achieve pagination, you need to do a sort at >>>>>> application level for sorting rather than at the solr level which I think >>>>>> would become messy. Do you see a way around this? >>>>>> >>>>>> Thanks, >>>>>> Karthick >>>>>> >>>>>> >>>>>> On Tue, Aug 21, 2012 at 10:33 AM, Tanguy Moal <[EMAIL PROTECTED]>wrote: >>>>>> >>>>>>> Hello Karthick, >>>>>>> >>>>>>> 2012/8/21 Karthick Duraisamy Soundararaj <[EMAIL PROTECTED]> >>>>>>> >>>>>>>> *"Find all the highest scoring document for each manufacuturer in >>>>>>>> the current result set and place them ahead of the rest. Here as you can >>>>>>>> see, the idea is to display one product from each unique manufacturer first" >>>>>>>> *. Now to decide how many unique manufacturer to show before the >>>>>>>> normal ordering can be determined relative to the total number of unique >>>>>>>> manufacturers. Like for example, if there are 90 unique manufacturers, >>>>>>>> display products from 45 (approx 50%) first before displaying the rest of >>>>>>>> the products. >>>>>>>> >>>>>>> >>>>>>> That's exactly what grouping will do. At least for the first >>>>>>> sentence. You can ask for many items in each group, display only the first >>>>>>> and store the others "somewhere", for later use. When you reach your >>>>>>> "merchant representation threshold" (say 50% of total number of groups)
|
|