|
Grant Ingersoll
2010-04-29, 14:14
Mark Bennett
2010-04-29, 14:59
Avi Rosenschein
2010-04-30, 12:00
Grant Ingersoll
2010-04-30, 15:21
MitchK
2010-04-30, 21:16
Avi Rosenschein
2010-05-02, 12:50
Grant Ingersoll
2010-05-05, 14:08
Avi Rosenschein
2010-05-05, 16:40
Ivan Provalov
2010-05-04, 00:41
Peter Keegan
2010-05-03, 18:08
Grant Ingersoll
2010-05-05, 14:10
Peter Keegan
2010-05-05, 15:31
Fornoville, Tom
2010-04-29, 14:38
|
-
Relevancy PracticesGrant Ingersoll 2010-04-29, 14:14
I'm putting on a talk at Lucene Eurocon (http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical Relevance" and I'm curious as to what people put in practice for testing and improving relevance. I have my own inclinations, but I don't want to muddy the water just yet. So, if you have a few moments, I'd love to hear responses to the following questions.
What worked? What didn't work? What didn't you understand about it? What tools did you use? What tools did you wish you had either for debugging relevance or "fixing" it? How much time did you spend on it? How did you avoid over/under tuning? What stage of development/testing/production did you decide to do relevance tuning? Was that timing planned or not? Thanks, Grant +
Grant Ingersoll 2010-04-29, 14:14
-
Re: Relevancy PracticesMark Bennett 2010-04-29, 14:59
Hi Grant,
You're welcome to use any of my slides (Dave's got them), with attribution of course. BUT.... Have you considered a section something like "why the hell do you think Relevancy tweaking is gonna save you!?!?" Basically that, as a corpus grows exponentially, so do results list sizes, so ALL relevancy tweaks will eventually fail. And FACETS (or other navigators) are the answer. I've got slides on that as well. Of course relevancy matters.... but it's only ONE of perhaps a three pronged approach: 1: Organic Relevancy and top query suggetions 2: Results list Navigators, the best the system can support, and 3: Data quality (spidering, METADATA quality, source weighting, etc) Mark -- Mark Bennett / New Idea Engineering, Inc. / [EMAIL PROTECTED] Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 On Thu, Apr 29, 2010 at 7:14 AM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > I'm putting on a talk at Lucene Eurocon ( > http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical > Relevance" and I'm curious as to what people put in practice for testing and > improving relevance. I have my own inclinations, but I don't want to muddy > the water just yet. So, if you have a few moments, I'd love to hear > responses to the following questions. > > What worked? > What didn't work? > What didn't you understand about it? > What tools did you use? > What tools did you wish you had either for debugging relevance or "fixing" > it? > How much time did you spend on it? > How did you avoid over/under tuning? > What stage of development/testing/production did you decide to do relevance > tuning? Was that timing planned or not? > > > Thanks, > Grant > +
Mark Bennett 2010-04-29, 14:59
-
Re: Relevancy PracticesAvi Rosenschein 2010-04-30, 12:00
On Thu, Apr 29, 2010 at 5:59 PM, Mark Bennett <[EMAIL PROTECTED]> wrote:
> Hi Grant, > > You're welcome to use any of my slides (Dave's got them), with attribution > of course. > > BUT.... > > Have you considered a section something like "why the hell do you think > Relevancy tweaking is gonna save you!?!?" > Basically that, as a corpus grows exponentially, so do results list sizes, > so ALL relevancy tweaks will eventually fail. And FACETS (or other > navigators) are the answer. I've got slides on that as well. > The idea is to get the relevancy to fail on a smaller and smaller percent of the queries, as the corpus grows larger. Facets can definitely help, but they don't solve the basic problem of search, when there is no facet for the particular way the user is looking for something. The strength of search is that it can help the user to find things even when other forms of navigation fail. Of course relevancy matters.... but it's only ONE of perhaps a three pronged > approach: > 1: Organic Relevancy and top query suggetions > 2: Results list Navigators, the best the system can support, and > 3: Data quality (spidering, METADATA quality, source weighting, etc) > I would prefer to say that data quality can directly contribute to relevance (besides being important for other reasons as well). Basically, search relevancy is a combination of quality of data + quality of algorithm. In general, they are both important, and data even has the potential to be more important than algorithm, if you structure it right. Also, tuning the algorithms to the users can be very important. For instance, we have found that in a basic search functionality, the default query parser operator OR works very well. But on a page for advanced users, who want to very precisely tune their search results, a default of AND works better. Regards, -- Avi +
Avi Rosenschein 2010-04-30, 12:00
-
Re: Relevancy PracticesGrant Ingersoll 2010-04-30, 15:21
On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: > Also, tuning the algorithms to the users can be very important. For > instance, we have found that in a basic search functionality, the default > query parser operator OR works very well. But on a page for advanced users, > who want to very precisely tune their search results, a default of AND works > better. Avi, Great example. Can you elaborate on how you arrived at this conclusion? What things did you do to determine it was a problem? -Grant --------------------------------------------------------------------- +
Grant Ingersoll 2010-04-30, 15:21
-
Re: Relevancy PracticesMitchK 2010-04-30, 21:16
I found your thread at the Solr-user-list. However, it seems like your topic belongs more to Lucene in general? I copy my posting from there, so that everything is accessible by one thread. -------------------------------------------------------------------------- I think the problems one has to solve are depending on the usecases one has to deal with. It makes a difference whether I got much documents that are bloody similar but with different contexts and I have to determine what query applies to what context in what probability for which document - or if I have lots of editorialy managed documents with relatively clear contexts, because they offer human-created tags etc. I haven't made much experiences with Solr (and no experiences in a productive environment). However, those experiences I have made show that spliting the document's context in as small parts as possible is always a good idea. I don't mean splitting in a sense of making the part's of a document smaller. I mean that in a way of making it easier to decide which part of a document is more important than another. e.g.: I got a social network and every user is able to create his or her own blog - as a corporation I want to make them all searchable. It would be beneficial for high-quality search, if I am able to extract the introduction, the category (maybe added by the author). According to this: If this is not done by people, or not well done enough, than I need to do so algorithmically. e.g.: If I got a dictionary of person-names, than I could use the keepWordFilter to create a field I can facet *and* boost on. Let's say the user writes about Paris Hilton, Barrack Obama or any other well known person, than I can extract their names from the content in an easy way - of course this could be done better, but that's not the point here. If I search for "Obama's speech" all documents with "Obama" could get a boost. The difference between the solution without this keepWordFilter-feature would be, that Solr does not know that the most important word in this query is "Obama". It is only a shortcut of some ideas on how one can improve the relevancy with several features that Solr offers out-of-the-box. Some of them could be improved with external NLP-tools. My biggest problem with relevancy is, that I can't work with metadata computed on the fly or every hour out of the box (okay, you mentioned at the discussion on the dev-list that it may be possible, however I answered that the feature you talked about is not well documented, so that I don't know if it fits my needs or how to use it). How to avoid over- or under-tuning? Easily: Testing every change I made on scoring-factors against a lot of queries. If it looks good in 9 of 10 cases in a real good way, than the 10th case runs against a really bad query or could be solved with a facet or... there are a lot of ideas how to solve this. What I really want to say is: Test as much as you can and try to realize what your changes really mean (for example I can make a boost on the title of a document with a value of 1.000, every other field has got a boost-value between 1 and 10. I am relatively sure that this meets the needs for some queries but works catastrophal with the rest). It really helps to understand how Lucene's similarity works and what those factors mean in reality to your existing data. Maybe you need to change the smiliarity, because you don't want that the length of a document influences the score of it. Just some thougths. I don't think that I tell you much new stuff, however, if you got any questions or want to know more about this or that, please ask. Unfortunately I can't go to the ApacheCon, but hopefully it helps to give a good presentation. Kind regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/Relevancy-Practices-tp765363p768902.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -------------------------------------- +
MitchK 2010-04-30, 21:16
-
Re: Relevancy PracticesAvi Rosenschein 2010-05-02, 12:50
On 4/30/10, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> > On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: >> Also, tuning the algorithms to the users can be very important. For >> instance, we have found that in a basic search functionality, the default >> query parser operator OR works very well. But on a page for advanced >> users, >> who want to very precisely tune their search results, a default of AND >> works >> better. > > Avi, > > Great example. Can you elaborate on how you arrived at this conclusion? > What things did you do to determine it was a problem? > > -Grant Hi Grant, Sure. On http://wiki.answers.com/, we use search in a variety of places and ways. In the basic search box (what you get if you look stuff up in the main Ask box on the home page), we generally want the relevancy matching to be pretty fuzzy. For example, if the user looked up "Where can you see photos of the Aurora Borealis effect?" I would still want to show them "Where can you see photos of the Aurora Borealis?" as a match. However, the advanced search page, http://wiki.answers.com/Q/Special:Search, is used by advanced users to filter questions by various facets and searches, and to them it is important for the filter to filter out non-matches, since they use it as a working page. For example, if they want to do a search for "Harry Potter" and classify all results into the "Harry Potter" category, it is important that not every match for "Harry" is returned. -- Avi --------------------------------------------------------------------- +
Avi Rosenschein 2010-05-02, 12:50
-
Re: Relevancy PracticesGrant Ingersoll 2010-05-05, 14:08
On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote: > On 4/30/10, Grant Ingersoll <[EMAIL PROTECTED]> wrote: >> >> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: >>> Also, tuning the algorithms to the users can be very important. For >>> instance, we have found that in a basic search functionality, the default >>> query parser operator OR works very well. But on a page for advanced >>> users, >>> who want to very precisely tune their search results, a default of AND >>> works >>> better. >> >> Avi, >> >> Great example. Can you elaborate on how you arrived at this conclusion? >> What things did you do to determine it was a problem? >> >> -Grant > > Hi Grant, > > Sure. On http://wiki.answers.com/, we use search in a variety of > places and ways. > > In the basic search box (what you get if you look stuff up in the main > Ask box on the home page), we generally want the relevancy matching to > be pretty fuzzy. For example, if the user looked up "Where can you see > photos of the Aurora Borealis effect?" I would still want to show them > "Where can you see photos of the Aurora Borealis?" as a match. > > However, the advanced search page, > http://wiki.answers.com/Q/Special:Search, is used by advanced users to > filter questions by various facets and searches, and to them it is > important for the filter to filter out non-matches, since they use it > as a working page. For example, if they want to do a search for "Harry > Potter" and classify all results into the "Harry Potter" category, it > is important that not every match for "Harry" is returned. I'm curious, Avi, if you can share how you came to these conclusions? For instance, did you have any qualitative evidence that "fuzzy" was better for the main page? Or was it a "I know it when I see it" kind of thing. --------------------------------------------------------------------- +
Grant Ingersoll 2010-05-05, 14:08
-
Re: Relevancy PracticesAvi Rosenschein 2010-05-05, 16:40
On Wed, May 5, 2010 at 5:08 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> > On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote: > > > On 4/30/10, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > >> > >> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: > >>> Also, tuning the algorithms to the users can be very important. For > >>> instance, we have found that in a basic search functionality, the > default > >>> query parser operator OR works very well. But on a page for advanced > >>> users, > >>> who want to very precisely tune their search results, a default of AND > >>> works > >>> better. > >> > >> Avi, > >> > >> Great example. Can you elaborate on how you arrived at this conclusion? > >> What things did you do to determine it was a problem? > >> > >> -Grant > > > > Hi Grant, > > > > Sure. On http://wiki.answers.com/, we use search in a variety of > > places and ways. > > > > In the basic search box (what you get if you look stuff up in the main > > Ask box on the home page), we generally want the relevancy matching to > > be pretty fuzzy. For example, if the user looked up "Where can you see > > photos of the Aurora Borealis effect?" I would still want to show them > > "Where can you see photos of the Aurora Borealis?" as a match. > > > > However, the advanced search page, > > http://wiki.answers.com/Q/Special:Search, is used by advanced users to > > filter questions by various facets and searches, and to them it is > > important for the filter to filter out non-matches, since they use it > > as a working page. For example, if they want to do a search for "Harry > > Potter" and classify all results into the "Harry Potter" category, it > > is important that not every match for "Harry" is returned. > > I'm curious, Avi, if you can share how you came to these conclusions? For > instance, did you have any qualitative evidence that "fuzzy" was better for > the main page? Or was it a "I know it when I see it" kind of thing. > I guess it was an "I know it when I see it" kind of thing. But it is supported by evidence from our testing team and direct feedback from users. I guess one could say that the difference is less in level of user sophistication (though that is part of it), and more in user expectation when using different input methods of search. Our home page encourages asking questions in natural language, and therefore search based on that query is going to need to be "fuzzier" than a strict match of all the terms. -- Avi +
Avi Rosenschein 2010-05-05, 16:40
-
Re: Relevancy PracticesIvan Provalov 2010-05-04, 00:41
Grant,
We are currently working on a relevancy improvement project. We took the IBM's paper from 2007 TREC and followed the approaches they described to improve Lucene's relevance. It also gave us some idea of Lucene’s out-of-the-box precision performance (MAP). In addition to it we used some of the best practices described in TREC's book (Voorhees 2005, MIT). We also looked into the probability scoring model (BM25). We started by comparing “vanilla” Lucene to our Lucene-based product’s performance. We obtained the collections and the judgments from the past TREC which were close to the genre of the content we store. We then proceeded to study how different tunings affected the scores. We used Lucene's benchmarking module to run against the TREC data. Even though there were a few old TREC document/topic format related issues along the way, this benchmarking tool was all together great in helping find the MAP and measure where we were at. Then we applied the Sweet Spot similarity, Pivot Point document length normalization (Lnb/Ltc), and BM25 scoring algorithms. After applying these different scoring mechanism changes and other techniques (different stemmers, query expansion), we saw some improvements. We then compared this to our current production system and started tuning it as well. Our second goal here was to include the relevance measurement into the continuous integration tests running nightly. The thought here is that if one of the system’s changes inadvertently affected the scoring, we would find out right away. This second phase also helped us discover hidden bugs in our production system. In addition to the English-based analyzers, we also studied Chinese analyzers and compared the results with the English collection runs. We used TREC data for that. Some observations: 1. Even though the Vector Space model with Boolean query (OR) gives good MAP scores, in some products the large number of returned results makes the product less usable. So, defaulting to AND operator may be a better option as was mentioned in this user group post earlier. 2. This TREC-based evaluation is just of many tools to use. For example, user feed-back is still the most important evaluation one can do. 3. We will continue studying how different scoring mechanisms affect relevance quality before making a decision whether to switch from the default VSM. Some of our concerns are over-tuning and performance testing. 4. Lucene user community has been very helpful. Robert Muir, Joaquin Iglesias, and others helped with applying the scoring algorithms and providing great suggestions. 5. Some of the tools we use constantly - Lucene’s query Explanation and Luke. Thanks, Ivan Provalov --- On Thu, 4/29/10, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > From: Grant Ingersoll <[EMAIL PROTECTED]> > Subject: Relevancy Practices > To: [EMAIL PROTECTED] > Date: Thursday, April 29, 2010, 10:14 AM > I'm putting on a talk at Lucene > Eurocon (http://lucene-eurocon.org/sessions-track1-day2.html#1) > on "Practical Relevance" and I'm curious as to what people > put in practice for testing and improving relevance. I > have my own inclinations, but I don't want to muddy the > water just yet. So, if you have a few moments, I'd > love to hear responses to the following questions. > > What worked? > What didn't work? > What didn't you understand about it? > What tools did you use? > What tools did you wish you had either for debugging > relevance or "fixing" it? > How much time did you spend on it? > How did you avoid over/under tuning? > What stage of development/testing/production did you decide > to do relevance tuning? Was that timing planned or > not? > > > Thanks, > Grant > --------------------------------------------------------------------- +
Ivan Provalov 2010-05-04, 00:41
-
Re: Relevancy PracticesPeter Keegan 2010-05-03, 18:08
We discovered very soon after going to production that Lucene's scores were
often 'too precise'. For example, a page of 25 results may have several different score values, and all within 15% of each other, but to the end user all 25 results were equally relevant. Thus we wanted the secondary sort field to determine the order, instead. This required writing a custom score comparator to 'round' the scores. The same thing occurred for distance sorting. We also limit the effect of term frequency to help prevent spamming. In comparison to Avi, we use 'AND' as the default operator for keyword queries and if no docs are found, the query is automatically retried with 'OR'. This improves precision a bit and only occurs if the user provides no operators. Lucene's Explanation class has been invaluable in helping me to explain a particular sort order in many, many situations. Most of our relevance tuning has occurred after deployment to production. Peter On Thu, Apr 29, 2010 at 10:14 AM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > I'm putting on a talk at Lucene Eurocon ( > http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical > Relevance" and I'm curious as to what people put in practice for testing and > improving relevance. I have my own inclinations, but I don't want to muddy > the water just yet. So, if you have a few moments, I'd love to hear > responses to the following questions. > > What worked? > What didn't work? > What didn't you understand about it? > What tools did you use? > What tools did you wish you had either for debugging relevance or "fixing" > it? > How much time did you spend on it? > How did you avoid over/under tuning? > What stage of development/testing/production did you decide to do relevance > tuning? Was that timing planned or not? > > > Thanks, > Grant > +
Peter Keegan 2010-05-03, 18:08
-
Re: Relevancy PracticesGrant Ingersoll 2010-05-05, 14:10
Thanks, Peter.
Can you share what kind of evaluations you did to determine that the end user believed the results were equally relevant? How formal was that process? -Grant On May 3, 2010, at 11:08 AM, Peter Keegan wrote: > We discovered very soon after going to production that Lucene's scores were > often 'too precise'. For example, a page of 25 results may have several > different score values, and all within 15% of each other, but to the end > user all 25 results were equally relevant. Thus we wanted the secondary sort > field to determine the order, instead. This required writing a custom score > comparator to 'round' the scores. The same thing occurred for distance > sorting. We also limit the effect of term frequency to help prevent > spamming. In comparison to Avi, we use 'AND' as the default operator for > keyword queries and if no docs are found, the query is automatically retried > with 'OR'. This improves precision a bit and only occurs if the user > provides no operators. > > Lucene's Explanation class has been invaluable in helping me to explain a > particular sort order in many, many situations. > Most of our relevance tuning has occurred after deployment to production. > > Peter > > On Thu, Apr 29, 2010 at 10:14 AM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > >> I'm putting on a talk at Lucene Eurocon ( >> http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical >> Relevance" and I'm curious as to what people put in practice for testing and >> improving relevance. I have my own inclinations, but I don't want to muddy >> the water just yet. So, if you have a few moments, I'd love to hear >> responses to the following questions. >> >> What worked? >> What didn't work? >> What didn't you understand about it? >> What tools did you use? >> What tools did you wish you had either for debugging relevance or "fixing" >> it? >> How much time did you spend on it? >> How did you avoid over/under tuning? >> What stage of development/testing/production did you decide to do relevance >> tuning? Was that timing planned or not? >> >> >> Thanks, >> Grant >> --------------------------------------------------------------------- +
Grant Ingersoll 2010-05-05, 14:10
-
Re: Relevancy PracticesPeter Keegan 2010-05-05, 15:31
The feedback came directly from customers and customer facing support folks.
Here is an example of a query with keywords: nurse, rn, nursing, hospital. The top 2 hits have scores of 26.86348 and 26.407215. To the customer, both results were equally relevant because all of their keywords were in the documents. For this application, the subtleties of TF/IDF are not appreciated by the end user ;-). Here are the Explanations for the scores (I hope they are readable): Doc 1: 26.86348 sum of: 26.86348 product of: 33.57935 sum of: 10.403484 weight(contents:nurse in 110320), product of: 0.30413723 queryWeight(contents:nurse), product of: 4.8375363 idf(contents: nurse=9554) 0.06287027 queryNorm 34.206547 fieldWeight(contents:nurse in 110320), product of: 7.071068 btq, product of: 1.4142135 tf(phraseFreq=2.0) 5.0 scorePayload(...) 4.8375363 idf(contents: nurse=9554) 1.0 fieldNorm(field=contents, doc=110320) 11.005695 weight(contents:rn in 110320), product of: 0.31281596 queryWeight(contents:rn), product of: 4.9755783 idf(contents: rn=8322) 0.06287027 queryNorm 35.18265 fieldWeight(contents:rn in 110320), product of: 7.071068 btq, product of: 1.4142135 tf(phraseFreq=3.0) 5.0 scorePayload(...) 4.9755783 idf(contents: rn=8322) 1.0 fieldNorm(field=contents, doc=110320) 10.136917 weight(contents:nursing in 110320), product of: 0.3002155 queryWeight(contents:nursing), product of: 4.7751584 idf(contents: nursing=10169) 0.06287027 queryNorm 33.76547 fieldWeight(contents:nursing in 110320), product of: 7.071068 btq, product of: 1.4142135 tf(phraseFreq=11.0) 5.0 scorePayload(...) 4.7751584 idf(contents: nursing=10169) 1.0 fieldNorm(field=contents, doc=110320) 2.0332527 weight(contents:hospital in 110320), product of: 0.30064976 queryWeight(contents:hospital), product of: 4.7820654 idf(contents: hospital=10099) 0.06287027 queryNorm 6.7628617 fieldWeight(contents:hospital in 110320), product of: 1.4142135 btq, product of: 1.4142135 tf(phraseFreq=3.0) 1.0 scorePayload(...) 4.7820654 idf(contents: hospital=10099) 1.0 fieldNorm(field=contents, doc=110320) 0.8 coord(4/5) Doc 2: 26.407215 sum of: 26.407215 product of: 33.009018 sum of: 10.403484 weight(contents:nurse in 271166), product of: 0.30413723 queryWeight(contents:nurse), product of: 4.8375363 idf(contents: nurse=9554) 0.06287027 queryNorm 34.206547 fieldWeight(contents:nurse in 271166), product of: 7.071068 btq, product of: 1.4142135 tf(phraseFreq=4.0) 5.0 scorePayload(...) 4.8375363 idf(contents: nurse=9554) 1.0 fieldNorm(field=contents, doc=271166) 11.005695 weight(contents:rn in 271166), product of: 0.31281596 queryWeight(contents:rn), product of: 4.9755783 idf(contents: rn=8322) 0.06287027 queryNorm 35.18265 fieldWeight(contents:rn in 271166), product of: 7.071068 btq, product of: 1.4142135 tf(phraseFreq=4.0) 5.0 scorePayload(...) 4.9755783 idf(contents: rn=8322) 1.0 fieldNorm(field=contents, doc=271166) 1.4335766 weight(contents:nursing in 271166), product of: 0.3002155 queryWeight(contents:nursing), product of: 4.7751584 idf(contents: nursing=10169) 0.06287027 queryNorm 4.7751584 fieldWeight(contents:nursing in 271166), product of: 1.0 btq, product of: 1.0 tf(phraseFreq=1.0) 1.0 scorePayload(...) 4.7751584 idf(contents: nursing=10169) 1.0 fieldNorm(field=contents, doc=271166) 10.166264 weight(contents:hospital in 271166), product of: 0.30064976 queryWeight(contents:hospital), product of: 4.7820654 idf(contents: hospital=10099) 0.06287027 queryNorm 33.81431 fieldWeight(contents:hospital in 271166), product of: 7.071068 btq, product of: 1.4142135 tf(phraseFreq=9.0) 5.0 scorePayload(...) 4.7820654 idf(contents: hospital=10099) 1.0 fieldNorm(field=contents, doc=271166) 0.8 coord(4/5) Peter On Wed, May 5, 2010 at 10:10 AM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: +
Peter Keegan 2010-05-05, 15:31
-
RE: Relevancy PracticesFornoville, Tom 2010-04-29, 14:38
We've only been using Lucene for a couple of weeks and we're still in
the evaluation and R&D phase but there's one single thing that has helped us out enormously with the relevance testing: a set of reference documents and queries. We basically sat together with the business people a created a list of about 50 (fictional) documents, some queries and the order in which results should be returned. Once everyone agreed on this reference data we converted it to a set of unit tests. Until now this approach has helped us out big time both in refining the business requirements and the scoring and relevancy in the search engine itself. Cheers, Tom -----Original Message----- From: Grant Ingersoll [mailto:[EMAIL PROTECTED]] On Behalf Of Grant Ingersoll Sent: donderdag 29 april 2010 16:15 To: [EMAIL PROTECTED] Subject: Relevancy Practices I'm putting on a talk at Lucene Eurocon (http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical Relevance" and I'm curious as to what people put in practice for testing and improving relevance. I have my own inclinations, but I don't want to muddy the water just yet. So, if you have a few moments, I'd love to hear responses to the following questions. What worked? What didn't work? What didn't you understand about it? What tools did you use? What tools did you wish you had either for debugging relevance or "fixing" it? How much time did you spend on it? How did you avoid over/under tuning? What stage of development/testing/production did you decide to do relevance tuning? Was that timing planned or not? Thanks, Grant --------------------------------------------------------------------- +
Fornoville, Tom 2010-04-29, 14:38
|