|
Nimrod Priell
2012-06-21, 20:01
Sean Owen
2012-06-21, 20:48
Sean Owen
2012-06-21, 20:55
Nimrod Priell
2012-06-21, 21:28
Nimrod Priell
2012-06-21, 21:41
Sean Owen
2012-06-21, 22:33
Ted Dunning
2012-06-22, 03:19
Ted Dunning
2012-06-22, 03:20
Nimrod Priell
2012-06-22, 11:40
|
-
"Direction" of co-occurence and log-likelihood ratioNimrod Priell 2012-06-21, 20:01
Hi,
I've been looking at log-likelihood ratio to rank statistically significant co-occurence of items for users, after reading about it first in "Mahout in Action", then reviewing the code, and links, and Ted Dunning's original paper on ACL 93'. I've reproduced Ted's results from the paper, (using the R code in the blog post), in R: > m the swiss OTHER the 0 110 2442 at 76 0 104 OTHER 2476 111 26458 (This is a 2-line example from the results on page 12 (72 in original publication) of the paper) Now, > to_llr(m) the swiss OTHER the 446.26 270.7219 80.975 at 157.21 2.5196 145.986 OTHER 143.11 256.7802 14.745 The results for "the swiss" and "at the" correspond to those appearing in the paper. However, note the result for "the the", which accurately presents this odd bigram as deviating greatly from independence: Given how common "the" is as a unigram, it's highly improbable to never see it as a bigram. I am wondering if there's a way to detect whether the deviation from independence is of the type that the co-occurrance is under-represented or over-represented w.r.t random sampling. Ideally, I'd like a measure on, say, (-inf, inf) where if the result is negative there is under-representation of the class where both A and B occur, and if it is positive, there is an overabundance of samples with (A intersection B). My initial guess was that LLR(k_11, k_12, k_21, k_22) has one minima with respect to k_11, i.e. keeping all other parameters fixed, it will be decreasing with k_11 up to a point, then increasing. That minimum is obviously when the co-occurance is random. Hence my thought was using d_LLR/dk_11 to determine direction: Concretely, looking at the sign of LLR(k_11 + 1, k_12, k_21, k_22) and using that for the direction. But I bet there are smarter ways than that. Ted- on twitter (I am @nimrodpriell) you asked me to e-mail the list and perhaps your full dissertation has more pointers on the subject. So I would love it if you can point me to it. I did note Lingpipe uses a different type of scoring, Pearson C_2 goodness of fit (it seems different from LLR, but I didn't dig deep) to do their collocation scoring: http://alias-i.com/lingpipe/demos/tutorial/interestingPhrases/read-me.html (the exact method is documented in the code, http://alias-i.com/lingpipe/docs/api/com/aliasi/lm/TokenizedLM.html#chiSquaredIndependence(int[]) ). Is that method a good way to capture what I'd like? --- On a completely different subject: I wrote a simple RelevantItemsDataSplitter and RecommenderIRStatsEvaluator which take a list of item IDs, and run CF evaluation by hiding items only out of that list, and asking to recommend only out of that list of items (precision and recall are then also calculated only with that list of items as the universe). The use-case is when you have an entire user-item matrix (think binary CF, not rating-based for now) but only certain items are interesting in an actual recommendation scenario: For instance, you only want to recommend products from a specific store, but you think that knowledge of the products from other stores would help (hence the bigger matrix). It helps because recommending just by popularity is very good and could beat user-item or item-item CF in cases where there is a power-law distribution and most the users have a small set of items. But in a real situation you often don't want to recommend what 80% of the users already have (even if a user is in the 20% that doesn't have that item). You want results on items you're likely to recommend in a real scenario; That are less popular for example. I realize an alternative to the example I proposed with the popularity is looking at the top-n recommendation for large n because only relatively few items are very popular so the precision-recall stats based on popularity become less skewed; But I still think it's a useful constraint for evaluation. If there is interest I will make sure it is well tested, read the guidelines and try to contribute the code (it's really <5 lines of code that change). Or was there a better way to do this rather than overloading functions on the RIDS and RIRSE classes? Thanks, Nimrod Priell
-
Re: "Direction" of co-occurence and log-likelihood ratioSean Owen 2012-06-21, 20:48
On Thu, Jun 21, 2012 at 9:01 PM, Nimrod Priell <[EMAIL PROTECTED]> wrote:
> On a completely different subject: I wrote a simple RelevantItemsDataSplitter and RecommenderIRStatsEvaluator which take a list of item IDs, and run CF evaluation by hiding items only out of that list, and asking to recommend only out of that list of items (precision and recall are then also calculated only with that list of items as the universe). Sure, if you know what the 'right answers' are more specifically in your use case, you can and should use that in the test. That's what the splitter class is for and that's what you did, yes. The more important thing of course is to implement this in your actual recommender! you can use a Rescorer to penalize popular items, if that's what you believe improves the result quality. > I realize an alternative to the example I proposed with the popularity is looking at the top-n recommendation for large n because only relatively few items are very popular so the precision-recall stats based on popularity become less skewed; But I still think it's a useful constraint for evaluation. You mean you want to use as a large "at" value in your test? This tends to increase recall but decrease precision. I don't know if it (necessarily) fixes something in this regard.
-
Re: "Direction" of co-occurence and log-likelihood ratioSean Owen 2012-06-21, 20:55
Is this not just a matter of comparing the frequency of "the" with
"the the"? If "the" is 1/n of the words, then "the the" ought to be 1/n^2. If it's less, it's under-represented. On Thu, Jun 21, 2012 at 9:01 PM, Nimrod Priell <[EMAIL PROTECTED]> wrote: > I am wondering if there's a way to detect whether the deviation from independence is of the type that the co-occurrance is under-represented or over-represented w.r.t random sampling. Ideally, I'd like a measure on, say, (-inf, inf) where if the result is negative there is under-representation of the class where both A and B occur, and if it is positive, there is an overabundance of samples with (A intersection B). > > My initial guess was that LLR(k_11, k_12, k_21, k_22) has one minima with respect to k_11, i.e. keeping all other parameters fixed, it will be decreasing with k_11 up to a point, then increasing. That minimum is obviously when the co-occurance is random.
-
Re: "Direction" of co-occurence and log-likelihood ratioNimrod Priell 2012-06-21, 21:28
Sean, thank you very much. Yes, that is true, I can look at the directionality just by comparing P(A and B) with P(A)*P(B) (where P here is the sample estimation of the probability of the event).
Thanks, Nimrod On Jun 21, 2012, at 4:55 PM, Sean Owen wrote: > Is this not just a matter of comparing the frequency of "the" with > "the the"? If "the" is 1/n of the words, then "the the" ought to be > 1/n^2. If it's less, it's under-represented. > > On Thu, Jun 21, 2012 at 9:01 PM, Nimrod Priell <[EMAIL PROTECTED]> wrote: >> I am wondering if there's a way to detect whether the deviation from independence is of the type that the co-occurrance is under-represented or over-represented w.r.t random sampling. Ideally, I'd like a measure on, say, (-inf, inf) where if the result is negative there is under-representation of the class where both A and B occur, and if it is positive, there is an overabundance of samples with (A intersection B). >> >> My initial guess was that LLR(k_11, k_12, k_21, k_22) has one minima with respect to k_11, i.e. keeping all other parameters fixed, it will be decreasing with k_11 up to a point, then increasing. That minimum is obviously when the co-occurance is random.
-
Re: "Direction" of co-occurence and log-likelihood ratioNimrod Priell 2012-06-21, 21:41
On Jun 21, 2012, at 4:48 PM, Sean Owen wrote: > On Thu, Jun 21, 2012 at 9:01 PM, Nimrod Priell <[EMAIL PROTECTED]> wrote: >> On a completely different subject: I wrote a simple RelevantItemsDataSplitter and RecommenderIRStatsEvaluator which take a list of item IDs, and run CF evaluation by hiding items only out of that list, and asking to recommend only out of that list of items (precision and recall are then also calculated only with that list of items as the universe). > > Sure, if you know what the 'right answers' are more specifically in > your use case, you can and should use that in the test. That's what > the splitter class is for and that's what you did, yes. > > The more important thing of course is to implement this in your actual > recommender! you can use a Rescorer to penalize popular items, if > that's what you believe improves the result quality. > In my real-time context, the recommender will never be asked to recommend for items outside of the specified set. Hence I want to evaluate only on it. The reference to popularity is a little confusing, actually; it just so happens that I realized I should do the scoring differently because the items in my set are less popular, and I didn't see any improvement from UBCF until I compared the results on these less popular items; rather than hit items at random for which the popularity recommender does best. Cool. I'll try to look into submitting a patch this weekend and maybe others could gain from this. > >> I realize an alternative to the example I proposed with the popularity is looking at the top-n recommendation for large n because only relatively few items are very popular so the precision-recall stats based on popularity become less skewed; But I still think it's a useful constraint for evaluation. > > You mean you want to use as a large "at" value in your test? This > tends to increase recall but decrease precision. I don't know if it > (necessarily) fixes something in this regard. What I do is I scatter-plot top-n for every n (say, top-1, top-5, top-10, ...) on a precision-recall space (or just compare F1 across several values of n). Then user-based CF is comparable to and even worse than "most popular" recommendation (in my test). However, as n becomes larger, "popular recommendation" has no "personalization" and starts missing, compared to UBCF so it has a "lower profile" than UBCF the farther out I go in recommendations. For an example (where I took this idea from), see the plot on top of page 22 of http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf , a guide to the R recommenderlab package. My rationalization for this was that because 80% of the users have the most popular item, it is picked somewhat more often to be the hidden item, and it's very easy for the popular recommender to guess it right. When I ask the recommender for many items, the effect of the popular ones dampens. Does that make sense? Best, Nimrod
-
Re: "Direction" of co-occurence and log-likelihood ratioSean Owen 2012-06-21, 22:33
The idea is sound but for a different and I think stronger reason.
For this kind of test you need to hold out items that are some of the best recommendations, since that's what the recommender is trying to find. Holding out random items isn't OK since the recommender is not simply trying to parrot back the user's rating set. In the binary case, all items look the same, so you don't know what's best to hold out. The best you can do is random, and it's not a good thing. Here you have side information about what makes a good recommendation. It's valid to prefer these in your test set, and of course you want to prefer them in the recommender too which you are. On Thu, Jun 21, 2012 at 10:41 PM, Nimrod Priell <[EMAIL PROTECTED]> wrote: > My rationalization for this was that because 80% of the users have the most popular item, it is picked somewhat more often to be the hidden item, and it's very easy for the popular recommender to guess it right. When I ask the recommender for many items, the effect of the popular ones dampens. Does that make sense? > > Best, > Nimrod >
-
Re: "Direction" of co-occurence and log-likelihood ratioTed Dunning 2012-06-22, 03:19
Yes. There is for the 2x2 case that arises with cooccurrence.
Simply take the square root and apply a sign according to signum(k11/k1* - k*1 / k**). There is code in mahout with a name like rootLLR that computes this. Sent from my iPhone On Jun 21, 2012, at 2:01 PM, Nimrod Priell <[EMAIL PROTECTED]> wrote: > I am wondering if there's a way to detect whether the deviation from independence is of the type that the co-occurrance is under-represented or over-represented w.r.t random sampling. Ideally, I'd like a measure on, say, (-inf, inf) where if the result is negative there is under-representation of the class where both A and B occur, and if it is positive, there is an overabundance of samples with (A intersection B).
-
Re: "Direction" of co-occurence and log-likelihood ratioTed Dunning 2012-06-22, 03:20
Most correlation measures have trouble with small counts. They ascribe very high score to coincidence (hence the title of the original paper)
Sent from my iPhone On Jun 21, 2012, at 2:01 PM, Nimrod Priell <[EMAIL PROTECTED]> wrote: > > I did note Lingpipe uses a different type of scoring, Pearson C_2 goodness of fit (it seems different from LLR, but I didn't dig deep) to do their collocation scoring: http://alias-i.com/lingpipe/demos/tutorial/interestingPhrases/read-me.html (the exact method is documented in the code, http://alias-i.com/lingpipe/docs/api/com/aliasi/lm/TokenizedLM.html#chiSquaredIndependence(int[]) ). Is that method a good way to capture what I'd like?
-
Re: "Direction" of co-occurence and log-likelihood ratioNimrod Priell 2012-06-22, 11:40
Thanks, for both replies. Helped me a lot.
On Jun 21, 2012, at 11:20 PM, Ted Dunning wrote: > Most correlation measures have trouble with small counts. They ascribe very high score to coincidence (hence the title of the original paper) > > Sent from my iPhone > > On Jun 21, 2012, at 2:01 PM, Nimrod Priell <[EMAIL PROTECTED]> wrote: > >> >> I did note Lingpipe uses a different type of scoring, Pearson C_2 goodness of fit (it seems different from LLR, but I didn't dig deep) to do their collocation scoring: http://alias-i.com/lingpipe/demos/tutorial/interestingPhrases/read-me.html (the exact method is documented in the code, http://alias-i.com/lingpipe/docs/api/com/aliasi/lm/TokenizedLM.html#chiSquaredIndependence(int[]) ). Is that method a good way to capture what I'd like? |