| clear query|facets|time |
Search criteria: relevant computing.
Results from 61 to 70 from
488 (0.584s).
|
|
|
Loading phrases to help you refine your search...
|
|
SummerOfCode2011ProjectRankingTerrier - Lucene - Lucene - [wiki]
|
|
...Return to main page
A short overview of Terrier's scoring architecture
Terrier is another Java-based, open source search engine developed at the School of Computing Science, University...
|
|
....
Also, even this idea is false. BM25, for instance, does not have its own idfBM25() method in Idf; it is computed in the BM25 class directly.
Idf also has methods that compute the logarithm...
|
[+ show more]
[- hide]
| ... randomness models.
AfterEffect subclasses compute the gain.
Normalisation is applied on the "raw" term frequencies before they are passed to the basic model.
These classes define their own... |
| ....
Statistics availability in Lucene
The content of the EntryStatistics class, i.e. the term statistics, is conveniently mirrored by the TermContext class. The relevant fields are
docFreq corresponds... |
| ... the real length; it may be worth to have both, since the more options, the more possibilities to experiment with;
avg. field length: has to be computed as in MockBM25Similarity... |
|
|
http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRankingTerrier
Author: DavidNemeskey,
2011-06-20, 12:51
|
|
|
NewScoring - Nutch - [wiki]
|
|
...-analysis to get a single global relevancy score for each url. Building a webgraph assumes that all links are stored in the current segments to be processed. Links are not held over from one processing...
|
|
... links to D which links back to A. This program is computationally expensive and usually, due to time and space requirement, can't be run on more than a three or four level depth. While it does...
|
[+ show more]
[- hide]
| ... and link cycles and then allow those links to be removed. Problem is the class is very expensive computationally. You can set the depth you want it to run but it is worse than exponential so I... |
| ... scores. Some things to consider:
Pagerank is just one of over 200 signals that google uses (if they still use it) to determine relevancy. Even if Google still uses it it most likely has... |
| ... changed. Link analysis scores are good global relevancy scores, but a link score does not a search engine make today. Oh how I wish it was that simple. LinkRank is a good starting point, that... |
|
|
http://wiki.apache.org/nutch/NewScoring
Author: LewisJohnMcgibbney,
2011-08-07, 12:55
|
|
|
PublicServers - Nutch - [wiki]
|
|
... of Chinese language websites in North America.
Ecolhub Web Search an E. coli specific search engine based on Nutch. EcoliHub WebSearch includes only those sites relevant to E. coli, thereby...
|
|
... discovery and search add-on. Computes similarity between pages using Nutch crawls.
SymbolHound - A search engine targeted toward programming- and math- related queries. Allows users to search...
|
|
|
http://wiki.apache.org/nutch/PublicServers
Author: DallanQuass,
2011-11-10, 21:40
|
|
|
ConversationsBetweenDougMarvinAndGrant - Lucene - Lucene - [wiki]
|
|
... boosts are not ordered.
Personally I think the eight-bit floats used by Lucene give plenty of
precision for this class of computation. Relevant documents should be
easily distinguished from...
|
|
... I think the eight-bit floats used by Lucene give plenty of
> precision for this class of computation. Relevant documents should be
> easily distinguished from non-relevant documents...
|
[+ show more]
[- hide]
| ... mean here, since boosts are not ordered.
>
> Personally I think the eight-bit floats used by Lucene give plenty of
> precision for this class of computation. Relevant documents... |
| ... are not ordered.
>>
>> Personally I think the eight-bit floats used by Lucene give plenty of
>> precision for this class of computation. Relevant documents should be
>> easily... |
| ... non-relevant documents, and fine-differences
in ranking between relevant documents don't matter. The only time folks
have complained about the precision of eight-bit floats in Lucene... |
|
|
http://wiki.apache.org/lucene-java/ConversationsBetweenDougMarvinAndGrant
Author: localhost,
2009-09-20, 21:47
|
|
|
Reference Reading - Apache Mahout - Apache Software Foundation - Mahout - [wiki]
|
|
..., consider a specialist text, e.g.:
Introduction to Bayesian Statistics (2nd Edition), William H. Bolstad, Wiley.
(amazon)
Then for the computational side of Bayesian (predominantly Markov chain...
|
|
... Monte Carlo), e.g.
Bolstad's Understanding Computational Bayesian Statistics, Wiley.
(amazon)
Then you might try Bayesian Data Analysis, Gelman et al., Chapman &Hall/CRC
On top...
|
[+ show more]
[- hide]
| ... http://research.microsoft.com/en-us/um/people/cmbishop/PRML/index.htm
matrix computations/decomposition/factorization etc.?
How's this one?
any idea? any other suggestion?
I found... |
| ...://www.amazon.com/Introduction-Linear-Algebra-Theory-Applications/dp/053400606X
David S. Watkins "Fundamentals of Matrix Computations (Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts)"
http://www.amazon.com/Fundamentals-Matrix-Computations... |
| ...://people.maths.ox.ac.uk/trefethen/text.html (with some online lecture notes)
I think this is the most relevant book for matrix math on distributed systems:
http://www.amazon.com/Numerical-Linear-Algebra-Lloyd-Trefethen/dp/0898713617... |
|
|
https://cwiki.apache.org/confluence/display/MAHOUT/Reference+Reading
Author: Grant Ingersoll,
2011-05-03, 00:00
|
|
|
OldHadoopTutorial - Nutch - [wiki]
|
|
... of the tutorial though I will point you to relevant resources if you want to know more about the architecture of Nutch and Hadoop.
The tutorial comes in two phases. Firstly we get Hadoop running...
|
|
... not be compatible with future releases of either Nutch or Hadoop.
Five: For this tutorial we setup nutch across 6 different computers. If you are using a different number of machines you should still...
|
[+ show more]
[- hide]
| ...
First let me layout the computers that we used in our setup. To setup Nutch and Hadoop we had 7 commodity computers ranging from 750Mghz to 1.0 Ghz. Each computer had at least 128 Megs of RAM... |
| ... and at least a 10 Gigabyte hard drive. One computer had dual 750 Mghz CPUs and another had dual 30 Gigabyte hard drives. All of these computers were purchased for under $500.00 at a liquidation sale... |
| .... I am telling you this to let you know that you don't have to have big hardware to get up and running with Nutch and Hadoop. Our computers were named like this:
devcluster01
devcluster02... |
|
|
http://wiki.apache.org/nutch/OldHadoopTutorial
Author: LewisJohnMcgibbney,
2011-09-02, 19:58
|
|
|
LuceneFAQ - Lucene - Lucene - [wiki]
|
|
... before body matches. But you can also boost queries on title by using query.setBoost(boost) on the relevant clause.
How do I find similar documents?
See the MoreLikeThis class in the org...
|
|
... to the deletable file.
Note that as of 2.1 the deletable file is no longer used. Instead, Lucene computes which files are no longer referenced by the index and removes them whenever a writer is created...
|
|
|
http://wiki.apache.org/lucene-java/LuceneFAQ
Author: SteveRowe,
2011-12-28, 03:22
|
|
|
Collocations - Apache Mahout - Apache Software Foundation - Mahout - [wiki]
|
|
... which co-occur more often than would be expected by chance. Statistically relevant combinations of terms identify additional lexical units which can be treated as features in a vector...
|
|
... overthruster', the Log-Likelihood ratio is computed by looking at the number of occurences of that word pair in the corpus, the number of word pairs that begin with 'oscillation' but end with something...
|
[+ show more]
[- hide]
| ..., frequency)
Pass 2: CollocDriver.computeNGramsPruneByLLR(...)
Pass 1 has calculated full frequencies for ngrams and subgrams, Pass 2 performs the LLR calculation.
Map Phase: IdentityMapper (org... |
|
|
https://cwiki.apache.org/confluence/display/MAHOUT/Collocations
Author: Dan Brickley,
2011-08-30, 00:00
|
|
|
[LUCENE-4574] FunctionQuery ValueSource value computed twice per document - Lucene - [issue]
|
|
... in a row. This computation isn't exactly cheap to calculate so this is a big problem. I was able to work-around this problem trivially on my end by caching the last value with corresponding docid...
|
|
....function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
at org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
at org.apache.lucene.search.FieldComparator$Relevance...
|
|
|
http://issues.apache.org/jira/browse/LUCENE-4574
Author: David Smiley,
2012-11-30, 17:54
|
|
|
RE: Getting facet counts for 10,000 most relevant hits - Solr - [mail # user]
|
|
... It can, and I have -- but only for the case of a single node... In general the faceting code in solr just needs a DocSet. the default imple uses the DocSet computed as aside effect...
|
|
... when executing the main search, but a custom SearchComponent could pick any DocSet it wants. A few years back I wrote a custom faceting plugin that computed a "score" for each constraint...
|
[+ show more]
[- hide]
| ... as a "guideline" for a sampling problem, telling each shard to consider only *their* top N results when computing the top facets in shardReq #1, and then do the same "give me an exact count" type logic... |
|
|
Author: Chris Hostetter,
2011-10-01, 01:19
|
|
|
|