|
|
-
How to retrieve the index of a string within a field?
Elaine Li 2009-10-06, 16:10
Hi,
I have a field. The field has a sentence. If the user types in a word or a phrase, how can I return the index of this word or the index of the first word of the phrase? I tried to use &bf=ord..., but it does not work as i expected.
Thanks.
Elaine
+
Elaine Li 2009-10-06, 16:10
-
Re: How to retrieve the index of a string within a field?
Sandeep Tagore 2009-10-07, 12:12
Hi Elaine, What do you mean by "index of this word".. do you want to return the first occurrence of the word in that sentence or the document id. Also which type of field is it? is it a Text or String? If that is of type Text.. u can't achieve that because the sentence will be tokenized. Sandeep Elaine Li wrote: > > I have a field. The field has a sentence. If the user types in a word > or a phrase, how can I return the index of this word or the index of > the first word of the phrase? > I tried to use &bf=ord..., but it does not work as i expected. > -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25783936.htmlSent from the Solr - User mailing list archive at Nabble.com.
+
Sandeep Tagore 2009-10-07, 12:12
-
Re: How to retrieve the index of a string within a field?
Elaine Li 2009-10-07, 14:19
Hi Sandeep, Say the field < field name="sentence">Can you get what you want?</ field>, the field type is Text. My query contains 'sentence:"get what you"'. Is it possible to get number 2 directly from a query since the word 'get' is the 2nd token in the sentence? Thanks. Elaine On Wed, Oct 7, 2009 at 8:12 AM, Sandeep Tagore <[EMAIL PROTECTED]> wrote: > > Hi Elaine, > What do you mean by "index of this word".. do you want to return the first > occurrence of the word in that sentence or the document id. > Also which type of field is it? is it a Text or String? If that is of type > Text.. u can't achieve that because the sentence will be tokenized. > > Sandeep > > > Elaine Li wrote: >> >> I have a field. The field has a sentence. If the user types in a word >> or a phrase, how can I return the index of this word or the index of >> the first word of the phrase? >> I tried to use &bf=ord..., but it does not work as i expected. >> > > -- > View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25783936.html> Sent from the Solr - User mailing list archive at Nabble.com. > >
+
Elaine Li 2009-10-07, 14:19
-
Re: How to retrieve the index of a string within a field?
Sandeep Tagore 2009-10-07, 15:06
Hi Elaine, You can achieve that with some modifications in sol configuration files. Generally text will be configured as < fieldType name="text" class="solr.Text Field" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </ fieldType> When a field is declared as text(with above conf.) it will tokenized. Say, for example, your sentence "Can you get what you want?" will become be tokenized like "can, you, get, what, you, want". So when you search for 'sentence:"get what you"' you will get 0 results. To achieve your objective you can remove Tokenizers in "text" configuration. The best way I suggest is to declare the field as type "string". Search the string with wild card like 'sentence:"*get what you*"' using sorlj client and when you get try to records (results) save the output of sentence.indexOf(keyword) in your java bean. Here sentence is a variable declared in the java bean. For more details you need to read the usage of Solrj. If you have any issues in modifying the configuration post the configuration you have for the fieldtype "text" and i will modify it for you. Regards, Sandeep Team Elaine Li wrote: > > Say the field < field name="sentence">Can you get what you > want?</ field>, the field type is Text. > > My query contains 'sentence:"get what you"'. Is it possible to get > number 2 directly from a query since the word 'get' is the 2nd token > in the sentence? > -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25788406.htmlSent from the Solr - User mailing list archive at Nabble.com.
+
Sandeep Tagore 2009-10-07, 15:06
-
Re: How to retrieve the index of a string within a field?
Elaine Li 2009-10-07, 18:12
Sandeep, I do get results when I search for "get what you", not 0 results.
What in my schema makes this difference?
<fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <!-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. --> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <!-- <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> --> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> --> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <!-- <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> --> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType>
I need to learn Solrj. I am currently using javascript as a client and invoke http calls to get results to display in the browser. Can Solrj get all the results at one short w/o the http call? I need to do some postprocessing against all the results and then display the processed data. Submitting multiple http queries and post-process after each query does not seem to be the right way.
Thanks.
Elaine
On Wed, Oct 7, 2009 at 11:06 AM, Sandeep Tagore <[EMAIL PROTECTED]> wrote: > > Hi Elaine, > You can achieve that with some modifications in sol configuration files. > Generally text will be configured as > <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > When a field is declared as text(with above conf.) it will tokenized. Say, > for example, your sentence > "Can you get what you want?" will become be tokenized like "can, you, get, > what, you, want". So when you search for 'sentence:"get what you"' you will > get 0 results. > > To achieve your objective you can remove Tokenizers in "text" configuration. > The best way I suggest is to declare the field as type "string". Search the > string with wild card like 'sentence:"*get what you*"' using sorlj client > and when you get try to records (results) save the output of > sentence.indexOf(keyword) in your java bean. Here sentence is a variable > declared in the java bean.
+
Elaine Li 2009-10-07, 18:12
-
Re: How to retrieve the index of a string within a field?
Sandeep Tagore 2009-10-08, 05:31
Elaine, The field type text contains <tokenizer class="solr.WhitespaceTokenizerFactory"/> in its definition. So all the sentences that are indexed / queried will be split in to words. So when you search for 'get what you', you will get sentences containing get, what, you, get what, get you, what you, get what you. So when you try to find the indexOf of the keyword in that sentence (from results), you may not get it all the times. Solrj can give the results in one shot but it uses http call. You cant avoid it. You don't need to query multiple times with Solrj. Query once, get the results, store them in java beans, process it and display the results. Regards, Sandeep Elaine Li wrote: > > Sandeep, I do get results when I search for "get what you", not 0 results. > What in my schema makes this difference? > I need to learn Solrj. I am currently using javascript as a client and > invoke http calls to get results to display in the browser. Can Solrj > get all the results at one short w/o the http call? I need to do some > postprocessing against all the results and then display the processed > data. Submitting multiple http queries and post-process after each > query does not seem to be the right way. > -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25798586.htmlSent from the Solr - User mailing list archive at Nabble.com.
+
Sandeep Tagore 2009-10-08, 05:31
-
Re: How to retrieve the index of a string within a field?
Elaine Li 2009-10-08, 13:30
Sandeep, When I submit query, i actually make sure the searched phrase is wrapped with double quotes. When I do that, it will only return sentences with 'get what you'. If it does not have double quotes, it will return all the sentences as described in your email because without double quotes, it is a 'get OR what OR you' query. I don't know too much about the concepts behind search. I just make use of whatever works for me. Do you think I am still ok using text as my sentence field type? If the return is 100 thousands of results, will Solrj's http call hung up on it? Thanks a lot. Elaine On Thu, Oct 8, 2009 at 1:31 AM, Sandeep Tagore <[EMAIL PROTECTED]> wrote: > > Elaine, > The field type text contains <tokenizer > class="solr.WhitespaceTokenizerFactory"/> in its definition. So all the > sentences that are indexed / queried will be split in to words. So when you > search for 'get what you', you will get sentences containing get, what, you, > get what, get you, what you, get what you. So when you try to find the > indexOf of the keyword in that sentence (from results), you may not get it > all the times. > > Solrj can give the results in one shot but it uses http call. You cant avoid > it. You don't need to query multiple times with Solrj. Query once, get the > results, store them in java beans, process it and display the results. > > Regards, > Sandeep > > > Elaine Li wrote: >> >> Sandeep, I do get results when I search for "get what you", not 0 results. >> What in my schema makes this difference? >> I need to learn Solrj. I am currently using javascript as a client and >> invoke http calls to get results to display in the browser. Can Solrj >> get all the results at one short w/o the http call? I need to do some >> postprocessing against all the results and then display the processed >> data. Submitting multiple http queries and post-process after each >> query does not seem to be the right way. >> > -- > View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25798586.html> Sent from the Solr - User mailing list archive at Nabble.com. > >
+
Elaine Li 2009-10-08, 13:30
-
Re: How to retrieve the index of a string within a field?
Sandeep Tagore 2009-10-09, 06:43
Hi Elaine, As you are able to get the sentences which contains that phrase(when you use double quotes), its ok with the 'text' field type. Frankly speaking, I don't know whether Solrj's http call will hung or not if you try to get 100 thousands records at a time. I never tried that. But I guess you can't display more than 1000 records at a time. The best thing I can suggest you is pagination. You can use 'start' and 'rows' parameters to get the results in slices... say 1000 records at a time(start=0&rows=1000, start=1001&rows=2000....). You can easily achieve this using Solrj. In some scenarios, I tried to get 10k records at a time and I didn't get any problem. If you get any heap space errors, try to increase the space with JVM parameters. Thanks, Sandeep Elaine Li wrote: > > Sandeep, > > When I submit query, i actually make sure the searched phrase is > wrapped with double quotes. When I do that, it will only return > sentences with 'get what you'. If it does not have double quotes, it > will return all the sentences as described in your email because > without double quotes, it is a 'get OR what OR you' query. I don't > know too much about the concepts behind search. I just make use of > whatever works for me. Do you think I am still ok using text as my > sentence field type? > > If the return is 100 thousands of results, will Solrj's http call hung > up on it? > > Thanks a lot. > Elaine > -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25816222.htmlSent from the Solr - User mailing list archive at Nabble.com.
+
Sandeep Tagore 2009-10-09, 06:43
-
Re: How to retrieve the index of a string within a field?
Chris Hostetter 2009-10-09, 16:54
: I have a field. The field has a sentence. If the user types in a word : or a phrase, how can I return the index of this word or the index of : the first word of the phrase? : I tried to use &bf=ord..., but it does not work as i expected.
for basic queries (term, phrase, etc...) position information is not available for the patched doucments ... you can use highlighting to re-compute where matches occured, but the accuracy of that information depends a lot on what your field type, query, and highligher options look like. i don't believe we have any Highlighter options thta will just give you back the position information -- but one could be added.
for *true* positional matching info, there are the "Span" family of queries, which can actually return the exact information -- but there is no native query parser support for Spam queries in Solr, so you would need to customize your QParser to get that information.
-Hoss
+
Chris Hostetter 2009-10-09, 16:54
|
|