|
Marian Steinbach
2011-12-05, 10:12
sharadgaur
2012-02-28, 19:59
Marian Steinbach
2012-02-28, 20:10
Ahmet Arslan
2012-02-28, 20:14
Marian Steinbach
2012-02-28, 20:19
Ahmet Arslan
2012-02-28, 21:23
sharadgaur
2012-02-28, 21:34
andrew
2012-03-01, 20:14
Ahmet Arslan
2012-03-01, 21:05
Koji Sekiguchi
2012-03-02, 03:00
andrew
2012-03-02, 12:37
Ahmet Arslan
2012-03-02, 12:57
Robert Muir
2012-03-02, 13:26
andrew
2012-03-02, 13:47
andrew
2012-03-02, 14:21
Ahmet Arslan
2012-03-02, 14:41
Robert Muir
2012-03-02, 14:55
andrew
2012-03-02, 15:02
Ahmet Arslan
2012-03-02, 16:09
Ahmet Arslan
2012-03-02, 17:43
|
-
search.highlight.InvalidTokenOffsetsException in Solr 3.5Marian Steinbach 2011-12-05, 10:12
I get InvalidTokenOffsetsException in some searches when highlighting
is activated. It seems to depend on the result documents involved. In previous versions of Solr I haven't experienced this kind of error. Any ideas? Here is the complete exception stack: Problem accessing /solr/select. Reason: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token verwaltung exceeds length of provided text sized 3228 org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token verwaltung exceeds length of provided text sized 3228 at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:497) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token verwaltung exceeds length of provided text sized 3228 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490) ... 24 more These are my highlighting parameters, which seem to have no effect on the exception: <str name="hl">true</str> <str name="hl.fl">body,text</str> <int name="hl.snippets">3</int> <int name="hl.maxAnalyzedChars">20000</int> <str name="hl.mergeContiguous">true</str>
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5sharadgaur 2012-02-28, 19:59
I am also facing same problem do you have any update on it..... I am using
Solr 3.5 and getting same error... Feb 28, 2012 1:40:44 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token to exceeds length of provided text sized 11503 at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:497) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:567) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token to exceeds length of provided text sized 11503 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490) ... 20 more -- View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3785157.html Sent from the Solr - User mailing list archive at Nabble.com.
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5Marian Steinbach 2012-02-28, 20:10
Unfortunately I don't have any news on that. I disabled highlighting on the
text field (sadly). Have you tracked down which field causes the problem? Can you tell which filters you are applying to the according field type? Marian
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5Ahmet Arslan 2012-02-28, 20:14
> Unfortunately I don't have any news
> on that. I disabled highlighting on the > text field (sadly). > > Have you tracked down which field causes the problem? Can > you tell which > filters you are applying to the according field type? Are you using HTMLStripCharFilter ? If yes this could be : https://issues.apache.org/jira/browse/LUCENE-3690
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5Marian Steinbach 2012-02-28, 20:19
Am 28. Februar 2012 21:14 schrieb Ahmet Arslan <[EMAIL PROTECTED]>:
> > Are you using HTMLStripCharFilter ? If yes this could be : > https://issues.apache.org/jira/browse/LUCENE-3690 > Not sure whether that question was directed at me, but I am not using HTMLStripCharFilter but some other pattern replacements which modify character positions, probably in the same manner as HTMLStripCharFilter does.
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5Ahmet Arslan 2012-02-28, 21:23
> Not sure whether that question was directed at me, but I am
> not using HTMLStripCharFilter but some other pattern > replacements which modify > character positions, probably in the same manner as > HTMLStripCharFilter > does. I thought that cause of the problem is https://issues.apache.org/jira/browse/LUCENE-2208 What is your field definition? Can you provide your document and query pair that causes this exception?
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5sharadgaur 2012-02-28, 21:34
I was using fieldType text_general_rev
<fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true" maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> But since I changed to fieldType text_genral. Everything is running fine.... not getting InvalidTokenOffsetsException exception. <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> -- View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3785456.html Sent from the Solr - User mailing list archive at Nabble.com.
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5andrew 2012-03-01, 20:14
I have the same problem. This happens only for some documents in the index.
Like sharadgaur, the problem ceased when I removed ReversedWildcardFilterFactory from my analysis chain, HTMLStripCharFilterFactory has been there before and after. I am running branch-3.6 r1238628. As far as I can tell, this already has the fixes from LUCENE-2208 / LUCENE-3690. -- View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3791598.html Sent from the Solr - User mailing list archive at Nabble.com.
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5Ahmet Arslan 2012-03-01, 21:05
> I have the same problem. This happens
> only for some documents in the index. Andrew, can you provide a document string and a query pair? I will try to re-produce the exception. Then we can create a test case that fails. Others can look into it.
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5Koji Sekiguchi 2012-03-02, 03:00
(12/03/02 6:05), Ahmet Arslan wrote:
>> I have the same problem. This happens >> only for some documents in the index. > > Andrew, can you provide a document string and a query pair? I will try to re-produce the exception. Then we can create a test case that fails. Others can look into it. +1. Please do it! koji -- Query Log Visualizer for Apache Solr http://soleami.com/
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5andrew 2012-03-02, 12:37
I was able to create a test case.
We are querying ranges of documents. When I tried to isolate the document that causes trouble, I found it happens with exactly every second request only for a single document query (it fails constantly when requesting a range of documents where that document is included). I could also reproduce the exception with only that single document in the index. I think it is not a good idea to post the Solr <add/> XML here - it is very long (text extract of a newspaper page) and may not reproduce verbatim (whitespace etc.) if I paste it here. iorixxx, koji - is it ok if I send the necessary artifacts (add XML, schema, config) via email? -- View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793347.html Sent from the Solr - User mailing list archive at Nabble.com.
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5Ahmet Arslan 2012-03-02, 12:57
> I think it is not a good idea to post the Solr <add/>
> XML here - it is very > long (text extract of a newspaper page) and may not > reproduce verbatim > (whitespace etc.) if I paste it here. > > iorixxx, koji - is it ok if I send the necessary artifacts > (add XML, schema, > config) via email? I saw people using http://pastebin.com/ for this purposes before. Can you provide your full search URL too?
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5Robert Muir 2012-03-02, 13:26
On Fri, Mar 2, 2012 at 7:37 AM, andrew <[EMAIL PROTECTED]> wrote:
> I was able to create a test case. > > We are querying ranges of documents. When I tried to isolate the document > that causes trouble, I found it happens with exactly every second request > only for a single document query (it fails constantly when requesting a > range of documents where that document is included). I could also reproduce > the exception with only that single document in the index. > > I think it is not a good idea to post the Solr <add/> XML here - it is very > long (text extract of a newspaper page) and may not reproduce verbatim > (whitespace etc.) if I paste it here. > > iorixxx, koji - is it ok if I send the necessary artifacts (add XML, schema, > config) via email? > You can also open a jira issue (https://issues.apache.org/jira/browse/SOLR), and upload everything as attachments. I would also be very interested if you can test a nightly 3.6 build (https://builds.apache.org/job/Solr-3.x/lastSuccessfulBuild/artifact/artifacts/) There have been *numerous* offsets bugs fixed in 3.6 in a variety of tokenizers/tokenfilters besides the HTMLStripCharFilter: https://issues.apache.org/jira/browse/LUCENE-3642 https://issues.apache.org/jira/browse/SOLR-2891 https://issues.apache.org/jira/browse/LUCENE-3717 -- lucidimagination.com
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5andrew 2012-03-02, 13:47
I posted the files here: http://www.mediafire.com/?z43a5qyfvz4zxp1
-- View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793496.html Sent from the Solr - User mailing list archive at Nabble.com.
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5andrew 2012-03-02, 14:21
Robert, I just tried with 3.6-SNAPSHOT 1296203 from svn - the problem is
still there. I am just about to leave for a vacation. I'll try to open a JIRA issue this evening. -- View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793593.html Sent from the Solr - User mailing list archive at Nabble.com.
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5Ahmet Arslan 2012-03-02, 14:41
> Robert, I just tried with > 3.6-SNAPSHOT 1296203 from svn - the problem is > still there. > > I am just about to leave for a vacation. I'll try to open a > JIRA issue this > evening. Andrew, thanks for providing files. I also re-produced it. But cause of the exception is that you are trying to highlight on a field (body) that is not indexed. To enable highlighting you need both indexed="true" and stored="true" . http://wiki.apache.org/solr/FieldOptionsByUseCase I changed definition of body field from indexed="false" to indexed="true" and it is working now. But for the record (with indexed="false"), it is weird that it produces snippet in the first request, and then fails in the second request.
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5Robert Muir 2012-03-02, 14:55
On Fri, Mar 2, 2012 at 9:41 AM, Ahmet Arslan <[EMAIL PROTECTED]> wrote:
> >> Robert, I just tried with >> 3.6-SNAPSHOT 1296203 from svn - the problem is >> still there. >> >> I am just about to leave for a vacation. I'll try to open a >> JIRA issue this >> evening. > > Andrew, thanks for providing files. I also re-produced it. > > But cause of the exception is that you are trying to highlight on a field (body) that is not indexed. > > To enable highlighting you need both indexed="true" and stored="true" . > http://wiki.apache.org/solr/FieldOptionsByUseCase > > I changed definition of body field from indexed="false" to indexed="true" and it is working now. > > But for the record (with indexed="false"), it is weird that it produces snippet in the first request, and then fails in the second request. > > Ahmet, this is a good find. Can we still open a JIRA issue so that a more useful exception is thrown here? -- lucidimagination.com
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5andrew 2012-03-02, 15:02
Ah, ok - thank you for looking at it.
But - the wiki page has a foot note that says "a tokenizer must be defined for the field, but it doesn't need to be indexed". The body field has the type "dcx_text" which has a tokenizer. Is the documentation wrong here or am I misunderstanding something? -- View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793706.html Sent from the Solr - User mailing list archive at Nabble.com.
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5Ahmet Arslan 2012-03-02, 16:09
> But - the wiki page has a foot note that says "a tokenizer
> must be defined > for the field, but it doesn't need to be indexed". The body > field has the > type "dcx_text" which has a tokenizer. > > Is the documentation wrong here or am I misunderstanding > something? Ah, I never read that note. (just looking on the table). I think you are right, I can generate snippet from the following field: <field name="body" type="dcx_text" stored="true" indexed="false" multiValued="true"/>
-
Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5Ahmet Arslan 2012-03-02, 17:43
> Ahmet, this is a good find. Can we still open a JIRA issue
> so that a > more useful exception is thrown here? Robert, I created SOLR-3193 and created a test using Andrew's files. |