|
|
-
extends LuceneTestCase, avoid preflex codec?
Ryan McKinley 2011-03-30, 16:26
I have a test framework that extends LuceneTestCase and tests a bunch of spatial indexing strategies.
One strategy writes binary tokens (eventually this should be CSF) and i'm getting an error when it hits the preflex codec.
Is there a way to avoid this? testSpatialSearch(org.apache.lucene.spatial.strategy.jts.JtsGeoStrategyTestCase) Time elapsed: 0.231 sec <<< FAILURE! java.lang.AssertionError at org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:339) at org.apache.lucene.index.codecs.preflexrw.TermInfosWriter.compareToLastTerm(TermInfosWriter.java:136) at org.apache.lucene.index.codecs.preflexrw.TermInfosWriter.add(TermInfosWriter.java:166) at org.apache.lucene.index.codecs.preflexrw.PreFlexFieldsWriter$PreFlexTermsWriter.finishTerm(PreFlexFieldsWriter.java:194) at org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:337) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:112)
---------------------------------------------------------------------
-
Re: extends LuceneTestCase, avoid preflex codec?
Robert Muir 2011-03-30, 16:27
yes, the collation tests work the same way, as they use pure binary tokens.
so their tests look like this:
@Override public void setUp() throws Exception { super.setUp(); assumeFalse("preflex format only supports UTF-8 encoded bytes", "PreFlex".equals(CodecProvider.getDefault().getDefaultFieldCodec())); } On Wed, Mar 30, 2011 at 12:26 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > I have a test framework that extends LuceneTestCase and tests a bunch > of spatial indexing strategies. > > One strategy writes binary tokens (eventually this should be CSF) and > i'm getting an error when it hits the preflex codec. > > Is there a way to avoid this? > > > testSpatialSearch(org.apache.lucene.spatial.strategy.jts.JtsGeoStrategyTestCase) > Time elapsed: 0.231 sec <<< FAILURE! > java.lang.AssertionError > at org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:339) > at org.apache.lucene.index.codecs.preflexrw.TermInfosWriter.compareToLastTerm(TermInfosWriter.java:136) > at org.apache.lucene.index.codecs.preflexrw.TermInfosWriter.add(TermInfosWriter.java:166) > at org.apache.lucene.index.codecs.preflexrw.PreFlexFieldsWriter$PreFlexTermsWriter.finishTerm(PreFlexFieldsWriter.java:194) > at org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:337) > at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:112) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
---------------------------------------------------------------------
-
Re: extends LuceneTestCase, avoid preflex codec?
Ryan McKinley 2011-03-30, 20:15
thanks -- I also see it failing on SimpleText. Is that expected? On Wed, Mar 30, 2011 at 12:27 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > yes, the collation tests work the same way, as they use pure binary tokens. > > so their tests look like this: > > @Override > public void setUp() throws Exception { > super.setUp(); > assumeFalse("preflex format only supports UTF-8 encoded bytes", > "PreFlex".equals(CodecProvider.getDefault().getDefaultFieldCodec())); > } > > > On Wed, Mar 30, 2011 at 12:26 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: >> I have a test framework that extends LuceneTestCase and tests a bunch >> of spatial indexing strategies. >> >> One strategy writes binary tokens (eventually this should be CSF) and >> i'm getting an error when it hits the preflex codec. >> >> Is there a way to avoid this? >> >> >> testSpatialSearch(org.apache.lucene.spatial.strategy.jts.JtsGeoStrategyTestCase) >> Time elapsed: 0.231 sec <<< FAILURE! >> java.lang.AssertionError >> at org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:339) >> at org.apache.lucene.index.codecs.preflexrw.TermInfosWriter.compareToLastTerm(TermInfosWriter.java:136) >> at org.apache.lucene.index.codecs.preflexrw.TermInfosWriter.add(TermInfosWriter.java:166) >> at org.apache.lucene.index.codecs.preflexrw.PreFlexFieldsWriter$PreFlexTermsWriter.finishTerm(PreFlexFieldsWriter.java:194) >> at org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:337) >> at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:112) >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
---------------------------------------------------------------------
-
Re: extends LuceneTestCase, avoid preflex codec?
Robert Muir 2011-03-30, 20:17
On Wed, Mar 30, 2011 at 4:15 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > thanks -- I also see it failing on SimpleText. Is that expected? > >
I don't think that is expected? The collation keys use binary terms in their tests and pass with simpletext, though that doesn't mean their isn't a possibility of a bug in SimpleText...
---------------------------------------------------------------------
-
Re: extends LuceneTestCase, avoid preflex codec?
Ryan McKinley 2011-03-30, 20:22
I also see it with:
test params are: codec=RandomCodecProvider: {id=MockFixedIntBlock(blockSize=1821), geo=SimpleText, name=MockSep}, locale=no_NO_NY, timezone=Europe/Chisinau
IIUC, that picks a random provider for each field? and geo got SimpleText.
The actual error I see is with code I have to make sure we don't read too many bytes:
BytesRef term = te.next(); while (term != null) { WKBReader reader = new WKBReader(factory); try { final BytesRef ref = term; Geometry geo = reader.read(new InStream() { int off = ref.offset;
@Override public void read(byte[] buf) throws IOException { if (off + buf.length > ref.length) { throw new InvalidShapeException("Asking for too many bytes"); } for (int i = 0; i < buf.length; i++) { buf[i] = ref.bytes[off + i]; } off += buf.length; } }); ...
On Wed, Mar 30, 2011 at 4:17 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > On Wed, Mar 30, 2011 at 4:15 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: >> thanks -- I also see it failing on SimpleText. Is that expected? >> >> > > I don't think that is expected? The collation keys use binary terms in > their tests and pass with simpletext, though that doesn't mean their > isn't a possibility of a bug in SimpleText... > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
---------------------------------------------------------------------
-
Re: extends LuceneTestCase, avoid preflex codec?
Robert Muir 2011-03-30, 20:32
On Wed, Mar 30, 2011 at 4:22 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > int off = ref.offset; ... > if (off + buf.length > ref.length) { > throw new InvalidShapeException("Asking for too many bytes");
this check looks like it might be wrong (backwards logic) especially if the codec returns terms with nonzero bytesref offsets? but its hard to tell in the context of your small snippet...
---------------------------------------------------------------------
-
Re: extends LuceneTestCase, avoid preflex codec?
Ryan McKinley 2011-03-30, 20:40
dooh -- yes, logic was backwards.
Thank you random testing! (and Robert -- of course) On Wed, Mar 30, 2011 at 4:32 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > On Wed, Mar 30, 2011 at 4:22 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: >> int off = ref.offset; > ... >> if (off + buf.length > ref.length) { >> throw new InvalidShapeException("Asking for too many bytes"); > > this check looks like it might be wrong (backwards logic) especially > if the codec returns terms with nonzero bytesref offsets? > but its hard to tell in the context of your small snippet... > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
---------------------------------------------------------------------
|
|