|
Robert Muir
2012-03-24, 03:36
Michael McCandless
2012-03-24, 12:11
Robert Muir
2012-03-24, 12:14
Robert Muir
2012-03-24, 12:21
Michael McCandless
2012-03-24, 13:21
Robert Muir
2012-03-24, 13:53
Michael McCandless
2012-03-24, 14:10
Dawid Weiss
2012-03-25, 09:11
Michael McCandless
2012-03-25, 14:59
Dawid Weiss
2012-03-25, 18:49
Steven A Rowe
2012-03-25, 18:56
|
-
Re: [JENKINS] Lucene-3.x - Build # 680 - FailureRobert Muir 2012-03-24, 03:36
This is likely a real bug (from added tests in LUCENE-3905)
But, I cannot reproduce locally since I dont have the special linedocs file. I think we need to address this: we cant use random tests if they arent reproducible... hell i'd even bet happier if linedocsfile took a URL and the "reproduce-with" actually downloaded the huge file? But I think we can do better without this... For now maybe we should disable the special linedocs file in hudson? By using it, it means none of our analysis tests are reproducible anymore... which is a big loss. Index: hudson-lucene-trunk.sh ==================================================================--- hudson-lucene-trunk.sh (revision 1304701) +++ hudson-lucene-trunk.sh (working copy) @@ -33,6 +33,7 @@ #mv -f dist/*-src.tgz $ARTIFACTS # run tests +# TODO: make reproducible -Dtests.linedocsfile=$TEST_LINE_DOCS_FILE \ cd $WORKSPACE/$CORE_DIR $ANT_HOME/bin/ant \ -Dsvnversion.exe=$SVNVERSION_EXE \ @@ -40,12 +41,12 @@ -Dversion=$VERSION \ -Dtests.multiplier=$TESTS_MULTIPLIER \ -Dtests.nightly=true \ - -Dtests.linedocsfile=$TEST_LINE_DOCS_FILE \ -Dargs="$TEST_JVM_ARGS" \ test # release it mv -f dist/*.tgz $ARTIFACTS +# TODO: make reproducible -Dtests.linedocsfile=$TEST_LINE_DOCS_FILE \ cd $WORKSPACE/$MODULES_DIR $ANT_HOME/bin/ant \ -Dsvnversion.exe=$SVNVERSION_EXE \ @@ -53,7 +54,6 @@ -Dversion=$VERSION \ -Dtests.multiplier=$TESTS_MULTIPLIER \ -Dtests.nightly=true \ - -Dtests.linedocsfile=$TEST_LINE_DOCS_FILE \ -Dargs="$TEST_JVM_ARGS" \ test Index: hudson-lucene-3.x.sh ==================================================================--- hudson-lucene-3.x.sh (revision 1304701) +++ hudson-lucene-3.x.sh (working copy) @@ -24,13 +24,13 @@ mv -f build/docs/api/* $JAVADOCS_ARTIFACTS # run tests +# TODO: make reproducible -Dtests.linedocsfile=$TEST_LINE_DOCS_FILE \ JAVA_HOME=$JAVA_HOME_16 $ANT_HOME/bin/ant \ -Dsvnversion.exe=$SVNVERSION_EXE \ -Dsvn.exe=$SVN_EXE \ -Dversion=$VERSION \ -Dtests.multiplier=$TESTS_MULTIPLIER \ -Dtests.nightly=true \ - -Dtests.linedocsfile=$TEST_LINE_DOCS_FILE \ -Dargs="$TEST_JVM_ARGS" \ test # release it On Fri, Mar 23, 2012 at 11:16 PM, Apache Jenkins Server <[EMAIL PROTECTED]> wrote: > Build: https://builds.apache.org/job/Lucene-3.x/680/ > > 1 tests failed. > REGRESSION: org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testRandomHugeStrings > > Error Message: > Uncaught exception by thread: Thread[Thread-10,5,] > > Stack Trace: > org.apache.lucene.util.UncaughtExceptionsRule$UncaughtExceptionsInBackgroundThread: Uncaught exception by thread: Thread[Thread-10,5,] > at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:83) > at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:516) > at org.junit.rules.RunRules.evaluate(RunRules.java:18) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) > at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) > at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:146) > at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) > at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) > at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:74) > at org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:36) lucidimagination.com
-
Re: [JENKINS] Lucene-3.x - Build # 680 - FailureMichael McCandless 2012-03-24, 12:11
I agree not being able to quickly reproduce a failure is bad, but,
worse, I don't want to lose test coverage. The big line doc file has much more content (~200X larger) than the checked in tiny one. Some tests, especially when running nightly, can fully consume the small line docs file and then they'll "wrap around" and won't see unique terms anymore. Does the failure reproduce if you use the big line docs file? (-Dtest.linedocsfile=...)? Ie, it's still reproducible, but it's a hassle because you need the big line file docs locally? If so... then maybe all that's missing is the "repro" line adding the -Dtest.linedocsfile=/path/to/big/line/file/docs? This way, when we try to repro locally, we'd get a hard but clear error (because the path presumably doesn't work in our local env), and we'd know to edit that to point to our local copy? Separately: does the test fail if you beast it using only the tiny line docs file? This really boils down to a test coverage vs test reproducibility tradeoff... but I would favor coverage here: I'd rather know about a bug, but struggle to reproduce/fix it, than not know about it at all. Ignorance isn't bliss... Mike McCandless http://blog.mikemccandless.com On Fri, Mar 23, 2012 at 11:36 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > This is likely a real bug (from added tests in LUCENE-3905) > > But, I cannot reproduce locally since I dont have the special linedocs file. > > I think we need to address this: we cant use random tests if they > arent reproducible... hell i'd even bet happier if linedocsfile took a > URL and the "reproduce-with" actually downloaded the huge file? But I > think we can do better without this... > > For now maybe we should disable the special linedocs file in hudson? > By using it, it means none of our analysis tests are reproducible > anymore... which is a big loss. > > Index: hudson-lucene-trunk.sh > ==================================================================> --- hudson-lucene-trunk.sh (revision 1304701) > +++ hudson-lucene-trunk.sh (working copy) > @@ -33,6 +33,7 @@ > #mv -f dist/*-src.tgz $ARTIFACTS > > # run tests > +# TODO: make reproducible -Dtests.linedocsfile=$TEST_LINE_DOCS_FILE \ > cd $WORKSPACE/$CORE_DIR > $ANT_HOME/bin/ant \ > -Dsvnversion.exe=$SVNVERSION_EXE \ > @@ -40,12 +41,12 @@ > -Dversion=$VERSION \ > -Dtests.multiplier=$TESTS_MULTIPLIER \ > -Dtests.nightly=true \ > - -Dtests.linedocsfile=$TEST_LINE_DOCS_FILE \ > -Dargs="$TEST_JVM_ARGS" \ > test > # release it > mv -f dist/*.tgz $ARTIFACTS > > +# TODO: make reproducible -Dtests.linedocsfile=$TEST_LINE_DOCS_FILE \ > cd $WORKSPACE/$MODULES_DIR > $ANT_HOME/bin/ant \ > -Dsvnversion.exe=$SVNVERSION_EXE \ > @@ -53,7 +54,6 @@ > -Dversion=$VERSION \ > -Dtests.multiplier=$TESTS_MULTIPLIER \ > -Dtests.nightly=true \ > - -Dtests.linedocsfile=$TEST_LINE_DOCS_FILE \ > -Dargs="$TEST_JVM_ARGS" \ > test > > Index: hudson-lucene-3.x.sh > ==================================================================> --- hudson-lucene-3.x.sh (revision 1304701) > +++ hudson-lucene-3.x.sh (working copy) > @@ -24,13 +24,13 @@ > mv -f build/docs/api/* $JAVADOCS_ARTIFACTS > > # run tests > +# TODO: make reproducible -Dtests.linedocsfile=$TEST_LINE_DOCS_FILE \ > JAVA_HOME=$JAVA_HOME_16 $ANT_HOME/bin/ant \ > -Dsvnversion.exe=$SVNVERSION_EXE \ > -Dsvn.exe=$SVN_EXE \ > -Dversion=$VERSION \ > -Dtests.multiplier=$TESTS_MULTIPLIER \ > -Dtests.nightly=true \ > - -Dtests.linedocsfile=$TEST_LINE_DOCS_FILE \ > -Dargs="$TEST_JVM_ARGS" \ > test > # release it > > > On Fri, Mar 23, 2012 at 11:16 PM, Apache Jenkins Server > <[EMAIL PROTECTED]> wrote: >> Build: https://builds.apache.org/job/Lucene-3.x/680/ >> >> 1 tests failed. >> REGRESSION: org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testRandomHugeStrings >> >> Error Message: >> Uncaught exception by thread: Thread[Thread-10,5,] >> >> Stack Trace: >> org.apache.lucene.util.UncaughtExceptionsRule$UncaughtExceptionsInBackgroundThread: Uncaught exception by thread: Thread[Thread-10,5,]
-
Re: [JENKINS] Lucene-3.x - Build # 680 - FailureRobert Muir 2012-03-24, 12:14
On Sat, Mar 24, 2012 at 8:11 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote: > > This really boils down to a test coverage vs test reproducibility > tradeoff... but I would favor coverage here: I'd rather know about a > bug, but struggle to reproduce/fix it, than not know about it at all. > Ignorance isn't bliss... > Thats not the case. we can have both. but reproducibility isn't something i want to lose: all the analysis tests were always completely reproducible before. I opened https://issues.apache.org/jira/browse/LUCENE-3910 -- lucidimagination.com ---------------------------------------------------------------------
-
Re: [JENKINS] Lucene-3.x - Build # 680 - FailureRobert Muir 2012-03-24, 12:21
On Sat, Mar 24, 2012 at 8:11 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote: > > Does the failure reproduce if you use the big line docs file? > (-Dtest.linedocsfile=...)? Ie, it's still reproducible, but it's a > hassle because you need the big line file docs locally? > The big line docs file locally goes way beyond hassle: * Where do i get it from? I have no idea. I'm sure its on jenkins, but I tried logging ssh'ing in and forgot to deal with the stupid rmuir/rcmuir shit and now my ip address is *firewalled out* from mistyping my username ONCE. is this security really necessary? is jenkins housing nuclear secrets or something? * I happen to know its several gigabytes in size from finding it before: this is a huge hassle for someone that isnt in the US on a fast pipe: i think its absurdly huge. -- lucidimagination.com ---------------------------------------------------------------------
-
Re: [JENKINS] Lucene-3.x - Build # 680 - FailureMichael McCandless 2012-03-24, 13:21
On Sat, Mar 24, 2012 at 8:21 AM, Robert Muir <[EMAIL PROTECTED]> wrote:
>> Does the failure reproduce if you use the big line docs file? >> (-Dtest.linedocsfile=...)? Ie, it's still reproducible, but it's a >> hassle because you need the big line file docs locally? OK, I verified: it does in fact reproduce, if you use the big line file docs. So it's not that it doesn't reproduce; it's that it "reproduces with some hassle". > The big line docs file locally goes way beyond hassle: > > * Where do i get it from? I put a copy here: http://people.apache.org/~mikemccand/enwiki.random.lines.txt.gz I agree we should make it as easy as possible to obtain (maybe an ant task that pulls it down for you?)... > * I happen to know its several gigabytes in size from finding it > before: this is a huge hassle for someone that isnt in the US on a > fast pipe: i think its absurdly huge. It's 1.1 GB compressed. I imagine devs that debug these sorts of fails would download it once and then reuse it. I agree it's "extra work" to reproduce failures, but I think that's the lesser evil here? The alternative is to not discover the bug (loss of test coverage)... eg, I'm unable to reproduce this failure if I only use the tiny line file docs (can anyone else?). If the "NOTE: reproduce with..." output included the -Dtests.linedocsfile=/path/to/big/file then you'd know you need to point it to your copy of the big line file docs. Mike McCandless http://blog.mikemccandless.com ---------------------------------------------------------------------
-
Re: [JENKINS] Lucene-3.x - Build # 680 - FailureRobert Muir 2012-03-24, 13:53
On Sat, Mar 24, 2012 at 9:21 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote: > On Sat, Mar 24, 2012 at 8:21 AM, Robert Muir <[EMAIL PROTECTED]> wrote: > > OK, I verified: it does in fact reproduce, if you use the big line file docs. > but the linedocs method truncates the real docs to fit. It could just be splitting a surrogate pair (making this not htmlstrips fault, but the test's fault instead). -- lucidimagination.com ---------------------------------------------------------------------
-
Re: [JENKINS] Lucene-3.x - Build # 680 - FailureMichael McCandless 2012-03-24, 14:10
On Sat, Mar 24, 2012 at 9:53 AM, Robert Muir <[EMAIL PROTECTED]> wrote:
> On Sat, Mar 24, 2012 at 9:21 AM, Michael McCandless > <[EMAIL PROTECTED]> wrote: >> On Sat, Mar 24, 2012 at 8:21 AM, Robert Muir <[EMAIL PROTECTED]> wrote: >> >> OK, I verified: it does in fact reproduce, if you use the big line file docs. >> > > but the linedocs method truncates the real docs to fit. It could just > be splitting a surrogate pair (making this not htmlstrips fault, but > the test's fault instead). You're right! Not good... I just committed a fix for that, but it looks like that wasn't the cause of HTMLStripCharFilter's test failure... I'll dig. Separately: I think tiny line file docs may have no surrogate pairs... I think we should fix that. I'll open an issue... Mike McCandless http://blog.mikemccandless.com ---------------------------------------------------------------------
-
Re: [JENKINS] Lucene-3.x - Build # 680 - FailureDawid Weiss 2012-03-25, 09:11
> http://people.apache.org/~mikemccand/enwiki.random.lines.txt.gz
You can make it nearly 50% smaller if you compress with lzma: -rw-r--r-- 1 dweiss dweiss 848M 2012-03-25 09:29 enwiki.bz2 -rw-r--r-- 1 dweiss dweiss 658M 2012-03-25 11:09 enwiki.lzma -rw-r--r-- 1 dweiss dweiss 1.1G 2012-03-24 14:03 enwiki.random.lines.txt.gz -rw-r--r-- 1 dweiss dweiss 2.9G 2012-03-25 09:21 enwiki.txt Dawid ---------------------------------------------------------------------
-
Re: [JENKINS] Lucene-3.x - Build # 680 - FailureMichael McCandless 2012-03-25, 14:59
On Sun, Mar 25, 2012 at 5:11 AM, Dawid Weiss
<[EMAIL PROTECTED]> wrote: >> http://people.apache.org/~mikemccand/enwiki.random.lines.txt.gz > > You can make it nearly 50% smaller if you compress with lzma: > > -rw-r--r-- 1 dweiss dweiss 848M 2012-03-25 09:29 enwiki.bz2 > -rw-r--r-- 1 dweiss dweiss 658M 2012-03-25 11:09 enwiki.lzma > -rw-r--r-- 1 dweiss dweiss 1.1G 2012-03-24 14:03 enwiki.random.lines.txt.gz > -rw-r--r-- 1 dweiss dweiss 2.9G 2012-03-25 09:21 enwiki.txt Wow, tempting :) Is "lzma" generally available... as a command line tool / ant can decompress...? A java impl...? Mike McCandless http://blog.mikemccandless.com ---------------------------------------------------------------------
-
Re: [JENKINS] Lucene-3.x - Build # 680 - FailureDawid Weiss 2012-03-25, 18:49
I think lzma is part of every major linux distribution? I didn't do
anything and it was on my ubuntu at least. 7z implements lzma compression if you want to create an archive (lzma is a streaming compressor much like bzip2). There is Java compressor (I think) and decompressor available from 7z author -- it is all right, I've used it in the past: http://www.7-zip.org/sdk.html Dawid On Sun, Mar 25, 2012 at 4:59 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > On Sun, Mar 25, 2012 at 5:11 AM, Dawid Weiss > <[EMAIL PROTECTED]> wrote: >>> http://people.apache.org/~mikemccand/enwiki.random.lines.txt.gz >> >> You can make it nearly 50% smaller if you compress with lzma: >> >> -rw-r--r-- 1 dweiss dweiss 848M 2012-03-25 09:29 enwiki.bz2 >> -rw-r--r-- 1 dweiss dweiss 658M 2012-03-25 11:09 enwiki.lzma >> -rw-r--r-- 1 dweiss dweiss 1.1G 2012-03-24 14:03 enwiki.random.lines.txt.gz >> -rw-r--r-- 1 dweiss dweiss 2.9G 2012-03-25 09:21 enwiki.txt > > Wow, tempting :) > > Is "lzma" generally available... as a command line tool / ant can > decompress...? A java impl...? > > Mike McCandless > > http://blog.mikemccandless.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > ---------------------------------------------------------------------
-
RE: [JENKINS] Lucene-3.x - Build # 680 - FailureSteven A Rowe 2012-03-25, 18:56
I have it on Cygwin, as part of XZ Utils. There's a Mac port listed on the project page: http://tukaani.org/xz/.
Steve -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Dawid Weiss Sent: Sunday, March 25, 2012 2:49 PM To: [EMAIL PROTECTED] Subject: Re: [JENKINS] Lucene-3.x - Build # 680 - Failure I think lzma is part of every major linux distribution? I didn't do anything and it was on my ubuntu at least. 7z implements lzma compression if you want to create an archive (lzma is a streaming compressor much like bzip2). There is Java compressor (I think) and decompressor available from 7z author -- it is all right, I've used it in the past: http://www.7-zip.org/sdk.html Dawid On Sun, Mar 25, 2012 at 4:59 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > On Sun, Mar 25, 2012 at 5:11 AM, Dawid Weiss > <[EMAIL PROTECTED]> wrote: >>> http://people.apache.org/~mikemccand/enwiki.random.lines.txt.gz >> >> You can make it nearly 50% smaller if you compress with lzma: >> >> -rw-r--r-- 1 dweiss dweiss 848M 2012-03-25 09:29 enwiki.bz2 >> -rw-r--r-- 1 dweiss dweiss 658M 2012-03-25 11:09 enwiki.lzma >> -rw-r--r-- 1 dweiss dweiss 1.1G 2012-03-24 14:03 >> enwiki.random.lines.txt.gz >> -rw-r--r-- 1 dweiss dweiss 2.9G 2012-03-25 09:21 enwiki.txt > > Wow, tempting :) > > Is "lzma" generally available... as a command line tool / ant can > decompress...? A java impl...? > > Mike McCandless > > http://blog.mikemccandless.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] For > additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- |