|
Simon Willnauer
2011-06-30, 12:55
Uwe Schindler
2011-06-30, 12:56
Robert Muir
2011-06-30, 13:05
Simon Willnauer
2011-06-30, 13:09
Dawid Weiss
2011-06-30, 13:11
Uwe Schindler
2011-06-30, 13:13
Uwe Schindler
2011-06-30, 13:26
Simon Willnauer
2011-06-30, 13:40
Robert Muir
2011-06-30, 14:44
Simon Willnauer
2011-06-30, 18:31
Dawid Weiss
2011-06-30, 18:50
Michael McCandless
2011-06-30, 18:57
Dawid Weiss
2011-06-30, 19:09
Michael McCandless
2011-06-30, 19:25
Simon Willnauer
2011-06-30, 20:45
Michael McCandless
2011-06-30, 22:19
Simon Willnauer
2011-07-01, 05:47
Uwe Schindler
2011-07-01, 06:33
Michael McCandless
2011-07-01, 13:04
Michael McCandless
2011-07-01, 13:13
Shai Erera
2011-07-01, 16:07
|
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaSimon Willnauer 2011-06-30, 12:55
hmm are you concerned about the extra Math.min that happens in the
copyOf method? I don't how that relates to "intrinsic" and java 1.7 I don't have strong feelings here just checking if you mix something up in the comment you put there... I am happy to keep the old and now current code simon On Thu, Jun 30, 2011 at 2:42 PM, <[EMAIL PROTECTED]> wrote: > Author: rmuir > Date: Thu Jun 30 12:42:17 2011 > New Revision: 1141510 > > URL: http://svn.apache.org/viewvc?rev=1141510&view=rev > Log: > LUCENE-3239: remove use of slow Arrays.copyOf > > Modified: > lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java > > Modified: lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java > URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1=1141509&r2=1141510&view=diff > =============================================================================> --- lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java (original) > +++ lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011 > @@ -2,7 +2,6 @@ package org.apache.lucene.util; > > import java.io.IOException; > import java.io.OutputStream; > -import java.util.Arrays; > > /** > * Licensed to the Apache Software Foundation (ASF) under one or more > @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream > } > > private void grow(int newLength) { > - buffer = Arrays.copyOf(buffer, newLength); > + // It actually should be: (Java 1.7, when its intrinsic on all machines) > + // buffer = Arrays.copyOf(buffer, newLength); > + byte[] newBuffer = new byte[newLength]; > + System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); > + buffer = newBuffer; > } > > /** > > > ---------------------------------------------------------------------
-
RE: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaUwe Schindler 2011-06-30, 12:56
Hi Robert,
you reverted a use of Arrays.copyOf() on native types which is *exactly* implemented like this in Arrays.java code! The slow ones are <T> T[] copyOf(T[] array, int newlen) (because they use reflection). Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > Sent: Thursday, June 30, 2011 2:42 PM > To: [EMAIL PROTECTED] > Subject: svn commit: r1141510 - > /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe > ByteArrayOutputStream.java > > Author: rmuir > Date: Thu Jun 30 12:42:17 2011 > New Revision: 1141510 > > URL: http://svn.apache.org/viewvc?rev=1141510&view=rev > Log: > LUCENE-3239: remove use of slow Arrays.copyOf > > Modified: > > lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeB > yteArrayOutputStream.java > > Modified: > lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeB > yteArrayOutputStream.java > URL: > http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/or > g/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1 > =1141509&r2=1141510&view=diff > =========================================================> ===================> --- > lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeB > yteArrayOutputStream.java (original) > +++ > lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf > +++ eByteArrayOutputStream.java Thu Jun 30 12:42:17 2011 > @@ -2,7 +2,6 @@ package org.apache.lucene.util; > > import java.io.IOException; > import java.io.OutputStream; > -import java.util.Arrays; > > /** > * Licensed to the Apache Software Foundation (ASF) under one or more > @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream > } > > private void grow(int newLength) { > - buffer = Arrays.copyOf(buffer, newLength); > + // It actually should be: (Java 1.7, when its intrinsic on all machines) > + // buffer = Arrays.copyOf(buffer, newLength); > + byte[] newBuffer = new byte[newLength]; > + System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); > + buffer = newBuffer; > } > > /** > ---------------------------------------------------------------------
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaRobert Muir 2011-06-30, 13:05
because on windows 32bit at least, -client is still the default on
most jres out there. i realize people don't care about -client, but i will police foo[].clone() / arrays.copyOf etc to prevent problems. There are comments about this stuff on the relevant bug reports (oracle's site is down, sorry) linked to this issue. https://issues.apache.org/jira/browse/LUCENE-2674 Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I think we should always use arraycopy. On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer <[EMAIL PROTECTED]> wrote: > hmm are you concerned about the extra Math.min that happens in the > copyOf method? > I don't how that relates to "intrinsic" and java 1.7 > > I don't have strong feelings here just checking if you mix something > up in the comment you put there... I am happy to keep the old and now > current code > > simon > > On Thu, Jun 30, 2011 at 2:42 PM, <[EMAIL PROTECTED]> wrote: >> Author: rmuir >> Date: Thu Jun 30 12:42:17 2011 >> New Revision: 1141510 >> >> URL: http://svn.apache.org/viewvc?rev=1141510&view=rev >> Log: >> LUCENE-3239: remove use of slow Arrays.copyOf >> >> Modified: >> lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java >> >> Modified: lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java >> URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1=1141509&r2=1141510&view=diff >> =============================================================================>> --- lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java (original) >> +++ lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011 >> @@ -2,7 +2,6 @@ package org.apache.lucene.util; >> >> import java.io.IOException; >> import java.io.OutputStream; >> -import java.util.Arrays; >> >> /** >> * Licensed to the Apache Software Foundation (ASF) under one or more >> @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream >> } >> >> private void grow(int newLength) { >> - buffer = Arrays.copyOf(buffer, newLength); >> + // It actually should be: (Java 1.7, when its intrinsic on all machines) >> + // buffer = Arrays.copyOf(buffer, newLength); >> + byte[] newBuffer = new byte[newLength]; >> + System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); >> + buffer = newBuffer; >> } >> >> /** >> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ---------------------------------------------------------------------
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaSimon Willnauer 2011-06-30, 13:09
Robert I agree but doesn't that apply to Arrays.copyOf(Object[],int)
only? here we use a specialized primitive version? simon On Thu, Jun 30, 2011 at 3:05 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > because on windows 32bit at least, -client is still the default on > most jres out there. > > i realize people don't care about -client, but i will police > foo[].clone() / arrays.copyOf etc to prevent problems. > > There are comments about this stuff on the relevant bug reports > (oracle's site is down, sorry) linked to this issue. > https://issues.apache.org/jira/browse/LUCENE-2674 > > Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I > think we should always use arraycopy. > > On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer > <[EMAIL PROTECTED]> wrote: >> hmm are you concerned about the extra Math.min that happens in the >> copyOf method? >> I don't how that relates to "intrinsic" and java 1.7 >> >> I don't have strong feelings here just checking if you mix something >> up in the comment you put there... I am happy to keep the old and now >> current code >> >> simon >> >> On Thu, Jun 30, 2011 at 2:42 PM, <[EMAIL PROTECTED]> wrote: >>> Author: rmuir >>> Date: Thu Jun 30 12:42:17 2011 >>> New Revision: 1141510 >>> >>> URL: http://svn.apache.org/viewvc?rev=1141510&view=rev >>> Log: >>> LUCENE-3239: remove use of slow Arrays.copyOf >>> >>> Modified: >>> lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java >>> >>> Modified: lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java >>> URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1=1141509&r2=1141510&view=diff >>> =============================================================================>>> --- lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java (original) >>> +++ lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011 >>> @@ -2,7 +2,6 @@ package org.apache.lucene.util; >>> >>> import java.io.IOException; >>> import java.io.OutputStream; >>> -import java.util.Arrays; >>> >>> /** >>> * Licensed to the Apache Software Foundation (ASF) under one or more >>> @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream >>> } >>> >>> private void grow(int newLength) { >>> - buffer = Arrays.copyOf(buffer, newLength); >>> + // It actually should be: (Java 1.7, when its intrinsic on all machines) >>> + // buffer = Arrays.copyOf(buffer, newLength); >>> + byte[] newBuffer = new byte[newLength]; >>> + System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); >>> + buffer = newBuffer; >>> } >>> >>> /** >>> >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > ---------------------------------------------------------------------
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaDawid Weiss 2011-06-30, 13:11
Arrays.copyOf(primitive) is actually System.arraycopy by default. If
intrinsics are used it can only get faster. For object types it will probably be a bit slower for -client because of a runtime check for the component type. Dawid On Thu, Jun 30, 2011 at 3:05 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > because on windows 32bit at least, -client is still the default on > most jres out there. > > i realize people don't care about -client, but i will police > foo[].clone() / arrays.copyOf etc to prevent problems. > > There are comments about this stuff on the relevant bug reports > (oracle's site is down, sorry) linked to this issue. > https://issues.apache.org/jira/browse/LUCENE-2674 > > Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I > think we should always use arraycopy. > > On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer > <[EMAIL PROTECTED]> wrote: >> hmm are you concerned about the extra Math.min that happens in the >> copyOf method? >> I don't how that relates to "intrinsic" and java 1.7 >> >> I don't have strong feelings here just checking if you mix something >> up in the comment you put there... I am happy to keep the old and now >> current code >> >> simon >> >> On Thu, Jun 30, 2011 at 2:42 PM, <[EMAIL PROTECTED]> wrote: >>> Author: rmuir >>> Date: Thu Jun 30 12:42:17 2011 >>> New Revision: 1141510 >>> >>> URL: http://svn.apache.org/viewvc?rev=1141510&view=rev >>> Log: >>> LUCENE-3239: remove use of slow Arrays.copyOf >>> >>> Modified: >>> lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java >>> >>> Modified: lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java >>> URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1=1141509&r2=1141510&view=diff >>> =============================================================================>>> --- lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java (original) >>> +++ lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011 >>> @@ -2,7 +2,6 @@ package org.apache.lucene.util; >>> >>> import java.io.IOException; >>> import java.io.OutputStream; >>> -import java.util.Arrays; >>> >>> /** >>> * Licensed to the Apache Software Foundation (ASF) under one or more >>> @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream >>> } >>> >>> private void grow(int newLength) { >>> - buffer = Arrays.copyOf(buffer, newLength); >>> + // It actually should be: (Java 1.7, when its intrinsic on all machines) >>> + // buffer = Arrays.copyOf(buffer, newLength); >>> + byte[] newBuffer = new byte[newLength]; >>> + System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); >>> + buffer = newBuffer; >>> } >>> >>> /** >>> >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ---------------------------------------------------------------------
-
RE: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaUwe Schindler 2011-06-30, 13:13
Robert,
as noted in my other eMail, ist only slow for the generic Object[] method (as it uses j.l.reflect.Array.newInstance(Class componentType)). We are talking here about byte[], and the Arrays method is implemented with the same 3 lines of code, Simon replaced. The only difference is a Math.min() which is intrinsic (it is used, as Arrays.copyOf supports shrinking size, so the System.arrayCopy() needs upper limit to not AIOOBE). Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] > -----Original Message----- > From: Robert Muir [mailto:[EMAIL PROTECTED]] > Sent: Thursday, June 30, 2011 3:05 PM > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: svn commit: r1141510 - > /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe > ByteArrayOutputStream.java > > because on windows 32bit at least, -client is still the default on most jres out > there. > > i realize people don't care about -client, but i will police > foo[].clone() / arrays.copyOf etc to prevent problems. > > There are comments about this stuff on the relevant bug reports (oracle's > site is down, sorry) linked to this issue. > https://issues.apache.org/jira/browse/LUCENE-2674 > > Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I think we > should always use arraycopy. > > On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer > <[EMAIL PROTECTED]> wrote: > > hmm are you concerned about the extra Math.min that happens in the > > copyOf method? > > I don't how that relates to "intrinsic" and java 1.7 > > > > I don't have strong feelings here just checking if you mix something > > up in the comment you put there... I am happy to keep the old and now > > current code > > > > simon > > > > On Thu, Jun 30, 2011 at 2:42 PM, <[EMAIL PROTECTED]> wrote: > >> Author: rmuir > >> Date: Thu Jun 30 12:42:17 2011 > >> New Revision: 1141510 > >> > >> URL: http://svn.apache.org/viewvc?rev=1141510&view=rev > >> Log: > >> LUCENE-3239: remove use of slow Arrays.copyOf > >> > >> Modified: > >> > >> > lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe > >> ByteArrayOutputStream.java > >> > >> Modified: > >> > lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe > >> ByteArrayOutputStream.java > >> URL: > >> > http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/ > >> > org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r > >> 1=1141509&r2=1141510&view=diff > >> > =========================================================> ==========> >> ========> >> --- > >> > lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe > >> ByteArrayOutputStream.java (original) > >> +++ > lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Un > >> +++ safeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011 > >> @@ -2,7 +2,6 @@ package org.apache.lucene.util; > >> > >> import java.io.IOException; > >> import java.io.OutputStream; > >> -import java.util.Arrays; > >> > >> /** > >> * Licensed to the Apache Software Foundation (ASF) under one or more > >> @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream > >> } > >> > >> private void grow(int newLength) { > >> - buffer = Arrays.copyOf(buffer, newLength); > >> + // It actually should be: (Java 1.7, when its intrinsic on all > >> + machines) > >> + // buffer = Arrays.copyOf(buffer, newLength); > >> + byte[] newBuffer = new byte[newLength]; > >> + System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); > >> + buffer = newBuffer; > >> } > >> > >> /** > >> > >> > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] For > > additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] For additional
-
RE: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaUwe Schindler 2011-06-30, 13:26
We had an issue about this with FST's array growing in Mike's code, in facts ist *much* slower for generic Arrays' T[] copyOf(T[]...), with T extends Object (uses slow reflection).
For primitives it can only get faster in later JVMs, this is why we want to change all ArrayUtils.grow() to use this (and we don’t have a generic one there for above reason). ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf > Of Dawid Weiss > Sent: Thursday, June 30, 2011 3:11 PM > To: [EMAIL PROTECTED] > Subject: Re: svn commit: r1141510 - > /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe > ByteArrayOutputStream.java > > Arrays.copyOf(primitive) is actually System.arraycopy by default. If intrinsics > are used it can only get faster. For object types it will probably be a bit slower > for -client because of a runtime check for the component type. > > Dawid > > On Thu, Jun 30, 2011 at 3:05 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > > because on windows 32bit at least, -client is still the default on > > most jres out there. > > > > i realize people don't care about -client, but i will police > > foo[].clone() / arrays.copyOf etc to prevent problems. > > > > There are comments about this stuff on the relevant bug reports > > (oracle's site is down, sorry) linked to this issue. > > https://issues.apache.org/jira/browse/LUCENE-2674 > > > > Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I > > think we should always use arraycopy. > > > > On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer > > <[EMAIL PROTECTED]> wrote: > >> hmm are you concerned about the extra Math.min that happens in the > >> copyOf method? > >> I don't how that relates to "intrinsic" and java 1.7 > >> > >> I don't have strong feelings here just checking if you mix something > >> up in the comment you put there... I am happy to keep the old and now > >> current code > >> > >> simon > >> > >> On Thu, Jun 30, 2011 at 2:42 PM, <[EMAIL PROTECTED]> wrote: > >>> Author: rmuir > >>> Date: Thu Jun 30 12:42:17 2011 > >>> New Revision: 1141510 > >>> > >>> URL: http://svn.apache.org/viewvc?rev=1141510&view=rev > >>> Log: > >>> LUCENE-3239: remove use of slow Arrays.copyOf > >>> > >>> Modified: > >>> > >>> > lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf > >>> eByteArrayOutputStream.java > >>> > >>> Modified: > >>> > lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf > >>> eByteArrayOutputStream.java > >>> URL: > >>> > http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java > >>> > /org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510 > >>> &r1=1141509&r2=1141510&view=diff > >>> > =========================================================> =========> >>> =========> >>> --- > >>> > lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf > >>> eByteArrayOutputStream.java (original) > >>> +++ > lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/U > >>> +++ nsafeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011 > >>> @@ -2,7 +2,6 @@ package org.apache.lucene.util; > >>> > >>> import java.io.IOException; > >>> import java.io.OutputStream; > >>> -import java.util.Arrays; > >>> > >>> /** > >>> * Licensed to the Apache Software Foundation (ASF) under one or > >>> more @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream > >>> } > >>> > >>> private void grow(int newLength) { > >>> - buffer = Arrays.copyOf(buffer, newLength); > >>> + // It actually should be: (Java 1.7, when its intrinsic on all > >>> + machines) > >>> + // buffer = Arrays.copyOf(buffer, newLength); > >>> + byte[] newBuffer = new byte[newLength]; > >>> + System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); > >>> + buffer = newBuffer; > >>> } > >>> > >>> /** > >>> > >>> > >>> > >> > >> -------
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaSimon Willnauer 2011-06-30, 13:40
On Thu, Jun 30, 2011 at 3:26 PM, Uwe Schindler <[EMAIL PROTECTED]> wrote:
> We had an issue about this with FST's array growing in Mike's code, in facts ist *much* slower for generic Arrays' T[] copyOf(T[]...), with T extends Object (uses slow reflection). > > For primitives it can only get faster in later JVMs, this is why we want to change all ArrayUtils.grow() to use this (and we don’t have a generic one there for above reason). +1 - I don't see why this would be any slower... if we can get improvements we should go for it. The issues and bugreports are all for non-primitive copyOf methods so I don't see how this should affect us simon > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: [EMAIL PROTECTED] > > >> -----Original Message----- >> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf >> Of Dawid Weiss >> Sent: Thursday, June 30, 2011 3:11 PM >> To: [EMAIL PROTECTED] >> Subject: Re: svn commit: r1141510 - >> /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe >> ByteArrayOutputStream.java >> >> Arrays.copyOf(primitive) is actually System.arraycopy by default. If intrinsics >> are used it can only get faster. For object types it will probably be a bit slower >> for -client because of a runtime check for the component type. >> >> Dawid >> >> On Thu, Jun 30, 2011 at 3:05 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >> > because on windows 32bit at least, -client is still the default on >> > most jres out there. >> > >> > i realize people don't care about -client, but i will police >> > foo[].clone() / arrays.copyOf etc to prevent problems. >> > >> > There are comments about this stuff on the relevant bug reports >> > (oracle's site is down, sorry) linked to this issue. >> > https://issues.apache.org/jira/browse/LUCENE-2674 >> > >> > Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I >> > think we should always use arraycopy. >> > >> > On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer >> > <[EMAIL PROTECTED]> wrote: >> >> hmm are you concerned about the extra Math.min that happens in the >> >> copyOf method? >> >> I don't how that relates to "intrinsic" and java 1.7 >> >> >> >> I don't have strong feelings here just checking if you mix something >> >> up in the comment you put there... I am happy to keep the old and now >> >> current code >> >> >> >> simon >> >> >> >> On Thu, Jun 30, 2011 at 2:42 PM, <[EMAIL PROTECTED]> wrote: >> >>> Author: rmuir >> >>> Date: Thu Jun 30 12:42:17 2011 >> >>> New Revision: 1141510 >> >>> >> >>> URL: http://svn.apache.org/viewvc?rev=1141510&view=rev >> >>> Log: >> >>> LUCENE-3239: remove use of slow Arrays.copyOf >> >>> >> >>> Modified: >> >>> >> >>> >> lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf >> >>> eByteArrayOutputStream.java >> >>> >> >>> Modified: >> >>> >> lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf >> >>> eByteArrayOutputStream.java >> >>> URL: >> >>> >> http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java >> >>> >> /org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510 >> >>> &r1=1141509&r2=1141510&view=diff >> >>> >> =========================================================>> =========>> >>> =========>> >>> --- >> >>> >> lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf >> >>> eByteArrayOutputStream.java (original) >> >>> +++ >> lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/U >> >>> +++ nsafeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011 >> >>> @@ -2,7 +2,6 @@ package org.apache.lucene.util; >> >>> >> >>> import java.io.IOException; >> >>> import java.io.OutputStream; >> >>> -import java.util.Arrays; >> >>> >> >>> /** >> >>> * Licensed to the Apache Software Foundation (ASF) under one or >> >>> more @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream >> >>> } >> >>> >> >>> private void grow(int newLength) { >> >>> - buffer = Arrays.copyOf(buffer, newLength);
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaRobert Muir 2011-06-30, 14:44
I think Dawid is correct here... so we should change this back? still
personally when I see array clone() or copyOf() it makes me concerned, I know these are as fast as arraycopy in recent versions, but depending on which variant is used, and whether you use -server, can be slower... in general I just don't want us to have performance regressions on say windows 32bit over this stuff, personally I think arraycopy is a sure fire bet every time, but Ill concede the point that copyOf might not be slower for the primitive versions... I think in jdk7 we will not have this issue as -client sorta goes away in favor of the tiered thing? anyway, I think we should proceed with caution here as far as moving things over to copyOf, I don't see any evidence that its ever faster, but its definitely sometimes slower. On Jun 30, 2011 9:12 AM, "Dawid Weiss" <[EMAIL PROTECTED]> wrote: > Arrays.copyOf(primitive) is actually System.arraycopy by default. If > intrinsics are used it can only get faster. For object types it will > probably be a bit slower for -client because of a runtime check for > the component type. > > Dawid > > On Thu, Jun 30, 2011 at 3:05 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >> because on windows 32bit at least, -client is still the default on >> most jres out there. >> >> i realize people don't care about -client, but i will police >> foo[].clone() / arrays.copyOf etc to prevent problems. >> >> There are comments about this stuff on the relevant bug reports >> (oracle's site is down, sorry) linked to this issue. >> https://issues.apache.org/jira/browse/LUCENE-2674 >> >> Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I >> think we should always use arraycopy. >> >> On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer >> <[EMAIL PROTECTED]> wrote: >>> hmm are you concerned about the extra Math.min that happens in the >>> copyOf method? >>> I don't how that relates to "intrinsic" and java 1.7 >>> >>> I don't have strong feelings here just checking if you mix something >>> up in the comment you put there... I am happy to keep the old and now >>> current code >>> >>> simon >>> >>> On Thu, Jun 30, 2011 at 2:42 PM, <[EMAIL PROTECTED]> wrote: >>>> Author: rmuir >>>> Date: Thu Jun 30 12:42:17 2011 >>>> New Revision: 1141510 >>>> >>>> URL: http://svn.apache.org/viewvc?rev=1141510&view=rev >>>> Log: >>>> LUCENE-3239: remove use of slow Arrays.copyOf >>>> >>>> Modified: >>>> lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java >>>> >>>> Modified: lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java >>>> URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1=1141509&r2=1141510&view=diff >>>> =============================================================================>>>> --- lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java (original) >>>> +++ lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011 >>>> @@ -2,7 +2,6 @@ package org.apache.lucene.util; >>>> >>>> import java.io.IOException; >>>> import java.io.OutputStream; >>>> -import java.util.Arrays; >>>> >>>> /** >>>> * Licensed to the Apache Software Foundation (ASF) under one or more >>>> @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream >>>> } >>>> >>>> private void grow(int newLength) { >>>> - buffer = Arrays.copyOf(buffer, newLength); >>>> + // It actually should be: (Java 1.7, when its intrinsic on all machines) >>>> + // buffer = Arrays.copyOf(buffer, newLength); >>>> + byte[] newBuffer = new byte[newLength]; >>>> + System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); >>>> + buffer = newBuffer; >>>> } >>>> >>>> /** >>>> >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED]
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaSimon Willnauer 2011-06-30, 18:31
On Thu, Jun 30, 2011 at 4:44 PM, Robert Muir <[EMAIL PROTECTED]> wrote:
> I think Dawid is correct here... so we should change this back? still > personally when I see array clone() or copyOf() it makes me concerned, I > know these are as fast as arraycopy in recent versions, but depending on > which variant is used, and whether you use -server, can be slower... in > general I just don't want us to have performance regressions on say windows > 32bit over this stuff, personally I think arraycopy is a sure fire bet every > time, but Ill concede the point that copyOf might not be slower for the > primitive versions... I think in jdk7 we will not have this issue as -client > sorta goes away in favor of the tiered thing? anyway, I think we should > proceed with caution here as far as moving things over to copyOf, I don't > see any evidence that its ever faster, but its definitely sometimes slower. I don't seen any evidence that this is any slower though. simon > > On Jun 30, 2011 9:12 AM, "Dawid Weiss" <[EMAIL PROTECTED]> wrote: >> Arrays.copyOf(primitive) is actually System.arraycopy by default. If >> intrinsics are used it can only get faster. For object types it will >> probably be a bit slower for -client because of a runtime check for >> the component type. >> >> Dawid >> >> On Thu, Jun 30, 2011 at 3:05 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >>> because on windows 32bit at least, -client is still the default on >>> most jres out there. >>> >>> i realize people don't care about -client, but i will police >>> foo[].clone() / arrays.copyOf etc to prevent problems. >>> >>> There are comments about this stuff on the relevant bug reports >>> (oracle's site is down, sorry) linked to this issue. >>> https://issues.apache.org/jira/browse/LUCENE-2674 >>> >>> Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I >>> think we should always use arraycopy. >>> >>> On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer >>> <[EMAIL PROTECTED]> wrote: >>>> hmm are you concerned about the extra Math.min that happens in the >>>> copyOf method? >>>> I don't how that relates to "intrinsic" and java 1.7 >>>> >>>> I don't have strong feelings here just checking if you mix something >>>> up in the comment you put there... I am happy to keep the old and now >>>> current code >>>> >>>> simon >>>> >>>> On Thu, Jun 30, 2011 at 2:42 PM, <[EMAIL PROTECTED]> wrote: >>>>> Author: rmuir >>>>> Date: Thu Jun 30 12:42:17 2011 >>>>> New Revision: 1141510 >>>>> >>>>> URL: http://svn.apache.org/viewvc?rev=1141510&view=rev >>>>> Log: >>>>> LUCENE-3239: remove use of slow Arrays.copyOf >>>>> >>>>> Modified: >>>>> >>>>> lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java >>>>> >>>>> Modified: >>>>> lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java >>>>> URL: >>>>> http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1=1141509&r2=1141510&view=diff >>>>> >>>>> =============================================================================>>>>> --- >>>>> lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java >>>>> (original) >>>>> +++ >>>>> lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java >>>>> Thu Jun 30 12:42:17 2011 >>>>> @@ -2,7 +2,6 @@ package org.apache.lucene.util; >>>>> >>>>> import java.io.IOException; >>>>> import java.io.OutputStream; >>>>> -import java.util.Arrays; >>>>> >>>>> /** >>>>> * Licensed to the Apache Software Foundation (ASF) under one or more >>>>> @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream >>>>> } >>>>> >>>>> private void grow(int newLength) { >>>>> - buffer = Arrays.copyOf(buffer, newLength); >>>>> + // It actually should be: (Java 1.7, when its intrinsic on all >>>>> machines) >>>>> + // buffer = Arrays.copyOf(buffer, newLength);
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaDawid Weiss 2011-06-30, 18:50
> I don't seen any evidence that this is any slower though.
You need to run with -client (if the machine is a beast this is tricky because x64 will pick -server regardless of the command-line setting) and you need to be copying generic arrays. I think this can be shown -- a caliper benchmark would be perfect to demonstrate this in isolation; I may write one if I find a spare moment. Dawid ---------------------------------------------------------------------
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaMichael McCandless 2011-06-30, 18:57
I think it's important Lucene keeps good performance on "ordinary"
machines/envs. It's really quite dangerous that the active Lucene devs all use beasts for development/testing. We draw false conclusions. So we really should be testing with -client and if indeed generified Arrays.copyOf (and anything else) is risky in such envs we should not use it when System.arraycopy works more consistently. Mike McCandless http://blog.mikemccandless.com On Thu, Jun 30, 2011 at 2:50 PM, Dawid Weiss <[EMAIL PROTECTED]> wrote: >> I don't seen any evidence that this is any slower though. > > You need to run with -client (if the machine is a beast this is tricky > because x64 will pick -server regardless of the command-line setting) > and you need to be copying generic arrays. I think this can be shown > -- a caliper benchmark would be perfect to demonstrate this in > isolation; I may write one if I find a spare moment. > > Dawid > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ---------------------------------------------------------------------
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaDawid Weiss 2011-06-30, 19:09
> I think it's important Lucene keeps good performance on "ordinary"
> machines/envs. Not that this voice will help in anything, but I think the above is virtually impossible to achieve unless you have a bunch of machines, OSs and VMs to continually test on and a consistent set of benchmarks plotted over time... and of course check every single commit for regression over all these combinations. And even then you'd always find a case of something being faster or slower on some combination of hardware/ software; optimizing for these differences makes little sense to me (people struggling with performance on some weird software/hardware combination can always change the VM vendor or a VM switch). Sorry for being so pessimistically unconstructive... :( Dawid ---------------------------------------------------------------------
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaMichael McCandless 2011-06-30, 19:25
Fair enough, and I agree.
Though the least we could do is rotate in a Windows env, where Java runs with -client, to our Jenkins. But simple-to-follow rules like "Don't use Arrays.copyOf; use System.arraycopy instead" (if indeed System.arraycopy seems to generally not be slower) seem like a no-brainer. Why risk Arrays.copyOf, anytime? Shouldn't we never use it...? Mike McCandless http://blog.mikemccandless.com On Thu, Jun 30, 2011 at 3:09 PM, Dawid Weiss <[EMAIL PROTECTED]> wrote: >> I think it's important Lucene keeps good performance on "ordinary" >> machines/envs. > > Not that this voice will help in anything, but I think the above is > virtually impossible to achieve unless you have a bunch of machines, > OSs and VMs to continually test on and a consistent set of benchmarks > plotted over time... and of course check every single commit for > regression over all these combinations. And even then you'd always > find a case of something being faster or slower on some combination of > hardware/ software; optimizing for these differences makes little > sense to me (people struggling with performance on some weird > software/hardware combination can always change the VM vendor or a VM > switch). > > Sorry for being so pessimistically unconstructive... :( > > Dawid > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ---------------------------------------------------------------------
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaSimon Willnauer 2011-06-30, 20:45
On Thu, Jun 30, 2011 at 8:50 PM, Dawid Weiss
<[EMAIL PROTECTED]> wrote: >> I don't seen any evidence that this is any slower though. > > You need to run with -client (if the machine is a beast this is tricky > because x64 will pick -server regardless of the command-line setting) > and you need to be copying generic arrays. I think this can be shown > -- a caliper benchmark would be perfect to demonstrate this in > isolation; I may write one if I find a spare moment. this is what I want to see. I don't want to discuss based on some bug reported for a non-primitive version of copyOf thats all. its pointless to discuss if there is no evidence which I don't see. I am happy with arraycopy I would just have appreciated a discussion before backing the change out. simon > > Dawid > ---------------------------------------------------------------------
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaMichael McCandless 2011-06-30, 22:19
On Thu, Jun 30, 2011 at 4:45 PM, Simon Willnauer
<[EMAIL PROTECTED]> wrote: > On Thu, Jun 30, 2011 at 8:50 PM, Dawid Weiss > <[EMAIL PROTECTED]> wrote: >>> I don't seen any evidence that this is any slower though. >> >> You need to run with -client (if the machine is a beast this is tricky >> because x64 will pick -server regardless of the command-line setting) >> and you need to be copying generic arrays. I think this can be shown >> -- a caliper benchmark would be perfect to demonstrate this in >> isolation; I may write one if I find a spare moment. > > this is what I want to see. I don't want to discuss based on some bug > reported for a non-primitive version of copyOf thats all. > its pointless to discuss if there is no evidence which I don't see. I > am happy with arraycopy I would just have appreciated a discussion > before backing the change out. I think the burden of proof here is on Arrays.copyOf. Ie, until we can prove (through benchmarking in different envs) that it can be trusted, we should just stick with System.arraycopy to reduce the risk. Mike ---------------------------------------------------------------------
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaSimon Willnauer 2011-07-01, 05:47
On Fri, Jul 1, 2011 at 12:19 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote: > On Thu, Jun 30, 2011 at 4:45 PM, Simon Willnauer > <[EMAIL PROTECTED]> wrote: >> On Thu, Jun 30, 2011 at 8:50 PM, Dawid Weiss >> <[EMAIL PROTECTED]> wrote: >>>> I don't seen any evidence that this is any slower though. >>> >>> You need to run with -client (if the machine is a beast this is tricky >>> because x64 will pick -server regardless of the command-line setting) >>> and you need to be copying generic arrays. I think this can be shown >>> -- a caliper benchmark would be perfect to demonstrate this in >>> isolation; I may write one if I find a spare moment. >> >> this is what I want to see. I don't want to discuss based on some bug >> reported for a non-primitive version of copyOf thats all. >> its pointless to discuss if there is no evidence which I don't see. I >> am happy with arraycopy I would just have appreciated a discussion >> before backing the change out. > > I think the burden of proof here is on Arrays.copyOf. > > Ie, until we can prove (through benchmarking in different envs) that > it can be trusted, we should just stick with System.arraycopy to > reduce the risk. Mike I think we should do that but the real issue here is what if somebody comes up with any arbitrary method in the future claiming its slow we back out and use the "we think safe way" what if it is actually the other way around and copyOf is optimized by new VMs and the copyarray is slightly slower. If robert would not have reverted this commit we would have not discussed that her at all. I just want to prevent the "we should not do this" it might be a problem in the big picture while the microbenchmark doesn't show a difference. At some point we have to rely on the JVM. Even if we benchmark on all OS with various JVMs we can't prevent jvm bug based perf hits. While there was no bug reported for primitives here we don't have to be afraid of it either. I don't think its saver to use arraycopy at all its just a level of indirection safer but makes development more of a pain IMO. Just my $0.05 > > Mike > ---------------------------------------------------------------------
-
RE: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaUwe Schindler 2011-07-01, 06:33
Hi,
I don't understand the whole discussion here, so please compare these two implementations and tell me which one is faster. Please don't hurt me, if you don't want to see src.jar code from OpenJDK Java6 - just delete this mail if you don’t want to (the code here is licensed under GPL): Arrays.java: /** * Copies the specified array, truncating or padding with zeros (if necessary) * so the copy has the specified length. For all indices that are * valid in both the original array and the copy, the two arrays will * contain identical values. For any indices that are valid in the * copy but not the original, the copy will contain <tt>(byte)0</tt>. * Such indices will exist if and only if the specified length * is greater than that of the original array. * * @param original the array to be copied * @param newLength the length of the copy to be returned * @return a copy of the original array, truncated or padded with zeros * to obtain the specified length * @throws NegativeArraySizeException if <tt>newLength</tt> is negative * @throws NullPointerException if <tt>original</tt> is null * @since 1.6 */ public static byte[] copyOf(byte[] original, int newLength) { byte[] copy = new byte[newLength]; System.arraycopy(original, 0, copy, 0, Math.min(original.length, newLength)); return copy; } This is our implementation, simon replaced and Robert reverted (UnsafeByteArrayOutputStream): private void grow(int newLength) { // It actually should be: (Java 1.7, when its intrinsic on all machines) // buffer = Arrays.copyOf(buffer, newLength); byte[] newBuffer = new byte[newLength]; System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); buffer = newBuffer; } So please look at the code, where is a difference that could slow down, except the Math.min() call which is an intrinsic in almost every JDK on earth? The problem we are talking about here is only about the generic Object[] copyOf method and also affects e.g. *all* Collection.toArray() methods - they all use this code, so whenever you use ArrayList.toArray() or similar, the slow code is executed. This is why we replaced Collections.sort() by CollectionUtil.sort, that does no array copy. Simon & me were not willing to replace the reallocations in FST code (Mike you remember, we reverted that on your GIT repo when we did perf tests) and other parts in Lucene (there are only few of them). The idea was only to replace primitive type code to make it easier readable. And with later JDK code it could even get faster (not slower), if Oracle starts to add intrinsics for those new methods (and that’s Dawid and mine reason to change to copyTo for primitive types). In general, if you use Java SDK methods, that are as fast as ours, they always have a chance to get faster in later JDKs. So we should always prefer Java SDK methods, unless they are slower because their default impl is too generic or has too much safety checks or uses reflection. To come back to UnsafeByteArrayOutputStream: I would change the whole code, as I don’t like the allocation strategy in it (it's exponential, on every grow it doubles its size). We should change that to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar allocation strategy like in trunk. Then we can discuss about this problem again when Simon & me wants to change ArrayUtils.grow methods to use Arrays.copyOf... *g* [just joking, I will never ask again, because this discussion here is endless and does not bring us forward]. The other thing I don’t like in the new faceting module is duplication of vint code. Why don’t we change it to use DataInput/DataOutput and use Dawid's new In/OutStream wrapper for DataOutput everywhere. This would be much more streamlined with all the code we currently have. Then we can encode the payloads (or later docvalues) using the new UnsafeByteArrayOutputStream, wrapped with a OutputStreamDataOutput wrapper? Or maybe add a ByteArrayDataOutput class. Uwe (getting crazy) Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED]
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaMichael McCandless 2011-07-01, 13:04
On Fri, Jul 1, 2011 at 1:47 AM, Simon Willnauer
<[EMAIL PROTECTED]> wrote: > Mike I think we should do that but the real issue here is what if > somebody comes up with any arbitrary method in the future claiming its > slow we back out and use the "we think safe way" what if it is > actually the other way around and copyOf is optimized by new VMs and > the copyarray is slightly slower. I think we take it case by case? We do need to be careful when using low level ops in Java. In this specific case, we have been almost burned already by Arrays.copyOf, and never by System.arraycopy (that I'm aware of), so we should not cutover until we have some confidence that burning will not occur. Other cases will be different, but this one has enough history that the bias seems clear. > I just want > to prevent the "we should not do this" it might be a problem in the > big picture while the microbenchmark doesn't show a difference. At > some point we have to rely on the JVM. I agree, but generally the burden of proof is on the "new one". Just because we can use Java 1.6 code now doesn't mean we should blindly cutover to new stuff. System.arraycopy is tried & true and we've never hit a perf issue with it, in my memory. > Even if we benchmark on all OS with various JVMs we can't prevent jvm > bug based perf hits. Right, nothing is perfect here; this is just about mitigating risk. We were lucky to have tracked down this slowdown down last time, I think only because Robert was using a -client JVM at the time? > While there was no bug reported for primitives > here we don't have to be afraid of it either. I don't think its saver > to use arraycopy at all its just a level of indirection safer but > makes development more of a pain IMO. Yes it's a slight hassle (2 extra lines), but if this mitigates risk, it's worth it. That said, if we can convince ourselves that in the primitives only case, Arrays.copyOf has very little risk, then I'm OK w/ using it for those cases. Mike ---------------------------------------------------------------------
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaMichael McCandless 2011-07-01, 13:13
On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler <[EMAIL PROTECTED]> wrote:
> Hi, > > I don't understand the whole discussion here, so please compare these two implementations and tell me which one is faster. Please don't hurt me, if you don't want to see src.jar code from OpenJDK Java6 - just delete this mail if you don’t want to (the code here is licensed under GPL): This is the source code for a specific version of one specific Java impl. If we knew all Java impls simply implemented the primitive case using System.arraycopy (admittedly it's hard to imagine that they wouldn't!) then we are fine. > This is our implementation, simon replaced and Robert reverted (UnsafeByteArrayOutputStream): > > private void grow(int newLength) { > // It actually should be: (Java 1.7, when its intrinsic on all machines) > // buffer = Arrays.copyOf(buffer, newLength); > byte[] newBuffer = new byte[newLength]; > System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); > buffer = newBuffer; > } > > So please look at the code, where is a difference that could slow down, except the Math.min() call which is an intrinsic in almost every JDK on earth? Right, in this case (if you used OpenJDK 6) we are obviously OK. Not sure about other cases... > The problem we are talking about here is only about the generic Object[] copyOf method and also affects e.g. *all* Collection.toArray() methods - they all use this code, so whenever you use ArrayList.toArray() or similar, the slow code is executed. This is why we replaced Collections.sort() by CollectionUtil.sort, that does no array copy. Simon & me were not willing to replace the reallocations in FST code (Mike you remember, we reverted that on your GIT repo when we did perf tests) and other parts in Lucene (there are only few of them). The idea was only to replace primitive type code to make it easier readable. And with later JDK code it could even get faster (not slower), if Oracle starts to add intrinsics for those new methods (and that’s Dawid and mine reason to change to copyTo for primitive types). In general, if you use Java SDK methods, that are as fast as ours, they always have a chance to get faster in later JDKs. So we should always prefer Java SDK methods, unless they are slower because their default impl is too generic or has too much safety checks or uses reflection. OK I'm convinced (I think!) that for primitive types only, let's use Arrays.copyOf! > To come back to UnsafeByteArrayOutputStream: > > I would change the whole code, as I don’t like the allocation strategy in it (it's exponential, on every grow it doubles its size). We should change that to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar allocation strategy like in trunk. Then we can discuss about this problem again when Simon & me wants to change ArrayUtils.grow methods to use Arrays.copyOf... *g* [just joking, I will never ask again, because this discussion here is endless and does not bring us forward]. Well, it sounds like for primitive types, we can cutover ArrayUtils.grow methods. Then we can look @ the nightly bench the next day ;) But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it (almost) a dup of ByteArrayDataOutput? > The other thing I don’t like in the new faceting module is duplication of vint code. Why don’t we change it to use DataInput/DataOutput and use Dawid's new In/OutStream wrapper for DataOutput everywhere. This would be much more streamlined with all the code we currently have. Then we can encode the payloads (or later docvalues) using the new UnsafeByteArrayOutputStream, wrapped with a OutputStreamDataOutput wrapper? Or maybe add a ByteArrayDataOutput class. That sounds good! Uwe can you commit TODOs to the code w/ these ideas? Mike ---------------------------------------------------------------------
-
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.javaShai Erera 2011-07-01, 16:07
About the encoders package - there are several encoders there besides
VInt, so I wouldn't dispose of it so quickly. That said, I think we should definitely explore consolidating VInt with the core classes, and maybe write an encoder which delegate to them. Or, come up w/ a different approach for allowing to plug in different Encoders. I don't rule out anything, as long as we preserve functionality and capabilities. Shai On Friday, July 1, 2011, Michael McCandless <[EMAIL PROTECTED]> wrote: > On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler <[EMAIL PROTECTED]> wrote: >> Hi, >> >> I don't understand the whole discussion here, so please compare these two implementations and tell me which one is faster. Please don't hurt me, if you don't want to see src.jar code from OpenJDK Java6 - just delete this mail if you don’t want to (the code here is licensed under GPL): > > This is the source code for a specific version of one specific Java > impl. If we knew all Java impls simply implemented the primitive case > using System.arraycopy (admittedly it's hard to imagine that they > wouldn't!) then we are fine. > >> This is our implementation, simon replaced and Robert reverted (UnsafeByteArrayOutputStream): >> >> private void grow(int newLength) { >> // It actually should be: (Java 1.7, when its intrinsic on all machines) >> // buffer = Arrays.copyOf(buffer, newLength); >> byte[] newBuffer = new byte[newLength]; >> System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); >> buffer = newBuffer; >> } >> >> So please look at the code, where is a difference that could slow down, except the Math.min() call which is an intrinsic in almost every JDK on earth? > > Right, in this case (if you used OpenJDK 6) we are obviously OK. Not > sure about other cases... > >> The problem we are talking about here is only about the generic Object[] copyOf method and also affects e.g. *all* Collection.toArray() methods - they all use this code, so whenever you use ArrayList.toArray() or similar, the slow code is executed. This is why we replaced Collections.sort() by CollectionUtil.sort, that does no array copy. Simon & me were not willing to replace the reallocations in FST code (Mike you remember, we reverted that on your GIT repo when we did perf tests) and other parts in Lucene (there are only few of them). The idea was only to replace primitive type code to make it easier readable. And with later JDK code it could even get faster (not slower), if Oracle starts to add intrinsics for those new methods (and that’s Dawid and mine reason to change to copyTo for primitive types). In general, if you use Java SDK methods, that are as fast as ours, they always have a chance to get faster in later JDKs. So we should always prefer Java SDK methods, unless they are slower because their default impl is too generic or has too much safety checks or uses reflection. > > OK I'm convinced (I think!) that for primitive types only, let's use > Arrays.copyOf! > >> To come back to UnsafeByteArrayOutputStream: >> >> I would change the whole code, as I don’t like the allocation strategy in it (it's exponential, on every grow it doubles its size). We should change that to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar allocation strategy like in trunk. Then we can discuss about this problem again when Simon & me wants to change ArrayUtils.grow methods to use Arrays.copyOf... *g* [just joking, I will never ask again, because this discussion here is endless and does not bring us forward]. > > Well, it sounds like for primitive types, we can cutover > ArrayUtils.grow methods. Then we can look @ the nightly bench the > next day ;) > > But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it > (almost) a dup of ByteArrayDataOutput? > >> The other thing I don’t like in the new faceting module is duplication of vint code. Why don’t we change it to use DataInput/DataOutput and use Dawid's new In/OutStream wrapper for DataOutput everywhere. This would be much more streamlined with all the code we currently have. Then we can encode the payloads (or later docvalues) using the new UnsafeByteArrayOutputStream, wrapped with a OutputStreamDataOutput wrapper? Or maybe add a ByteArrayDataOutput class. |