|
not interesting
2012-05-07, 14:42
Dyer, James
2012-05-07, 18:46
not interesting
2012-05-08, 07:39
Dyer, James
2012-05-08, 14:17
Brent Mills
2012-05-10, 18:31
Mikhail Khludnev
2012-05-07, 15:00
not interesting
2012-05-07, 15:15
Mikhail Khludnev
2012-05-07, 15:19
|
-
Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?not interesting 2012-05-07, 14:42
I just upgraded from Solr 3.4 to Solr 3.6; I'm using the same
data-import.xml for both versions. The import functioned properly with 3.4. I'm using a nested entity to fetch authors associated with each document, and I'm using CachedSqlEntityProcessor to avoid hitting the DB an unreasonable number of times. However, when indexing, Solr indexes very slowly and appears to be fetching all authors in the DB for each document. The index should be ~500 megs; I aborted the indexing when it reached ~6gigs. If I comment out the nested author entity below, Solr will index normally. Am I missing something obvious or is this a bug? <document name="documents"> <entity name="document" dataSource="production" transformer="HTMLStripTransformer,TemplateTransformer,RegexTransformer" query="select id, ..., from document"> <field column="id" name="id"/> <field column="uid" name="uid" template="DOC${document.id}"/> <!-- more fields .. --> <entity name="author" dataSource="production" query="select cast(da.document_id as text) as document_id, a.id, a.name, a.signature from document_author da left outer join author a on a.id = da.author_id" cacheKey="document_id" cacheLookup="document.id" processor="CachedSqlEntityProcessor"> <field name="author_id" column="id" /> <field name="author" column="name" /> <field name="author_signature" column="signature" /> </entity> </entity> </document> Also posted at SO if you prefer to answer there: http://stackoverflow.com/questions/10482484/nested-cachedsqlentityprocessor-running-for-each-entity-row-with-solr-3-6 Kellen +
not interesting 2012-05-07, 14:42
-
RE: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?Dyer, James 2012-05-07, 18:46
Dear Kellen, Brent & Keith,
There now are fixes available for 2 cache-related bugs that unfortunately made their way into the 3.6.0 release. These were addressed on these 2 JIRA issues, which have been committed to the 3.6 branch (as of today): - https://issues.apache.org/jira/browse/SOLR-3430 - https://issues.apache.org/jira/browse/SOLR-3360 These problem were also affecting Trunk/4.x, with both fixes being committed to Trunk under SOLR-3430. Should Solr 3.6.1 be released, these fixes will become generally available at that time. They also will be part of the 4.0 release, which the Development Community hopes will be later this year. In the mean time, I am hoping each of you can test these fixes with your installation. The best way to do this is to get a fresh SVN checkout of the 3.6.1 branch (http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/), switch to the "solr" directory, then run "ant dist". I believe you need Ant 1.8 to build. If you are unable to build yourself, I put an *unofficial* shapshot of the DIH jar here: http://people.apache.org/~jdyer/unofficial/apache-solr-dataimporthandler-3.6.1-SNAPSHOT-r1335176.jar Please let me know if this solves your problems with DIH Caching, giving you the functionality you had with 3.5 and prior. Your feedback is greatly appreciatd. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: not interesting [mailto:[EMAIL PROTECTED]] Sent: Monday, May 07, 2012 9:43 AM To: [EMAIL PROTECTED] Subject: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6? I just upgraded from Solr 3.4 to Solr 3.6; I'm using the same data-import.xml for both versions. The import functioned properly with 3.4. I'm using a nested entity to fetch authors associated with each document, and I'm using CachedSqlEntityProcessor to avoid hitting the DB an unreasonable number of times. However, when indexing, Solr indexes very slowly and appears to be fetching all authors in the DB for each document. The index should be ~500 megs; I aborted the indexing when it reached ~6gigs. If I comment out the nested author entity below, Solr will index normally. Am I missing something obvious or is this a bug? <document name="documents"> <entity name="document" dataSource="production" transformer="HTMLStripTransformer,TemplateTransformer,RegexTransformer" query="select id, ..., from document"> <field column="id" name="id"/> <field column="uid" name="uid" template="DOC${document.id}"/> <!-- more fields .. --> <entity name="author" dataSource="production" query="select cast(da.document_id as text) as document_id, a.id, a.name, a.signature from document_author da left outer join author a on a.id = da.author_id" cacheKey="document_id" cacheLookup="document.id" processor="CachedSqlEntityProcessor"> <field name="author_id" column="id" /> <field name="author" column="name" /> <field name="author_signature" column="signature" /> </entity> </entity> </document> Also posted at SO if you prefer to answer there: http://stackoverflow.com/questions/10482484/nested-cachedsqlentityprocessor-running-for-each-entity-row-with-solr-3-6 Kellen +
Dyer, James 2012-05-07, 18:46
-
Re: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?not interesting 2012-05-08, 07:39
> In the mean time, I am hoping each of you can test these fixes with your installation. The best way to do this is to get a fresh SVN checkout of the 3.6.1 branch (http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/), switch to the "solr" directory, then run "ant dist". I believe you need Ant 1.8 to build.
> > If you are unable to build yourself, I put an *unofficial* shapshot of the DIH jar here: > http://people.apache.org/~jdyer/unofficial/apache-solr-dataimporthandler-3.6.1-SNAPSHOT-r1335176.jar I understood your suggestion to be that I should use the 3.6.1 dataimporthandler jars with my 3.6.0 installation. If that was correct, then this has not solved my issue. I have tried both the unofficial snapshot and my own built-from-source version of the jars. The behavior of DIH is the same; it fetches far more rows than it should, the index grows to a very large size, and indexing is very slow (10 minutes, 100000000 rows fetched, only 1500 documents processed). Kellen +
not interesting 2012-05-08, 07:39
-
RE: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?Dyer, James 2012-05-08, 14:17
Kellen,
I appreciate your trying this out. Is there any way you can provide your data-config.xml file? I'd really like to get to the bottom of this. Thanks. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: not interesting [mailto:[EMAIL PROTECTED]] Sent: Tuesday, May 08, 2012 2:39 AM To: [EMAIL PROTECTED] Subject: Re: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6? > In the mean time, I am hoping each of you can test these fixes with your installation. The best way to do this is to get a fresh SVN checkout of the 3.6.1 branch (http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/), switch to the "solr" directory, then run "ant dist". I believe you need Ant 1.8 to build. > > If you are unable to build yourself, I put an *unofficial* shapshot of the DIH jar here: > http://people.apache.org/~jdyer/unofficial/apache-solr-dataimporthandler-3.6.1-SNAPSHOT-r1335176.jar I understood your suggestion to be that I should use the 3.6.1 dataimporthandler jars with my 3.6.0 installation. If that was correct, then this has not solved my issue. I have tried both the unofficial snapshot and my own built-from-source version of the jars. The behavior of DIH is the same; it fetches far more rows than it should, the index grows to a very large size, and indexing is very slow (10 minutes, 100000000 rows fetched, only 1500 documents processed). Kellen +
Dyer, James 2012-05-08, 14:17
-
RE: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?Brent Mills 2012-05-10, 18:31
Hi James,
I just pulled down the newest nightly build of 4.0 and it solves an issue I had been having with solr ignoring the caching of the child entities. It was basically opening a new connection for each iteration even though everything was specified correctly. This was present in my previous build of 4.0 so it looks like you fixed it with one of those patches. Thanks for all your work on the DIH, the caching improvements are a big help with some of the things we will be rolling out in production soon. -Brent -----Original Message----- From: Dyer, James [mailto:[EMAIL PROTECTED]] Sent: Monday, May 07, 2012 1:47 PM To: [EMAIL PROTECTED] Cc: Brent Mills; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6? Dear Kellen, Brent & Keith, There now are fixes available for 2 cache-related bugs that unfortunately made their way into the 3.6.0 release. These were addressed on these 2 JIRA issues, which have been committed to the 3.6 branch (as of today): - https://issues.apache.org/jira/browse/SOLR-3430 - https://issues.apache.org/jira/browse/SOLR-3360 These problem were also affecting Trunk/4.x, with both fixes being committed to Trunk under SOLR-3430. Should Solr 3.6.1 be released, these fixes will become generally available at that time. They also will be part of the 4.0 release, which the Development Community hopes will be later this year. In the mean time, I am hoping each of you can test these fixes with your installation. The best way to do this is to get a fresh SVN checkout of the 3.6.1 branch (http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/), switch to the "solr" directory, then run "ant dist". I believe you need Ant 1.8 to build. If you are unable to build yourself, I put an *unofficial* shapshot of the DIH jar here: http://people.apache.org/~jdyer/unofficial/apache-solr-dataimporthandler-3.6.1-SNAPSHOT-r1335176.jar Please let me know if this solves your problems with DIH Caching, giving you the functionality you had with 3.5 and prior. Your feedback is greatly appreciatd. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: not interesting [mailto:[EMAIL PROTECTED]] Sent: Monday, May 07, 2012 9:43 AM To: [EMAIL PROTECTED] Subject: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6? I just upgraded from Solr 3.4 to Solr 3.6; I'm using the same data-import.xml for both versions. The import functioned properly with 3.4. I'm using a nested entity to fetch authors associated with each document, and I'm using CachedSqlEntityProcessor to avoid hitting the DB an unreasonable number of times. However, when indexing, Solr indexes very slowly and appears to be fetching all authors in the DB for each document. The index should be ~500 megs; I aborted the indexing when it reached ~6gigs. If I comment out the nested author entity below, Solr will index normally. Am I missing something obvious or is this a bug? <document name="documents"> <entity name="document" dataSource="production" transformer="HTMLStripTransformer,TemplateTransformer,RegexTransformer" query="select id, ..., from document"> <field column="id" name="id"/> <field column="uid" name="uid" template="DOC${document.id}"/> <!-- more fields .. --> <entity name="author" dataSource="production" query="select cast(da.document_id as text) as document_id, a.id, a.name, a.signature from document_author da left outer join author a on a.id = da.author_id" cacheKey="document_id" cacheLookup="document.id" processor="CachedSqlEntityProcessor"> <field name="author_id" column="id" /> <field name="author" column="name" /> <field name="author_signature" column="signature" /> </entity> </entity> </document> Also posted at SO if you prefer to answer there: http://stackoverflow.com/questions/10482484/nested-cachedsqlentityprocessor-running-for-each-entity-row-with-solr-3-6 Kellen +
Brent Mills 2012-05-10, 18:31
-
Re: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?Mikhail Khludnev 2012-05-07, 15:00
Hi,
it sounds like https://issues.apache.org/jira/browse/SOLR-3360 fix is committed, tests are on going. On Mon, May 7, 2012 at 6:42 PM, not interesting <[EMAIL PROTECTED]>wrote: > I just upgraded from Solr 3.4 to Solr 3.6; I'm using the same > data-import.xml for both versions. The import functioned properly with > 3.4. > > I'm using a nested entity to fetch authors associated with each > document, and I'm using CachedSqlEntityProcessor to avoid hitting the > DB an unreasonable number of times. However, when indexing, Solr > indexes very slowly and appears to be fetching all authors in the DB > for each document. The index should be ~500 megs; I aborted the > indexing when it reached ~6gigs. If I comment out the nested author > entity below, Solr will index normally. > > Am I missing something obvious or is this a bug? > > <document name="documents"> > <entity name="document" dataSource="production" > transformer="HTMLStripTransformer,TemplateTransformer,RegexTransformer" > query="select id, ..., from document"> > <field column="id" name="id"/> > <field column="uid" name="uid" template="DOC${document.id}"/> > <!-- more fields .. --> > <entity name="author" dataSource="production" > query="select > cast(da.document_id as text) as document_id, > a.id, a.name, a.signature from document_author da > left outer join author a on a.id = da.author_id" > cacheKey="document_id" > cacheLookup="document.id" > processor="CachedSqlEntityProcessor"> > <field name="author_id" column="id" /> > <field name="author" column="name" /> > <field name="author_signature" column="signature" /> > </entity> > </entity> > </document> > > Also posted at SO if you prefer to answer there: > > http://stackoverflow.com/questions/10482484/nested-cachedsqlentityprocessor-running-for-each-entity-row-with-solr-3-6 > > Kellen > -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics <http://www.griddynamics.com> <[EMAIL PROTECTED]> +
Mikhail Khludnev 2012-05-07, 15:00
-
Re: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?not interesting 2012-05-07, 15:15
> it sounds like
> https://issues.apache.org/jira/browse/SOLR-3360 > fix is committed, tests are on going. Hmm, I'm running solr behind tomcat; where can I configure Solr to use only a single thread for testing? Kellen +
not interesting 2012-05-07, 15:15
-
Re: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?Mikhail Khludnev 2012-05-07, 15:19
Your dataconfig.xml is already single threaded. Bug is in DIH 3.6.0 code.
There should be a link to the fixed jar in the comments. On Mon, May 7, 2012 at 7:15 PM, not interesting <[EMAIL PROTECTED]>wrote: > > it sounds like > > https://issues.apache.org/jira/browse/SOLR-3360 > > fix is committed, tests are on going. > > Hmm, I'm running solr behind tomcat; where can I configure Solr to use > only a single thread for testing? > > Kellen > -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics <http://www.griddynamics.com> <[EMAIL PROTECTED]> +
Mikhail Khludnev 2012-05-07, 15:19
|