|
|
-
How many <doc></doc> in the XML source file before indexing?
Bruno Mannina 2012-05-24, 07:49
Hi All,
Just a little question concerning the max number of
<add> <doc></doc> </add>
that I can write in the xml source file before indexing? only one, 10, 100, 1000, unlimited...?
I must indexed 80M docs so I can't create one xml file by doc.
thanks, Bruno
+
Bruno Mannina 2012-05-24, 07:49
-
Re: How many <doc></doc> in the XML source file before indexing?
Paul Libbrecht 2012-05-24, 07:51
Bruno, see the solrconfig.xml, you have all sorts of tweaks for this kind of things.
paul Le 24 mai 2012 à 09:49, Bruno Mannina a écrit :
> Hi All, > > Just a little question concerning the max number of > > <add> > <doc></doc> > </add> > > that I can write in the xml source file before indexing? only one, 10, 100, 1000, unlimited...? > > I must indexed 80M docs so I can't create one xml file by doc. > > thanks, > Bruno > > > >
+
Paul Libbrecht 2012-05-24, 07:51
-
Re: How many <doc></doc> in the XML source file before indexing?
Bruno Mannina 2012-05-24, 07:55
Sorry I just found : http://wiki.apache.org/solr/UpdateXmlMessagesI will take also a look to find the max number of <doc></doc>. Le 24/05/2012 09:51, Paul Libbrecht a �crit : > Bruno, > see the solrconfig.xml, you have all sorts of tweaks for this kind of things. > > paul > > > Le 24 mai 2012 � 09:49, Bruno Mannina a �crit : > >> Hi All, >> >> Just a little question concerning the max number of >> >> <add> >> <doc></doc> >> </add> >> >> that I can write in the xml source file before indexing? only one, 10, 100, 1000, unlimited...? >> >> I must indexed 80M docs so I can't create one xml file by doc. >> >> thanks, >> Bruno >> >> >> >> > >
+
Bruno Mannina 2012-05-24, 07:55
-
Re: How many <doc></doc> in the XML source file before indexing?
Bruno Mannina 2012-05-24, 08:58
I can't find my answer concerning the max number of <doc></doc> ? Can someone can tell me if there is no limit? Le 24/05/2012 09:55, Bruno Mannina a �crit : > Sorry I just found : http://wiki.apache.org/solr/UpdateXmlMessages> > I will take also a look to find the max number of <doc></doc>. > > Le 24/05/2012 09:51, Paul Libbrecht a �crit : >> Bruno, >> see the solrconfig.xml, you have all sorts of tweaks for this kind of >> things. >> >> paul >> >> >> Le 24 mai 2012 � 09:49, Bruno Mannina a �crit : >> >>> Hi All, >>> >>> Just a little question concerning the max number of >>> >>> <add> >>> <doc></doc> >>> </add> >>> >>> that I can write in the xml source file before indexing? only one, >>> 10, 100, 1000, unlimited...? >>> >>> I must indexed 80M docs so I can't create one xml file by doc. >>> >>> thanks, >>> Bruno >>> >>> >>> >>> >> >> > > >
+
Bruno Mannina 2012-05-24, 08:58
-
Re: How many <doc></doc> in the XML source file before indexing?
Michael Kuhlmann 2012-05-24, 09:46
There is no hard limit for the maximum nunmber of documents per update. It's only memory dependent. The smaller each document, and the more memory Solr can acquire, the more documents can you send in one update. However, I wouldn't pish it too jard anyway. If you can send, say, 100 documents per update, the you won't gain much if you send 200 documents instead, or even 1000. The number of requests don't count that much. And, if the update fails for some reason, then the whole request will be ignored. If you had sent 1000 documents in an update, and one of them had a field missing, for example, then it's hard to find out which one. Greetings, Michael Am 24.05.2012 10:58, schrieb Bruno Mannina: > I can't find my answer concerning the max number of <doc></doc> ? > > Can someone can tell me if there is no limit? > > Le 24/05/2012 09:55, Bruno Mannina a �crit : >> Sorry I just found : http://wiki.apache.org/solr/UpdateXmlMessages>> >> I will take also a look to find the max number of <doc></doc>. >> >> Le 24/05/2012 09:51, Paul Libbrecht a �crit : >>> Bruno, >>> see the solrconfig.xml, you have all sorts of tweaks for this kind of >>> things. >>> >>> paul >>> >>> >>> Le 24 mai 2012 � 09:49, Bruno Mannina a �crit : >>> >>>> Hi All, >>>> >>>> Just a little question concerning the max number of >>>> >>>> <add> >>>> <doc></doc> >>>> </add> >>>> >>>> that I can write in the xml source file before indexing? only one, >>>> 10, 100, 1000, unlimited...? >>>> >>>> I must indexed 80M docs so I can't create one xml file by doc. >>>> >>>> thanks, >>>> Bruno >>>> >>>> >>>> >>>> >>> >>> >> >> >> >
+
Michael Kuhlmann 2012-05-24, 09:46
-
Re: How many <doc></doc> in the XML source file before indexing?
Bruno Mannina 2012-05-24, 10:37
In fact it's not for an update but only for the first indexation. I mean, I will receive the full database with around 80M docs in some XML files (one per country in the world). From these 80M docs I will generate right XML format for each doc. (I don't need all fields from the source) And as actually for my test (12 000 docs), I generate one file per doc, there is no problem. But with 80M docs I can't generate one file per doc. It's for this reason I asked the max number of <doc> in a file <add>. For the first time, if a country file fails, no problem, I will check it and re-generate it. Is it bad to create a file with 5M <doc> ? Le 24/05/2012 11:46, Michael Kuhlmann a �crit : > There is no hard limit for the maximum nunmber of documents per update. > > It's only memory dependent. The smaller each document, and the more > memory Solr can acquire, the more documents can you send in one update. > > However, I wouldn't pish it too jard anyway. If you can send, say, 100 > documents per update, the you won't gain much if you send 200 > documents instead, or even 1000. The number of requests don't count > that much. > > And, if the update fails for some reason, then the whole request will > be ignored. If you had sent 1000 documents in an update, and one of > them had a field missing, for example, then it's hard to find out > which one. > > Greetings, > Michael > > Am 24.05.2012 10:58, schrieb Bruno Mannina: >> I can't find my answer concerning the max number of <doc></doc> ? >> >> Can someone can tell me if there is no limit? >> >> Le 24/05/2012 09:55, Bruno Mannina a �crit : >>> Sorry I just found : http://wiki.apache.org/solr/UpdateXmlMessages>>> >>> I will take also a look to find the max number of <doc></doc>. >>> >>> Le 24/05/2012 09:51, Paul Libbrecht a �crit : >>>> Bruno, >>>> see the solrconfig.xml, you have all sorts of tweaks for this kind of >>>> things. >>>> >>>> paul >>>> >>>> >>>> Le 24 mai 2012 � 09:49, Bruno Mannina a �crit : >>>> >>>>> Hi All, >>>>> >>>>> Just a little question concerning the max number of >>>>> >>>>> <add> >>>>> <doc></doc> >>>>> </add> >>>>> >>>>> that I can write in the xml source file before indexing? only one, >>>>> 10, 100, 1000, unlimited...? >>>>> >>>>> I must indexed 80M docs so I can't create one xml file by doc. >>>>> >>>>> thanks, >>>>> Bruno >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> > > >
+
Bruno Mannina 2012-05-24, 10:37
-
Re: How many <doc></doc> in the XML source file before indexing?
Michael Kuhlmann 2012-05-24, 11:29
Just try it! Maybe you're lucky, and it works with 80M docs. If each document takes 100 k, then it only needs 8 GB memory for indexing. However, I doubt it. I've not been too deeply into the UpdateHandler yet, but I think it first needs to parse the complete XML file before it starts to index. But that worst thing that can happen is an OOM exception. And when you need to split the xml files, then you can split into smaller chunks as well. Just a note: In Solr, you're always updating, even in the first indexation. There's no difference between updates and inserts. Greetings, Michael Am 24.05.2012 12:37, schrieb Bruno Mannina: > In fact it's not for an update but only for the first indexation. > > I mean, I will receive the full database with around 80M docs in some > XML files (one per country in the world). > From these 80M docs I will generate right XML format for each doc. (I > don't need all fields from the source) > > And as actually for my test (12 000 docs), I generate one file per doc, > there is no problem. > But with 80M docs I can't generate one file per doc. > > It's for this reason I asked the max number of <doc> in a file <add>. > > For the first time, if a country file fails, no problem, I will check it > and re-generate it. > > Is it bad to create a file with 5M <doc> ? > > > Le 24/05/2012 11:46, Michael Kuhlmann a �crit : >> There is no hard limit for the maximum nunmber of documents per update. >> >> It's only memory dependent. The smaller each document, and the more >> memory Solr can acquire, the more documents can you send in one update. >> >> However, I wouldn't pish it too jard anyway. If you can send, say, 100 >> documents per update, the you won't gain much if you send 200 >> documents instead, or even 1000. The number of requests don't count >> that much. >> >> And, if the update fails for some reason, then the whole request will >> be ignored. If you had sent 1000 documents in an update, and one of >> them had a field missing, for example, then it's hard to find out >> which one. >> >> Greetings, >> Michael >> >> Am 24.05.2012 10:58, schrieb Bruno Mannina: >>> I can't find my answer concerning the max number of <doc></doc> ? >>> >>> Can someone can tell me if there is no limit? >>> >>> Le 24/05/2012 09:55, Bruno Mannina a �crit : >>>> Sorry I just found : http://wiki.apache.org/solr/UpdateXmlMessages>>>> >>>> I will take also a look to find the max number of <doc></doc>. >>>> >>>> Le 24/05/2012 09:51, Paul Libbrecht a �crit : >>>>> Bruno, >>>>> see the solrconfig.xml, you have all sorts of tweaks for this kind of >>>>> things. >>>>> >>>>> paul >>>>> >>>>> >>>>> Le 24 mai 2012 � 09:49, Bruno Mannina a �crit : >>>>> >>>>>> Hi All, >>>>>> >>>>>> Just a little question concerning the max number of >>>>>> >>>>>> <add> >>>>>> <doc></doc> >>>>>> </add> >>>>>> >>>>>> that I can write in the xml source file before indexing? only one, >>>>>> 10, 100, 1000, unlimited...? >>>>>> >>>>>> I must indexed 80M docs so I can't create one xml file by doc. >>>>>> >>>>>> thanks, >>>>>> Bruno >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >> >> >> >
+
Michael Kuhlmann 2012-05-24, 11:29
-
Re: How many <doc></doc> in the XML source file before indexing?
Yonik Seeley 2012-05-24, 17:10
On Thu, May 24, 2012 at 7:29 AM, Michael Kuhlmann <[EMAIL PROTECTED]> wrote: > However, I doubt it. I've not been too deeply into the UpdateHandler yet, > but I think it first needs to parse the complete XML file before it starts > to index. Solr's update handlers all stream (XML, JSON, CSV), reading and indexing a document at a time from the input. -Yonik http://lucidimagination.com
+
Yonik Seeley 2012-05-24, 17:10
-
Re: How many <doc></doc> in the XML source file before indexing?
Bruno Mannina 2012-05-24, 11:45
humm... ok I will do the test as soon as receive the database. Thx a lot ! Le 24/05/2012 13:29, Michael Kuhlmann a �crit : > Just try it! > > Maybe you're lucky, and it works with 80M docs. If each document takes > 100 k, then it only needs 8 GB memory for indexing. > > However, I doubt it. I've not been too deeply into the UpdateHandler > yet, but I think it first needs to parse the complete XML file before > it starts to index. > > But that worst thing that can happen is an OOM exception. And when you > need to split the xml files, then you can split into smaller chunks as > well. > > Just a note: In Solr, you're always updating, even in the first > indexation. There's no difference between updates and inserts. > > Greetings, > Michael > > Am 24.05.2012 12:37, schrieb Bruno Mannina: >> In fact it's not for an update but only for the first indexation. >> >> I mean, I will receive the full database with around 80M docs in some >> XML files (one per country in the world). >> From these 80M docs I will generate right XML format for each doc. (I >> don't need all fields from the source) >> >> And as actually for my test (12 000 docs), I generate one file per doc, >> there is no problem. >> But with 80M docs I can't generate one file per doc. >> >> It's for this reason I asked the max number of <doc> in a file <add>. >> >> For the first time, if a country file fails, no problem, I will check it >> and re-generate it. >> >> Is it bad to create a file with 5M <doc> ? >> >> >> Le 24/05/2012 11:46, Michael Kuhlmann a �crit : >>> There is no hard limit for the maximum nunmber of documents per update. >>> >>> It's only memory dependent. The smaller each document, and the more >>> memory Solr can acquire, the more documents can you send in one update. >>> >>> However, I wouldn't pish it too jard anyway. If you can send, say, 100 >>> documents per update, the you won't gain much if you send 200 >>> documents instead, or even 1000. The number of requests don't count >>> that much. >>> >>> And, if the update fails for some reason, then the whole request will >>> be ignored. If you had sent 1000 documents in an update, and one of >>> them had a field missing, for example, then it's hard to find out >>> which one. >>> >>> Greetings, >>> Michael >>> >>> Am 24.05.2012 10:58, schrieb Bruno Mannina: >>>> I can't find my answer concerning the max number of <doc></doc> ? >>>> >>>> Can someone can tell me if there is no limit? >>>> >>>> Le 24/05/2012 09:55, Bruno Mannina a �crit : >>>>> Sorry I just found : http://wiki.apache.org/solr/UpdateXmlMessages>>>>> >>>>> I will take also a look to find the max number of <doc></doc>. >>>>> >>>>> Le 24/05/2012 09:51, Paul Libbrecht a �crit : >>>>>> Bruno, >>>>>> see the solrconfig.xml, you have all sorts of tweaks for this >>>>>> kind of >>>>>> things. >>>>>> >>>>>> paul >>>>>> >>>>>> >>>>>> Le 24 mai 2012 � 09:49, Bruno Mannina a �crit : >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> Just a little question concerning the max number of >>>>>>> >>>>>>> <add> >>>>>>> <doc></doc> >>>>>>> </add> >>>>>>> >>>>>>> that I can write in the xml source file before indexing? only one, >>>>>>> 10, 100, 1000, unlimited...? >>>>>>> >>>>>>> I must indexed 80M docs so I can't create one xml file by doc. >>>>>>> >>>>>>> thanks, >>>>>>> Bruno >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >> > > >
+
Bruno Mannina 2012-05-24, 11:45
-
Re: How many <doc></doc> in the XML source file before indexing?
Michael Kuhlmann 2012-05-24, 09:47
"pish it too jard" - sounds funny. :) I meant "push it too hard". Am 24.05.2012 11:46, schrieb Michael Kuhlmann: > There is no hard limit for the maximum nunmber of documents per update. > > It's only memory dependent. The smaller each document, and the more > memory Solr can acquire, the more documents can you send in one update. > > However, I wouldn't pish it too jard anyway. If you can send, say, 100 > documents per update, the you won't gain much if you send 200 documents > instead, or even 1000. The number of requests don't count that much. > > And, if the update fails for some reason, then the whole request will be > ignored. If you had sent 1000 documents in an update, and one of them > had a field missing, for example, then it's hard to find out which one. > > Greetings, > Michael > > Am 24.05.2012 10:58, schrieb Bruno Mannina: >> I can't find my answer concerning the max number of <doc></doc> ? >> >> Can someone can tell me if there is no limit? >> >> Le 24/05/2012 09:55, Bruno Mannina a �crit : >>> Sorry I just found : http://wiki.apache.org/solr/UpdateXmlMessages>>> >>> I will take also a look to find the max number of <doc></doc>. >>> >>> Le 24/05/2012 09:51, Paul Libbrecht a �crit : >>>> Bruno, >>>> see the solrconfig.xml, you have all sorts of tweaks for this kind of >>>> things. >>>> >>>> paul >>>> >>>> >>>> Le 24 mai 2012 � 09:49, Bruno Mannina a �crit : >>>> >>>>> Hi All, >>>>> >>>>> Just a little question concerning the max number of >>>>> >>>>> <add> >>>>> <doc></doc> >>>>> </add> >>>>> >>>>> that I can write in the xml source file before indexing? only one, >>>>> 10, 100, 1000, unlimited...? >>>>> >>>>> I must indexed 80M docs so I can't create one xml file by doc. >>>>> >>>>> thanks, >>>>> Bruno >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >
+
Michael Kuhlmann 2012-05-24, 09:47
|
|