|
|
Bram Rongen 2012-04-18, 12:17
Dear fellow Solr users, I've been using Solr for a very short time now and I'm stuck. I'm trying to index a drupal website consisting of 1.2 million smaller nodes and 300k larger nodes (~400kb avg).. I'm using Solr 3.5 on a dedicated Ubuntu 10.04 box with 3TB of diskspace and 16GB of memory. I've tried using the sun JRE and OpenJDK, both resulting in the same problem. Indexing works great until my .fdt file reaches the size of 4.9GB/ 5217987319b. At this point when Solr starts merging it just keeps on merging, starting over and over.. Java is using all the available memory even though Xmx is set at 8G. When I restart Solr everything looks fine until merging is triggered. Whenever it hangs the server load averages 3, searching is possible but slow, the solr admin interface is reachable but sending new documents leads to a time-out. I've tried using several different settings for MergePolicy and started reindexing a couple of times but the behavior stays the same. My current solrconf.xml can be found here: http://pastebin.com/NXDT0B8f. I'm unable to find errors in the log which makes it really difficult to debug.. Could anyone point me in the right direction? I've already asked my question on stackoverflow without receiving a solution: http://stackoverflow.com/questions/9993633/apache-solr-3-5-hangs-when-indexing. Maybe it can provide you with some more information. Kind regards! Bram Rongen
+
Bram Rongen 2012-04-18, 12:17
-
Re: Solr file size limit?
Shawn Heisey 2012-04-18, 20:37
On 4/18/2012 6:17 AM, Bram Rongen wrote: > I'm using Solr 3.5 on a dedicated Ubuntu 10.04 box with 3TB of diskspace > and 16GB of memory. I've tried using the sun JRE and OpenJDK, both > resulting in the same problem. Indexing works great until my .fdt file > reaches the size of 4.9GB/ 5217987319b. At this point when Solr starts > merging it just keeps on merging, starting over and over.. Java is using > all the available memory even though Xmx is set at 8G. When I restart Solr > everything looks fine until merging is triggered. Whenever it hangs the > server load averages 3, searching is possible but slow, the solr admin > interface is reachable but sending new documents leads to a time-out. Solr 3.5 works a little differently than previous versions (MMAPs all the index files), so if you look at the memory usage as reported by the OS, it's going to look all wrong. I've got my max heap set to 8192M, but this is what top looks like: Mem: 64937704k total, 58876376k used, 6061328k free, 379400k buffers Swap: 8388600k total, 77844k used, 8310756k free, 47080172k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22798 ncindex 20 0 75.6g 21g 12g S 1.0 34.3 14312:55 java If you add up the 47GB it says it's using for the disk cache, the 6GB that it says is free, and the 21GB it says that Java has resident, you end up with considerably more than the 64GB total RAM the machine has, even if you include the 77MB of swap that's used. You can use the jstat command to get a better idea of how much RAM java really is using: jstat -gc -t <pid> 5000 Add up the S0C, S1C, EC, OC, and PC columns. The alignment is often wrong on this output, so you'll have to count the columns. If I do this for my system, I end up with 8462972 KB. Alternatively, if you have a GUI installed on the server or you have set up remote JMX, you can use JConsole to very easily get a correct number. The extra memory reported by the OS is not really being used, it is a side effect of the memory mapping used by the Lucene indexes. > I've tried using several different settings for MergePolicy and started > reindexing a couple of times but the behavior stays the same. My current > solrconf.xml can be found here: http://pastebin.com/NXDT0B8f. I'm unable to > find errors in the log which makes it really difficult to debug.. Could > anyone point me in the right direction? A MergeFactor of 4 is extremely low and will result in very frequent merging. The default is 10. I use a value of 36, but that is unusually high. Looking at one of my indexes on that machine, the largest fdt file is 7657412 KB, the other three are tiny - 9880, 12160, and 28 KB. That index was recently optimized. The total index size is over 20GB. I have three indexes that size running in different cores on that machine. You're definitely not running into any limits as far as Solr is concerned. You might be running into I/O issues. Are you relying on autocommit, or explicitly committing your updates and waiting for the commit to finish before doing more updates? When there is segment merging, commits can take a really long time. If you are using autocommit or not waiting for manual commits to finish, it might get bad enough that one commit has not yet finished when another is ready to take place. I don't know what this would actually do, but it would not be a good situation. How have you created your 3TB of disk space? If you are using RAID5 or RAID6, you can run into very serious and unavoidable performance problems with writes. If it is a single disk, it may not provide enough IOPS for good performance. My servers also have 3TB of disk space, using six 1TB SATA drives in RAID10. The worst-case scenario for your merges is equivalent to an optimize. An optimize of one of my 20GB indexes takes 15 minutes even on RAID10, so I only optimize one large index once a day, so each large index gets optimized every six days. I hope this helps, but I'll be happy to try and offer more, within my skill set. Thanks, Shawn
+
Shawn Heisey 2012-04-18, 20:37
-
Re: Solr file size limit?
Shawn Heisey 2012-04-18, 20:54
On 4/18/2012 6:17 AM, Bram Rongen wrote: > I've been using Solr for a very short time now and I'm stuck. I'm trying to > index a drupal website consisting of 1.2 million smaller nodes and 300k > larger nodes (~400kb avg)..
A followup to my previous reply: Your ramBufferSizeMB is only 32, the default in the example config. I have seen recommendations indicating that going beyond 128MB is not usually helpful. With such large input documents, that may not apply to you - try setting it to 512 or 1024. That will result in far fewer index segments being created. They will be larger, so merges will be much less frequent but take longer.
Thanks, Shawn
+
Shawn Heisey 2012-04-18, 20:54
-
Re: Solr file size limit?
Bram Rongen 2012-04-19, 13:49
Hello Shawn, Thanks very much for your answer. Yesterday I've started indexing again but this time on Solr 3.6.. Again Solr is failing around the same time, but not exactly (now the largest fdt file is 4.8G).. It's right after the moment I receive memory-errors at the Drupal side which make me suspicious that it maybe has something to do with a huge document.. Is that possible? I was indexing 1500 documents at once every minute. Drupal builds them all up in memory before submitting them to Solr. At some point it runs out of memory and I have to switch to 10/20 documents per minute for a while.. then I can switch back to 1000 documents per minute. The disk is a software RAID1 over 2 disks. But I've also run into the same problem at another server.. This was a VM-server with only 1GB ram and 40GB of disk. With this server the merge-repeat happened at an earlier stage. I've also let Solr continue with merging for about two days before (in an earlier attempt), without submitting new documents. The merging kept repeating. Somebody suggested it could be because I'm using Jetty, could that be right? My schema.xml and solrconfig.xml can be found here: http://pastebin.com/GeBrB903 http://pastebin.com/Su8q1WAhKind regards, Bram Rongen On Wed, Apr 18, 2012 at 10:54 PM, Shawn Heisey <[EMAIL PROTECTED]> wrote: > On 4/18/2012 6:17 AM, Bram Rongen wrote: > >> I've been using Solr for a very short time now and I'm stuck. I'm trying >> to >> index a drupal website consisting of 1.2 million smaller nodes and 300k >> larger nodes (~400kb avg).. >> > > A followup to my previous reply: Your ramBufferSizeMB is only 32, the > default in the example config. I have seen recommendations indicating that > going beyond 128MB is not usually helpful. With such large input > documents, that may not apply to you - try setting it to 512 or 1024. That > will result in far fewer index segments being created. They will be > larger, so merges will be much less frequent but take longer. > > Thanks, > Shawn > >
+
Bram Rongen 2012-04-19, 13:49
-
Re: Solr file size limit?
Bram Rongen 2012-04-19, 13:56
I've discovered some documents are 100+MB in size.. Could this be the problem? On Thu, Apr 19, 2012 at 3:49 PM, Bram Rongen <[EMAIL PROTECTED]> wrote: > Hello Shawn, > > Thanks very much for your answer. > > Yesterday I've started indexing again but this time on Solr 3.6.. Again > Solr is failing around the same time, but not exactly (now the largest fdt > file is 4.8G).. It's right after the moment I receive memory-errors at the > Drupal side which make me suspicious that it maybe has something to do with > a huge document.. Is that possible? I was indexing 1500 documents at once > every minute. Drupal builds them all up in memory before submitting them to > Solr. At some point it runs out of memory and I have to switch to 10/20 > documents per minute for a while.. then I can switch back to 1000 documents > per minute. > > The disk is a software RAID1 over 2 disks. But I've also run into the same > problem at another server.. This was a VM-server with only 1GB ram and 40GB > of disk. With this server the merge-repeat happened at an earlier stage. > > I've also let Solr continue with merging for about two days before (in an > earlier attempt), without submitting new documents. The merging kept > repeating. > > Somebody suggested it could be because I'm using Jetty, could that be > right? > > My schema.xml and solrconfig.xml can be found here: > http://pastebin.com/GeBrB903 http://pastebin.com/Su8q1WAh> > Kind regards, > Bram Rongen > > > On Wed, Apr 18, 2012 at 10:54 PM, Shawn Heisey <[EMAIL PROTECTED]> wrote: > >> On 4/18/2012 6:17 AM, Bram Rongen wrote: >> >>> I've been using Solr for a very short time now and I'm stuck. I'm trying >>> to >>> index a drupal website consisting of 1.2 million smaller nodes and 300k >>> larger nodes (~400kb avg).. >>> >> >> A followup to my previous reply: Your ramBufferSizeMB is only 32, the >> default in the example config. I have seen recommendations indicating that >> going beyond 128MB is not usually helpful. With such large input >> documents, that may not apply to you - try setting it to 512 or 1024. That >> will result in far fewer index segments being created. They will be >> larger, so merges will be much less frequent but take longer. >> >> Thanks, >> Shawn >> >> >
+
Bram Rongen 2012-04-19, 13:56
-
Re: Solr file size limit?
Shawn Heisey 2012-04-19, 15:04
On 4/19/2012 7:49 AM, Bram Rongen wrote: > Yesterday I've started indexing again but this time on Solr 3.6.. Again > Solr is failing around the same time, but not exactly (now the largest fdt > file is 4.8G).. It's right after the moment I receive memory-errors at the > Drupal side which make me suspicious that it maybe has something to do with > a huge document.. Is that possible? I was indexing 1500 documents at once > every minute. Drupal builds them all up in memory before submitting them to > Solr. At some point it runs out of memory and I have to switch to 10/20 > documents per minute for a while.. then I can switch back to 1000 documents > per minute. > > The disk is a software RAID1 over 2 disks. But I've also run into the same > problem at another server.. This was a VM-server with only 1GB ram and 40GB > of disk. With this server the merge-repeat happened at an earlier stage. > > I've also let Solr continue with merging for about two days before (in an > earlier attempt), without submitting new documents. The merging kept > repeating. > > Somebody suggested it could be because I'm using Jetty, could that be right?
I am using Jetty for my Solr installation and it handles very large indexes without a problem. I have created a single index with all my data (nearly 70 million documents, total index size over 100GB). Aside from how long it takes to build and the fact that I don't have enough RAM to cache it for good performance, Solr handled it just fine. For production I use a distributed index on multiple servers.
I don't know why you are seeing a merge that continually restarts, that's truly odd. I've never used drupal, don't know a lot about it. From my small amount of research just now, I assume that it uses Tika, also another tool that I have no experience with. I am guessing that you store the entire text of your documents into solr, and that they are indexed up to a maximum of 10000 tokens (the default value of maxFieldLength in solrconfig.xml), based purely on speculation about the "body" field in your schema.
A document that's 100MB in size, if the whole thing gets stored, will completely overwhelm a 32MB buffer, and might even be enough to overwhelm a 256MB buffer as well, because it will basically have to build the entire index segment in RAM, with term vectors, indexed data, and stored data for all fields.
With such large documents, you may have to increase the maxFieldLength, or you won't be able to search on the entire document text. Depending on the content of those documents, it may or may not be a problem that only the first 10,000 tokens will get indexed. Large documents tend to be repetitive and there might not be any search value after the introduction and initial words. Your documents may be different, so you'll have to make that decision.
To test whether my current thoughts are right, I recommend that you try with the following settings during the initial full import: ramBufferSizeMB: 1024 (or maybe higher), autoCommit maxTime: 0, autoCommit maxDocs: 0. This will mean that unless the indexing process issues manual commits (either in the middle of indexing or at the end), you will have to do a manual one. Once you have the initial index built and it is only doing updates, you will probably be able to go back to using autoCommit.
It's possible that I have no understanding of the real problem here, and my recommendation above may result in no improvement. General recommendations, no matter what the current problem might be:
1) Get a lot more RAM. Ideally you want to have enough free memory to cache your entire index. That may not be possible, but you want to get as close to that goal as you can. 2) If you can, see what you can do to increase your IOPS. Using mirrored high RPM SAS is an easy solution, and might be slightly cheaper than SATA RAID10, which is my solution. SSD is easy and very fast, but expensive and not redundant -- I am currently not aware of any SSD RAID solutions that have OS TRIM support. RAID10 with high RPM SAS would be best, but very expensive. On the extreme high end, you could go with a high performance SAN.
Thanks, Shawn
+
Shawn Heisey 2012-04-19, 15:04
-
Re: Solr file size limit?
Lance Norskog 2012-04-20, 10:15
Good point! Do you store the large file in your documents, or just index them?
Do you have a "largest file" limit in your environment? Try this: ulimit -a
What is the "file size"?
On Thu, Apr 19, 2012 at 8:04 AM, Shawn Heisey <[EMAIL PROTECTED]> wrote: > On 4/19/2012 7:49 AM, Bram Rongen wrote: >> >> Yesterday I've started indexing again but this time on Solr 3.6.. Again >> Solr is failing around the same time, but not exactly (now the largest fdt >> file is 4.8G).. It's right after the moment I receive memory-errors at the >> Drupal side which make me suspicious that it maybe has something to do >> with >> a huge document.. Is that possible? I was indexing 1500 documents at once >> every minute. Drupal builds them all up in memory before submitting them >> to >> Solr. At some point it runs out of memory and I have to switch to 10/20 >> documents per minute for a while.. then I can switch back to 1000 >> documents >> per minute. >> >> The disk is a software RAID1 over 2 disks. But I've also run into the same >> problem at another server.. This was a VM-server with only 1GB ram and >> 40GB >> of disk. With this server the merge-repeat happened at an earlier stage. >> >> I've also let Solr continue with merging for about two days before (in an >> earlier attempt), without submitting new documents. The merging kept >> repeating. >> >> Somebody suggested it could be because I'm using Jetty, could that be >> right? > > > I am using Jetty for my Solr installation and it handles very large indexes > without a problem. I have created a single index with all my data (nearly > 70 million documents, total index size over 100GB). Aside from how long it > takes to build and the fact that I don't have enough RAM to cache it for > good performance, Solr handled it just fine. For production I use a > distributed index on multiple servers. > > I don't know why you are seeing a merge that continually restarts, that's > truly odd. I've never used drupal, don't know a lot about it. From my > small amount of research just now, I assume that it uses Tika, also another > tool that I have no experience with. I am guessing that you store the > entire text of your documents into solr, and that they are indexed up to a > maximum of 10000 tokens (the default value of maxFieldLength in > solrconfig.xml), based purely on speculation about the "body" field in your > schema. > > A document that's 100MB in size, if the whole thing gets stored, will > completely overwhelm a 32MB buffer, and might even be enough to overwhelm a > 256MB buffer as well, because it will basically have to build the entire > index segment in RAM, with term vectors, indexed data, and stored data for > all fields. > > With such large documents, you may have to increase the maxFieldLength, or > you won't be able to search on the entire document text. Depending on the > content of those documents, it may or may not be a problem that only the > first 10,000 tokens will get indexed. Large documents tend to be repetitive > and there might not be any search value after the introduction and initial > words. Your documents may be different, so you'll have to make that > decision. > > To test whether my current thoughts are right, I recommend that you try with > the following settings during the initial full import: ramBufferSizeMB: > 1024 (or maybe higher), autoCommit maxTime: 0, autoCommit maxDocs: 0. This > will mean that unless the indexing process issues manual commits (either in > the middle of indexing or at the end), you will have to do a manual one. > Once you have the initial index built and it is only doing updates, you > will probably be able to go back to using autoCommit. > > It's possible that I have no understanding of the real problem here, and my > recommendation above may result in no improvement. General recommendations, > no matter what the current problem might be: > > 1) Get a lot more RAM. Ideally you want to have enough free memory to cache > your entire index. That may not be possible, but you want to get as close
Lance Norskog [EMAIL PROTECTED]
+
Lance Norskog 2012-04-20, 10:15
-
Re: Solr file size limit?
Bram Rongen 2012-04-20, 12:03
Yeah, I'm indexing some PDF documents.. I've extracted the text through tika (pre-indexing).. and the largest field in my DB is 20MB. That's quite extensive ;) My Solution for the moment is to cut this text to the first 500KB, that should be enough for a decent index and search capabilities.. Should I increase the buffer size for these sizes as well or will 32MB suffice?
FYI, output of ulimit -a is core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 *file size (blocks, -f) unlimited* pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Kind regards! Bram
On Fri, Apr 20, 2012 at 12:15 PM, Lance Norskog <[EMAIL PROTECTED]> wrote:
> Good point! Do you store the large file in your documents, or just index > them? > > Do you have a "largest file" limit in your environment? Try this: > ulimit -a > > What is the "file size"? > > On Thu, Apr 19, 2012 at 8:04 AM, Shawn Heisey <[EMAIL PROTECTED]> wrote: > > On 4/19/2012 7:49 AM, Bram Rongen wrote: > >> > >> Yesterday I've started indexing again but this time on Solr 3.6.. Again > >> Solr is failing around the same time, but not exactly (now the largest > fdt > >> file is 4.8G).. It's right after the moment I receive memory-errors at > the > >> Drupal side which make me suspicious that it maybe has something to do > >> with > >> a huge document.. Is that possible? I was indexing 1500 documents at > once > >> every minute. Drupal builds them all up in memory before submitting them > >> to > >> Solr. At some point it runs out of memory and I have to switch to 10/20 > >> documents per minute for a while.. then I can switch back to 1000 > >> documents > >> per minute. > >> > >> The disk is a software RAID1 over 2 disks. But I've also run into the > same > >> problem at another server.. This was a VM-server with only 1GB ram and > >> 40GB > >> of disk. With this server the merge-repeat happened at an earlier stage. > >> > >> I've also let Solr continue with merging for about two days before (in > an > >> earlier attempt), without submitting new documents. The merging kept > >> repeating. > >> > >> Somebody suggested it could be because I'm using Jetty, could that be > >> right? > > > > > > I am using Jetty for my Solr installation and it handles very large > indexes > > without a problem. I have created a single index with all my data > (nearly > > 70 million documents, total index size over 100GB). Aside from how long > it > > takes to build and the fact that I don't have enough RAM to cache it for > > good performance, Solr handled it just fine. For production I use a > > distributed index on multiple servers. > > > > I don't know why you are seeing a merge that continually restarts, that's > > truly odd. I've never used drupal, don't know a lot about it. From my > > small amount of research just now, I assume that it uses Tika, also > another > > tool that I have no experience with. I am guessing that you store the > > entire text of your documents into solr, and that they are indexed up to > a > > maximum of 10000 tokens (the default value of maxFieldLength in > > solrconfig.xml), based purely on speculation about the "body" field in > your > > schema. > > > > A document that's 100MB in size, if the whole thing gets stored, will > > completely overwhelm a 32MB buffer, and might even be enough to > overwhelm a > > 256MB buffer as well, because it will basically have to build the entire > > index segment in RAM, with term vectors, indexed data, and stored data
+
Bram Rongen 2012-04-20, 12:03
-
Re: Solr file size limit?
Bram Rongen 2012-04-20, 12:09
Hmm, reading your reply again I see that Solr only uses the first 10k tokens from each field so field length should not be a problem per se.. It could be my document contain very large tokens and unorganized tokens, could this startle Solr?
On Fri, Apr 20, 2012 at 2:03 PM, Bram Rongen <[EMAIL PROTECTED]> wrote:
> Yeah, I'm indexing some PDF documents.. I've extracted the text through > tika (pre-indexing).. and the largest field in my DB is 20MB. That's quite > extensive ;) My Solution for the moment is to cut this text to the first > 500KB, that should be enough for a decent index and search capabilities.. > Should I increase the buffer size for these sizes as well or will 32MB > suffice? > > FYI, output of ulimit -a is > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 20 > *file size (blocks, -f) unlimited* > pending signals (-i) 16382 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) unlimited > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > > Kind regards! > Bram > > On Fri, Apr 20, 2012 at 12:15 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > >> Good point! Do you store the large file in your documents, or just index >> them? >> >> Do you have a "largest file" limit in your environment? Try this: >> ulimit -a >> >> What is the "file size"? >> >> On Thu, Apr 19, 2012 at 8:04 AM, Shawn Heisey <[EMAIL PROTECTED]> wrote: >> > On 4/19/2012 7:49 AM, Bram Rongen wrote: >> >> >> >> Yesterday I've started indexing again but this time on Solr 3.6.. Again >> >> Solr is failing around the same time, but not exactly (now the largest >> fdt >> >> file is 4.8G).. It's right after the moment I receive memory-errors at >> the >> >> Drupal side which make me suspicious that it maybe has something to do >> >> with >> >> a huge document.. Is that possible? I was indexing 1500 documents at >> once >> >> every minute. Drupal builds them all up in memory before submitting >> them >> >> to >> >> Solr. At some point it runs out of memory and I have to switch to 10/20 >> >> documents per minute for a while.. then I can switch back to 1000 >> >> documents >> >> per minute. >> >> >> >> The disk is a software RAID1 over 2 disks. But I've also run into the >> same >> >> problem at another server.. This was a VM-server with only 1GB ram and >> >> 40GB >> >> of disk. With this server the merge-repeat happened at an earlier >> stage. >> >> >> >> I've also let Solr continue with merging for about two days before >> (in an >> >> earlier attempt), without submitting new documents. The merging kept >> >> repeating. >> >> >> >> Somebody suggested it could be because I'm using Jetty, could that be >> >> right? >> > >> > >> > I am using Jetty for my Solr installation and it handles very large >> indexes >> > without a problem. I have created a single index with all my data >> (nearly >> > 70 million documents, total index size over 100GB). Aside from how >> long it >> > takes to build and the fact that I don't have enough RAM to cache it for >> > good performance, Solr handled it just fine. For production I use a >> > distributed index on multiple servers. >> > >> > I don't know why you are seeing a merge that continually restarts, >> that's >> > truly odd. I've never used drupal, don't know a lot about it. From my >> > small amount of research just now, I assume that it uses Tika, also >> another >> > tool that I have no experience with. I am guessing that you store the >> > entire text of your documents into solr, and that they are indexed up >> to a >> > maximum of 10000 tokens (the default value of maxFieldLength in
+
Bram Rongen 2012-04-20, 12:09
-
Re: Solr file size limit?
Bram Rongen 2012-05-10, 20:18
Hi Guys!
I've removed the two largest documents which were very large. One of which consisted of 1 field and was around 4MB (text)..
This fixed my issue..
Kind regards,
Bram Rongen
On Fri, Apr 20, 2012 at 2:09 PM, Bram Rongen <[EMAIL PROTECTED]> wrote:
> Hmm, reading your reply again I see that Solr only uses the first 10k > tokens from each field so field length should not be a problem per se.. It > could be my document contain very large tokens and unorganized tokens, > could this startle Solr? > > > On Fri, Apr 20, 2012 at 2:03 PM, Bram Rongen <[EMAIL PROTECTED]> wrote: > >> Yeah, I'm indexing some PDF documents.. I've extracted the text through >> tika (pre-indexing).. and the largest field in my DB is 20MB. That's quite >> extensive ;) My Solution for the moment is to cut this text to the first >> 500KB, that should be enough for a decent index and search capabilities.. >> Should I increase the buffer size for these sizes as well or will 32MB >> suffice? >> >> FYI, output of ulimit -a is >> core file size (blocks, -c) 0 >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 20 >> *file size (blocks, -f) unlimited* >> pending signals (-i) 16382 >> max locked memory (kbytes, -l) 64 >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size (512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 8192 >> cpu time (seconds, -t) unlimited >> max user processes (-u) unlimited >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> >> >> Kind regards! >> Bram >> >> On Fri, Apr 20, 2012 at 12:15 PM, Lance Norskog <[EMAIL PROTECTED]>wrote: >> >>> Good point! Do you store the large file in your documents, or just index >>> them? >>> >>> Do you have a "largest file" limit in your environment? Try this: >>> ulimit -a >>> >>> What is the "file size"? >>> >>> On Thu, Apr 19, 2012 at 8:04 AM, Shawn Heisey <[EMAIL PROTECTED]> wrote: >>> > On 4/19/2012 7:49 AM, Bram Rongen wrote: >>> >> >>> >> Yesterday I've started indexing again but this time on Solr 3.6.. >>> Again >>> >> Solr is failing around the same time, but not exactly (now the >>> largest fdt >>> >> file is 4.8G).. It's right after the moment I receive memory-errors >>> at the >>> >> Drupal side which make me suspicious that it maybe has something to do >>> >> with >>> >> a huge document.. Is that possible? I was indexing 1500 documents at >>> once >>> >> every minute. Drupal builds them all up in memory before submitting >>> them >>> >> to >>> >> Solr. At some point it runs out of memory and I have to switch to >>> 10/20 >>> >> documents per minute for a while.. then I can switch back to 1000 >>> >> documents >>> >> per minute. >>> >> >>> >> The disk is a software RAID1 over 2 disks. But I've also run into the >>> same >>> >> problem at another server.. This was a VM-server with only 1GB ram and >>> >> 40GB >>> >> of disk. With this server the merge-repeat happened at an earlier >>> stage. >>> >> >>> >> I've also let Solr continue with merging for about two days before >>> (in an >>> >> earlier attempt), without submitting new documents. The merging kept >>> >> repeating. >>> >> >>> >> Somebody suggested it could be because I'm using Jetty, could that be >>> >> right? >>> > >>> > >>> > I am using Jetty for my Solr installation and it handles very large >>> indexes >>> > without a problem. I have created a single index with all my data >>> (nearly >>> > 70 million documents, total index size over 100GB). Aside from how >>> long it >>> > takes to build and the fact that I don't have enough RAM to cache it >>> for >>> > good performance, Solr handled it just fine. For production I use a >>> > distributed index on multiple servers. >>> > >>> > I don't know why you are seeing a merge that continually restarts,
+
Bram Rongen 2012-05-10, 20:18
|
|