|
|
-
Re: Plans to remove RAMDirectory?DM Smith 2011-12-20, 15:08
How about an issue to track this? I'd be glad to do it, but I'm not
really the "reporter" for it. -- DM On 12/20/2011 09:51 AM, Shai Erera wrote: > Thanks for the clarification Uwe. If the whole idea is a new RAMDirectory > implementation, that is more efficient, then it's ok. I think that the > ideas you write are interesting. > > Have you tried MMapDir for read access in comparison to RAMDirectory for a >> larger index >> > I have, and I support the decision not to use RAMDirectory for such cases. > BUT, MMapDir is not recommended for use on all platforms / JDKs. Second, it > cannot be used on e.g. HDFS. So sometimes RAMDirectory is the best you can > do. > > Again, if the whole idea is improving RAMDirectory's implementation, then > that I totally agree with and it makes sense. My point was that we should > not lose the ability to load indexes into RAM. > > Shai > > On Tue, Dec 20, 2011 at 3:36 PM, Uwe Schindler<[EMAIL PROTECTED]> wrote: > >> Hi,**** >> >> ** ** >> >> You misunderstood the whole thing. The idea was to maybe replace >> RAMDirectory by a �clone� of MMapDirectory that uses large >> DirectByteBuffers outside the JVM heap. The current RAMDirectory is very >> limited (buffersize hardcoded to 8 KB, if you have a 50 Gigabyte Index in >> this RAMDirectory, your GC simply drives crazy � we investigated this >> several times for customers. RAMDirectory was in fact several times slower >> than a simple disk-based MMapDir). Also the locking on the RAMFile class is >> horrible, as for large indexes you have to change buffer several times when >> seeking/reading/�, which does heavily locking. In contrast, MMapDir is >> completely lock-free!**** >> >> ** ** >> >> Until there is no replacement we will not remove it, but the current >> RAMDirectory is not useable for large indexes. That�s a limitation and the >> design of this class does not support anything else. It�s currently >> unfixable and instead of putting work into fixing it, the time should be >> spent in working on a new ByteBuffer-based RAMDir with larger blocs/blocs >> that merge or IOContext helping to calculate the file size before writing >> it (e.g. when triggering a merge you know the approximate size of the file >> before, so you can allocate a buffer that�s better than 8 Kilobytes). Also >> directByteBuffer helps to make GC happy, as the RAMdir is outside JVM heap. >> **** >> >> ** ** >> >> **� **Also, RAMDirectory is still more efficient than MMapDirectory, if >> you want to index (and then search) on a small (sometimes even transient) >> amount of data**** >> >> ** ** >> >> That�s not true, as RAMdir uses more time for switching buffers than >> reading the data. The proble m is that MMapDir does not support **writing** >> and that why we plan to improve this. Have you tried MMapDir for read >> access in comparison to RAMDirectory for a larger index, it outperforms >> several times (depending on OS and if file data is in FS cache already). >> The new directory will simply mimic the MMapIndexInput, add >> MMapIndexOutput, but not based on a mmaped buffer, instead a in-memory >> (Direct)ByteBuffer (outside or inside JVM heap � both will be supported). >> This simplifies code a lot.**** >> >> ** ** >> >> The discussions about the limitations of crappy RAMDirectory were >> discussed on conferences, sorry. We did **not**decide to remove it >> (without a patch/replacement). The whole �message� on the issue was that >> RAMDirectory is a bad idea. The recommended approach at the moment to >> handle large in-ram directories would be to use a tmpfs on Linux/Solaris >> and use MMapDir on top (for larger indexes). The MMap would then directly >> map the RAM of the underlying tmpfs.**** >> >> ** ** >> >> Uwe**** >> >> ** ** >> >> -----**** >> >> Uwe Schindler**** >> >> H.-H.-Meier-Allee 63, D-28213 Bremen**** >> >> http://www.thetaphi.de**** >> >> eMail: [EMAIL PROTECTED]**** >> >> ** ** >> >> *From:* Shai Erera [mailto:[EMAIL PROTECTED]] >> *Sent:* Tuesday, December 20, 2011 2:13 PM |