Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Lucene, mail # dev - remove seek-back in terms dict / fold appending codec into default?


Copy link to this message
-
Re: remove seek-back in terms dict / fold appending codec into default?
Robert Muir 2012-06-26, 23:21
what's the concern? a read-once file we slurp in hurts nothing as far as
open file limits etc. I don't think we should be so damn crazy about # of
files anyway: we have cfs as a solution for that (unrelated)
On Jun 26, 2012 6:13 PM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote:

> On 26/06/2012 23:13, Michael McCandless wrote:
>
>> +1, if we can find some clean way of doing it that doesn't rely on
>> file length on read (ie, to seek backwards to the header).
>>
>
> I don't like the additional file idea, we already create too many files
> ... maybe record this in a segmentInfo attribute?
>
>  Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Jun 26, 2012 at 11:32 AM, Robert Muir <[EMAIL PROTECTED]> wrote:
>>
>>> Just looking at the previous thread, I wonder if we should consider
>>> removing AppendingCodec and just removing this seek stuff.
>>>
>>> Currently this is essentially metadata stuff in terms dict/index (e.g.
>>> terms dict field summary section and offsets for each field in terms
>>> index: https://builds.apache.org/job/**Lucene-trunk/javadoc/core/org/**
>>> apache/lucene/codecs/lucene40/**Lucene40PostingsFormat.html<https://builds.apache.org/job/Lucene-trunk/javadoc/core/org/apache/lucene/codecs/lucene40/Lucene40PostingsFormat.html>
>>> )
>>>
>>> I know the typical argument for keeping this stuff is that we would
>>> need to rely upon additional file operations (e.g. length), and we
>>> want to limit that, but this isn't the only possible solution, e.g. we
>>> could write a read-once file with this metadata thats just slurped in.
>>>
>>> And really relying upon seek at write could be viewed as just as bad
>>> as relying upon length, obviously we know some filesystems dont
>>> support it.
>>>
>>>
>>> --
>>> lucidimagination.com
>>>
>>> ------------------------------**------------------------------**
>>> ---------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]he.**org<[EMAIL PROTECTED]>
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>> ------------------------------**------------------------------**---------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]he.**org<[EMAIL PROTECTED]>
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>
>
>
> --
> Best regards,
> Andrzej Bialecki
> http://www.sigram.com, blog http://www.sigram.com/blog
>  ___.,___,___,___,_._. __________________<><_________**___________
> [___||.__|__/|__||\/|: Information Retrieval, System Integration
> ___|||__||..\|..||..|: Contact: info at sigram dot com
>
>
>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: [EMAIL PROTECTED]he.**org<[EMAIL PROTECTED]>
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>