If you use a single thread then, yes, segments are sequential.
But if e.g. you are updating documents, then deletions (because a document
the corrupted segment will mean you don't drop the deletions.
> I deduce the transaction range not using the segment corrupted but the
> corrected segments. The transaction id is incremental and i imagine segment
> are saved sequentelly so if it is missing the segment 5 , reading the
> correct segment 4 i can find the maximunn transaction id A , reading the
> segment 6 i can find the minimum transaction id B so i can deduce the hole
> , the range is [A+1,B-1] ... making a query in db i reaload the
> corrisponding document and i add again in lucene this missing documents.
>
>
> 2017-03-23 15:28 GMT+01:00 Cristian Lorenzetto <
> [EMAIL PROTECTED]>:
>
>> I deduce the transaction range not using the segment corrupted but the
>> corrected segments. The transaction id is incremental and i imagine segment
>> are saved sequentelly so if it is missing the segment 5 , reading the
>> correct segment 4 i can find the maximunn transaction id A , reading the
>> segment 6 i can find the minimum transaction id B so i can deduce the hole
>> , the range is [A+1,B-1] ... making a query in db i reaload the
>> corrisponding document and i add again in lucene this missing documents.
>>
>>
>> 2017-03-23 15:17 GMT+01:00 Michael McCandless <[EMAIL PROTECTED]>
>> :
>>
>>> Lucene corruption should be rare and only due to bad hardware; if you
>>> are seeing otherwise we really should get to the root cause.
>>>
>>> Mapping documents to each segment will not be easy in general,
>>> especially if that segment is now corrupted so you can't search it.
>>>
>>> Documents lost because of power loss / OS crash while indexing can be
>>> more common, and its for that use case that the sequence numbers /
>>> transaction log should be helpful.
>>>
>>> Mike McCandless
>>>
>>>
http://blog.mikemccandless.com>>>
>>> On Thu, Mar 23, 2017 at 10:12 AM, Cristian Lorenzetto <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Yes exactly. I saw, working in the past in systems using lucene (for
>>>> example alfresco projects), lucene corruption happens sometimes and every
>>>> time the building requires a lot of times ... so i thougth a way for
>>>> accelerating the fixing of a corruption index. In addition there is a rare
>>>> case not described here ( If after a database commit lucene throws a
>>>> exception for exampe disk is full ) there is a possibility of a
>>>> disalignement from the database and the lucene index. With this system
>>>> these problems could be solved automatically. In database every row has a
>>>> property with trasaction id. So if i know in lucene is missing a segment 6
>>>> , corrisponds to transactions range[ 1000, 1050] so i can reload in a
>>>> query in database just corrisponding rows.
>>>>
>>>> 2017-03-23 14:59 GMT+01:00 Michael McCandless <
>>>> [EMAIL PROTECTED]>:
>>>>
>>>>> You should be able to use the sequence numbers returned by IndexWriter
>>>>> operations to "know" which operations made it into the commit and which did
>>>>> not, and then on disaster recovery replay only those operations that didn't
>>>>> make it?
>>>>>
>>>>> Mike McCandless
>>>>>
>>>>>
http://blog.mikemccandless.com>>>>>
>>>>> On Thu, Mar 23, 2017 at 5:53 AM, Cristian Lorenzetto <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Errata corridge/integration for questions related to previous my post
>>>>>>
>>>>>> I studied a bit this lucene classes for understanding:
>>>>>> 1) setCommitData is designed for versioning the index , not for
>>>>>> passing a transaction log. However if userdata is different for every
>>>>>> transactionid it is equivalent .
>>>>>> 2) NRT refresh automatically searcher/reader it dont call commit. I