Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Lucene, mail # dev - Avoiding segment merges during indexing


Copy link to this message
-
Re: Avoiding segment merges during indexing
Otis Gospodnetic 2005-08-11, 22:36
Kevin - are you saying that you can just comment out the 2 optimize()
calls and addIndexes(Directory[]) will keep working?  I don't recall
why there are optimize() calls again, but I know several people had
issues with it...

Otis

--- Kevin Oliver <[EMAIL PROTECTED]> wrote:

> This is a proposal that is in need of some insights.
>
> In an effort to speed up adding documents to an existing index, we
> are
> pursuing using IndexWriter.addIndexes(Directory[]). In theory this
> should work great -- you index your new documents into a new
> Directory,
> then add them into to your existing directory, saving you the time
> spent
> merging segments that would be caused by the normal
> IndexWriter.addDocument(Document) calls during indexing.
>
> However, addIndexes() has the property that it calls optimize() both
> before and after adding the new directories. This wipes out the
> performance boost, and then some.
>
> So I found a way to work around this, but I don't like what I've had
> to
> do and I was wondering if anybody has any ideas on what could be done
> to
> make this more pleasant.
>
> It appears that by getting the new segment files into the existing
> directory, with the correct segment names, it will work without all
> of
> the optimize calls. Unfortunately, getting the segment names right
> and
> getting the files into the right location is a big ugly hack and is
> quite fragile.
>
> Is there a better way? I think maybe some explanation into why the 2
> optimizes are there would help my understanding. Is there a clean way
> of
> doing what I'm proposing? Is there some hidden catch I'm missing and
> I've been going down the wrong path?
>
> It seems to me this would be a great benefit to anyone who does
> indexing
> on existing indexes and wants it to be fast.
>
> Thanks,
> Kevin Oliver
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------