Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - SolrCloud distributed indexing (Re: anyone use hadoop+solr?)


Copy link to this message
-
Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)
Andrzej Bialecki 2010-09-06, 18:30
On 2010-09-06 16:41, Yonik Seeley wrote:
> On Mon, Sep 6, 2010 at 10:18 AM, MitchK<[EMAIL PROTECTED]>  wrote:
> [...consistent hashing...]
>> But it doesn't solve the problem at all, correct me if I am wrong, but: If
>> you add a new server, let's call him IP3-1, and IP3-1 is nearer to the
>> current ressource X, than doc x will be indexed at IP3-1 - even if IP2-1
>> holds the older version.
>> Am I right?
>
> Right.  You still need code to handle migration.
>
> Consistent hashing is a way for everyone to be able to agree on the
> mapping, and for the mapping to change incrementally.  i.e. you add a
> node and it only changes the docid->node mapping of a limited percent
> of the mappings, rather than changing the mappings of potentially
> everything, as a simple MOD would do.

Another strategy to avoid excessive reindexing is to keep splitting the
largest shards, and then your mapping becomes a regular MOD plus a list
of these additional splits. Really, there's an infinite number of ways
you could implement this...

>
> For SolrCloud, I don't think we'll end up using consistent hashing -
> we don't need it (although some of the concepts may still be useful).

I imagine there could be situations where a simple MOD won't do ;) so I
think it would be good to hide this strategy behind an
interface/abstract class. It costs nothing, and gives you flexibility in
how you implement this mapping.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com