Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # dev - Katta's goodness for Solr


Copy link to this message
-
Katta's goodness for Solr
Otis Gospodnetic 2008-11-11, 18:15
Quick thought.  I saw Stefan's Katta presentation last night.  Katta seems nice and simple.  If I understood correctly, juicy stuff that is interesting to Solr is:
- Katta has a notion of a Primary Master and N Secondary Slaves (no SPOF there)
- Search Nodes serve index shards copied locally from some shared storage
- Zookeeper instances (again Primary Master and N Secondary Slaves) that facilitate communication among distributed components

The master:
-- knows how to distribute a set of index shards it is given across a number of search nodes (distribution policy pluggable, similar to Hadoop's, but different)
-- has a map of which shard is on which search node (in Zookeeper)
-- knows how to replicate each shard (replication factor configurable)
-- knows when a search node goes down (via Zookeeper notification)
-- knows how to create more replicas of shards on dead search node (and remove extra replicas when search node is revived)
-- can notify search nodes when a new index is available (via Zookeeper)

More in:
http://joa23.files.wordpress.com/2008/09/katta-overview.pdf

Paul Noble will like slide #13 ;)

In particular, I think that:
- Making use of Zookeper for index snapshot + replication might be useful (Master publishes the info about a new snapshot to Zookier and Search Slaves get notified immediately and start copying the index)
- Making use of Zookeper for keeping a map of index shards + applying a replication factor would be very useful
- Making use of pluggable shard placement policy would be useful

Thoughts?

Also:
While Katta provides shard->search server functionality via pluggable impl, what both Solr and Katta are still missing is the doc->shard functionality.  However, this might not be terribly hard if we do something similar to Katta's pluggable shard->search server distribution policy.  Please mind I'm saying this without having looked at any of the Katta code.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch