Zack,
I'm really pleased to see this you expressing interest in this. Some points
that I'd like to make regarding the topic:
- Distributed Search isn't just hard, it's really hard.
- Lucene, and thus, Lucene.Net is a library, and anything which is
contributed to it should remain within that scope. If it's application
code, (eg SOLR) it should probably be done as a separate project.
- That said, many of the components of such a system would be perfect for
the Contrib library and could facilitate building different (and custom)
implementations of distributed search based on .NET. If you contribute the
reusable bits, you can maintain your design focus and whatever choices you
make about dependencies/stack in your application, and still help others to
get started if they want to do a different implementation.
- Just to state the obvious, the main value of a distributed search
application built around Lucene.Net (vs using an existing one based around
Java Lucene) is the ability to use custom Analyzers, Queries, Scorers, etc,
written in .NET vs Java. This can be a big deal and support for this should
be baked into the application. Consider a (secure?) administrative API for
pushing custom libs across the wire and loading them in an isolated
AppDomain, which interacts with the API service.
- To further state the obvious, I think you're correct about the
abstraction of IndexReader being too chatty, and that moving to a higher
level makes sense. A DistributedQuery and DistributedCollector
implementation is probably the right place to start.
- Please don't assume a .NET client on the other end! This could be a cool
product to use in polyglot environments. HTTP APIs are your friend. :)
- ZMQ makes a pretty good communication layer if HTTP doesn't work for you.
- Developing a language agnostic peer to peer API for the search nodes
would enable others to say, implement a version in Java, or ?? other
language which can fulfill the API. Could even create a hybrid engine with
Lucene based search only being one of the node types.
Thanks,
Troy
On Mon, Aug 20, 2012 at 9:37 AM, Zachary Gramana <[EMAIL PROTECTED]> wrote:
> Nick,
>
> Thanks for the feedback. You may not have been looking for this long and
> complex of a reply, but I wanted to share my thinking and validate some
> assumptions with the group before I get too much further down the road.
>
> Let me walk you through where my thinking is at, and see what you think.
>
> First, some observations:
>
> * MultiSearcher and RemoteSearchable is deprecated starting in Java Lucene
> starting with 3.1 (
>
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201106.mbox/%3C007001cc2c35$359afba0$a0d0f2e0$@thetaphi.de%3E),
> and for good reason. Not only does it have some bugs related to scoring,
> etc., i
>
> * IndexReader, as the service interface, results in excessive network
> chatter. Query, in my mind, sounds like the right abstraction. Parse an
> incoming query request once, distribute the query objects to core
> instances, then merge the results. IndexSearcher in 3.3 implements a merge
> TopDocs method, so this approach seems promising. This would also enable
> each core to use a request queue to handle concurrent requests. Query,
> Filter, etc., have been marked serlializable for a long time.
>
> * I like Solr's separated Web/Core approach. The remoting-based approaches
> buy into a few of the 8 fallacies of distributed computing. The web/core
> approach, not so much.
>
> * Java-Lucene has recently delegated distributed search to Solr (and
> ElasticSearch, Katta, IndexTank, etc) in v3.1 and later. This says (a)
> distributed search is hard, and (b) requires solving problems that are
> beyond the scope of Lucene. Unfortunately, this highlights the lack of a
> .NET Solr analog.
>
> These observations lead me to the following questions:
>
> 1. Jeez, it would be nice if we had a .NET Solr-ish project. Kidding,
> kidding. Kind of.
> 2. Should distributed search live in Contribs, or in another project