|
Michael McCandless
2010-02-24, 16:32
Michael Busch
2010-02-24, 16:40
Simon Willnauer
2010-02-24, 18:57
Steven A Rowe
2010-02-24, 18:44
Yonik Seeley
2010-02-24, 19:20
Chris Hostetter
2010-03-01, 19:00
Ted Dunning
2010-03-01, 19:26
Doug Cutting
2010-03-01, 20:00
Ted Dunning
2010-02-24, 19:36
Michael McCandless
2010-02-26, 20:20
Robert Muir
2010-02-26, 21:11
Simon Willnauer
2010-02-26, 21:15
Marvin Humphrey
2010-02-26, 21:24
Uwe Schindler
2010-02-26, 21:44
Mark Miller
2010-02-26, 21:53
Steven A Rowe
2010-02-26, 22:15
Yonik Seeley
2010-02-26, 22:20
Michael McCandless
2010-02-28, 10:57
Marvin Humphrey
2010-02-28, 17:32
Ian Holsman
2010-02-28, 16:07
Mattmann, Chris A
2010-02-28, 17:27
Shalin Shekhar Mangar
2010-02-28, 18:32
Mark Miller
2010-02-28, 18:43
Michael Busch
2010-02-28, 17:52
Grant Ingersoll
2010-03-01, 00:30
Michael Busch
2010-03-01, 05:05
Grant Ingersoll
2010-03-01, 14:28
Grant Ingersoll
2010-03-01, 14:33
Mattmann, Chris A
2010-03-01, 15:04
Mark Miller
2010-03-01, 15:27
Mattmann, Chris A
2010-03-01, 15:40
Mark Miller
2010-03-01, 15:54
Mattmann, Chris A
2010-03-01, 16:06
Mark Miller
2010-03-01, 16:11
Robert Muir
2010-03-01, 16:12
Mattmann, Chris A
2010-03-01, 16:20
Grant Ingersoll
2010-03-01, 16:57
Mattmann, Chris A
2010-03-01, 17:01
Michael McCandless
2010-03-01, 17:44
Chris Hostetter
2010-03-01, 18:43
Mattmann, Chris A
2010-03-01, 18:48
Mark Miller
2010-03-01, 19:27
Mattmann, Chris A
2010-03-01, 18:07
Michael McCandless
2010-03-01, 18:25
Mattmann, Chris A
2010-03-01, 18:28
Michael McCandless
2010-03-01, 18:46
patrick o'leary
2010-03-02, 08:26
Steven A Rowe
2010-03-01, 18:41
Mattmann, Chris A
2010-03-01, 18:46
Marvin Humphrey
2010-03-01, 17:58
Michael McCandless
2010-03-01, 18:03
Michael McCandless
2010-03-01, 18:38
Michael Busch
2010-03-01, 18:13
Michael McCandless
2010-03-01, 19:22
Robert Muir
2010-03-01, 17:02
Simon Willnauer
2010-03-01, 17:41
Grant Ingersoll
2010-03-01, 15:33
Mattmann, Chris A
2010-03-01, 15:44
Michael Busch
2010-03-01, 05:26
Mark Miller
2010-03-01, 13:25
Mark Miller
2010-02-28, 18:39
Mattmann, Chris A
2010-02-28, 18:55
Jason Rutherglen
2010-02-28, 21:04
Shalin Shekhar Mangar
2010-02-28, 17:20
Jason Rutherglen
2010-03-01, 16:55
Doug Cutting
2010-02-24, 19:09
Simon Willnauer
2010-02-24, 19:12
Ted Dunning
2010-02-24, 19:13
Grant Ingersoll
2010-02-24, 21:04
Uri Boness
2010-03-02, 17:39
Ard Schrijvers
2010-03-03, 09:21
Grant Ingersoll
2010-03-03, 14:06
Ard Schrijvers
2010-03-03, 14:21
Mattmann, Chris A
2010-03-02, 07:17
|
-
Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael McCandless 2010-02-24, 16:32
I think, in order to stop duplicating our analysis code across
Nutch/Solr/Lucene, we should separate out the analyzers into a standalone package, and maybe as its own sub-project under the Lucene tlp? The goal would be eventually to have a single source for all our analysis needs, and for all Lucene projects to eventually cutover to this source (deprecating their current analysis code). We could also at this time fix some of the known problems in the analysis APIs, eg that the Analyzer base class confusingly exposes both non-reuse and reuse APIs, that not all Analyzers are final, etc. What do people think...? Mike +
Michael McCandless 2010-02-24, 16:32
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael Busch 2010-02-24, 16:40
+1! I think that's the way to go. It's also confusing currently that
some analysers are in Lucene's core jar, and that there is an additional contrib analysis jar. Your proposal would solve this problem too. Michael On Feb 24, 2010, at 8:32 AM, Michael McCandless <[EMAIL PROTECTED] > wrote: > I think, in order to stop duplicating our analysis code across > Nutch/Solr/Lucene, we should separate out the analyzers into a > standalone package, and maybe as its own sub-project under the Lucene > tlp? > > The goal would be eventually to have a single source for all our > analysis needs, and for all Lucene projects to eventually cutover to > this source (deprecating their current analysis code). > > We could also at this time fix some of the known problems in the > analysis APIs, eg that the Analyzer base class confusingly exposes > both non-reuse and reuse APIs, that not all Analyzers are final, etc. > > What do people think...? > > Mike +
Michael Busch 2010-02-24, 16:40
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Simon Willnauer 2010-02-24, 18:57
Mike, thanks for moving out of the JIRA issue. For completeness I just
add the link to the issue where this thread started though. --> https://issues.apache.org/jira/browse/LUCENE-2279 I also think we need a solution for this problem but it does not seem to be that easy. Would moving the analysis be compatible with the lucene core having no dependencies? Not that I do not favor that solution I really think we should move all that out but I'm not sure about the place for this to live. My first impression would be a lucene contrib module but that would raise other issues like all solr committers then need access to that contrib. A new project would surely make sense but is also quite an overhead isn't it?! simon On Wed, Feb 24, 2010 at 5:40 PM, Michael Busch <[EMAIL PROTECTED]> wrote: > +1! I think that's the way to go. It's also confusing currently that some > analysers are in Lucene's core jar, and that there is an additional contrib > analysis jar. Your proposal would solve this problem too. > > Michael > > On Feb 24, 2010, at 8:32 AM, Michael McCandless <[EMAIL PROTECTED]> > wrote: > >> I think, in order to stop duplicating our analysis code across >> Nutch/Solr/Lucene, we should separate out the analyzers into a >> standalone package, and maybe as its own sub-project under the Lucene >> tlp? >> >> The goal would be eventually to have a single source for all our >> analysis needs, and for all Lucene projects to eventually cutover to >> this source (deprecating their current analysis code). >> >> We could also at this time fix some of the known problems in the >> analysis APIs, eg that the Analyzer base class confusingly exposes >> both non-reuse and reuse APIs, that not all Analyzers are final, etc. >> >> What do people think...? >> >> Mike > +
Simon Willnauer 2010-02-24, 18:57
-
RE: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Steven A Rowe 2010-02-24, 18:44
+1. We can call the project LuAnn :) - Steve
On 02/24/2010 at 11:40 AM, Michael Busch wrote: > +1! I think that's the way to go. It's also confusing currently that > some analysers are in Lucene's core jar, and that there is an > additional contrib analysis jar. Your proposal would solve this > problem too. > > Michael > > On Feb 24, 2010, at 8:32 AM, Michael McCandless > <[EMAIL PROTECTED] > > wrote: > > > > I think, in order to stop duplicating our analysis code across > > Nutch/Solr/Lucene, we should separate out the analyzers into a > > standalone package, and maybe as its own sub-project under the Lucene > > tlp? > > > > The goal would be eventually to have a single source for all our > > analysis needs, and for all Lucene projects to eventually cutover to > > this source (deprecating their current analysis code). > > > > We could also at this time fix some of the known problems in the > > analysis APIs, eg that the Analyzer base class confusingly exposes > > both non-reuse and reuse APIs, that not all Analyzers are final, etc. > > > > What do people think...? > > > > Mike +
Steven A Rowe 2010-02-24, 18:44
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Yonik Seeley 2010-02-24, 19:20
I've started to think that a merge of Solr and Lucene would be in the
best interest of both projects. Recently, Solr as pulled back from using Lucene trunk (or even the latest version), as the increased amount of change between releases (and in-between releases) made it impractical to deal with. This is a pretty big negative for Lucene, since Solr is the biggest Lucene user (where people are directly exposed to lucene for the express purpose of developing search features). I know Solr development has always benefited hugely from users using trunk, and Lucene trunk has now lost all the solr users. Some in Lucene development have expressed a desire to make Lucene more of a complete solution, rather than just a core full-text search library... things like a data schema, faceting, etc. The Lucene project already has an enterprise search platform with these features... that's Solr. Trying to pull popular pieces out of Solr makes life harder for Solr developers, brings our projects into conflict, and is often unsuccessful (witness the largely failed migration of FunctionQueries from Solr to Lucene). For Lucene to achieve the ultimate in usability for users, it can't require Java experience... it needs higher level abstractions provided by Solr. The other benefit to Lucene would be to bring features to developers much sooner... Solr has had features years before they were developed in Lucene, and currently has more developers working with it. Esp with Solr not using Lucene trunk, if a Solr developer wants a feature quickly, they cannot add it to Lucene (even if it might make sense there) since that introduces a big unpredictable lag - when that version of Lucene make it's way into Solr. The current divide is a bit unnatural. For maximum benefit of both projects, it seems like Solr and Lucene should essentially merge. Lucene core would essentially remain as it is, but: 1) Solr would go back to using Lucene's trunk 2) For new Solr features, there would be an effort to abstract it such that non-Solr users could use the functionality (faceting, field collapsing, etc) 3) For new Lucene features, there would be an effort to integrate it into Solr. 4) Releases would be synchronized... Lucene and Solr would release at the same time. -Yonik +
Yonik Seeley 2010-02-24, 19:20
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Chris Hostetter 2010-03-01, 19:00
: I've started to think that a merge of Solr and Lucene would be in the : best interest of both projects. As I already mentioned in my previous reply: I think there are incremental steps that could be made before we spend too much effort worrying if/how Solr develpment could be more tightly integrated with Lucene-Java development; or if Solr should be a TLP. But this is a different message, where I reply to the larger issue from a slightly differnet perspective just in case I get hit by a bus and don't get a chance to bring it up if/when the conversation wrrants it. It woulr probably be wise to consider the overall issues from an anthropological standpoint -- by looking at other (ASF) projects that have been in similar situations. On the one hand: "branching" projects so that sub project graduates from project and become their own TLPs seems to generally be the norm -- so maybe we should ask the (obvious) question of why? ... not all existing TLPs make good corrolaries, but (based on my limited understanding) it might make sense to look at something like "HTTPD vs APR" in particulr: HTTPD came first, and APR was refactored out of it -- but except for the ordering a similar relationship could be found between Solr and Lucene-Java (one is a server, the other is the core foundation on which it's built) ... so why did APR spin off into it's own TLP instead of just being a product developed/released in lock step with HTTPD? Conversly: Hadoop has lots of subprojects with divergent user communities, but (again: based on my limited understanding) they have moved towards doing development with some tight coupling in their release / compatibility processes. (ie: consistent version numbers between products based on the hadoop-core they use) ... how well does that work out for them? what issues do they face? Better understanding those situations could help us avoid making costly mistakes. -Hoss +
Chris Hostetter 2010-03-01, 19:00
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Ted Dunning 2010-03-01, 19:26
Hadoop is a strange beast. The Hadoop core itself has fractured into three
projects that have independent mailing lists but which share release dates. On Mon, Mar 1, 2010 at 11:00 AM, Chris Hostetter <[EMAIL PROTECTED]>wrote: > Conversly: Hadoop has lots of subprojects with divergent user > communities, but (again: based on my limited understanding) they > have moved towards doing development with some tight coupling in their > release / compatibility processes. (ie: consistent version numbers between > products based on the hadoop-core they use) ... how well does that work > out for them? what issues do they face? > -- Ted Dunning, CTO DeepDyve +
Ted Dunning 2010-03-01, 19:26
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Doug Cutting 2010-03-01, 20:00
Ted Dunning wrote:
> Hadoop is a strange beast. The Hadoop core itself has fractured into three > projects that have independent mailing lists but which share release dates. But without any releases yet. Is that "shared nothing"? The rationale for the Hadoop split was that the single codebase was too big and too active for developers to easily follow. Splitting dev lists was an initial step towards someday splitting into separate TLPs. The first post-split releases will be sync'd, but long term the expectation is that the release schedules may diverge. (This is all my opinion: I have but one vote on the Hadoop PMC.) Doug +
Doug Cutting 2010-03-01, 20:00
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Ted Dunning 2010-02-24, 19:36
This would have been a huge benefit to me about a year ago. We had to have
clustering (katta provided that), but we also really wanted many features that SOLR has. In the end, we went with clustering for scale and stability and rewrote/backported/punted on many of the other features. But this is a monumental ambition. I congratulate Yonik for imagining it. I would love to see it. It will be very difficult, however. On Wed, Feb 24, 2010 at 11:20 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > I've started to think that a merge of Solr and Lucene would be in the > best interest of both projects. > -- Ted Dunning, CTO DeepDyve +
Ted Dunning 2010-02-24, 19:36
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael McCandless 2010-02-26, 20:20
I think this is a good idea! LuSolr ;) (kidding)
I agree with all of your points Yonik. What do other people think...? Mike On Wed, Feb 24, 2010 at 2:20 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > I've started to think that a merge of Solr and Lucene would be in the > best interest of both projects. > > Recently, Solr as pulled back from using Lucene trunk (or even the > latest version), as the increased amount of change between releases > (and in-between releases) made it impractical to deal with. This is a > pretty big negative for Lucene, since Solr is the biggest Lucene user > (where people are directly exposed to lucene for the express purpose > of developing search features). I know Solr development has always > benefited hugely from users using trunk, and Lucene trunk has now lost > all the solr users. > > Some in Lucene development have expressed a desire to make Lucene more > of a complete solution, rather than just a core full-text search > library... things like a data schema, faceting, etc. The Lucene > project already has an enterprise search platform with these > features... that's Solr. Trying to pull popular pieces out of Solr > makes life harder for Solr developers, brings our projects into > conflict, and is often unsuccessful (witness the largely failed > migration of FunctionQueries from Solr to Lucene). For Lucene to > achieve the ultimate in usability for users, it can't require Java > experience... it needs higher level abstractions provided by Solr. > > The other benefit to Lucene would be to bring features to developers > much sooner... Solr has had features years before they were developed > in Lucene, and currently has more developers working with it. Esp > with Solr not using Lucene trunk, if a Solr developer wants a feature > quickly, they cannot add it to Lucene (even if it might make sense > there) since that introduces a big unpredictable lag - when that > version of Lucene make it's way into Solr. > > The current divide is a bit unnatural. For maximum benefit of both > projects, it seems like Solr and Lucene should essentially merge. > Lucene core would essentially remain as it is, but: > 1) Solr would go back to using Lucene's trunk > 2) For new Solr features, there would be an effort to abstract it such > that non-Solr users could use the functionality (faceting, field > collapsing, etc) > 3) For new Lucene features, there would be an effort to integrate it into Solr. > 4) Releases would be synchronized... Lucene and Solr would release at > the same time. > > -Yonik > +
Michael McCandless 2010-02-26, 20:20
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Robert Muir 2010-02-26, 21:11
+1
On Fri, Feb 26, 2010 at 3:20 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > I think this is a good idea! LuSolr ;) (kidding) > > I agree with all of your points Yonik. > > What do other people think...? > > Mike > > On Wed, Feb 24, 2010 at 2:20 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > I've started to think that a merge of Solr and Lucene would be in the > > best interest of both projects. > > > > Recently, Solr as pulled back from using Lucene trunk (or even the > > latest version), as the increased amount of change between releases > > (and in-between releases) made it impractical to deal with. This is a > > pretty big negative for Lucene, since Solr is the biggest Lucene user > > (where people are directly exposed to lucene for the express purpose > > of developing search features). I know Solr development has always > > benefited hugely from users using trunk, and Lucene trunk has now lost > > all the solr users. > > > > Some in Lucene development have expressed a desire to make Lucene more > > of a complete solution, rather than just a core full-text search > > library... things like a data schema, faceting, etc. The Lucene > > project already has an enterprise search platform with these > > features... that's Solr. Trying to pull popular pieces out of Solr > > makes life harder for Solr developers, brings our projects into > > conflict, and is often unsuccessful (witness the largely failed > > migration of FunctionQueries from Solr to Lucene). For Lucene to > > achieve the ultimate in usability for users, it can't require Java > > experience... it needs higher level abstractions provided by Solr. > > > > The other benefit to Lucene would be to bring features to developers > > much sooner... Solr has had features years before they were developed > > in Lucene, and currently has more developers working with it. Esp > > with Solr not using Lucene trunk, if a Solr developer wants a feature > > quickly, they cannot add it to Lucene (even if it might make sense > > there) since that introduces a big unpredictable lag - when that > > version of Lucene make it's way into Solr. > > > > The current divide is a bit unnatural. For maximum benefit of both > > projects, it seems like Solr and Lucene should essentially merge. > > Lucene core would essentially remain as it is, but: > > 1) Solr would go back to using Lucene's trunk > > 2) For new Solr features, there would be an effort to abstract it such > > that non-Solr users could use the functionality (faceting, field > > collapsing, etc) > > 3) For new Lucene features, there would be an effort to integrate it into > Solr. > > 4) Releases would be synchronized... Lucene and Solr would release at > > the same time. > > > > -Yonik > > > -- Robert Muir [EMAIL PROTECTED] +
Robert Muir 2010-02-26, 21:11
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Simon Willnauer 2010-02-26, 21:15
+1
So many people ask me when Solr will have all the lucene features and how quickly solr keeps up. If we can make it somehow I think it would be a huge improvement. Except of mark millers resume :) simon On Fri, Feb 26, 2010 at 10:11 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > +1 > > On Fri, Feb 26, 2010 at 3:20 PM, Michael McCandless < > [EMAIL PROTECTED]> wrote: > >> I think this is a good idea! LuSolr ;) (kidding) >> >> I agree with all of your points Yonik. >> >> What do other people think...? >> >> Mike >> >> On Wed, Feb 24, 2010 at 2:20 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: >> > I've started to think that a merge of Solr and Lucene would be in the >> > best interest of both projects. >> > >> > Recently, Solr as pulled back from using Lucene trunk (or even the >> > latest version), as the increased amount of change between releases >> > (and in-between releases) made it impractical to deal with. This is a >> > pretty big negative for Lucene, since Solr is the biggest Lucene user >> > (where people are directly exposed to lucene for the express purpose >> > of developing search features). I know Solr development has always >> > benefited hugely from users using trunk, and Lucene trunk has now lost >> > all the solr users. >> > >> > Some in Lucene development have expressed a desire to make Lucene more >> > of a complete solution, rather than just a core full-text search >> > library... things like a data schema, faceting, etc. The Lucene >> > project already has an enterprise search platform with these >> > features... that's Solr. Trying to pull popular pieces out of Solr >> > makes life harder for Solr developers, brings our projects into >> > conflict, and is often unsuccessful (witness the largely failed >> > migration of FunctionQueries from Solr to Lucene). For Lucene to >> > achieve the ultimate in usability for users, it can't require Java >> > experience... it needs higher level abstractions provided by Solr. >> > >> > The other benefit to Lucene would be to bring features to developers >> > much sooner... Solr has had features years before they were developed >> > in Lucene, and currently has more developers working with it. Esp >> > with Solr not using Lucene trunk, if a Solr developer wants a feature >> > quickly, they cannot add it to Lucene (even if it might make sense >> > there) since that introduces a big unpredictable lag - when that >> > version of Lucene make it's way into Solr. >> > >> > The current divide is a bit unnatural. For maximum benefit of both >> > projects, it seems like Solr and Lucene should essentially merge. >> > Lucene core would essentially remain as it is, but: >> > 1) Solr would go back to using Lucene's trunk >> > 2) For new Solr features, there would be an effort to abstract it such >> > that non-Solr users could use the functionality (faceting, field >> > collapsing, etc) >> > 3) For new Lucene features, there would be an effort to integrate it into >> Solr. >> > 4) Releases would be synchronized... Lucene and Solr would release at >> > the same time. >> > >> > -Yonik >> > >> > > > > -- > Robert Muir > [EMAIL PROTECTED] > +
Simon Willnauer 2010-02-26, 21:15
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Marvin Humphrey 2010-02-26, 21:24
On Fri, Feb 26, 2010 at 03:20:58PM -0500, Michael McCandless wrote:
> I think this is a good idea! LuSolr ;) (kidding) > > I agree with all of your points Yonik. > > What do other people think...? My ideal would be to go the opposite direction: shrink Lucene to a minimal specification, and put all serious functionality into plugins. On the other hand, making giant bloatware official policy seems like the natural progression for Lucene. ;) Marvin Humphrey +
Marvin Humphrey 2010-02-26, 21:24
-
RE: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Uwe Schindler 2010-02-26, 21:44
-1, I dont use Solr, I still want to be able to use Lucene without any Solr bloat! I tend to Marvin's comment.
Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] > -----Original Message----- > From: Marvin Humphrey [mailto:[EMAIL PROTECTED]] > Sent: Friday, February 26, 2010 10:24 PM > To: [EMAIL PROTECTED] > Subject: Re: Factor out a standalone, shared analysis package for > Nutch/Solr/Lucene? > > On Fri, Feb 26, 2010 at 03:20:58PM -0500, Michael McCandless wrote: > > I think this is a good idea! LuSolr ;) (kidding) > > > > I agree with all of your points Yonik. > > > > What do other people think...? > > My ideal would be to go the opposite direction: shrink Lucene to a > minimal > specification, and put all serious functionality into plugins. > > On the other hand, making giant bloatware official policy seems like > the > natural progression for Lucene. ;) > > Marvin Humphrey +
Uwe Schindler 2010-02-26, 21:44
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mark Miller 2010-02-26, 21:53
You would still be able to. I still have some misgivings too, but this
should not be one of them. Lucene would still exist without Solr for those that don't use Solr. On 02/26/2010 04:44 PM, Uwe Schindler wrote: > -1, I dont use Solr, I still want to be able to use Lucene without any Solr bloat! I tend to Marvin's comment. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: [EMAIL PROTECTED] > > > >> -----Original Message----- >> From: Marvin Humphrey [mailto:[EMAIL PROTECTED]] >> Sent: Friday, February 26, 2010 10:24 PM >> To: [EMAIL PROTECTED] >> Subject: Re: Factor out a standalone, shared analysis package for >> Nutch/Solr/Lucene? >> >> On Fri, Feb 26, 2010 at 03:20:58PM -0500, Michael McCandless wrote: >> >>> I think this is a good idea! LuSolr ;) (kidding) >>> >>> I agree with all of your points Yonik. >>> >>> What do other people think...? >>> >> My ideal would be to go the opposite direction: shrink Lucene to a >> minimal >> specification, and put all serious functionality into plugins. >> >> On the other hand, making giant bloatware official policy seems like >> the >> natural progression for Lucene. ;) >> >> Marvin Humphrey >> > > -- - Mark http://www.lucidimagination.com +
Mark Miller 2010-02-26, 21:53
-
RE: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Steven A Rowe 2010-02-26, 22:15
On 02/24/2010 at 2:20 PM, Yonik Seeley wrote:
> I've started to think that a merge of Solr and Lucene would be in the > best interest of both projects. The Sorlucene :) merger could be achieved virtually, i.e. via policy, rather than physically merging: 1. Transfer Solr stuff that logically belongs in Lucene over to Lucene. 2. Make Solr depend on Lucene trunk. 3. Block any future commits to either project that don't have a coordinating change for the other project. 4. Coordinate releases. Done. Steve +
Steven A Rowe 2010-02-26, 22:15
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Yonik Seeley 2010-02-26, 22:20
On Fri, Feb 26, 2010 at 5:15 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote:
> On 02/24/2010 at 2:20 PM, Yonik Seeley wrote: >> I've started to think that a merge of Solr and Lucene would be in the >> best interest of both projects. > > The Sorlucene :) merger could be achieved virtually, i.e. via policy, rather than physically merging: Everything is virtual here anyway :-) I agree with Mike that a single dev list is highly desirable. There would still be separate downloads. What to do with some of the other stuff is unspecified. Committers would need to be merged though - that's the only way to make a change across projects w/o breaking stuff. -Yonik +
Yonik Seeley 2010-02-26, 22:20
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael McCandless 2010-02-28, 10:57
To make this more concrete, I think this is roughly what's being
proposed: * Merging the dev lists into a single list. * Merging committers. * When a change it committed to Lucene, it must pass all Solr tests. * Release both at once. These things would not change: * Most importantly, the source code would remain factored into separate dirs/modules. * User's lists should remain separate. * Web sites would remain separate. * Solr & Lucene are still separate downloads, separate JARs, seperate subdirs in the source tree, etc. The outside world still sees Solr & Lucene as separate entities. It's only that they would now be developed/released in synchrony. There are some important gains by doing this: * Single source for all the code dup we now have across the projects (my original reason, specifically on analyzers, for starting this). * Whenever a new feature is added to Lucene, we'd work through what the impact is to Solr. This can still mean we separately develop exposure in Solr, but it'd get us to at least more immediately think about it. * Solr is Lucene's biggest direct user -- most people who use Lucene use it through Solr -- so having it more closely integrated means we know sooner if we broke something. * Right now I could test whether flex breaks anything in Solr. I can't do that now since Solr is isn't upgraded to 3.1. Recent big changes (eg segment based searching, Version, attr based tokenstream api) caused alot of work in Solr that could've been much smoother had Solr "been there" as we were working through them. Recent new features, eg near-real-time search, which are unavailable in Solr still, would have at least had some discussion about how to expose this in Solr. Over time (and we don't have to do this right on day 1) we can make core capabilities available to pure Lucene. EG core Lucene users should be able to use faceting, use a schema, etc. I think this idea makes alot of sense and I think now is a good time to do it. Yes, this a big change, but I think the gains are sizable. As Lucene & Solr diverge more, it'll only become harder and harder to merge. Robert's massive patch on SOLR-1657, upgrading most Solr's analyzers to 3.0, is aging... while other changes to analyzers are being proposed (SOLR-1799). If we were integrated (or at least single source for analyzers), Robert would already have committed it. Mike On Fri, Feb 26, 2010 at 5:20 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Fri, Feb 26, 2010 at 5:15 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote: >> On 02/24/2010 at 2:20 PM, Yonik Seeley wrote: >>> I've started to think that a merge of Solr and Lucene would be in the >>> best interest of both projects. >> >> The Sorlucene :) merger could be achieved virtually, i.e. via policy, rather than physically merging: > > Everything is virtual here anyway :-) > I agree with Mike that a single dev list is highly desirable. There > would still be separate downloads. What to do with some of the other > stuff is unspecified. > > Committers would need to be merged though - that's the only way to > make a change across projects w/o breaking stuff. > > -Yonik > +
Michael McCandless 2010-02-28, 10:57
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Marvin Humphrey 2010-02-28, 17:32
On Sun, Feb 28, 2010 at 05:57:05AM -0500, Michael McCandless wrote:
> Robert's massive patch on SOLR-1657, upgrading most Solr's analyzers > to 3.0, is aging... while other changes to analyzers are being > proposed (SOLR-1799). If we were integrated (or at least single > source for analyzers), Robert would already have committed it. Is Analyzer's interface mature and stable enough to break out? Massive patches which can't be applied easily... that doesn't seem like a good sign. On the other hand, if Analyzers are installed independently, they can have their own version, which could advance independently of Lucene. The need for matchVersion would go away in the context of analysis, to be replaced by a traditional versioning system which I think users would find easier to grok. Marvin Humphrey +
Marvin Humphrey 2010-02-28, 17:32
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Ian Holsman 2010-02-28, 16:07
I'm not a committer here (or on SOLR), so I can't vote, but I'm
generally against this. but on the flip side I've been using SOLR for quite a while. firstly SOLR is not the only application that uses lucene as a webservice. waiting for SOLR developers to implement re-factorings and changes made to the core will hamper lucene development. and things like katta, elastic search, neo4j, and zoie will be treated like 2nd class citizens and suffer. It will also hamper innovative new developments, as now 'oh.. this will break SOLR', or 'SOLR can't use that easily' will stop them. I'm curious how the NRT enhancements and payload changes would have gone if they had to wait for SOLR to change stuff to make them work. and most of the SOLR dev's are on the lucene dev list anyway. SOLR should just be treated like any API user of lucene and lucene should not be limited by SOLR. as for the original reason.. I support breaking out the analyzers and making them more generic, or pushing down the changes SOLR (and nutch and whoever) have made back into the core. as for the assertion that SOLR is the largest user of lucene, I don't even know how you could back that up, and even if it is today, that might change tomorrow. The web is a fickle place. so.. I'm pretty happy with how things are going today. lucene is a library that other things can include. SOLR is a webservice using lucene. On 2/28/10 5:57 AM, Michael McCandless wrote: > To make this more concrete, I think this is roughly what's being > proposed: > > * Merging the dev lists into a single list. > > * Merging committers. > > * When a change it committed to Lucene, it must pass all Solr > tests. > > * Release both at once. > > These things would not change: > > * Most importantly, the source code would remain factored into > separate dirs/modules. > > * User's lists should remain separate. > > * Web sites would remain separate. > > * Solr& Lucene are still separate downloads, separate JARs, > seperate subdirs in the source tree, etc. > > The outside world still sees Solr& Lucene as separate entities. It's > only that they would now be developed/released in synchrony. > > There are some important gains by doing this: > > * Single source for all the code dup we now have across the > projects (my original reason, specifically on analyzers, for > starting this). > > * Whenever a new feature is added to Lucene, we'd work through what > the impact is to Solr. This can still mean we separately develop > exposure in Solr, but it'd get us to at least more immediately > think about it. > > * Solr is Lucene's biggest direct user -- most people who use Lucene > use it through Solr -- so having it more closely integrated means > we know sooner if we broke something. > > * Right now I could test whether flex breaks anything in Solr. I > can't do that now since Solr is isn't upgraded to 3.1. > > Recent big changes (eg segment based searching, Version, attr based > tokenstream api) caused alot of work in Solr that could've been much > smoother had Solr "been there" as we were working through them. > > Recent new features, eg near-real-time search, which are unavailable > in Solr still, would have at least had some discussion about how to > expose this in Solr. > > Over time (and we don't have to do this right on day 1) we can make > core capabilities available to pure Lucene. EG core Lucene users > should be able to use faceting, use a schema, etc. > > I think this idea makes alot of sense and I think now is a good time > to do it. Yes, this a big change, but I think the gains are sizable. > As Lucene& Solr diverge more, it'll only become harder and harder to > merge. > > Robert's massive patch on SOLR-1657, upgrading most Solr's analyzers > to 3.0, is aging... while other changes to analyzers are being > proposed (SOLR-1799). If we were integrated (or at least single +
Ian Holsman 2010-02-28, 16:07
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mattmann, Chris A 2010-02-28, 17:27
Hi All,
+1, I'm with Ian on this one. Loose coupling is always better in these types of situations... Cheers, Chris On 2/28/10 8:07 AM, "Ian Holsman" <[EMAIL PROTECTED]> wrote: I'm not a committer here (or on SOLR), so I can't vote, but I'm generally against this. but on the flip side I've been using SOLR for quite a while. firstly SOLR is not the only application that uses lucene as a webservice. waiting for SOLR developers to implement re-factorings and changes made to the core will hamper lucene development. and things like katta, elastic search, neo4j, and zoie will be treated like 2nd class citizens and suffer. It will also hamper innovative new developments, as now 'oh.. this will break SOLR', or 'SOLR can't use that easily' will stop them. I'm curious how the NRT enhancements and payload changes would have gone if they had to wait for SOLR to change stuff to make them work. and most of the SOLR dev's are on the lucene dev list anyway. SOLR should just be treated like any API user of lucene and lucene should not be limited by SOLR. as for the original reason.. I support breaking out the analyzers and making them more generic, or pushing down the changes SOLR (and nutch and whoever) have made back into the core. as for the assertion that SOLR is the largest user of lucene, I don't even know how you could back that up, and even if it is today, that might change tomorrow. The web is a fickle place. so.. I'm pretty happy with how things are going today. lucene is a library that other things can include. SOLR is a webservice using lucene. On 2/28/10 5:57 AM, Michael McCandless wrote: > To make this more concrete, I think this is roughly what's being > proposed: > > * Merging the dev lists into a single list. > > * Merging committers. > > * When a change it committed to Lucene, it must pass all Solr > tests. > > * Release both at once. > > These things would not change: > > * Most importantly, the source code would remain factored into > separate dirs/modules. > > * User's lists should remain separate. > > * Web sites would remain separate. > > * Solr& Lucene are still separate downloads, separate JARs, > seperate subdirs in the source tree, etc. > > The outside world still sees Solr& Lucene as separate entities. It's > only that they would now be developed/released in synchrony. > > There are some important gains by doing this: > > * Single source for all the code dup we now have across the > projects (my original reason, specifically on analyzers, for > starting this). > > * Whenever a new feature is added to Lucene, we'd work through what > the impact is to Solr. This can still mean we separately develop > exposure in Solr, but it'd get us to at least more immediately > think about it. > > * Solr is Lucene's biggest direct user -- most people who use Lucene > use it through Solr -- so having it more closely integrated means > we know sooner if we broke something. > > * Right now I could test whether flex breaks anything in Solr. I > can't do that now since Solr is isn't upgraded to 3.1. > > Recent big changes (eg segment based searching, Version, attr based > tokenstream api) caused alot of work in Solr that could've been much > smoother had Solr "been there" as we were working through them. > > Recent new features, eg near-real-time search, which are unavailable > in Solr still, would have at least had some discussion about how to > expose this in Solr. > > Over time (and we don't have to do this right on day 1) we can make > core capabilities available to pure Lucene. EG core Lucene users > should be able to use faceting, use a schema, etc. > > I think this idea makes alot of sense and I think now is a good time > to do it. Yes, this a big change, but I think the gains are sizable. > As Lucene& Solr diverge more, it'll only become harder and harder to > merge. > > Robert's massive patch on SOLR-1657, upgrading most Solr's analyzers ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2010-02-28, 17:27
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Shalin Shekhar Mangar 2010-02-28, 18:32
On Sun, Feb 28, 2010 at 9:37 PM, Ian Holsman <[EMAIL PROTECTED]> wrote:
> > waiting for SOLR developers to implement re-factorings and changes made to > the core will hamper lucene development. > and things like katta, elastic search, neo4j, and zoie will be treated like > 2nd class citizens and suffer. > Lucene changes don't need to be in Solr immediately and they won't be, until somebody has the itch. Many Lucene bugs have been caught by Solr's tests and making sure that a change passes Solr's test suite is a good thing. A Lucene change that fails Solr's tests is either a bug or a backwards-incompatible API change. If it is the latter then I believe changing Solr is a good lesson in the magnitude of changes needed in a typical Lucene application. Possibly, those lessons can lead to a more flexible/simpler API. This is relevant for new features as well. For example, look at how the trie range query was affected when Solr came into the picture. I know that many Lucene developers like to use newer features as soon as possible. But seriously, how many update their Lucene applications to support these changes in sync with a patch or even trunk? _*Striving*_ to keep Solr in sync with Lucene will give instant feedback which, I think, will help us build better APIs and give Lucene users a better experience. Consider another argument: Solr's use of Lucene can be advertised as a best-practice which can be a huge help for Lucene users. You want to know how to add caching on top of Lucene? See Solr. Replication? See Solr etc. As far as the other projects are concerned, I don't see why they will be treated as 2nd class citizens. The Lucene core will continue to be separate and if some of Solr's features are available to those projects in an easy to assimilate Java API, they too benefit from it. It is a win-win situation. > It will also hamper innovative new developments, as now 'oh.. this will > break SOLR', or 'SOLR can't use that easily' will stop them. I'm curious how > the NRT enhancements and payload changes would have gone if they had to wait > for SOLR to change stuff to make them work. and most of the SOLR dev's are > on the lucene dev list anyway. > Again, nobody is proposing that all new features must have corresponding support in Solr. New features are anyway designed to be backward compatible and all the proposal says is that the changes should not break Solr, which makes sense. -- Regards, Shalin Shekhar Mangar. +
Shalin Shekhar Mangar 2010-02-28, 18:32
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mark Miller 2010-02-28, 18:43
On 02/28/2010 01:32 PM, Shalin Shekhar Mangar wrote:
> A Lucene > change that fails Solr's tests is either a bug or a backwards-incompatible > API change. > Not always. I still argue that per segment searching was a valid change that was backwards compatible - but it broke Solr because Solr ignores MultiSearcher and went on the assumption that a single Searcher had access to the entire index. That's somewhat against the design of Lucene, which doesn't (and can't) make such assumptions. -- - Mark http://www.lucidimagination.com +
Mark Miller 2010-02-28, 18:43
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael Busch 2010-02-28, 17:52
I'm not very happy with this proposal. I certainly understand what is
being tried to achieve though. I'd like to see a tighter integration and communication between Lucene core and SOLR too, but the proposed requirements seem much too strict. For example, I think it's a good idea for SOLR to ride on Lucene's trunk again. This will show potential problems of API changes and new features in Lucene much more quickly. It will also help SOLR to use new Lucene features much more quickly. However, I'm -1 for these points: * When a change it committed to Lucene, it must pass all Solr tests. * Release both at once. SOLR is a consumer of Lucene's API. So what this requirement basically translates to is that I, as a Lucene committer, now have to not only make sure Lucene's backwards-compatibility is ensured, but also that I make all necessary changes in SOLR. So I have to know much more code suddenly and potentionally make many more changes. But this doesn't help all the other Lucene consumers out there. I invested several weeks upgrading our software at IBM to 3.0 APIs, because I had 5000 compile errors. I think the Lucene backwards-compatibility policy is very strict already and it often takes more time working on bw-compat than the actual feature. With the additional requirement above this will get worse, and I'm afraid it might slow down Lucene's progress. I don't disagree that things like moving function queries from SOLR to Lucene have failed - but we have to ask why they weren't added to Lucene in the first place. Was there ever a discussion whether those queries should be added to Lucene or SOLR when they were developed? Or I'd also love to see a powerful facet engine in Lucene, and SOLR would build its faceting features on top of those APIs. So I'm +1 for better communication (maybe even merging the dev lists) and especially talking about where a new feature should live before working on a patch. Michael On 2/28/10 2:57 AM, Michael McCandless wrote: > To make this more concrete, I think this is roughly what's being > proposed: > > * Merging the dev lists into a single list. > > * Merging committers. > > * When a change it committed to Lucene, it must pass all Solr > tests. > > * Release both at once. > > These things would not change: > > * Most importantly, the source code would remain factored into > separate dirs/modules. > > * User's lists should remain separate. > > * Web sites would remain separate. > > * Solr& Lucene are still separate downloads, separate JARs, > seperate subdirs in the source tree, etc. > > The outside world still sees Solr& Lucene as separate entities. It's > only that they would now be developed/released in synchrony. > > There are some important gains by doing this: > > * Single source for all the code dup we now have across the > projects (my original reason, specifically on analyzers, for > starting this). > > * Whenever a new feature is added to Lucene, we'd work through what > the impact is to Solr. This can still mean we separately develop > exposure in Solr, but it'd get us to at least more immediately > think about it. > > * Solr is Lucene's biggest direct user -- most people who use Lucene > use it through Solr -- so having it more closely integrated means > we know sooner if we broke something. > > * Right now I could test whether flex breaks anything in Solr. I > can't do that now since Solr is isn't upgraded to 3.1. > > Recent big changes (eg segment based searching, Version, attr based > tokenstream api) caused alot of work in Solr that could've been much > smoother had Solr "been there" as we were working through them. > > Recent new features, eg near-real-time search, which are unavailable > in Solr still, would have at least had some discussion about how to > expose this in Solr. > > Over time (and we don't have to do this right on day 1) we can make > core capabilities available to pure Lucene. EG core Lucene users +
Michael Busch 2010-02-28, 17:52
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Grant Ingersoll 2010-03-01, 00:30
On Feb 28, 2010, at 9:52 AM, Michael Busch wrote: > I'm not very happy with this proposal. I certainly understand what is > being tried to achieve though. I'd like to see a tighter integration > and communication between Lucene core and SOLR too, but the proposed > requirements seem much too strict. For example, I think it's a good > idea for SOLR to ride on Lucene's trunk again. This will show > potential problems of API changes and new features in Lucene much more > quickly. It will also help SOLR to use new Lucene features much more quickly. > > However, I'm -1 for these points: > > * When a change it committed to Lucene, it must pass all Solr tests. Not sure why more tests would be a negative. The Solr tests exercise quite a bit of Lucene functionality as well. -Grant +
Grant Ingersoll 2010-03-01, 00:30
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael Busch 2010-03-01, 05:05
On 2/28/10 4:30 PM, Grant Ingersoll wrote:
> Not sure why more tests would be a negative. The Solr tests exercise quite a bit of Lucene functionality as well. > > -Grant > Sorry, I should have made myself clearer here. It'd obviously be silly to argue against more test coverage. In general I think it's a great idea to run the Solr tests also when testing a Lucene patch. I'm just not happy about making this a formal requirement (that Solr tests have to pass in order to commit a Lucene patch). All backwards-incompatible patches, which we had quite a few of in 2.9 and 3.0, would then become even more difficult to commit, because you have to make all changes then in Solr too as part of the Lucene patch. Think about changes like per-segment search or the new TokenStream API and how difficult and time consuming they were for core and contrib already. For backwards-compatible changes, by all means, let's run as many tests as we can. We have all been saying we want to have more frequent releases. Right now Lucene has no external dependencies that could slow down a release and still we don't release as frequently as we'd like to. If we add dependencies like release alignment with subprojects I'm afraid this will become worse. I was really happy about the original idea of having a separate analyzer module (or subproject, library, whatever name it'd have), because analysis seems quite separate from indexing/search. Separating the two seems logical. And why not release such an analyzer package more frequently than Lucene. Different pieces of code don't all move with the same pace. It'd be nice to have the freedom of releasing an analyzer library after e.g. a new language was added, maybe even only two weeks after the previous release. IMO more modular release cycles is a better way to go than this new proposal. I'd be happy if the Solr developers would be more involved in Lucene (again) and if we would discuss new ideas with the question in mind, where the new feature should live. And also the Lucene developers who are not very involved in Solr should understand the impact that Lucene changes have on Solr. So big +1 for better communication between Solr and Lucene devs! Michael +
Michael Busch 2010-03-01, 05:05
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Grant Ingersoll 2010-03-01, 14:28
On Feb 28, 2010, at 9:05 PM, Michael Busch wrote: > On 2/28/10 4:30 PM, Grant Ingersoll wrote: >> > > I was really happy about the original idea of having a separate analyzer module (or subproject, library, whatever name it'd have), because analysis seems quite separate from indexing/search. Separating the two seems logical. And why not release such an analyzer package more frequently than Lucene. Different pieces of code don't all move with the same pace. It'd be nice to have the freedom of releasing an analyzer library after e.g. a new language was added, maybe even only two weeks after the previous release. IMO more modular release cycles is a better way to go than this new proposal. Yeah, but you know the Analyzers are just the start. Next it's faceting, then some other piece, b/c let's all face facts: Solr is more or less what you build when you build a Lucene search application. People say the don't want all the "bloat" (AFAICT, what they really mean is they prefer their own bloat, since every implementation I ever see of Lucene looks damn well a lot like Solr and I've seen _a lot_ of implementations). So, to me, why not just get it over with? One of the outcomes of it, could easily be that Solr is more modular anyway, meaning people can pick and choose more what they want (although they already can). Also, as Doug alluded to, the Board is likely to ask us to consider less subprojects in the future, so we may be consolidating and spinning off anyway. -Grant +
Grant Ingersoll 2010-03-01, 14:28
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Grant Ingersoll 2010-03-01, 14:33
On Mar 1, 2010, at 6:28 AM, Grant Ingersoll wrote: > > On Feb 28, 2010, at 9:05 PM, Michael Busch wrote: > >> On 2/28/10 4:30 PM, Grant Ingersoll wrote: >>> >> >> I was really happy about the original idea of having a separate analyzer module (or subproject, library, whatever name it'd have), because analysis seems quite separate from indexing/search. Separating the two seems logical. And why not release such an analyzer package more frequently than Lucene. Different pieces of code don't all move with the same pace. It'd be nice to have the freedom of releasing an analyzer library after e.g. a new language was added, maybe even only two weeks after the previous release. IMO more modular release cycles is a better way to go than this new proposal. > > Yeah, but you know the Analyzers are just the start. Next it's faceting, then some other piece, b/c let's all face facts: Solr is more or less what you build when you build a Lucene search application. People say the don't want all the "bloat" (AFAICT, what they really mean is they prefer their own bloat, since every implementation I ever see of Lucene looks damn well a lot like Solr and I've seen _a lot_ of implementations). So, to me, why not just get it over with? One of the outcomes of it, could easily be that Solr is more modular anyway, meaning people can pick and choose more what they want (although they already can). > But, like Mark said, even w/ such a proposed move, people can still happily keep their "bloated" code, too! So, don't take me as implying we would be forcing it on everyone. So, all those other 3rd party sub projects would still be just fine. -Grant +
Grant Ingersoll 2010-03-01, 14:33
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mattmann, Chris A 2010-03-01, 15:04
Hey Grant,
I¹d like to explore this < does this imply that the Lucene sub-projects will go away and Lucene will turn into Lucene-java and maintain its Apache TLP, and then you¹d have say, solr.apache.org, tika.apache.org, mahout.apache.org (already started), etc. etc.? If so, that may be the best of all worlds, allowing project independence, but also not following the Apache "antipattern" as Doug put it... Cheers, Chris On 3/1/10 7:28 AM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote: > Also, as Doug alluded to, the Board is likely to ask us to consider less > subprojects in the future, so we may be consolidating and spinning off anyway. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] Phone: +1 (818) 354-8810 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2010-03-01, 15:04
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mark Miller 2010-03-01, 15:27
{quote}If so, that may be the best of all worlds,
allowing project independence, but also not following the Apache "antipattern" as Doug put it...{quote} That would really be no real world change from how things work today. The fact is, today, Solr already operates essentially as an independent project. The only real difference is that it shares the same PMC with Lucene now and wouldn't with this change. This would address none of the issues that triggered the idea for a possible merge. On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote: > Hey Grant, > > I�d like to explore this< does this imply that the Lucene sub-projects will > go away and Lucene will turn into Lucene-java and maintain its Apache TLP, > and then you�d have say, solr.apache.org, tika.apache.org, mahout.apache.org > (already started), etc. etc.? If so, that may be the best of all worlds, > allowing project independence, but also not following the Apache > "antipattern" as Doug put it... > > Cheers, > Chris > > > > On 3/1/10 7:28 AM, "Grant Ingersoll"<[EMAIL PROTECTED]> wrote: > > >> Also, as Doug alluded to, the Board is likely to ask us to consider less >> subprojects in the future, so we may be consolidating and spinning off anyway. >> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > Phone: +1 (818) 354-8810 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > -- - Mark http://www.lucidimagination.com +
Mark Miller 2010-03-01, 15:27
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mattmann, Chris A 2010-03-01, 15:40
Hi Mark,
> > That would really be no real world change from how things work today. The fact > is, today, Solr already operates essentially as an independent project. Well if that's the case, then it would lead me to think that it's more of a TLP more than anything else per best practices. > The only real difference is that it shares the same PMC with Lucene now and > wouldn't with this change. This would address none of the issues that > triggered > the idea for a possible merge. I don't agree -- you're looking to bring together two communities that are "fairly separate" as you put it. The separation likely didn't spring up over night and has been this way for a while (as least to my knowledge). This is exactly the type of situation that typically leads to TLP creation from what I've seen. Cheers, Chris > > > > On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote: >> Hey Grant, >> >> I¹d like to explore this< does this imply that the Lucene sub-projects will >> go away and Lucene will turn into Lucene-java and maintain its Apache TLP, >> and then you¹d have say, solr.apache.org, tika.apache.org, mahout.apache.org >> (already started), etc. etc.? If so, that may be the best of all worlds, >> allowing project independence, but also not following the Apache >> "antipattern" as Doug put it... >> >> Cheers, >> Chris >> >> >> >> On 3/1/10 7:28 AM, "Grant Ingersoll"<[EMAIL PROTECTED]> wrote: >> >> >>> Also, as Doug alluded to, the Board is likely to ask us to consider less >>> subprojects in the future, so we may be consolidating and spinning off >>> anyway. >>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: [EMAIL PROTECTED] >> Phone: +1 (818) 354-8810 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> > > > -- > - Mark > > http://www.lucidimagination.com > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2010-03-01, 15:40
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mark Miller 2010-03-01, 15:54
On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
> Hi Mark, > > >> That would really be no real world change from how things work today. The fact >> is, today, Solr already operates essentially as an independent project. >> > Well if that's the case, then it would lead me to think that it's more of a > TLP more than anything else per best practices. > That depends. It could be argued it should be a top level project or that it should be closer to the Lucene project. Some people are arguing for both approaches right now. There are two directions we could move in. > >> The only real difference is that it shares the same PMC with Lucene now and >> wouldn't with this change. This would address none of the issues that >> triggered >> the idea for a possible merge. >> > I don't agree -- you're looking to bring together two communities that are > "fairly separate" as you put it. The separation likely didn't spring up over > night and has been this way for a while (as least to my knowledge). This is > exactly the type of situation that typically leads to TLP creation from what > I've seen. > It also causes negatives between Solr/Lucene that some are looking to address. Hence the birth of this proposal. Going TLP with Solr will only aggravate those negatives, not help them. While the communities operate fairly separately at the moment, the people in the communities are not so separate. The committer list has huge overlap. Many committers on one project but not the other do a lot of work on both projects. There is already a strong link with the personal - merging the management of the projects addresses many of the concerns that have prompted this discussion. TLP'ing Solr only makes those concerns multiply. They would diverge further, and incompatible overlap between them would increase. > Cheers, > Chris > > > > >> >> >> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote: >> >>> Hey Grant, >>> >>> I�d like to explore this< does this imply that the Lucene sub-projects will >>> go away and Lucene will turn into Lucene-java and maintain its Apache TLP, >>> and then you�d have say, solr.apache.org, tika.apache.org, mahout.apache.org >>> (already started), etc. etc.? If so, that may be the best of all worlds, >>> allowing project independence, but also not following the Apache >>> "antipattern" as Doug put it... >>> >>> Cheers, >>> Chris >>> >>> >>> >>> On 3/1/10 7:28 AM, "Grant Ingersoll"<[EMAIL PROTECTED]> wrote: >>> >>> >>> >>>> Also, as Doug alluded to, the Board is likely to ask us to consider less >>>> subprojects in the future, so we may be consolidating and spinning off >>>> anyway. >>>> >>>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Chris Mattmann, Ph.D. >>> Senior Computer Scientist >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>> Office: 171-266B, Mailstop: 171-246 >>> Email: [EMAIL PROTECTED] >>> Phone: +1 (818) 354-8810 >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Adjunct Assistant Professor, Computer Science Department >>> University of Southern California, Los Angeles, CA 90089 USA >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> >>> >> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> >> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > -- - Mark http://www.lucidimagination.com +
Mark Miller 2010-03-01, 15:54
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mattmann, Chris A 2010-03-01, 16:06
Hi Mark,
Thanks for your message. I respect your viewpoint, but I respectfully disagree. It just seems (to me at least based on the discussion) like a TLP for Solr is the way to go. Cheers, Chris On 3/1/10 8:54 AM, "Mark Miller" <[EMAIL PROTECTED]> wrote: On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: > Hi Mark, > > >> That would really be no real world change from how things work today. The fact >> is, today, Solr already operates essentially as an independent project. >> > Well if that's the case, then it would lead me to think that it's more of a > TLP more than anything else per best practices. > That depends. It could be argued it should be a top level project or that it should be closer to the Lucene project. Some people are arguing for both approaches right now. There are two directions we could move in. > >> The only real difference is that it shares the same PMC with Lucene now and >> wouldn't with this change. This would address none of the issues that >> triggered >> the idea for a possible merge. >> > I don't agree -- you're looking to bring together two communities that are > "fairly separate" as you put it. The separation likely didn't spring up over > night and has been this way for a while (as least to my knowledge). This is > exactly the type of situation that typically leads to TLP creation from what > I've seen. > It also causes negatives between Solr/Lucene that some are looking to address. Hence the birth of this proposal. Going TLP with Solr will only aggravate those negatives, not help them. While the communities operate fairly separately at the moment, the people in the communities are not so separate. The committer list has huge overlap. Many committers on one project but not the other do a lot of work on both projects. There is already a strong link with the personal - merging the management of the projects addresses many of the concerns that have prompted this discussion. TLP'ing Solr only makes those concerns multiply. They would diverge further, and incompatible overlap between them would increase. > Cheers, > Chris > > > > >> >> >> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote: >> >>> Hey Grant, >>> >>> I'd like to explore this< does this imply that the Lucene sub-projects will >>> go away and Lucene will turn into Lucene-java and maintain its Apache TLP, >>> and then you'd have say, solr.apache.org, tika.apache.org, mahout.apache.org >>> (already started), etc. etc.? If so, that may be the best of all worlds, >>> allowing project independence, but also not following the Apache >>> "antipattern" as Doug put it... >>> >>> Cheers, >>> Chris >>> >>> >>> >>> On 3/1/10 7:28 AM, "Grant Ingersoll"<[EMAIL PROTECTED]> wrote: >>> >>> >>> >>>> Also, as Doug alluded to, the Board is likely to ask us to consider less >>>> subprojects in the future, so we may be consolidating and spinning off >>>> anyway. >>>> >>>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Chris Mattmann, Ph.D. >>> Senior Computer Scientist >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>> Office: 171-266B, Mailstop: 171-246 >>> Email: [EMAIL PROTECTED] >>> Phone: +1 (818) 354-8810 >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Adjunct Assistant Professor, Computer Science Department >>> University of Southern California, Los Angeles, CA 90089 USA >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> >>> >> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> >> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department - Mark http://www.lucidimagination.com ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2010-03-01, 16:06
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mark Miller 2010-03-01, 16:11
That's fine with me ;)
I can certainly see people thinking both ways. I'm sure neither approach is a clear win in every aspect. - Mark On 03/01/2010 11:06 AM, Mattmann, Chris A (388J) wrote: > Hi Mark, > > Thanks for your message. I respect your viewpoint, but I respectfully disagree. It just seems (to me at least based on the discussion) like a TLP for Solr is the way to go. > > Cheers, > Chris > > > > On 3/1/10 8:54 AM, "Mark Miller"<[EMAIL PROTECTED]> wrote: > > On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: > >> Hi Mark, >> >> >> >>> That would really be no real world change from how things work today. The fact >>> is, today, Solr already operates essentially as an independent project. >>> >>> >> Well if that's the case, then it would lead me to think that it's more of a >> TLP more than anything else per best practices. >> >> > That depends. It could be argued it should be a top level project or > that it should be closer to the Lucene project. Some people are arguing > for both approaches right now. There are two directions we could move in. > >> >>> The only real difference is that it shares the same PMC with Lucene now and >>> wouldn't with this change. This would address none of the issues that >>> triggered >>> the idea for a possible merge. >>> >>> >> I don't agree -- you're looking to bring together two communities that are >> "fairly separate" as you put it. The separation likely didn't spring up over >> night and has been this way for a while (as least to my knowledge). This is >> exactly the type of situation that typically leads to TLP creation from what >> I've seen. >> >> > It also causes negatives between Solr/Lucene that some are looking to > address. Hence the birth of this proposal. Going TLP with Solr will only > aggravate those negatives, not help them. > > While the communities operate fairly separately at the moment, the > people in the communities are not so separate. The committer list has > huge overlap. Many committers on one project but not the other do a lot > of work on both projects. > > There is already a strong link with the personal - merging the > management of the projects addresses many of the concerns that have > prompted this discussion. TLP'ing Solr only makes those concerns > multiply. They would diverge further, and incompatible overlap between > them would increase. > > >> Cheers, >> Chris >> >> >> >> >> >>> >>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote: >>> >>> >>>> Hey Grant, >>>> >>>> I'd like to explore this< does this imply that the Lucene sub-projects will >>>> go away and Lucene will turn into Lucene-java and maintain its Apache TLP, >>>> and then you'd have say, solr.apache.org, tika.apache.org, mahout.apache.org >>>> (already started), etc. etc.? If so, that may be the best of all worlds, >>>> allowing project independence, but also not following the Apache >>>> "antipattern" as Doug put it... >>>> >>>> Cheers, >>>> Chris >>>> >>>> >>>> >>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<[EMAIL PROTECTED]> wrote: >>>> >>>> >>>> >>>> >>>>> Also, as Doug alluded to, the Board is likely to ask us to consider less >>>>> subprojects in the future, so we may be consolidating and spinning off >>>>> anyway. >>>>> >>>>> >>>>> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Chris Mattmann, Ph.D. >>>> Senior Computer Scientist >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>> Office: 171-266B, Mailstop: 171-246 >>>> Email: [EMAIL PROTECTED] >>>> Phone: +1 (818) 354-8810 >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Adjunct Assistant Professor, Computer Science Department >>>> University of Southern California, Los Angeles, CA 90089 USA >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> >>>> >>>> >>> -- >>> - Mark >>> >>> http://www.lucidimagination.com - Mark http://www.lucidimagination.com +
Mark Miller 2010-03-01, 16:11
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Robert Muir 2010-03-01, 16:12
this will make the analyzers duplication problem even worse
On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) < [EMAIL PROTECTED]> wrote: > Hi Mark, > > Thanks for your message. I respect your viewpoint, but I respectfully > disagree. It just seems (to me at least based on the discussion) like a TLP > for Solr is the way to go. > > Cheers, > Chris > > > > On 3/1/10 8:54 AM, "Mark Miller" <[EMAIL PROTECTED]> wrote: > > On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: > > Hi Mark, > > > > > >> That would really be no real world change from how things work today. > The fact > >> is, today, Solr already operates essentially as an independent project. > >> > > Well if that's the case, then it would lead me to think that it's more of > a > > TLP more than anything else per best practices. > > > That depends. It could be argued it should be a top level project or > that it should be closer to the Lucene project. Some people are arguing > for both approaches right now. There are two directions we could move in. > > > >> The only real difference is that it shares the same PMC with Lucene now > and > >> wouldn't with this change. This would address none of the issues that > >> triggered > >> the idea for a possible merge. > >> > > I don't agree -- you're looking to bring together two communities that > are > > "fairly separate" as you put it. The separation likely didn't spring up > over > > night and has been this way for a while (as least to my knowledge). This > is > > exactly the type of situation that typically leads to TLP creation from > what > > I've seen. > > > It also causes negatives between Solr/Lucene that some are looking to > address. Hence the birth of this proposal. Going TLP with Solr will only > aggravate those negatives, not help them. > > While the communities operate fairly separately at the moment, the > people in the communities are not so separate. The committer list has > huge overlap. Many committers on one project but not the other do a lot > of work on both projects. > > There is already a strong link with the personal - merging the > management of the projects addresses many of the concerns that have > prompted this discussion. TLP'ing Solr only makes those concerns > multiply. They would diverge further, and incompatible overlap between > them would increase. > > > Cheers, > > Chris > > > > > > > > > >> > >> > >> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote: > >> > >>> Hey Grant, > >>> > >>> I'd like to explore this< does this imply that the Lucene > sub-projects will > >>> go away and Lucene will turn into Lucene-java and maintain its Apache > TLP, > >>> and then you'd have say, solr.apache.org, tika.apache.org, > mahout.apache.org > >>> (already started), etc. etc.? If so, that may be the best of all > worlds, > >>> allowing project independence, but also not following the Apache > >>> "antipattern" as Doug put it... > >>> > >>> Cheers, > >>> Chris > >>> > >>> > >>> > >>> On 3/1/10 7:28 AM, "Grant Ingersoll"<[EMAIL PROTECTED]> wrote: > >>> > >>> > >>> > >>>> Also, as Doug alluded to, the Board is likely to ask us to consider > less > >>>> subprojects in the future, so we may be consolidating and spinning off > >>>> anyway. > >>>> > >>>> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> Chris Mattmann, Ph.D. > >>> Senior Computer Scientist > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >>> Office: 171-266B, Mailstop: 171-246 > >>> Email: [EMAIL PROTECTED] > >>> Phone: +1 (818) 354-8810 > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> Adjunct Assistant Professor, Computer Science Department > >>> University of Southern California, Los Angeles, CA 90089 USA > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> > >>> > >>> > >>> > >> > >> -- > >> - Mark > >> > >> http://www.lucidimagination.com > >> > >> > >> > >> > >> > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Robert Muir [EMAIL PROTECTED] +
Robert Muir 2010-03-01, 16:12
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mattmann, Chris A 2010-03-01, 16:20
Hi Robert,
I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers issue - I was in favor, at the very least, of having a separate module/project/whatever that both Solr/Lucene (and whatever project) can depend on for the shared analyzer code... Cheers, Chris On 3/1/10 9:12 AM, "Robert Muir" <[EMAIL PROTECTED]> wrote: this will make the analyzers duplication problem even worse On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) < [EMAIL PROTECTED]> wrote: > Hi Mark, > > Thanks for your message. I respect your viewpoint, but I respectfully > disagree. It just seems (to me at least based on the discussion) like a TLP > for Solr is the way to go. > > Cheers, > Chris > > > > On 3/1/10 8:54 AM, "Mark Miller" <[EMAIL PROTECTED]> wrote: > > On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: > > Hi Mark, > > > > > >> That would really be no real world change from how things work today. > The fact > >> is, today, Solr already operates essentially as an independent project. > >> > > Well if that's the case, then it would lead me to think that it's more of > a > > TLP more than anything else per best practices. > > > That depends. It could be argued it should be a top level project or > that it should be closer to the Lucene project. Some people are arguing > for both approaches right now. There are two directions we could move in. > > > >> The only real difference is that it shares the same PMC with Lucene now > and > >> wouldn't with this change. This would address none of the issues that > >> triggered > >> the idea for a possible merge. > >> > > I don't agree -- you're looking to bring together two communities that > are > > "fairly separate" as you put it. The separation likely didn't spring up > over > > night and has been this way for a while (as least to my knowledge). This > is > > exactly the type of situation that typically leads to TLP creation from > what > > I've seen. > > > It also causes negatives between Solr/Lucene that some are looking to > address. Hence the birth of this proposal. Going TLP with Solr will only > aggravate those negatives, not help them. > > While the communities operate fairly separately at the moment, the > people in the communities are not so separate. The committer list has > huge overlap. Many committers on one project but not the other do a lot > of work on both projects. > > There is already a strong link with the personal - merging the > management of the projects addresses many of the concerns that have > prompted this discussion. TLP'ing Solr only makes those concerns > multiply. They would diverge further, and incompatible overlap between > them would increase. > > > Cheers, > > Chris > > > > > > > > > >> > >> > >> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote: > >> > >>> Hey Grant, > >>> > >>> I'd like to explore this< does this imply that the Lucene > sub-projects will > >>> go away and Lucene will turn into Lucene-java and maintain its Apache > TLP, > >>> and then you'd have say, solr.apache.org, tika.apache.org, > mahout.apache.org > >>> (already started), etc. etc.? If so, that may be the best of all > worlds, > >>> allowing project independence, but also not following the Apache > >>> "antipattern" as Doug put it... > >>> > >>> Cheers, > >>> Chris > >>> > >>> > >>> > >>> On 3/1/10 7:28 AM, "Grant Ingersoll"<[EMAIL PROTECTED]> wrote: > >>> > >>> > >>> > >>>> Also, as Doug alluded to, the Board is likely to ask us to consider > less > >>>> subprojects in the future, so we may be consolidating and spinning off > >>>> anyway. > >>>> > >>>> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> Chris Mattmann, Ph.D. > >>> Senior Computer Scientist > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >>> Office: 171-266B, Mailstop: 171-246 > >>> Email: [EMAIL PROTECTED] > >>> Phone: +1 (818) 354-8810 > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Robert Muir [EMAIL PROTECTED] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2010-03-01, 16:20
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Grant Ingersoll 2010-03-01, 16:57
On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: > Hi Robert, > > I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers issue - I was in favor, at the very least, of having a separate module/project/whatever that both Solr/Lucene (and whatever project) can depend on for the shared analyzer code... Not really. They are intimately linked. > > Cheers, > Chris > > > > On 3/1/10 9:12 AM, "Robert Muir" <[EMAIL PROTECTED]> wrote: > > this will make the analyzers duplication problem even worse > > On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) < > [EMAIL PROTECTED]> wrote: > >> Hi Mark, >> >> Thanks for your message. I respect your viewpoint, but I respectfully >> disagree. It just seems (to me at least based on the discussion) like a TLP >> for Solr is the way to go. >> >> Cheers, >> Chris >> >> >> >> On 3/1/10 8:54 AM, "Mark Miller" <[EMAIL PROTECTED]> wrote: >> >> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: >>> Hi Mark, >>> >>> >>>> That would really be no real world change from how things work today. >> The fact >>>> is, today, Solr already operates essentially as an independent project. >>>> >>> Well if that's the case, then it would lead me to think that it's more of >> a >>> TLP more than anything else per best practices. >>> >> That depends. It could be argued it should be a top level project or >> that it should be closer to the Lucene project. Some people are arguing >> for both approaches right now. There are two directions we could move in. >>> >>>> The only real difference is that it shares the same PMC with Lucene now >> and >>>> wouldn't with this change. This would address none of the issues that >>>> triggered >>>> the idea for a possible merge. >>>> >>> I don't agree -- you're looking to bring together two communities that >> are >>> "fairly separate" as you put it. The separation likely didn't spring up >> over >>> night and has been this way for a while (as least to my knowledge). This >> is >>> exactly the type of situation that typically leads to TLP creation from >> what >>> I've seen. >>> >> It also causes negatives between Solr/Lucene that some are looking to >> address. Hence the birth of this proposal. Going TLP with Solr will only >> aggravate those negatives, not help them. >> >> While the communities operate fairly separately at the moment, the >> people in the communities are not so separate. The committer list has >> huge overlap. Many committers on one project but not the other do a lot >> of work on both projects. >> >> There is already a strong link with the personal - merging the >> management of the projects addresses many of the concerns that have >> prompted this discussion. TLP'ing Solr only makes those concerns >> multiply. They would diverge further, and incompatible overlap between >> them would increase. >> >>> Cheers, >>> Chris >>> >>> >>> >>> >>>> >>>> >>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote: >>>> >>>>> Hey Grant, >>>>> >>>>> I'd like to explore this< does this imply that the Lucene >> sub-projects will >>>>> go away and Lucene will turn into Lucene-java and maintain its Apache >> TLP, >>>>> and then you'd have say, solr.apache.org, tika.apache.org, >> mahout.apache.org >>>>> (already started), etc. etc.? If so, that may be the best of all >> worlds, >>>>> allowing project independence, but also not following the Apache >>>>> "antipattern" as Doug put it... >>>>> >>>>> Cheers, >>>>> Chris >>>>> >>>>> >>>>> >>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<[EMAIL PROTECTED]> wrote: >>>>> >>>>> >>>>> >>>>>> Also, as Doug alluded to, the Board is likely to ask us to consider >> less >>>>>> subprojects in the future, so we may be consolidating and spinning off >>>>>> anyway. >>>>>> >>>>>> >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> Chris Mattmann, Ph.D. >>>>> Senior Computer Scientist >>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA +
Grant Ingersoll 2010-03-01, 16:57
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mattmann, Chris A 2010-03-01, 17:01
Hi Grant,
> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: > >> Hi Robert, >> >> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers >> issue - I was in favor, at the very least, of having a separate >> module/project/whatever that both Solr/Lucene (and whatever project) can >> depend on for the shared analyzer code... > > Not really. They are intimately linked. Ummm, how so? Making project A called "Apache Super Analyzers" and then making Lucene(-java) and Solr depend on Apache Super Analyzers is separate of whether or not Lucene(-java) and Solr are TLPs or not... Cheers, Chris > > >> >> Cheers, >> Chris >> >> >> >> On 3/1/10 9:12 AM, "Robert Muir" <[EMAIL PROTECTED]> wrote: >> >> this will make the analyzers duplication problem even worse >> >> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) < >> [EMAIL PROTECTED]> wrote: >> >>> Hi Mark, >>> >>> Thanks for your message. I respect your viewpoint, but I respectfully >>> disagree. It just seems (to me at least based on the discussion) like a TLP >>> for Solr is the way to go. >>> >>> Cheers, >>> Chris >>> >>> >>> >>> On 3/1/10 8:54 AM, "Mark Miller" <[EMAIL PROTECTED]> wrote: >>> >>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: >>>> Hi Mark, >>>> >>>> >>>>> That would really be no real world change from how things work today. >>> The fact >>>>> is, today, Solr already operates essentially as an independent project. >>>>> >>>> Well if that's the case, then it would lead me to think that it's more of >>> a >>>> TLP more than anything else per best practices. >>>> >>> That depends. It could be argued it should be a top level project or >>> that it should be closer to the Lucene project. Some people are arguing >>> for both approaches right now. There are two directions we could move in. >>>> >>>>> The only real difference is that it shares the same PMC with Lucene now >>> and >>>>> wouldn't with this change. This would address none of the issues that >>>>> triggered >>>>> the idea for a possible merge. >>>>> >>>> I don't agree -- you're looking to bring together two communities that >>> are >>>> "fairly separate" as you put it. The separation likely didn't spring up >>> over >>>> night and has been this way for a while (as least to my knowledge). This >>> is >>>> exactly the type of situation that typically leads to TLP creation from >>> what >>>> I've seen. >>>> >>> It also causes negatives between Solr/Lucene that some are looking to >>> address. Hence the birth of this proposal. Going TLP with Solr will only >>> aggravate those negatives, not help them. >>> >>> While the communities operate fairly separately at the moment, the >>> people in the communities are not so separate. The committer list has >>> huge overlap. Many committers on one project but not the other do a lot >>> of work on both projects. >>> >>> There is already a strong link with the personal - merging the >>> management of the projects addresses many of the concerns that have >>> prompted this discussion. TLP'ing Solr only makes those concerns >>> multiply. They would diverge further, and incompatible overlap between >>> them would increase. >>> >>>> Cheers, >>>> Chris >>>> >>>> >>>> >>>> >>>>> >>>>> >>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote: >>>>> >>>>>> Hey Grant, >>>>>> >>>>>> I'd like to explore this< does this imply that the Lucene >>> sub-projects will >>>>>> go away and Lucene will turn into Lucene-java and maintain its Apache >>> TLP, >>>>>> and then you'd have say, solr.apache.org, tika.apache.org, >>> mahout.apache.org >>>>>> (already started), etc. etc.? If so, that may be the best of all >>> worlds, >>>>>> allowing project independence, but also not following the Apache >>>>>> "antipattern" as Doug put it... >>>>>> >>>>>> Cheers, >>>>>> Chris >>>>>> >>>>>> >>>>>> >>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<[EMAIL PROTECTED]> wrote: >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2010-03-01, 17:01
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael McCandless 2010-03-01, 17:44
If we don't somehow first address the code duplication across the 2
projects, making Solr a TLP will make things worse. I started here with analysis because I think that's the biggest pain point: it seemed like an obvious first step to fixing the code duplication and thus the most likely to reach some consensus. And it's also very timely: Robert is right now making all kinds of great fixes to our collective analyzers (in between bouts of fuzzy DFA debugging). But it goes beyond analyzers: I'd like to see other modules, now in Solr, eventually moved to Lucene, because they really are "core" functionality (eg facets, function (and other?) queries, spatial, maybe improvements to spellchecker/highlighter). How can we do this? And how can we do this so that it "lasts" over time? If new cool "core" things are born in Solr-land (which of course happens alot -- lots of good healthy usage), how will they find their way back to Lucene? Yonik's proposal (merging development of Solr/Lucene, but keeping all else separate) would achieve this. If we do the opposite (Solr -> TLP), how could we possibly achieve this? I guess one possibility is to just suck it up and duplicate the code. Meaning, each project will have to manually merge fixes in from the other project (so long as there's someone around with the itch to do so). Lucene would copy in all of Solr's analysis, and vice-versa (and likewise other dup'd functionality). I really dislike this solution... it will confuse the daylights out of users, its error proned, it's a waste of dev effort, there will always be little differences... but maybe it is in fact the lesser evil? I would much prefer merging Solr/Lucene development... Mike On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) <[EMAIL PROTECTED]> wrote: > Hi Grant, > >> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: >> >>> Hi Robert, >>> >>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers >>> issue - I was in favor, at the very least, of having a separate >>> module/project/whatever that both Solr/Lucene (and whatever project) can >>> depend on for the shared analyzer code... >> >> Not really. They are intimately linked. > > Ummm, how so? Making project A called "Apache Super Analyzers" and then > making Lucene(-java) and Solr depend on Apache Super Analyzers is separate > of whether or not Lucene(-java) and Solr are TLPs or not... > > Cheers, > Chris > > >> >> >>> >>> Cheers, >>> Chris >>> >>> >>> >>> On 3/1/10 9:12 AM, "Robert Muir" <[EMAIL PROTECTED]> wrote: >>> >>> this will make the analyzers duplication problem even worse >>> >>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) < >>> [EMAIL PROTECTED]> wrote: >>> >>>> Hi Mark, >>>> >>>> Thanks for your message. I respect your viewpoint, but I respectfully >>>> disagree. It just seems (to me at least based on the discussion) like a TLP >>>> for Solr is the way to go. >>>> >>>> Cheers, >>>> Chris >>>> >>>> >>>> >>>> On 3/1/10 8:54 AM, "Mark Miller" <[EMAIL PROTECTED]> wrote: >>>> >>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: >>>>> Hi Mark, >>>>> >>>>> >>>>>> That would really be no real world change from how things work today. >>>> The fact >>>>>> is, today, Solr already operates essentially as an independent project. >>>>>> >>>>> Well if that's the case, then it would lead me to think that it's more of >>>> a >>>>> TLP more than anything else per best practices. >>>>> >>>> That depends. It could be argued it should be a top level project or >>>> that it should be closer to the Lucene project. Some people are arguing >>>> for both approaches right now. There are two directions we could move in. >>>>> >>>>>> The only real difference is that it shares the same PMC with Lucene now >>>> and >>>>>> wouldn't with this change. This would address none of the issues that >>>>>> triggered >>>>>> the idea for a possible merge. >>>>> +
Michael McCandless 2010-03-01, 17:44
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Chris Hostetter 2010-03-01, 18:43
(Man, why is it you guys alwasy decide to start the monolithic "let's redesign the world" threads while i'm offline for a few days ... I figured at worst I'd 'svn up' and discover that McCandless had reimplemented all of the indexing code in Scala, but i certainly wasn't expecting all of this.) As some one who has attempted to read it all at once, let me just say that this thread is way too big. I say this not as a facetious comment about the number of messages or the depth of replies but as a serious comment about the breadth and depth of the core issues that people seem to be trying to address in a monolithic fashion -- monolithic suggestions which are in many ways diametricly opposed to each other. Without obvious concensious on where we want to go, or a clear sense of how well things will work when we there "there" it seems most productive to focus on what would be needed to achieve some incremental steps that could be productive for any/all goals. At it's core: this thread started with McCandless'ss suggestion that refactoring some of text analysis code from Solr, Nutch and Lucene-Java out of all three projects and into a common code base would be beneficial to all three subprojects -- Not only do I see no flaw to that reasoning, but it also seems like it would (oddly enough) serve as a good first step towards *either* tighter development integration between Lucene-Java and Solr, *OR* towards looser development of the two code bases (via making Solr a seperate TLP). Developing a new code module like this should help demonstrate / excercise some of the "process" issues that might come up in trying to integrate the development and release processes of the existing products. If things work out "well" that may illustrate that tighter integration is better; if things work out "poor" that should also tells us something, and may give us guidance on how to move forward. In the worst case scenerio that i can imagine: some code is refactored out of Solr and Nutch in a way that makes it more directly usable by other comsumers of Lucene-Java. (Even if Solr and Nutch never use that code and become their own TLPs and succed from the ASF to become caribbean tax haven that seems like a Net win for Lucene-Java) To put the issue another way: Does anyone see how McCandless'ss suggestion would be counter-productive towards your vision of what Lucene/Solr/Nutch should be like in the future? (regardless of your particular vision is) ... : I started here with analysis because I think that's the biggest pain : point: it seemed like an obvious first step to fixing the code : duplication and thus the most likely to reach some consensus. And : it's also very timely: Robert is right now making all kinds of great : fixes to our collective analyzers (in between bouts of fuzzy DFA : debugging). -Hoss +
Chris Hostetter 2010-03-01, 18:43
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mattmann, Chris A 2010-03-01, 18:48
Hey Hoss,
I support Mike's original suggestion of having a shared, independently maintained/released analysis package for Nutch/Solr/Lucene. I emphatically do not support merging Solr and Lucene in the way proposed. Hope that clarifies things, at least from me. Cheers, Chris On 3/1/10 11:43 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: (Man, why is it you guys alwasy decide to start the monolithic "let's redesign the world" threads while i'm offline for a few days ... I figured at worst I'd 'svn up' and discover that McCandless had reimplemented all of the indexing code in Scala, but i certainly wasn't expecting all of this.) As some one who has attempted to read it all at once, let me just say that this thread is way too big. I say this not as a facetious comment about the number of messages or the depth of replies but as a serious comment about the breadth and depth of the core issues that people seem to be trying to address in a monolithic fashion -- monolithic suggestions which are in many ways diametricly opposed to each other. Without obvious concensious on where we want to go, or a clear sense of how well things will work when we there "there" it seems most productive to focus on what would be needed to achieve some incremental steps that could be productive for any/all goals. At it's core: this thread started with McCandless'ss suggestion that refactoring some of text analysis code from Solr, Nutch and Lucene-Java out of all three projects and into a common code base would be beneficial to all three subprojects -- Not only do I see no flaw to that reasoning, but it also seems like it would (oddly enough) serve as a good first step towards *either* tighter development integration between Lucene-Java and Solr, *OR* towards looser development of the two code bases (via making Solr a seperate TLP). Developing a new code module like this should help demonstrate / excercise some of the "process" issues that might come up in trying to integrate the development and release processes of the existing products. If things work out "well" that may illustrate that tighter integration is better; if things work out "poor" that should also tells us something, and may give us guidance on how to move forward. In the worst case scenerio that i can imagine: some code is refactored out of Solr and Nutch in a way that makes it more directly usable by other comsumers of Lucene-Java. (Even if Solr and Nutch never use that code and become their own TLPs and succed from the ASF to become caribbean tax haven that seems like a Net win for Lucene-Java) To put the issue another way: Does anyone see how McCandless'ss suggestion would be counter-productive towards your vision of what Lucene/Solr/Nutch should be like in the future? (regardless of your particular vision is) ... : I started here with analysis because I think that's the biggest pain : point: it seemed like an obvious first step to fixing the code : duplication and thus the most likely to reach some consensus. And : it's also very timely: Robert is right now making all kinds of great : fixes to our collective analyzers (in between bouts of fuzzy DFA : debugging). -Hoss ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2010-03-01, 18:48
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mark Miller 2010-03-01, 19:27
On 03/01/2010 01:43 PM, Chris Hostetter wrote:
> (Man, why is it you guys alwasy decide to start the monolithic > "let's redesign the world" threads while i'm offline for a few days ... > I figured at worst I'd 'svn up' and discover that McCandless had > reimplemented all of the indexing code in Scala, but i certainly wasn't > expecting all of this.) > > As some one who has attempted to read it all at once, let me just say that > this thread is way too big. > > I say this not as a facetious comment about the number of messages or the > depth of replies but as a serious comment about the breadth and depth of > the core issues that people seem to be trying to address in a monolithic > fashion -- monolithic suggestions which are in many ways diametricly > opposed to each other. > Personally, I don't think the idea of a merge is too big. I think the implications of it are less than you are making them out to be. Monolithic suggestions? Lets half merge? Lets draft a resolution indicating that both Lucene and Solr devs would like to possibly play nicer together with more communication? I don't think that are a lot of baby steps towards this goal that will have any meaning or ramifications. > Without obvious concensious on where we want to go, or a clear sense of > how well things will work when we there "there" it seems most productive > to focus on what would be needed to achieve some incremental steps that > could be productive for any/all goals. > That sounds like magic to me :) Or focusing on stuff that has nothing to do with a merge or TLP. > At it's core: this thread started with McCandless'ss suggestion that > refactoring some of text analysis code from Solr, Nutch and Lucene-Java > out of all three projects and into a common code base would be beneficial > to all three subprojects -- Not only do I see no flaw to that reasoning, > but it also seems like it would (oddly enough) serve as a good first step > towards *either* tighter development integration between Lucene-Java and > Solr, *OR* towards looser development of the two code bases (via making > Solr a seperate TLP). > > Developing a new code module like this should help demonstrate / excercise > some of the "process" issues that might come up in trying to integrate the > development and release processes of the existing products. If things > work out "well" that may illustrate that tighter integration is better; if > things work out "poor" that should also tells us something, and may give > us guidance on how to move forward. In the worst case scenerio that i can > imagine: some code is refactored out of Solr and Nutch in a way that makes > it more directly usable by other comsumers of Lucene-Java. (Even if Solr > and Nutch never use that code and become their own TLPs and succed from > the ASF to become caribbean tax haven that seems like a Net win for > Lucene-Java) > > To put the issue another way: Does anyone see how McCandless'ss suggestion > would be counter-productive towards your vision of what Lucene/Solr/Nutch > should be like in the future? (regardless of your particular vision is) > No, not necessarily - but I don't think its going to tell us anything useful about a merge. Its just going to factor out some analyzers into what is likely going to be yet *another* project with more "do we run on trunk" or "don't we" issues. Or it will be a Lucene contrib, and cause us even more headaches due to Solr not running on trunk. > ... > > : I started here with analysis because I think that's the biggest pain > : point: it seemed like an obvious first step to fixing the code > : duplication and thus the most likely to reach some consensus. And > : it's also very timely: Robert is right now making all kinds of great > : fixes to our collective analyzers (in between bouts of fuzzy DFA > : debugging). > > > > -Hoss > -- - Mark http://www.lucidimagination.com +
Mark Miller 2010-03-01, 19:27
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mattmann, Chris A 2010-03-01, 18:07
Hi Mike,
I'm not sure I follow this line of thinking: how would Solr being a TLP affect the creation of a separate project/module for Analyzers any more so than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend on the newly created refactored Analysis project. Chris On 3/1/10 10:44 AM, "Michael McCandless" <[EMAIL PROTECTED]> wrote: If we don't somehow first address the code duplication across the 2 projects, making Solr a TLP will make things worse. I started here with analysis because I think that's the biggest pain point: it seemed like an obvious first step to fixing the code duplication and thus the most likely to reach some consensus. And it's also very timely: Robert is right now making all kinds of great fixes to our collective analyzers (in between bouts of fuzzy DFA debugging). But it goes beyond analyzers: I'd like to see other modules, now in Solr, eventually moved to Lucene, because they really are "core" functionality (eg facets, function (and other?) queries, spatial, maybe improvements to spellchecker/highlighter). How can we do this? And how can we do this so that it "lasts" over time? If new cool "core" things are born in Solr-land (which of course happens alot -- lots of good healthy usage), how will they find their way back to Lucene? Yonik's proposal (merging development of Solr/Lucene, but keeping all else separate) would achieve this. If we do the opposite (Solr -> TLP), how could we possibly achieve this? I guess one possibility is to just suck it up and duplicate the code. Meaning, each project will have to manually merge fixes in from the other project (so long as there's someone around with the itch to do so). Lucene would copy in all of Solr's analysis, and vice-versa (and likewise other dup'd functionality). I really dislike this solution... it will confuse the daylights out of users, its error proned, it's a waste of dev effort, there will always be little differences... but maybe it is in fact the lesser evil? I would much prefer merging Solr/Lucene development... Mike On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) <[EMAIL PROTECTED]> wrote: > Hi Grant, > >> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: >> >>> Hi Robert, >>> >>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers >>> issue - I was in favor, at the very least, of having a separate >>> module/project/whatever that both Solr/Lucene (and whatever project) can >>> depend on for the shared analyzer code... >> >> Not really. They are intimately linked. > > Ummm, how so? Making project A called "Apache Super Analyzers" and then > making Lucene(-java) and Solr depend on Apache Super Analyzers is separate > of whether or not Lucene(-java) and Solr are TLPs or not... > > Cheers, > Chris > > >> >> >>> >>> Cheers, >>> Chris >>> >>> >>> >>> On 3/1/10 9:12 AM, "Robert Muir" <[EMAIL PROTECTED]> wrote: >>> >>> this will make the analyzers duplication problem even worse >>> >>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) < >>> [EMAIL PROTECTED]> wrote: >>> >>>> Hi Mark, >>>> >>>> Thanks for your message. I respect your viewpoint, but I respectfully >>>> disagree. It just seems (to me at least based on the discussion) like a TLP >>>> for Solr is the way to go. >>>> >>>> Cheers, >>>> Chris >>>> >>>> >>>> >>>> On 3/1/10 8:54 AM, "Mark Miller" <[EMAIL PROTECTED]> wrote: >>>> >>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: >>>>> Hi Mark, >>>>> >>>>> >>>>>> That would really be no real world change from how things work today. >>>> The fact >>>>>> is, today, Solr already operates essentially as an independent project. >>>>>> >>>>> Well if that's the case, then it would lead me to think that it's more of >>>> a >>>>> TLP more than anything else per best practices. >>>>> >>>> That depends. It could be argued it should be a top level project or >>>> that it should be closer to the Lucene project. Some people are arguing ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2010-03-01, 18:07
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael McCandless 2010-03-01, 18:25
Because the code dup with analyzers is only one of the problems to
solve. In fact, it's the easiest of the problems to solve (that's why I proposed it, only, first). A more differentiating example is a much less mature module.... EG take spatial -- if Solr were its own TLP, how could spatial be built out in a way that we don't waste effort, and so that both direct Lucene and Solr users could use it when it's released? Mike On Mon, Mar 1, 2010 at 1:07 PM, Mattmann, Chris A (388J) <[EMAIL PROTECTED]> wrote: > Hi Mike, > > I'm not sure I follow this line of thinking: how would Solr being a TLP affect the creation of a separate project/module for Analyzers any more so than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend on the newly created refactored Analysis project. > > Chris > > > > On 3/1/10 10:44 AM, "Michael McCandless" <[EMAIL PROTECTED]> wrote: > > If we don't somehow first address the code duplication across the 2 > projects, making Solr a TLP will make things worse. > > I started here with analysis because I think that's the biggest pain > point: it seemed like an obvious first step to fixing the code > duplication and thus the most likely to reach some consensus. And > it's also very timely: Robert is right now making all kinds of great > fixes to our collective analyzers (in between bouts of fuzzy DFA > debugging). > > But it goes beyond analyzers: I'd like to see other modules, now in > Solr, eventually moved to Lucene, because they really are "core" > functionality (eg facets, function (and other?) queries, spatial, > maybe improvements to spellchecker/highlighter). How can we do this? > > And how can we do this so that it "lasts" over time? If new cool > "core" things are born in Solr-land (which of course happens alot -- > lots of good healthy usage), how will they find their way back to > Lucene? > > Yonik's proposal (merging development of Solr/Lucene, but keeping all > else separate) would achieve this. > > If we do the opposite (Solr -> TLP), how could we possibly achieve > this? > > I guess one possibility is to just suck it up and duplicate the code. > Meaning, each project will have to manually merge fixes in from the > other project (so long as there's someone around with the itch to do > so). Lucene would copy in all of Solr's analysis, and vice-versa (and > likewise other dup'd functionality). I really dislike this > solution... it will confuse the daylights out of users, its error > proned, it's a waste of dev effort, there will always be little > differences... but maybe it is in fact the lesser evil? > > I would much prefer merging Solr/Lucene development... > > Mike > > On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: >> Hi Grant, >> >>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: >>> >>>> Hi Robert, >>>> >>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers >>>> issue - I was in favor, at the very least, of having a separate >>>> module/project/whatever that both Solr/Lucene (and whatever project) can >>>> depend on for the shared analyzer code... >>> >>> Not really. They are intimately linked. >> >> Ummm, how so? Making project A called "Apache Super Analyzers" and then >> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate >> of whether or not Lucene(-java) and Solr are TLPs or not... >> >> Cheers, >> Chris >> >> >>> >>> >>>> >>>> Cheers, >>>> Chris >>>> >>>> >>>> >>>> On 3/1/10 9:12 AM, "Robert Muir" <[EMAIL PROTECTED]> wrote: >>>> >>>> this will make the analyzers duplication problem even worse >>>> >>>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) < >>>> [EMAIL PROTECTED]> wrote: >>>> >>>>> Hi Mark, >>>>> >>>>> Thanks for your message. I respect your viewpoint, but I respectfully >>>>> disagree. It just seems (to me at least based on the discussion) like a TLP +
Michael McCandless 2010-03-01, 18:25
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mattmann, Chris A 2010-03-01, 18:28
I'm glad that you brought that up! :)
Check out: http://incubator.apache.org/projects/sis.html We're just starting to tackle that very issue right now...patches/ideas/contributions welcome. Cheers, Chris On 3/1/10 11:25 AM, "Michael McCandless" <[EMAIL PROTECTED]> wrote: Because the code dup with analyzers is only one of the problems to solve. In fact, it's the easiest of the problems to solve (that's why I proposed it, only, first). A more differentiating example is a much less mature module.... EG take spatial -- if Solr were its own TLP, how could spatial be built out in a way that we don't waste effort, and so that both direct Lucene and Solr users could use it when it's released? Mike On Mon, Mar 1, 2010 at 1:07 PM, Mattmann, Chris A (388J) <[EMAIL PROTECTED]> wrote: > Hi Mike, > > I'm not sure I follow this line of thinking: how would Solr being a TLP affect the creation of a separate project/module for Analyzers any more so than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend on the newly created refactored Analysis project. > > Chris > > > > On 3/1/10 10:44 AM, "Michael McCandless" <[EMAIL PROTECTED]> wrote: > > If we don't somehow first address the code duplication across the 2 > projects, making Solr a TLP will make things worse. > > I started here with analysis because I think that's the biggest pain > point: it seemed like an obvious first step to fixing the code > duplication and thus the most likely to reach some consensus. And > it's also very timely: Robert is right now making all kinds of great > fixes to our collective analyzers (in between bouts of fuzzy DFA > debugging). > > But it goes beyond analyzers: I'd like to see other modules, now in > Solr, eventually moved to Lucene, because they really are "core" > functionality (eg facets, function (and other?) queries, spatial, > maybe improvements to spellchecker/highlighter). How can we do this? > > And how can we do this so that it "lasts" over time? If new cool > "core" things are born in Solr-land (which of course happens alot -- > lots of good healthy usage), how will they find their way back to > Lucene? > > Yonik's proposal (merging development of Solr/Lucene, but keeping all > else separate) would achieve this. > > If we do the opposite (Solr -> TLP), how could we possibly achieve > this? > > I guess one possibility is to just suck it up and duplicate the code. > Meaning, each project will have to manually merge fixes in from the > other project (so long as there's someone around with the itch to do > so). Lucene would copy in all of Solr's analysis, and vice-versa (and > likewise other dup'd functionality). I really dislike this > solution... it will confuse the daylights out of users, its error > proned, it's a waste of dev effort, there will always be little > differences... but maybe it is in fact the lesser evil? > > I would much prefer merging Solr/Lucene development... > > Mike > > On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: >> Hi Grant, >> >>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: >>> >>>> Hi Robert, >>>> >>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers >>>> issue - I was in favor, at the very least, of having a separate >>>> module/project/whatever that both Solr/Lucene (and whatever project) can >>>> depend on for the shared analyzer code... >>> >>> Not really. They are intimately linked. >> >> Ummm, how so? Making project A called "Apache Super Analyzers" and then >> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate >> of whether or not Lucene(-java) and Solr are TLPs or not... >> >> Cheers, >> Chris >> >> >>> >>> >>>> >>>> Cheers, >>>> Chris >>>> >>>> >>>> >>>> On 3/1/10 9:12 AM, "Robert Muir" <[EMAIL PROTECTED]> wrote: >>>> >>>> this will make the analyzers duplication problem even worse ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2010-03-01, 18:28
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael McCandless 2010-03-01, 18:46
This looks great!
But, the goal is to make a standalone toolkit exposing GIS functions, right? My original question (integrating this into Lucene/Solr) remains. EG there's alot of good working happening now in Solr to make spatial search available. How will that find its way back to Lucene? Lucene has its own (now duplicate) spatial package that was already developed. Users will now be confused about the two, each have different bugs/features, etc. If we had shared development then the ongoing effort would result in a spatial package that direct Lucene users and Solr users would be able to use. Mike On Mon, Mar 1, 2010 at 1:28 PM, Mattmann, Chris A (388J) <[EMAIL PROTECTED]> wrote: > I'm glad that you brought that up! :) > > Check out: > > http://incubator.apache.org/projects/sis.html > > We're just starting to tackle that very issue right now...patches/ideas/contributions welcome. > > Cheers, > Chris > > > > On 3/1/10 11:25 AM, "Michael McCandless" <[EMAIL PROTECTED]> wrote: > > Because the code dup with analyzers is only one of the problems to > solve. In fact, it's the easiest of the problems to solve (that's why > I proposed it, only, first). > > A more differentiating example is a much less mature module.... > > EG take spatial -- if Solr were its own TLP, how could spatial be > built out in a way that we don't waste effort, and so that both direct > Lucene and Solr users could use it when it's released? > > Mike > > On Mon, Mar 1, 2010 at 1:07 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: >> Hi Mike, >> >> I'm not sure I follow this line of thinking: how would Solr being a TLP affect the creation of a separate project/module for Analyzers any more so than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend on the newly created refactored Analysis project. >> >> Chris >> >> >> >> On 3/1/10 10:44 AM, "Michael McCandless" <[EMAIL PROTECTED]> wrote: >> >> If we don't somehow first address the code duplication across the 2 >> projects, making Solr a TLP will make things worse. >> >> I started here with analysis because I think that's the biggest pain >> point: it seemed like an obvious first step to fixing the code >> duplication and thus the most likely to reach some consensus. And >> it's also very timely: Robert is right now making all kinds of great >> fixes to our collective analyzers (in between bouts of fuzzy DFA >> debugging). >> >> But it goes beyond analyzers: I'd like to see other modules, now in >> Solr, eventually moved to Lucene, because they really are "core" >> functionality (eg facets, function (and other?) queries, spatial, >> maybe improvements to spellchecker/highlighter). How can we do this? >> >> And how can we do this so that it "lasts" over time? If new cool >> "core" things are born in Solr-land (which of course happens alot -- >> lots of good healthy usage), how will they find their way back to >> Lucene? >> >> Yonik's proposal (merging development of Solr/Lucene, but keeping all >> else separate) would achieve this. >> >> If we do the opposite (Solr -> TLP), how could we possibly achieve >> this? >> >> I guess one possibility is to just suck it up and duplicate the code. >> Meaning, each project will have to manually merge fixes in from the >> other project (so long as there's someone around with the itch to do >> so). Lucene would copy in all of Solr's analysis, and vice-versa (and >> likewise other dup'd functionality). I really dislike this >> solution... it will confuse the daylights out of users, its error >> proned, it's a waste of dev effort, there will always be little >> differences... but maybe it is in fact the lesser evil? >> >> I would much prefer merging Solr/Lucene development... >> >> Mike >> >> On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) >> <[EMAIL PROTECTED]> wrote: >>> Hi Grant, >>> +
Michael McCandless 2010-03-01, 18:46
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?patrick o'leary 2010-03-02, 08:26
Here's my view on it..
Developing GIS support for lucene took a little bit of time and patience and a couple of iterations from a basic concept to get buy to spend more time working on it, to an OMG this does what we need, build more more more... The lucene version of this was easy enough to support, however Solr support was a different kettle of fish. >From really crude duplication or query handlers and write templates to inject distance features to today where it's a little more componentised, but still a little crude unless you want to cut back on scalability and functionality by using function queries. I guess my point is that Solr has always required more effort, and solutions that constantly drove me further away from the initial lucene implementation. In my mind if I make something work in lucene, it should be easy to just 'plug-in' to Solr, but that is definitely not the case, leaf index readers, NumericalUtils, Trie all came at major development costs that were not present in lucene development. The spatial efforts going on in Solr, who knows if they will make it back to lucene, but at the same time has this gap between both systems grown to the point that porting is not a worthwhile effort? I honestly don't want to maintain both systems, but find that to allow for solr support I have to do a lot more "hacking" On Mon, Mar 1, 2010 at 10:46 AM, Michael McCandless < [EMAIL PROTECTED]> wrote: > This looks great! > > But, the goal is to make a standalone toolkit exposing GIS functions, > right? > > My original question (integrating this into Lucene/Solr) remains. > > EG there's alot of good working happening now in Solr to make spatial > search available. How will that find its way back to Lucene? Lucene > has its own (now duplicate) spatial package that was already > developed. Users will now be confused about the two, each have > different bugs/features, etc. > > If we had shared development then the ongoing effort would result in a > spatial package that direct Lucene users and Solr users would be able > to use. > > Mike > > On Mon, Mar 1, 2010 at 1:28 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: > > I'm glad that you brought that up! :) > > > > Check out: > > > > http://incubator.apache.org/projects/sis.html > > > > We're just starting to tackle that very issue right > now...patches/ideas/contributions welcome. > > > > Cheers, > > Chris > > > > > > > > On 3/1/10 11:25 AM, "Michael McCandless" <[EMAIL PROTECTED]> > wrote: > > > > Because the code dup with analyzers is only one of the problems to > > solve. In fact, it's the easiest of the problems to solve (that's why > > I proposed it, only, first). > > > > A more differentiating example is a much less mature module.... > > > > EG take spatial -- if Solr were its own TLP, how could spatial be > > built out in a way that we don't waste effort, and so that both direct > > Lucene and Solr users could use it when it's released? > > > > Mike > > > > On Mon, Mar 1, 2010 at 1:07 PM, Mattmann, Chris A (388J) > > <[EMAIL PROTECTED]> wrote: > >> Hi Mike, > >> > >> I'm not sure I follow this line of thinking: how would Solr being a TLP > affect the creation of a separate project/module for Analyzers any more so > than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend > on the newly created refactored Analysis project. > >> > >> Chris > >> > >> > >> > >> On 3/1/10 10:44 AM, "Michael McCandless" <[EMAIL PROTECTED]> > wrote: > >> > >> If we don't somehow first address the code duplication across the 2 > >> projects, making Solr a TLP will make things worse. > >> > >> I started here with analysis because I think that's the biggest pain > >> point: it seemed like an obvious first step to fixing the code > >> duplication and thus the most likely to reach some consensus. And > >> it's also very timely: Robert is right now making all kinds of great +
patrick o'leary 2010-03-02, 08:26
-
RE: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Steven A Rowe 2010-03-01, 18:41
Hi Chris,
On 03/01/2010 at 1:28 PM, Mattmann, Chris A (388J) wrote: > http://incubator.apache.org/projects/sis.html > > We're just starting to tackle that very issue right > now...patches/ideas/contributions welcome. Patches? SVN <https://svn.apache.org/repos/asf/incubator/sis/> looks empty ATM: asf - Revision 917638: /incubator/sis * .. Powered by Subversion version 1.6.9 (r901367). Also, the website <http://incubator.apache.org/sis/> doesn't seem to exist?: Not Found The requested URL /sis/ was not found on this server. Apache/2.3.5 (Unix) mod_ssl/2.3.5 OpenSSL/0.9.7d mod_fcgid/2.3.2-dev Server at incubator.apache.org Port 80 Steve +
Steven A Rowe 2010-03-01, 18:41
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mattmann, Chris A 2010-03-01, 18:46
Hey Steve,
Thanks! Yep we just started, and got our mailing lists set up after the positive Incubation vote. You can read the project proposal here: http://wiki.apache.org/incubator/SpatialProposal Cheers, Chris On 3/1/10 11:41 AM, "Steven A Rowe" <[EMAIL PROTECTED]> wrote: Hi Chris, On 03/01/2010 at 1:28 PM, Mattmann, Chris A (388J) wrote: > http://incubator.apache.org/projects/sis.html > > We're just starting to tackle that very issue right > now...patches/ideas/contributions welcome. Patches? SVN <https://svn.apache.org/repos/asf/incubator/sis/> looks empty ATM: asf - Revision 917638: /incubator/sis * .. Powered by Subversion version 1.6.9 (r901367). Also, the website <http://incubator.apache.org/sis/> doesn't seem to exist?: Not Found The requested URL /sis/ was not found on this server. Apache/2.3.5 (Unix) mod_ssl/2.3.5 OpenSSL/0.9.7d mod_fcgid/2.3.2-dev Server at incubator.apache.org Port 80 Steve ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2010-03-01, 18:46
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Marvin Humphrey 2010-03-01, 17:58
On Mon, Mar 01, 2010 at 12:44:02PM -0500, Michael McCandless wrote:
> But it goes beyond analyzers: I'd like to see other modules, now in > Solr, eventually moved to Lucene, because they really are "core" > functionality (eg facets, function (and other?) queries, spatial, > maybe improvements to spellchecker/highlighter). I disagree. Those don't belong in core, and though they are all great features, adding them to core constitutes "bloat", IMO. The Query class belongs in core. All those other modules should be distributed as plugins, which could be used by Solr, Katta, Lucene, whatever. Note that this is orthogonal to whether Solr and Lucene merge or diverge. Marvin Humphrey +
Marvin Humphrey 2010-03-01, 17:58
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael McCandless 2010-03-01, 18:03
On Mon, Mar 1, 2010 at 12:58 PM, Marvin Humphrey <[EMAIL PROTECTED]> wrote:
> On Mon, Mar 01, 2010 at 12:44:02PM -0500, Michael McCandless wrote: > >> But it goes beyond analyzers: I'd like to see other modules, now in >> Solr, eventually moved to Lucene, because they really are "core" >> functionality (eg facets, function (and other?) queries, spatial, >> maybe improvements to spellchecker/highlighter). > > I disagree. Those don't belong in core, and though they are all > great features, adding them to core constitutes "bloat", IMO. > > The Query class belongs in core. All those other modules should be > distributed as plugins, which could be used by Solr, Katta, Lucene, > whatever. > > Note that this is orthogonal to whether Solr and Lucene merge or > diverge. I agree with this (sorry I wasn't clear). By "core functionality" I mean it should be a separate module (plugin) that direct Lucene users can use, not "whenever you install core Lucene you get these functions". Ie, users shouldn't have to install Solr to use facets with Lucene. Mike +
Michael McCandless 2010-03-01, 18:03
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael McCandless 2010-03-01, 18:38
Also, there still seems to be a misconception on what's being proposed
here. The proposal is to synchronize the development of Solr and Lucene. Ie, a single dev list, single set of committers, synchronized releases. Everything else remains the same. EG the release artifacts, user's lists, web sites, branding, all remain separate. How the source code is modularized is an orthogonal question. We've discussed breaking out things of Lucene's core, like query parser, queries, analyzers into their own modules (and shipping their own artifacts), which I still think makes great sense. But it's independent of synchronizing our development. Mike On Mon, Mar 1, 2010 at 1:03 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > On Mon, Mar 1, 2010 at 12:58 PM, Marvin Humphrey <[EMAIL PROTECTED]> wrote: >> On Mon, Mar 01, 2010 at 12:44:02PM -0500, Michael McCandless wrote: >> >>> But it goes beyond analyzers: I'd like to see other modules, now in >>> Solr, eventually moved to Lucene, because they really are "core" >>> functionality (eg facets, function (and other?) queries, spatial, >>> maybe improvements to spellchecker/highlighter). >> >> I disagree. Those don't belong in core, and though they are all >> great features, adding them to core constitutes "bloat", IMO. >> >> The Query class belongs in core. All those other modules should be >> distributed as plugins, which could be used by Solr, Katta, Lucene, >> whatever. >> >> Note that this is orthogonal to whether Solr and Lucene merge or >> diverge. > > I agree with this (sorry I wasn't clear). > > By "core functionality" I mean it should be a separate module (plugin) > that direct Lucene users can use, not "whenever you install core > Lucene you get these functions". > > Ie, users shouldn't have to install Solr to use facets with Lucene. > > Mike > +
Michael McCandless 2010-03-01, 18:38
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael Busch 2010-03-01, 18:13
It seems like most of the people agree with these good goals but are
concerned about the release cycles (including me). How can we achieve these goals without making releases more difficult? Michael On 3/1/10 9:44 AM, Michael McCandless wrote: > If we don't somehow first address the code duplication across the 2 > projects, making Solr a TLP will make things worse. > > I started here with analysis because I think that's the biggest pain > point: it seemed like an obvious first step to fixing the code > duplication and thus the most likely to reach some consensus. And > it's also very timely: Robert is right now making all kinds of great > fixes to our collective analyzers (in between bouts of fuzzy DFA > debugging). > > But it goes beyond analyzers: I'd like to see other modules, now in > Solr, eventually moved to Lucene, because they really are "core" > functionality (eg facets, function (and other?) queries, spatial, > maybe improvements to spellchecker/highlighter). How can we do this? > > And how can we do this so that it "lasts" over time? If new cool > "core" things are born in Solr-land (which of course happens alot -- > lots of good healthy usage), how will they find their way back to > Lucene? > > Yonik's proposal (merging development of Solr/Lucene, but keeping all > else separate) would achieve this. > > If we do the opposite (Solr -> TLP), how could we possibly achieve > this? > > I guess one possibility is to just suck it up and duplicate the code. > Meaning, each project will have to manually merge fixes in from the > other project (so long as there's someone around with the itch to do > so). Lucene would copy in all of Solr's analysis, and vice-versa (and > likewise other dup'd functionality). I really dislike this > solution... it will confuse the daylights out of users, its error > proned, it's a waste of dev effort, there will always be little > differences... but maybe it is in fact the lesser evil? > > I would much prefer merging Solr/Lucene development... > > Mike > > On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: > >> Hi Grant, >> >> >>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: >>> >>> >>>> Hi Robert, >>>> >>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers >>>> issue - I was in favor, at the very least, of having a separate >>>> module/project/whatever that both Solr/Lucene (and whatever project) can >>>> depend on for the shared analyzer code... >>>> >>> Not really. They are intimately linked. >>> >> Ummm, how so? Making project A called "Apache Super Analyzers" and then >> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate >> of whether or not Lucene(-java) and Solr are TLPs or not... >> >> Cheers, >> Chris >> >> >> >>> >>> >>>> Cheers, >>>> Chris >>>> >>>> >>>> >>>> On 3/1/10 9:12 AM, "Robert Muir"<[EMAIL PROTECTED]> wrote: >>>> >>>> this will make the analyzers duplication problem even worse >>>> >>>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J)< >>>> [EMAIL PROTECTED]> wrote: >>>> >>>> >>>>> Hi Mark, >>>>> >>>>> Thanks for your message. I respect your viewpoint, but I respectfully >>>>> disagree. It just seems (to me at least based on the discussion) like a TLP >>>>> for Solr is the way to go. >>>>> >>>>> Cheers, >>>>> Chris >>>>> >>>>> >>>>> >>>>> On 3/1/10 8:54 AM, "Mark Miller"<[EMAIL PROTECTED]> wrote: >>>>> >>>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: >>>>> >>>>>> Hi Mark, >>>>>> >>>>>> >>>>>> >>>>>>> That would really be no real world change from how things work today. >>>>>>> >>>>> The fact >>>>> >>>>>>> is, today, Solr already operates essentially as an independent project. >>>>>>> >>>>>>> >>>>>> Well if that's the case, then it would lead me to think that it's more of +
Michael Busch 2010-03-01, 18:13
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael McCandless 2010-03-01, 19:22
The possibility of slowing down releases is the only real concern I
also share.... But, I think release frequency is largely a matter of discipline :) But, digging into it, I think as long as the project keeps a "stable trunk" (something Lucene has always tried to do -- does Solr?)... then release frequency is really a matter of discipline. I mean in Lucene we keep saying we want faster releases, but why doesn't it happen? Couldn't we have done 2X as many releases in the past few years? Did we "really" want to release more frequently? If we really want to take it seriously I think we should have someone unofficially be the next release czar. As soon as a release is finished, this czar is responsible for roughly planning the next one. This means making a tentative schedule, tracking big features and making sure they land "early" enough to bake fully on trunk, etc. New modules (eg spatial) need not gate the release -- that module's docs would call out clearly that it's not fully baked yet... Mike On Mon, Mar 1, 2010 at 1:13 PM, Michael Busch <[EMAIL PROTECTED]> wrote: > It seems like most of the people agree with these good goals but are > concerned about the release cycles (including me). How can we achieve these > goals without making releases more difficult? > > Michael > > On 3/1/10 9:44 AM, Michael McCandless wrote: >> >> If we don't somehow first address the code duplication across the 2 >> projects, making Solr a TLP will make things worse. >> >> I started here with analysis because I think that's the biggest pain >> point: it seemed like an obvious first step to fixing the code >> duplication and thus the most likely to reach some consensus. And >> it's also very timely: Robert is right now making all kinds of great >> fixes to our collective analyzers (in between bouts of fuzzy DFA >> debugging). >> >> But it goes beyond analyzers: I'd like to see other modules, now in >> Solr, eventually moved to Lucene, because they really are "core" >> functionality (eg facets, function (and other?) queries, spatial, >> maybe improvements to spellchecker/highlighter). How can we do this? >> >> And how can we do this so that it "lasts" over time? If new cool >> "core" things are born in Solr-land (which of course happens alot -- >> lots of good healthy usage), how will they find their way back to >> Lucene? >> >> Yonik's proposal (merging development of Solr/Lucene, but keeping all >> else separate) would achieve this. >> >> If we do the opposite (Solr -> TLP), how could we possibly achieve >> this? >> >> I guess one possibility is to just suck it up and duplicate the code. >> Meaning, each project will have to manually merge fixes in from the >> other project (so long as there's someone around with the itch to do >> so). Lucene would copy in all of Solr's analysis, and vice-versa (and >> likewise other dup'd functionality). I really dislike this >> solution... it will confuse the daylights out of users, its error >> proned, it's a waste of dev effort, there will always be little >> differences... but maybe it is in fact the lesser evil? >> >> I would much prefer merging Solr/Lucene development... >> >> Mike >> >> On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) >> <[EMAIL PROTECTED]> wrote: >> >>> >>> Hi Grant, >>> >>> >>>> >>>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: >>>> >>>> >>>>> >>>>> Hi Robert, >>>>> >>>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole >>>>> analyzers >>>>> issue - I was in favor, at the very least, of having a separate >>>>> module/project/whatever that both Solr/Lucene (and whatever project) >>>>> can >>>>> depend on for the shared analyzer code... >>>>> >>>> >>>> Not really. They are intimately linked. >>>> >>> >>> Ummm, how so? Making project A called "Apache Super Analyzers" and then >>> making Lucene(-java) and Solr depend on Apache Super Analyzers is >>> separate >>> of whether or not Lucene(-java) and Solr are TLPs or not... +
Michael McCandless 2010-03-01, 19:22
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Robert Muir 2010-03-01, 17:02
but Yonik's proposal (or at least some of the ideas from it?) is attractive
as it seems to solve the real problem that created the duplication in the first place, which is not limited to analyzers. On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) < [EMAIL PROTECTED]> wrote: > Hi Grant, > > > On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: > > > >> Hi Robert, > >> > >> I think my proposal (Solr->TLP) is sort of orthogonal to the whole > analyzers > >> issue - I was in favor, at the very least, of having a separate > >> module/project/whatever that both Solr/Lucene (and whatever project) can > >> depend on for the shared analyzer code... > > > > Not really. They are intimately linked. > > Ummm, how so? Making project A called "Apache Super Analyzers" and then > making Lucene(-java) and Solr depend on Apache Super Analyzers is separate > of whether or not Lucene(-java) and Solr are TLPs or not... > > Cheers, > Chris > > > > > > > >> > >> Cheers, > >> Chris > >> > >> > >> > >> On 3/1/10 9:12 AM, "Robert Muir" <[EMAIL PROTECTED]> wrote: > >> > >> this will make the analyzers duplication problem even worse > >> > >> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) < > >> [EMAIL PROTECTED]> wrote: > >> > >>> Hi Mark, > >>> > >>> Thanks for your message. I respect your viewpoint, but I respectfully > >>> disagree. It just seems (to me at least based on the discussion) like a > TLP > >>> for Solr is the way to go. > >>> > >>> Cheers, > >>> Chris > >>> > >>> > >>> > >>> On 3/1/10 8:54 AM, "Mark Miller" <[EMAIL PROTECTED]> wrote: > >>> > >>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: > >>>> Hi Mark, > >>>> > >>>> > >>>>> That would really be no real world change from how things work today. > >>> The fact > >>>>> is, today, Solr already operates essentially as an independent > project. > >>>>> > >>>> Well if that's the case, then it would lead me to think that it's more > of > >>> a > >>>> TLP more than anything else per best practices. > >>>> > >>> That depends. It could be argued it should be a top level project or > >>> that it should be closer to the Lucene project. Some people are arguing > >>> for both approaches right now. There are two directions we could move > in. > >>>> > >>>>> The only real difference is that it shares the same PMC with Lucene > now > >>> and > >>>>> wouldn't with this change. This would address none of the issues that > >>>>> triggered > >>>>> the idea for a possible merge. > >>>>> > >>>> I don't agree -- you're looking to bring together two communities that > >>> are > >>>> "fairly separate" as you put it. The separation likely didn't spring > up > >>> over > >>>> night and has been this way for a while (as least to my knowledge). > This > >>> is > >>>> exactly the type of situation that typically leads to TLP creation > from > >>> what > >>>> I've seen. > >>>> > >>> It also causes negatives between Solr/Lucene that some are looking to > >>> address. Hence the birth of this proposal. Going TLP with Solr will > only > >>> aggravate those negatives, not help them. > >>> > >>> While the communities operate fairly separately at the moment, the > >>> people in the communities are not so separate. The committer list has > >>> huge overlap. Many committers on one project but not the other do a lot > >>> of work on both projects. > >>> > >>> There is already a strong link with the personal - merging the > >>> management of the projects addresses many of the concerns that have > >>> prompted this discussion. TLP'ing Solr only makes those concerns > >>> multiply. They would diverge further, and incompatible overlap between > >>> them would increase. > >>> > >>>> Cheers, > >>>> Chris > >>>> > >>>> > >>>> > >>>> > >>>>> > >>>>> > >>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote: > >>>>> > >>>>>> Hey Grant, > >>>>>> > >>>>>> I'd like to explore this< does this imply that the Lucene > >>> sub-projects will > >>>>>> go away and Lucene will turn into Lucene-java and maintain its Robert Muir [EMAIL PROTECTED] +
Robert Muir 2010-03-01, 17:02
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Simon Willnauer 2010-03-01, 17:41
IMO the only downside is that we risk a longer release cycle if we
merge. I requires a certain level of discipline but has this been the case since ever?! Anything else seems to be a win to both communities and I personally would love to see the communities coming closer again. I was working on many analyzers removing code duplication maintaining BW compat almost every time we committed a change caused a new issue on solr which could have been fixed in one go. Concerns about Solr could slow us down during maintaining BW compat appear to be invalid to me as the Solr API as a direct customer of the lucene API would enforce our policy which is a good thing. I also agree with Robert that moving Solr into a TLP would make things even worse. On Mon, Mar 1, 2010 at 6:02 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > but Yonik's proposal (or at least some of the ideas from it?) is attractive > as it seems to solve the real problem that created the duplication in the > first place, which is not limited to analyzers. > > On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) < > [EMAIL PROTECTED]> wrote: > >> Hi Grant, >> >> > On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: >> > >> >> Hi Robert, >> >> >> >> I think my proposal (Solr->TLP) is sort of orthogonal to the whole >> analyzers >> >> issue - I was in favor, at the very least, of having a separate >> >> module/project/whatever that both Solr/Lucene (and whatever project) can >> >> depend on for the shared analyzer code... >> > >> > Not really. They are intimately linked. >> >> Ummm, how so? Making project A called "Apache Super Analyzers" and then >> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate >> of whether or not Lucene(-java) and Solr are TLPs or not... >> >> Cheers, >> Chris >> >> >> > >> > >> >> >> >> Cheers, >> >> Chris >> >> >> >> >> >> >> >> On 3/1/10 9:12 AM, "Robert Muir" <[EMAIL PROTECTED]> wrote: >> >> >> >> this will make the analyzers duplication problem even worse >> >> >> >> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) < >> >> [EMAIL PROTECTED]> wrote: >> >> >> >>> Hi Mark, >> >>> >> >>> Thanks for your message. I respect your viewpoint, but I respectfully >> >>> disagree. It just seems (to me at least based on the discussion) like a >> TLP >> >>> for Solr is the way to go. >> >>> >> >>> Cheers, >> >>> Chris >> >>> >> >>> >> >>> >> >>> On 3/1/10 8:54 AM, "Mark Miller" <[EMAIL PROTECTED]> wrote: >> >>> >> >>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: >> >>>> Hi Mark, >> >>>> >> >>>> >> >>>>> That would really be no real world change from how things work today. >> >>> The fact >> >>>>> is, today, Solr already operates essentially as an independent >> project. >> >>>>> >> >>>> Well if that's the case, then it would lead me to think that it's more >> of >> >>> a >> >>>> TLP more than anything else per best practices. >> >>>> >> >>> That depends. It could be argued it should be a top level project or >> >>> that it should be closer to the Lucene project. Some people are arguing >> >>> for both approaches right now. There are two directions we could move >> in. >> >>>> >> >>>>> The only real difference is that it shares the same PMC with Lucene >> now >> >>> and >> >>>>> wouldn't with this change. This would address none of the issues that >> >>>>> triggered >> >>>>> the idea for a possible merge. >> >>>>> >> >>>> I don't agree -- you're looking to bring together two communities that >> >>> are >> >>>> "fairly separate" as you put it. The separation likely didn't spring >> up >> >>> over >> >>>> night and has been this way for a while (as least to my knowledge). >> This >> >>> is >> >>>> exactly the type of situation that typically leads to TLP creation >> from >> >>> what >> >>>> I've seen. >> >>>> >> >>> It also causes negatives between Solr/Lucene that some are looking to >> >>> address. Hence the birth of this proposal. Going TLP with Solr will >> only +
Simon Willnauer 2010-03-01, 17:41
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Grant Ingersoll 2010-03-01, 15:33
All of this needs to be discussed and it's not even clear whether any of it is required. Lucene runs pretty smoothly from a PMC level, so I don't feel a huge need to break something up just for the sake of it.
At any rate, I doubt it makes much sense for some subs to be split out, but Mahout has already decided to do it (after the 0.3 release comes out) -Grant On Mar 1, 2010, at 7:04 AM, Mattmann, Chris A (388J) wrote: > Hey Grant, > > I¹d like to explore this < does this imply that the Lucene sub-projects will > go away and Lucene will turn into Lucene-java and maintain its Apache TLP, > and then you¹d have say, solr.apache.org, tika.apache.org, mahout.apache.org > (already started), etc. etc.? If so, that may be the best of all worlds, > allowing project independence, but also not following the Apache > "antipattern" as Doug put it... > > Cheers, > Chris > > > > On 3/1/10 7:28 AM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote: > >> Also, as Doug alluded to, the Board is likely to ask us to consider less >> subprojects in the future, so we may be consolidating and spinning off anyway. > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > Phone: +1 (818) 354-8810 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > +
Grant Ingersoll 2010-03-01, 15:33
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mattmann, Chris A 2010-03-01, 15:44
Hi Grant,
> All of this needs to be discussed and it's not even clear whether any of it is > required. Lucene runs pretty smoothly from a PMC level, so I don't feel a > huge need to break something up just for the sake of it. Well that's what we're doing, discussing it right? Also, you brought up the comment from Doug at the board meeting, so I thought it was fair game to discuss. > > At any rate, I doubt it makes much sense for some subs to be split out, but > Mahout has already decided to do it (after the 0.3 release comes out) I'd agree, unless the project meets the litmus test for TLP status, and it is seeming more and more like (at least) Solr does... Cheers, Chris > On Mar 1, 2010, at 7:04 AM, Mattmann, Chris A (388J) wrote: > >> Hey Grant, >> >> I¹d like to explore this < does this imply that the Lucene sub-projects will >> go away and Lucene will turn into Lucene-java and maintain its Apache TLP, >> and then you¹d have say, solr.apache.org, tika.apache.org, mahout.apache.org >> (already started), etc. etc.? If so, that may be the best of all worlds, >> allowing project independence, but also not following the Apache >> "antipattern" as Doug put it... >> >> Cheers, >> Chris >> >> >> >> On 3/1/10 7:28 AM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote: >> >>> Also, as Doug alluded to, the Board is likely to ask us to consider less >>> subprojects in the future, so we may be consolidating and spinning off >>> anyway. >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: [EMAIL PROTECTED] >> Phone: +1 (818) 354-8810 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2010-03-01, 15:44
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Michael Busch 2010-03-01, 05:26
I forgot to mention: I admittedly haven't been very involved in Solr in
the past. So I'm probably not aware of many of the problems Solr might have had with staying in sync with Lucene. If everyone here agrees with Yonik/Mike's proposal I will not try to block it with a -1 veto. I'm just trying to express here the concerns that come to my mind. To do what's best for the future of the Lucene TLP as a whole is of course my main interest too. And I really really want to still be able to use Lucene separately as a library, and I think we all agree here! Michael On 2/28/10 9:05 PM, Michael Busch wrote: > On 2/28/10 4:30 PM, Grant Ingersoll wrote: >> Not sure why more tests would be a negative. The Solr tests exercise >> quite a bit of Lucene functionality as well. >> >> -Grant > > Sorry, I should have made myself clearer here. It'd obviously be silly > to argue against more test coverage. In general I think it's a great > idea to run the Solr tests also when testing a Lucene patch. > > I'm just not happy about making this a formal requirement (that Solr > tests have to pass in order to commit a Lucene patch). All > backwards-incompatible patches, which we had quite a few of in 2.9 and > 3.0, would then become even more difficult to commit, because you have > to make all changes then in Solr too as part of the Lucene patch. > Think about changes like per-segment search or the new TokenStream API > and how difficult and time consuming they were for core and contrib > already. For backwards-compatible changes, by all means, let's run as > many tests as we can. > > We have all been saying we want to have more frequent releases. Right > now Lucene has no external dependencies that could slow down a release > and still we don't release as frequently as we'd like to. If we add > dependencies like release alignment with subprojects I'm afraid this > will become worse. > > I was really happy about the original idea of having a separate > analyzer module (or subproject, library, whatever name it'd have), > because analysis seems quite separate from indexing/search. Separating > the two seems logical. And why not release such an analyzer package > more frequently than Lucene. Different pieces of code don't all move > with the same pace. It'd be nice to have the freedom of releasing an > analyzer library after e.g. a new language was added, maybe even only > two weeks after the previous release. IMO more modular release cycles > is a better way to go than this new proposal. > > I'd be happy if the Solr developers would be more involved in Lucene > (again) and if we would discuss new ideas with the question in mind, > where the new feature should live. And also the Lucene developers who > are not very involved in Solr should understand the impact that Lucene > changes have on Solr. So big +1 for better communication between Solr > and Lucene devs! > > Michael +
Michael Busch 2010-03-01, 05:26
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mark Miller 2010-03-01, 13:25
> On 2/28/10 9:05 PM, Michael Busch wrote: >> Think about changes like per-segment search or the new TokenStream >> API and how difficult and time consuming they were for core and >> contrib already. 1. Its not just more work for the same Lucene devs - there would be more devs with a merge to work on these things. More devs that stay more in Solr land would probably have been more involved in these changes earlier in Lucene land with merged projects. 2. Solr found a bunch of issues with the TokenStream API. Might not be such a bad idea for such large changes to have to go through that. Solr also exposed issues that other users were going to have to face with per segment - might be good to be forced to face that as well. 3. It's already been mentioned that you wouldn't have to do the Solr part to add the Lucene part. You'd likely have been able to do the same thing - create the new API, get the backwards compat pieces in, and then create a JIRA issue to get it done for Solr. Then later, Robert and Yonik would have done most of the work - similar to how things worked anyway. At least getting Solr tests to pass seems like a nice way to keep Lucene honest - you have to think about and see your changes play out in actual use. It doesn't mean you actually have to do all of the work to get Solr completely up to speed. If the TokenStream API was perfect, it wouldn't have broken Solr tests. Per segment is a much more rare situation. >I'd be happy if the Solr developers would be more involved in Lucene (again) and if we would discuss new ideas with the question in mind, where >the new feature should live. And also the Lucene developers who are not very involved in Solr should understand the impact that Lucene changes >have on Solr. So big +1 for better communication between Solr and Lucene devs! Again - same way we'd all like there to be more frequent releases. I'd bet a fortune its not going to happen based on what we'd "like" to see. I see a solution to getting this done being proposed though. My main concern still, is the complication of releasing together, and how that is going to affect release frequency. Other than that, I've only seen wins for the quality of both projects. Most of the arguments against are assuming the merge means more than it does I think. Lucene will still be a library separate from Solr. People contribing to Lucene will not be required to do the Solr piece. This just moves us along the path of what Michael says he'd like to see above - and what I think most of us would like to see. We have learned from the past though - these things would like never happen without real change being implemented. Moving to a +1 from me. -- - Mark http://www.lucidimagination.com +
Mark Miller 2010-03-01, 13:25
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mark Miller 2010-02-28, 18:39
On 02/28/2010 12:52 PM, Michael Busch wrote:
> ... I think it's a good > idea for SOLR to ride on Lucene's trunk again... > However, I'm -1 for these points: > > * When a change it committed to Lucene, it must pass all Solr tests. > * Release both at once. > > These are huge reasons why we *don't* want SOLR to ride on Lucene's trunk anymore. bq. but we have to ask why they weren't added to Lucene in the first place. Because the two communities are fairly separate in a lot of ways. This is one of the things a potential merge would solve. We can say that the projects should communicate more all we look - the history of saying such things implies there will be no changes though. I'm still +0 here, but I'm starting to lean towards merge just sitting here disagreeing with everyone arguing against :) Solr is actually part of the project "Lucene" along with Lucene-Java. The divide now is actually almost unnatural considering how things are organized. To those arguing that this would make Solr a first class citizen of Lucene over other search solutions that use Lucene, that actually already is the case, and the way things are setup, it should be. Solr is part of the Lucene project. Other Lucene search engines are not. That doesn't mean we shouldn't consider Lucene changes in the context of all the projects that may use it, but Solr already is a first class citizen. Its not just some project using Lucene - its *the* Lucene project's Search Server. Lucene devs *should* consider Solr when developing on Lucene Java - they are the same project - Lucene. -- - Mark http://www.lucidimagination.com +
Mark Miller 2010-02-28, 18:39
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mattmann, Chris A 2010-02-28, 18:55
Hi Mark,
Thanks for the feedback. My concern is that if the two communities are pretty separate, then it is going to be more difficult merging them, and it's not always a good thing to take separated modules (or communities) and integrate them into a monolith, whether it be physically in the code, or community-wise. I and a bunch of others learned the hard way in OODT-ville at NASA, and we moved towards a more loosely coupled solution, even at the expense of the difficulty in "being out of date" from time to time. This experience makes it difficult for me to support such a move... Thanks! Cheers, Chris On 2/28/10 10:39 AM, "Mark Miller" <[EMAIL PROTECTED]> wrote: On 02/28/2010 12:52 PM, Michael Busch wrote: > ... I think it's a good > idea for SOLR to ride on Lucene's trunk again... > However, I'm -1 for these points: > > * When a change it committed to Lucene, it must pass all Solr tests. > * Release both at once. > > These are huge reasons why we *don't* want SOLR to ride on Lucene's trunk anymore. bq. but we have to ask why they weren't added to Lucene in the first place. Because the two communities are fairly separate in a lot of ways. This is one of the things a potential merge would solve. We can say that the projects should communicate more all we look - the history of saying such things implies there will be no changes though. I'm still +0 here, but I'm starting to lean towards merge just sitting here disagreeing with everyone arguing against :) Solr is actually part of the project "Lucene" along with Lucene-Java. The divide now is actually almost unnatural considering how things are organized. To those arguing that this would make Solr a first class citizen of Lucene over other search solutions that use Lucene, that actually already is the case, and the way things are setup, it should be. Solr is part of the Lucene project. Other Lucene search engines are not. That doesn't mean we shouldn't consider Lucene changes in the context of all the projects that may use it, but Solr already is a first class citizen. Its not just some project using Lucene - its *the* Lucene project's Search Server. Lucene devs *should* consider Solr when developing on Lucene Java - they are the same project - Lucene. -- - Mark http://www.lucidimagination.com ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2010-02-28, 18:55
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Jason Rutherglen 2010-02-28, 21:04
I think it's Solr rather than SOLR. :-) A little birdy told me so...
On Sun, Feb 28, 2010 at 10:39 AM, Mark Miller <[EMAIL PROTECTED]> wrote: > On 02/28/2010 12:52 PM, Michael Busch wrote: >> >> ... I think it's a good >> idea for SOLR to ride on Lucene's trunk again... >> However, I'm -1 for these points: >> >> * When a change it committed to Lucene, it must pass all Solr tests. >> * Release both at once. >> >> > > These are huge reasons why we *don't* want SOLR to ride on Lucene's trunk > anymore. > > bq. but we have to ask why they weren't added to Lucene in the first place. > > Because the two communities are fairly separate in a lot of ways. This is > one of the things a potential merge would solve. We can say that the > projects should communicate more all we look - the history of saying such > things implies there will be no changes though. > > I'm still +0 here, but I'm starting to lean towards merge just sitting here > disagreeing with everyone arguing against :) > > Solr is actually part of the project "Lucene" along with Lucene-Java. The > divide now is actually almost unnatural considering how things > are organized. > > To those arguing that this would make Solr a first class citizen of Lucene > over other search solutions that use Lucene, that actually already is the > case, and the way things are setup, it should be. Solr is part of the Lucene > project. Other Lucene search engines are not. That doesn't mean we shouldn't > consider Lucene changes in the context of all the projects that may use it, > but Solr already is a first class citizen. Its not just some project using > Lucene - its *the* Lucene project's Search Server. Lucene devs *should* > consider Solr when developing on Lucene Java - they are the same project - > Lucene. > > -- > - Mark > > http://www.lucidimagination.com > > > > +
Jason Rutherglen 2010-02-28, 21:04
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Shalin Shekhar Mangar 2010-02-28, 17:20
On Sun, Feb 28, 2010 at 4:27 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote: > To make this more concrete, I think this is roughly what's being > proposed: > > * Merging the dev lists into a single list. > > * Merging committers. > > * When a change it committed to Lucene, it must pass all Solr > tests. > > * Release both at once. > > These things would not change: > > * Most importantly, the source code would remain factored into > separate dirs/modules. > > * User's lists should remain separate. > > * Web sites would remain separate. > > * Solr & Lucene are still separate downloads, separate JARs, > seperate subdirs in the source tree, etc. > > The outside world still sees Solr & Lucene as separate entities. It's > only that they would now be developed/released in synchrony. > > +1 -- Regards, Shalin Shekhar Mangar. +
Shalin Shekhar Mangar 2010-02-28, 17:20
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Jason Rutherglen 2010-03-01, 16:55
Here's the main points that pop up:
> * Solr is Lucene's biggest direct user -- most people who use Lucene > use it through Solr -- so having it more closely integrated means > we know sooner if we broke something. > * Right now I could test whether flex breaks anything in Solr. I > can't do that now since Solr is isn't upgraded to 3.1. Flex is really important to get right and integrated into Solr, otherwise there probably won't be too many new features to add to Solr after Cloud and NRT. As a Solr user the exorbitantly long release cycles border on the absurd. I think there are certain Lucene committers who could help with the discipline required for this process to go more smoothly (hopefully). Also, everyone, great work on Lucene and Solr. Cheers, Jason On Sun, Feb 28, 2010 at 2:57 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > To make this more concrete, I think this is roughly what's being > proposed: > > * Merging the dev lists into a single list. > > * Merging committers. > > * When a change it committed to Lucene, it must pass all Solr > tests. > > * Release both at once. > > These things would not change: > > * Most importantly, the source code would remain factored into > separate dirs/modules. > > * User's lists should remain separate. > > * Web sites would remain separate. > > * Solr & Lucene are still separate downloads, separate JARs, > seperate subdirs in the source tree, etc. > > The outside world still sees Solr & Lucene as separate entities. It's > only that they would now be developed/released in synchrony. > > There are some important gains by doing this: > > * Single source for all the code dup we now have across the > projects (my original reason, specifically on analyzers, for > starting this). > > * Whenever a new feature is added to Lucene, we'd work through what > the impact is to Solr. This can still mean we separately develop > exposure in Solr, but it'd get us to at least more immediately > think about it. > > * Solr is Lucene's biggest direct user -- most people who use Lucene > use it through Solr -- so having it more closely integrated means > we know sooner if we broke something. > > * Right now I could test whether flex breaks anything in Solr. I > can't do that now since Solr is isn't upgraded to 3.1. > > Recent big changes (eg segment based searching, Version, attr based > tokenstream api) caused alot of work in Solr that could've been much > smoother had Solr "been there" as we were working through them. > > Recent new features, eg near-real-time search, which are unavailable > in Solr still, would have at least had some discussion about how to > expose this in Solr. > > Over time (and we don't have to do this right on day 1) we can make > core capabilities available to pure Lucene. EG core Lucene users > should be able to use faceting, use a schema, etc. > > I think this idea makes alot of sense and I think now is a good time > to do it. Yes, this a big change, but I think the gains are sizable. > As Lucene & Solr diverge more, it'll only become harder and harder to > merge. > > Robert's massive patch on SOLR-1657, upgrading most Solr's analyzers > to 3.0, is aging... while other changes to analyzers are being > proposed (SOLR-1799). If we were integrated (or at least single > source for analyzers), Robert would already have committed it. > > Mike > > On Fri, Feb 26, 2010 at 5:20 PM, Yonik Seeley > <[EMAIL PROTECTED]> wrote: >> On Fri, Feb 26, 2010 at 5:15 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote: >>> On 02/24/2010 at 2:20 PM, Yonik Seeley wrote: >>>> I've started to think that a merge of Solr and Lucene would be in the >>>> best interest of both projects. >>> >>> The Sorlucene :) merger could be achieved virtually, i.e. via policy, rather than physically merging: >> >> Everything is virtual here anyway :-) >> I agree with Mike that a single dev list is highly desirable. There >> would still be separate downloads. What to do with some of the other +
Jason Rutherglen 2010-03-01, 16:55
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Doug Cutting 2010-02-24, 19:09
Michael McCandless wrote:
> I think, in order to stop duplicating our analysis code across > Nutch/Solr/Lucene, we should separate out the analyzers into a > standalone package, and maybe as its own sub-project under the Lucene > tlp? Is the goal to release these on a separate schedule from Lucene Java? If so, then this makes sense, if not, then perhaps this could be simply a separate source code tree in Lucene Java built as separate jars. Where would the analyzer APIs live, in the core or in the analyzer tree? My guess is that they'd live in the core, and that the analyzer tree would depend on the core, but one might do it the other way around if one felt there were non-Lucene uses for analyzers. Note that subprojects with different committer lists are an anti-pattern at Apache. We've long done this in Lucene, but have recently been asked by the board to consider breaking most subprojects into their own TLPs. Would analyzers someday make sense as an indepdendent TLP? If not, then a subproject with disjoint committers might not be the right pattern. Doug +
Doug Cutting 2010-02-24, 19:09
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Simon Willnauer 2010-02-24, 19:12
On Wed, Feb 24, 2010 at 8:09 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Michael McCandless wrote: >> >> I think, in order to stop duplicating our analysis code across >> Nutch/Solr/Lucene, we should separate out the analyzers into a >> standalone package, and maybe as its own sub-project under the Lucene >> tlp? > > Is the goal to release these on a separate schedule from Lucene Java? If so, > then this makes sense, if not, then perhaps this could be simply a separate > source code tree in Lucene Java built as separate jars. Afaik, this was the intention. Otherwise I would agree this makes no sense! > > Where would the analyzer APIs live, in the core or in the analyzer tree? My > guess is that they'd live in the core, and that the analyzer tree would > depend on the core, but one might do it the other way around if one felt > there were non-Lucene uses for analyzers. > > Note that subprojects with different committer lists are an anti-pattern at > Apache. We've long done this in Lucene, but have recently been asked by the > board to consider breaking most subprojects into their own TLPs. Would > analyzers someday make sense as an indepdendent TLP? If not, then a > subproject with disjoint committers might not be the right pattern. > > Doug > +
Simon Willnauer 2010-02-24, 19:12
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Ted Dunning 2010-02-24, 19:13
Mahout is beginning to use Analyzers independently of Lucene for importing
text and text-like data into vector-ish algorithms. Of course, we also use Lucene for the similar purpose of importing entire Lucene indexes as vectors. That implies that we would not be hurt by any rearrangements of this kind. On Wed, Feb 24, 2010 at 11:09 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > Where would the analyzer APIs live, in the core or in the analyzer tree? > My guess is that they'd live in the core, and that the analyzer tree would > depend on the core, but one might do it the other way around if one felt > there were non-Lucene uses for analyzers. > -- Ted Dunning, CTO DeepDyve +
Ted Dunning 2010-02-24, 19:13
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Grant Ingersoll 2010-02-24, 21:04
On Feb 24, 2010, at 2:09 PM, Doug Cutting wrote: > Michael McCandless wrote: >> I think, in order to stop duplicating our analysis code across >> Nutch/Solr/Lucene, we should separate out the analyzers into a >> standalone package, and maybe as its own sub-project under the Lucene >> tlp? > > Is the goal to release these on a separate schedule from Lucene Java? If so, then this makes sense, if not, then perhaps this could be simply a separate source code tree in Lucene Java built as separate jars. > > Where would the analyzer APIs live, in the core or in the analyzer tree? My guess is that they'd live in the core, and that the analyzer tree would depend on the core, but one might do it the other way around if one felt there were non-Lucene uses for analyzers. > > Note that subprojects with different committer lists are an anti-pattern at Apache. We've long done this in Lucene, but have recently been asked by the board to consider breaking most subprojects into their own TLPs. Yeah, I've seen rumblings of this, but not sure why it is a big deal here. Many of Lucene's projects are related and interoperate with some committer overlap, but not all. For instance, Lucene.NET and PyLucene don't have a lot of overlap committer wise, but it would be silly for them to be TLPs. To me, Lucene has spun off subprojects when it makes sense, i.e. Hadoop and potentially Mahout in the near future, but otherwise, "if it ain't broke, don't fix it". > Would analyzers someday make sense as an indepdendent TLP? If not, then a subproject with disjoint committers might not be the right pattern. > In my mind, I think all current committers for Lucene/Nutch/Solr would be committers on this new project. -Grant +
Grant Ingersoll 2010-02-24, 21:04
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Uri Boness 2010-03-02, 17:39
Hi,
Just found out about this discussion so I realize I'm stepping in rather late with my feedback... still for what it's worth, here it is :-). In general I'm against this proposal as I believe it's can cause more harm than good. The way I (and many others) see Lucene is as a separate effort than Solr. I'm *big* fan of Solr and (as some of you may know) I'm using it daily and promoting it where/when I can. That said, I'm also a big fan of Lucene and I believe Solr has its value and use cases while Lucene has its own. Joining Solr with Lucene has the potential of creating a "virtual" monopoly over Solr-like solutions built on top of Lucene which is not community friendly but more importantly it puts the competition for Solr in jeopardy. IMO competition is a key advantage for products/projects. Yes, there is competition that will always come from the commercial vendors, but competition and challenges must also come from the open source community. This a big part of what drives innovation. Furthermore, the community and the users of Lucene should have the power/ability to decide on which solutions they want to go for - this is true community driven development way. I fully agree that there are many duplication in the work that is currently being done in Solr. But it mainly originates in Solr not in Lucene and the Lucene community should not be bothered by that. Such duplicate work should be addressed in the Solr project. So for example, take the analysis code... if all the work that has gone into the analyzers in Solr would have been committed in Lucene from the start, there wouldn't have been duplications. Same goes for the spatial support or other duplicate work. Solr development certainly proven to push Lucene development in many ways, and the best way to handle it is to contribute back all this goodness to Lucene. And yes, it means that Solr releases will need to wait for official Lucene releases, or in the mean time have their own custom Lucene distributions, but this is the fair play that all Lucene based solutions (let it be Katta, ElasticSearch, Sensei, or any other) will have to deal with. > Merging committers. I believe this will create a proliferation of commiters on these projects which can bring a lot of mess. Let Lucene commiters focus on what they do and know best - which is Lucene, and let Solr committer focus on Solr. If a Solr committer can bring a lot of value to Lucene, then yes, sure, make him/her a Lucene committers, but IMO being a Solr committer doesn't automatically give anyone the credentials or the skills to be a Lucene committer... mainly because the work done is Solr is often at a higher level and often not related to Lucene at all. > Single source for all the code dup we now have across the > projects (my original reason, specifically on analyzers, for > starting this). As mentioned above, this can easily be done by contributing the changes to the analyzers back to Lucene. > Whenever a new feature is added to Lucene, we'd work through what > the impact is to Solr. This can still mean we separately develop > exposure in Solr, but it'd get us to at least more immediately > think about it. This is something that Solr committers need to be responsible for, not lucene commiters. Lucene committers need to make sure that Lucene works and is bug free. I don't think it makes sense to push Solr responsibilities on to Lucene committers. > Solr is Lucene's biggest direct user -- most people who use Lucene > use it through Solr -- so having it more closely integrated means > we know sooner if we broke something. > I disagree here. I believe Lucene still has larger install base than Solr. Think of Jackrabbit which uses Lucene directly and all the CMSs that use Jackrabbit. Think of frameworks like Compass and Hibernate Search (that use Lucene directly) which are used in a lot of JEE deployments around the world. And certainly there are a lot of large infrastructures that use Lucene directly as well (as in LinkedIn for example). Solr is great in what it does but it is certainly not everything when it comes to open source search or Lucene. True, but again, this is an issue Solr committers will have to deal with. And yes, it means that Solr will almost always be one step behind Lucene, but that's how it works with every dependency on every library you use. If you want to test the flex stuff and it's currently being developed as a separate lucene branch, then you can create a separate Solr branch to see how it works and what future changes might need to go into Solr. Again, Lucene committers shouldn't bother with this problem and the development of Lucene shouldn't be effected due Solr related issues. Also take into account the huge difference in the release cycles between the projects. Lucene has quite a steady release cycle (last year it was quite constant on a release every 3 months or so). Solr on the other hand, has longer release cycles that can span more than a year. A lot of the issues that stall Solr releases have nothing to do with Lucene and Lucene release cycle shouldn't suffer from that. Furthermore users/projects/products that use Lucene directly should not suffer from that as well. All the goodness that is developed in Lucene and all the bug fixes should be available to Lucene users to download as soon as they're ready - they don't need to suffer from any Solr related issues. Please rest assure that my goal here is not to step on anyone's toes. I'm not a committer on either project but I certainly want to see these two projects go the right direction (at least the direction I believe is right). So just wanted to express my concerns here. Keep up the good work! Cheers, Uri +
Uri Boness 2010-03-02, 17:39
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Ard Schrijvers 2010-03-03, 09:21
On Tue, Mar 2, 2010 at 6:39 PM, Uri Boness <[EMAIL PROTECTED]> wrote:
> > I disagree here. I believe Lucene still has larger install base than Solr. > Think of Jackrabbit which uses Lucene directly and all the CMSs that use > Jackrabbit. Think of frameworks like Compass and Hibernate Search (that use > Lucene directly) which are used in a lot of JEE deployments around the > world. And certainly there are a lot of large infrastructures that use > Lucene directly as well (as in LinkedIn for example). Solr is great in what > it does but it is certainly not everything when it comes to open source > search or Lucene. I have been involved in the Lucene implementations of Jackrabbit (and before Slide). With respect to repositories (jsr-170 / jsr-283), where the backing persistence are some database and the storage is mainly key-value's, we use Lucene to do all relational queries (a subset of xpath/sql is translated to Lucene queries). This custom Lucene implementation (see [1] for overview) can imo never be replaced by Solr afaik, the only thing it has in common with Solr is that they both use Lucene. I agree with Uri that Lucene has a much larger install base than Solr Regards Ard [1] http://jackrabbit.apache.org/index-readers.html > Cheers, > Uri > +
Ard Schrijvers 2010-03-03, 09:21
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Grant Ingersoll 2010-03-03, 14:06
I think it should be clarified, that those who are using Lucene would not be affected at all by this proposal, other than they would probably get some things from Solr that they wish they had anyway (Analyzers, likely Faceting, etc). I would encourage people to go read what Mike wrote again. There would still be Lucene jars. There would still be Solr jars. All of those third party projects would still build exactly as they do now, unless of course they want to add new jars. Most of the merging is behind the scenes, like a single dev list for coordination. I would suspect in practice that most Solr committers would still focus on Solr, but...
What you would also be getting is less friction (and I don't mean that in a negative way) about where things should go. The reason there is often duplication of efforts is mainly due to the arbitrary boundary put up by the fact that most Solr committers are not Lucene committers. So, when a Solr committer comes up w/ something that may belong in Lucene proper (an Analyzer is just one example) they don't bother to make the effort to put it in Lucene, so Lucene loses out. -Grant On Mar 3, 2010, at 1:21 AM, Ard Schrijvers wrote: > On Tue, Mar 2, 2010 at 6:39 PM, Uri Boness <[EMAIL PROTECTED]> wrote: >> >> I disagree here. I believe Lucene still has larger install base than Solr. >> Think of Jackrabbit which uses Lucene directly and all the CMSs that use >> Jackrabbit. Think of frameworks like Compass and Hibernate Search (that use >> Lucene directly) which are used in a lot of JEE deployments around the >> world. And certainly there are a lot of large infrastructures that use >> Lucene directly as well (as in LinkedIn for example). Solr is great in what >> it does but it is certainly not everything when it comes to open source >> search or Lucene. > > I have been involved in the Lucene implementations of Jackrabbit (and > before Slide). With respect to repositories (jsr-170 / jsr-283), where > the backing persistence are some database and the storage is mainly > key-value's, we use Lucene to do all relational queries (a subset of > xpath/sql is translated to Lucene queries). This custom Lucene > implementation (see [1] for overview) can imo never be replaced by > Solr afaik, the only thing it has in common with Solr is that they > both use Lucene. > > I agree with Uri that Lucene has a much larger install base than Solr > > Regards Ard > > [1] http://jackrabbit.apache.org/index-readers.html > >> Cheers, >> Uri >> +
Grant Ingersoll 2010-03-03, 14:06
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Ard Schrijvers 2010-03-03, 14:21
On Wed, Mar 3, 2010 at 3:06 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> I think it should be clarified, that those who are using Lucene would not be affected at all by this proposal, other than they would probably get some things from Solr that they wish they > had anyway (Analyzers, likely Faceting, etc). I would encourage people to go read what Mike wrote again. The thread was a little unclear to me as it seemed to be send to this list while it had been started already some time before. Your explanation seems reasonable to me Regards Ard >There would still be Lucene jars. There would still be Solr jars. All of those third party projects would still build exactly as they do now, unless of course they want to add new jars. > Most of the merging is behind the scenes, like a single dev list for coordination. I would suspect in practice that most Solr committers would still focus on Solr, but... > > What you would also be getting is less friction (and I don't mean that in a negative way) about where things should go. The reason there is often duplication of efforts is mainly due to the >arbitrary boundary put up by the fact that most Solr committers are not Lucene committers. So, when a Solr committer comes up w/ something that may belong in Lucene proper (an >Analyzer is just one example) they don't bother to make the effort to put it in Lucene, so Lucene loses out. > > -Grant > +
Ard Schrijvers 2010-03-03, 14:21
-
Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?Mattmann, Chris A 2010-03-02, 07:17
Hey Mike,
> This looks great! Thanks! > > But, the goal is to make a standalone toolkit exposing GIS functions, > right? Yep you got it! > > My original question (integrating this into Lucene/Solr) remains. Sure, I think the goal would be to provide only the Spatial aspects required by Search (e.g., filters for documents, field types, etc.) as small classes in Lucene/Solr-land, and do the heavy lifting in the SIS project. > > EG there's alot of good working happening now in Solr to make spatial > search available. How will that find its way back to Lucene? Lucene > has its own (now duplicate) spatial package that was already > developed. Users will now be confused about the two, each have > different bugs/features, etc. I think as we move towards having an official SIS/spatial project and start to have releases/libraries, etc., it could partially help, but not totally alleviate, this issue. Cheers, Chris > > On Mon, Mar 1, 2010 at 1:28 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: >> I'm glad that you brought that up! :) >> >> Check out: >> >> http://incubator.apache.org/projects/sis.html >> >> We're just starting to tackle that very issue right >> now...patches/ideas/contributions welcome. >> >> Cheers, >> Chris >> >> >> >> On 3/1/10 11:25 AM, "Michael McCandless" <[EMAIL PROTECTED]> wrote: >> >> Because the code dup with analyzers is only one of the problems to >> solve. In fact, it's the easiest of the problems to solve (that's why >> I proposed it, only, first). >> >> A more differentiating example is a much less mature module.... >> >> EG take spatial -- if Solr were its own TLP, how could spatial be >> built out in a way that we don't waste effort, and so that both direct >> Lucene and Solr users could use it when it's released? >> >> Mike >> >> On Mon, Mar 1, 2010 at 1:07 PM, Mattmann, Chris A (388J) >> <[EMAIL PROTECTED]> wrote: >>> Hi Mike, >>> >>> I'm not sure I follow this line of thinking: how would Solr being a TLP >>> affect the creation of a separate project/module for Analyzers any more so >>> than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend >>> on the newly created refactored Analysis project. >>> >>> Chris >>> >>> >>> >>> On 3/1/10 10:44 AM, "Michael McCandless" <[EMAIL PROTECTED]> wrote: >>> >>> If we don't somehow first address the code duplication across the 2 >>> projects, making Solr a TLP will make things worse. >>> >>> I started here with analysis because I think that's the biggest pain >>> point: it seemed like an obvious first step to fixing the code >>> duplication and thus the most likely to reach some consensus. And >>> it's also very timely: Robert is right now making all kinds of great >>> fixes to our collective analyzers (in between bouts of fuzzy DFA >>> debugging). >>> >>> But it goes beyond analyzers: I'd like to see other modules, now in >>> Solr, eventually moved to Lucene, because they really are "core" >>> functionality (eg facets, function (and other?) queries, spatial, >>> maybe improvements to spellchecker/highlighter). How can we do this? >>> >>> And how can we do this so that it "lasts" over time? If new cool >>> "core" things are born in Solr-land (which of course happens alot -- >>> lots of good healthy usage), how will they find their way back to >>> Lucene? >>> >>> Yonik's proposal (merging development of Solr/Lucene, but keeping all >>> else separate) would achieve this. >>> >>> If we do the opposite (Solr -> TLP), how could we possibly achieve >>> this? >>> >>> I guess one possibility is to just suck it up and duplicate the code. >>> Meaning, each project will have to manually merge fixes in from the >>> other project (so long as there's someone around with the itch to do >>> so). Lucene would copy in all of Solr's analysis, and vice-versa (and >>> likewise other dup'd functionality). I really dislike this >>> solution... it will confuse the daylights out of users, its error ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2010-03-02, 07:17
|