|
Mark Miller
2008-12-02, 23:43
John Wang
2008-12-03, 00:02
Mark Miller
2008-12-03, 00:11
John Wang
2008-12-03, 00:22
Michael McCandless
2008-12-03, 17:14
Doug Cutting
2008-12-03, 18:07
John Wang
2008-12-03, 19:26
John Wang
2008-12-03, 19:43
Grant Ingersoll
2008-12-03, 22:52
John Wang
2008-12-04, 05:36
robert engels
2008-12-04, 05:49
Robert Muir
2008-12-04, 06:13
John Wang
2008-12-04, 06:24
eks dev
2008-12-04, 06:36
Robert Muir
2008-12-04, 07:03
John Wang
2008-12-04, 07:10
John Wang
2008-12-04, 07:25
Robert Muir
2008-12-04, 07:27
John Wang
2008-12-04, 07:45
Robert Muir
2008-12-04, 07:58
John Wang
2008-12-04, 09:00
Michael McCandless
2008-12-04, 11:32
Mark Miller
2008-12-04, 11:42
Grant Ingersoll
2008-12-04, 13:24
John Wang
2008-12-04, 16:48
Doug Cutting
2008-12-04, 18:46
Jason Rutherglen
2008-12-04, 19:21
Doug Cutting
2008-12-04, 20:01
Jason Rutherglen
2008-12-04, 21:24
Grant Ingersoll
2008-12-04, 23:23
John Wang
2008-12-05, 00:18
Doug Cutting
2008-12-05, 17:18
John Wang
2008-12-05, 19:23
Michael McCandless
2008-12-05, 20:07
John Wang
2008-12-05, 21:10
Doug Cutting
2008-12-05, 21:13
Doug Cutting
2008-12-05, 21:23
John Wang
2008-12-05, 21:41
Michael McCandless
2008-12-05, 21:47
Jason Rutherglen
2008-12-05, 22:24
Doug Cutting
2008-12-05, 22:40
Jason Rutherglen
2008-12-06, 00:02
eks dev
2008-12-08, 21:37
robert engels
2008-12-08, 21:51
Erik Hatcher
2008-12-08, 22:40
robert engels
2008-12-08, 22:49
Earwin Burrfoot
2008-12-08, 22:53
robert engels
2008-12-08, 22:56
markharw00d
2008-12-08, 23:10
Grant Ingersoll
2008-12-09, 17:18
|
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesMark Miller 2008-12-02, 23:43
Woah! I think you got the wrong impression. I think Doug said
basically what I was thinking (if not a bit more clearly than I was thinking it). I think we are all open to any good patches. It's nice to understand and discuss them first though. To reiterate what doug mentioned, sometime you IMp serializable for RMI but you don't want to fully support it. Mabye it's not great java, but it's common enough, and makes sense to me in certain instances. - Mark On Dec 2, 2008, at 6:30 PM, "John Wang (JIRA)" <[EMAIL PROTECTED]> wrote: > > [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652594#action_12652594 > ] > > John Wang commented on LUCENE-1473: > ----------------------------------- > > the fact an object implements Serializable implies this object can > be serialized. It is a known good java programming practice to > include a suid to the class (as a static variable) when the object > declares itself to be Serializable. If it is not meant to be > serialized, why did it implement Serializable. Furthermore, what is > the reason to avoid it being serialized? I find the reason being the > cost of support kinda ridiculous, seems this reason can be applied > to any bug fix, because this at the end of the day, it is a bug. > > I don't understand the issue of "extra bytes" to the term dictionary > if the Term instance is not actually serialized to the index (at > least I really hope that is not done) > > The serialVersionUID (suid) is a long because it is a java thing. > Here is a link to some information on the subject: > http://java.sun.com/developer/technicalArticles/Programming/serialization/ > > Use case: deploying lucene in a distributed environment, we have a > broker/server architecture. (standard stuff), we want roll out > search servers with lucene 2.4 instance by instance. The problem is > that the broker is sending a Query object to the searcher via java > serialization at the server level, and the broker is running 2.3. > And because of specifically this problem, 2.3 brokers cannot to talk > to 2.4 search servers even when the Query object was not changed. > > To me, this is a very valid use-case. The problem was two different > people did the release with different compilers. > > At the risk of pissing off the Lucene powerhouse, I feel I have to > express some candor. I am growing more and more frustrated with the > lack of the open source nature of this project and its unwillingness > to work with the developer community. This is a rather trivial > issue, and it is taking 7 back-and-forth's to reiterate some > standard Java behavior that has been around for years. > > Lucene is a great project and has enjoyed great success, and I think > it is to everyone's interest to make sure Lucene grows in a healthy > environment. > > > >> Implement Externalizable in main top level searcher classes >> ----------------------------------------------------------- >> >> Key: LUCENE-1473 >> URL: https://issues.apache.org/jira/browse/LUCENE-1473 >> Project: Lucene - Java >> Issue Type: Bug >> Components: Search >> Affects Versions: 2.4 >> Reporter: Jason Rutherglen >> Priority: Minor >> Attachments: LUCENE-1473.patch >> >> >> To maintain serialization compatibility between Lucene versions, >> major classes can implement Externalizable. This will make >> Serialization faster due to no reflection required and maintain >> backwards compatibility. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > ------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-03, 00:02
I have described our use-case in good detail. I think it is a common
architecture. And we are not using RemoteSearcher. This problem is not tied to RemoteSearcher, and we are not using RMI. Serialized java objects can be used at places other than RMI. "sometime you IMp serializable for RMI but you don't want to fully support it. Mabye it's not great java, but it's common enough, and makes sense to me in certain instances." - does not make sense to me. There are lotsa bugs that are common, e.g. thread-safety, dead-lock, memory leak, and they are bad java, doesn't mean they should not be addressed. Pardon me for being blunt, but this is really a bug: the expected behavior stated by the API is not honored. It would have been avoided if the same compiler was used for the release, with Java being WORA, this smells like a bug to me. My frustration is not unfounded, here are some examples I personally ran into: https://issues.apache.org/jira/browse/LUCENE-1246: simple 1 line null check, over 8 months, and still being "discussed" https://issues.apache.org/jira/browse/SOLR-243: with 4 votes, also few lines of change with the patch was originally done, over 18months, and still being "discussed" -John On Tue, Dec 2, 2008 at 3:43 PM, Mark Miller <[EMAIL PROTECTED]> wrote: > Woah! I think you got the wrong impression. I think Doug said basically > what I was thinking (if not a bit more clearly than I was thinking it). I > think we are all open to any good patches. It's nice to understand and > discuss them first though. > > To reiterate what doug mentioned, sometime you IMp serializable for RMI but > you don't want to fully support it. Mabye it's not great java, but it's > common enough, and makes sense to me in certain instances. > > - Mark > > > > On Dec 2, 2008, at 6:30 PM, "John Wang (JIRA)" <[EMAIL PROTECTED]> wrote: > > >> [ >> https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652594#action_12652594 >> ] >> >> John Wang commented on LUCENE-1473: >> ----------------------------------- >> >> the fact an object implements Serializable implies this object can be >> serialized. It is a known good java programming practice to include a suid >> to the class (as a static variable) when the object declares itself to be >> Serializable. If it is not meant to be serialized, why did it implement >> Serializable. Furthermore, what is the reason to avoid it being serialized? >> I find the reason being the cost of support kinda ridiculous, seems this >> reason can be applied to any bug fix, because this at the end of the day, it >> is a bug. >> >> I don't understand the issue of "extra bytes" to the term dictionary if >> the Term instance is not actually serialized to the index (at least I really >> hope that is not done) >> >> The serialVersionUID (suid) is a long because it is a java thing. Here is >> a link to some information on the subject: >> http://java.sun.com/developer/technicalArticles/Programming/serialization/ >> >> Use case: deploying lucene in a distributed environment, we have a >> broker/server architecture. (standard stuff), we want roll out search >> servers with lucene 2.4 instance by instance. The problem is that the broker >> is sending a Query object to the searcher via java serialization at the >> server level, and the broker is running 2.3. And because of specifically >> this problem, 2.3 brokers cannot to talk to 2.4 search servers even when the >> Query object was not changed. >> >> To me, this is a very valid use-case. The problem was two different people >> did the release with different compilers. >> >> At the risk of pissing off the Lucene powerhouse, I feel I have to express >> some candor. I am growing more and more frustrated with the lack of the open >> source nature of this project and its unwillingness to work with the >> developer community. This is a rather trivial issue, and it is taking 7 >> back-and-forth's to reiterate some standard Java behavior that has been
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesMark Miller 2008-12-03, 00:11
I worked on getting both of thoses issues resolved :) Sorry, can't
please everyone. If it helps, I'll commit that second one soon now that I can. It's lazy consensus around here man. Mabye it's not ideal, but I think the product speaks well for itself so far. I've never met a more accomadating group of guys myself. It is a large part volunteer effort. - Mark On Dec 2, 2008, at 7:02 PM, "John Wang" <[EMAIL PROTECTED]> wrote: > I have described our use-case in good detail. I think it is a common > architecture. And we are not using RemoteSearcher. This problem is > not tied to RemoteSearcher, and we are not using RMI. Serialized > java objects can be used at places other than RMI. > > "sometime you IMp serializable for RMI but you don't want to fully > support it. Mabye it's not great java, but it's common enough, and > makes sense to me in certain instances." - does not make sense to > me. There are lotsa bugs that are common, e.g. thread-safety, dead- > lock, memory leak, and they are bad java, doesn't mean they should > not be addressed. > > Pardon me for being blunt, but this is really a bug: the expected > behavior stated by the API is not honored. It would have been > avoided if the same compiler was used for the release, with Java > being WORA, this smells like a bug to me. > > My frustration is not unfounded, here are some examples I personally > ran into: > > https://issues.apache.org/jira/browse/LUCENE-1246: simple 1 line > null check, over 8 months, and still being "discussed" > > https://issues.apache.org/jira/browse/SOLR-243: with 4 votes, also > few lines of change with the patch was originally done, over > 18months, and still being "discussed" > > -John > > On Tue, Dec 2, 2008 at 3:43 PM, Mark Miller <[EMAIL PROTECTED]> > wrote: > Woah! I think you got the wrong impression. I think Doug said > basically what I was thinking (if not a bit more clearly than I was > thinking it). I think we are all open to any good patches. It's nice > to understand and discuss them first though. > > To reiterate what doug mentioned, sometime you IMp serializable for > RMI but you don't want to fully support it. Mabye it's not great > java, but it's common enough, and makes sense to me in certain > instances. > > - Mark > > > > On Dec 2, 2008, at 6:30 PM, "John Wang (JIRA)" <[EMAIL PROTECTED]> > wrote: > > > [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652594#action_12652594 > ] > > John Wang commented on LUCENE-1473: > ----------------------------------- > > the fact an object implements Serializable implies this object can > be serialized. It is a known good java programming practice to > include a suid to the class (as a static variable) when the object > declares itself to be Serializable. If it is not meant to be > serialized, why did it implement Serializable. Furthermore, what is > the reason to avoid it being serialized? I find the reason being the > cost of support kinda ridiculous, seems this reason can be applied > to any bug fix, because this at the end of the day, it is a bug. > > I don't understand the issue of "extra bytes" to the term dictionary > if the Term instance is not actually serialized to the index (at > least I really hope that is not done) > > The serialVersionUID (suid) is a long because it is a java thing. > Here is a link to some information on the subject: > http://java.sun.com/developer/technicalArticles/Programming/serialization/ > > Use case: deploying lucene in a distributed environment, we have a > broker/server architecture. (standard stuff), we want roll out > search servers with lucene 2.4 instance by instance. The problem is > that the broker is sending a Query object to the searcher via java > serialization at the server level, and the broker is running 2.3. > And because of specifically this problem, 2.3 brokers cannot to talk
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-03, 00:22
If you guys need help, maybe you guys should expand your committer list?
"product speaks well for itself so far", from what I have heard, losta people are just branching off the code-base and making changes and do merges every release. I really don't want to do that here, but I am being forced down that road. What I think should be avoided what happened to Linux, where there are different versions of the kernel, e.g. there are different version of lucene projects. Don't get me wrong, I think it is one of the best projects out there. But sometimes I think you guys should listen to the community a bit more, instead of presuming how the product is used. Anyway, thanks for looking into those issues. -John On Tue, Dec 2, 2008 at 4:11 PM, Mark Miller <[EMAIL PROTECTED]> wrote: > I worked on getting both of thoses issues resolved :) Sorry, can't please > everyone. If it helps, I'll commit that second one soon now that I can. It's > lazy consensus around here man. Mabye it's not ideal, but I think the > product speaks well for itself so far. I've never met a more accomadating > group of guys myself. It is a large part volunteer effort. > > - Mark > > On Dec 2, 2008, at 7:02 PM, "John Wang" <[EMAIL PROTECTED]> wrote: > > I have described our use-case in good detail. I think it is a common > architecture. And we are not using RemoteSearcher. This problem is not tied > to RemoteSearcher, and we are not using RMI. Serialized java objects can be > used at places other than RMI. > "sometime you IMp serializable for RMI but you don't want to fully support > it. Mabye it's not great java, but it's common enough, and makes sense to me > in certain instances." - does not make sense to me. There are lotsa bugs > that are common, e.g. thread-safety, dead-lock, memory leak, and they are > bad java, doesn't mean they should not be addressed. > > Pardon me for being blunt, but this is really a bug: the expected behavior > stated by the API is not honored. It would have been avoided if the same > compiler was used for the release, with Java being WORA, this smells like a > bug to me. > > My frustration is not unfounded, here are some examples I personally ran > into: > > <https://issues.apache.org/jira/browse/LUCENE-1246> > https://issues.apache.org/jira/browse/LUCENE-1246: simple 1 line null > check, over 8 months, and still being "discussed" > > <https://issues.apache.org/jira/browse/SOLR-243> > https://issues.apache.org/jira/browse/SOLR-243: with 4 votes, also few > lines of change with the patch was originally done, over 18months, and still > being "discussed" > > -John > > On Tue, Dec 2, 2008 at 3:43 PM, Mark Miller < <[EMAIL PROTECTED]> > [EMAIL PROTECTED]> wrote: > >> Woah! I think you got the wrong impression. I think Doug said basically >> what I was thinking (if not a bit more clearly than I was thinking it). I >> think we are all open to any good patches. It's nice to understand and >> discuss them first though. >> >> To reiterate what doug mentioned, sometime you IMp serializable for RMI >> but you don't want to fully support it. Mabye it's not great java, but it's >> common enough, and makes sense to me in certain instances. >> >> - Mark >> >> >> >> On Dec 2, 2008, at 6:30 PM, "John Wang (JIRA)" < <[EMAIL PROTECTED]> >> [EMAIL PROTECTED]> wrote: >> >> >>> [ >>> <https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652594#action_12652594> >>> https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652594#action_12652594 >>> ] >>> >>> John Wang commented on LUCENE-1473: >>> ----------------------------------- >>> >>> the fact an object implements Serializable implies this object can be >>> serialized. It is a known good java programming practice to include a suid >>> to the class (as a static variable) when the object declares itself to be >>> Serializable. If it is not meant to be serialized, why did it implement
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesMichael McCandless 2008-12-03, 17:14
John Wang wrote: > It would have been avoided if the same compiler was used for the > release, I took the same compiler (Sun JDK 1.6.0_06) and used the "serialver" tool to compute the SUID for Term.java, and on 2.3.2 it reports "554776219862331599L" for 2.4.0 and "435090971444481257L" for 2.3.2. In other words, the addition of "public Term(String field)" changed the SUID. Then I tried Sun JDK 1.4.2_15, and it reports the same results. Mike ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesDoug Cutting 2008-12-03, 18:07
John Wang wrote:
> If you guys need help, maybe you guys should expand your committer list? Committers are added when they've contributed a series of high-quality patches that have been committed, and demonstrated their ability to be easy to work with. Displaying anger is not a good way to become a committer. Calm persistence is advised. Lucene does not currently use Java Serialization much. Many committers may not be terribly familiar with it. > Use case: deploying lucene in a distributed environment, we have a > broker/server architecture. (standard stuff), we want roll out search > servers with lucene 2.4 instance by instance. The problem is that the > broker is sending a Query object to the searcher via java > serialization at the server level, and the broker is running 2.3. And > because of specifically this problem, 2.3 brokers cannot to talk to > 2.4 search servers even when the Query object was not changed. Thanks for providing a use case. One way to address this would be for Lucene to better support cross-version serialization. Another way might be for your application, which adds this requirement, to use an alternate representation for queries that it can guarantee is compatible across versions, e.g., a string. Might that be possible? Doug ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-03, 19:26
Doug:
My apologies if I came off seeming angry and/or trying to lobby to be a committer. Neither is the case. I am expressing a concern with how patches are being handled with this project, and providing my view point on how this can be better managed. Of course my concern can be either accepted or rejected. I just hope the committers would be "calm" enough to be able to see criticisms for what they are. I am a strong advocate of Lucene, hence my passion for its success. -John On Wed, Dec 3, 2008 at 10:07 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > John Wang wrote: > >> If you guys need help, maybe you guys should expand your committer list? >> > > Committers are added when they've contributed a series of high-quality > patches that have been committed, and demonstrated their ability to be easy > to work with. Displaying anger is not a good way to become a committer. > Calm persistence is advised. > > Lucene does not currently use Java Serialization much. Many committers may > not be terribly familiar with it. > > Use case: deploying lucene in a distributed environment, we have a >> broker/server architecture. (standard stuff), we want roll out search >> servers with lucene 2.4 instance by instance. The problem is that the >> broker is sending a Query object to the searcher via java >> serialization at the server level, and the broker is running 2.3. And >> because of specifically this problem, 2.3 brokers cannot to talk to >> 2.4 search servers even when the Query object was not changed. >> > > Thanks for providing a use case. One way to address this would be for > Lucene to better support cross-version serialization. Another way might be > for your application, which adds this requirement, to use an alternate > representation for queries that it can guarantee is compatible across > versions, e.g., a string. Might that be possible? > > Doug > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-03, 19:43
You are right, we can always transmit the string form and re-parse on the
other-end. Our problem is that we took this (serialization nature) for granted, and once something is deployed over a cluster, it would be difficult to do partial roll-outs in this case. But I guess there is no immediate remedy for this. Since we all agree careful scrutiny is a good thing: ScoreDocComparator.sortValue(), according to its javadoc: "The object returned must implement the java.io.Serializable interface." This has implicit implications how a distributed system should be designed around lucene, in my case result merge. You cannot transmit Strings or any other representatives around, because you don't know what the Comparable instance is (when SortField.type is set to Custom). I am curious, how would distributed Solr handle this without resorting to Java serialization? A side note, do you think returning Comparable here is good api design, shouldn't it be some sub-interface that extends both Comparable and Serializable, instead of resorting to javadoc? Thanks -John On Wed, Dec 3, 2008 at 10:19 AM, Doug Cutting (JIRA) <[EMAIL PROTECTED]>wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652882#action_12652882] > > Doug Cutting commented on LUCENE-1473: > -------------------------------------- > > > But, what's now being asked for (expected) with this issue is "long-term > persistence", which is really a very different beast and a much taller > order. > > That's the crux, alright. Does Lucene want to start adding cross-version > guarantees about the durability of its objects when serialized by Java > serialization. This is a hard problem. Systems like Thrift and > ProtocolBuffers offer support for this, but Java Serialiation itself doesn't > really provide much assistance. One can roll one's own serialization > compatibility story manually, as proposed by this patch, but that adds a > burden to the project. We'd need, for example, test cases that keep > serialized instances from past versions, so that we can be sure that patches > do not break this. > > The use case provided may not use RMI, but it is similar: it involves > transmitting Lucene objects over the wire between different versions of > Lucene. Since Java APIs, like Lucene, do not generally provide > cross-version compatibility, it would be safer to architect such a system so > that it controls the serialization of transmitted instances itself and can > thus guarantee their compatibility as the system is updated. Thus it would > develop its own representations for queries independent of Lucene's Query, > and map this to Lucene's Query. Is that not workable in this case? > > > > Implement Externalizable in main top level searcher classes > > ----------------------------------------------------------- > > > > Key: LUCENE-1473 > > URL: https://issues.apache.org/jira/browse/LUCENE-1473 > > Project: Lucene - Java > > Issue Type: Bug > > Components: Search > > Affects Versions: 2.4 > > Reporter: Jason Rutherglen > > Priority: Minor > > Attachments: LUCENE-1473.patch > > > > > > To maintain serialization compatibility between Lucene versions, major > classes can implement Externalizable. This will make Serialization faster > due to no reflection required and maintain backwards compatibility. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesGrant Ingersoll 2008-12-03, 22:52
On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote: > > > Hoss wrote: "sort of mythical "Lucene powerhouse" > Lucene seems to run itself quite differently than other open source > Java projects. Perhaps it would be good to spell out the reasons > for the reluctance to move ahead with features that developers work > on, that work, but do not go in. The developer contributions seem > to be quite low right now, especially compared to neighbor projects > such as Hadoop. Is this because fewer people are using Lucene? Or > is it due to the reluctance to work with the developer community? > Unfortunately the perception in the eyes of some people who work on > search related projects it is the latter. Or, could it be that Hadoop is relatively new and in vogue at the moment, very malleable and buggy(?) and has a HUGE corporate sponsor who dedicates lots of resources to it on a full time basis, whilst Lucene has been around in the ASF for 7+ years (and 12+ years total) and has a really large install base and thus must move more deliberately and basically has 1 person who gets to work on it full time while the rest of us pretty much volunteer? That's not an excuse, it's just the way it is. I personally, would love to work on Lucene all day every day as I have a lot of things I'd love to engage the community on, but the fact is I'm not paid to do that, so I give what I can when I can. I know most of the other committers are that way too. Thus, I don't think any one of us has a reluctance to move ahead with features or bug fixes. Looking at CHANGES.txt, I see a lot of contributors. Looking at java-dev and JIRA, I see lots of engagement with the community. Is it near the historical high for traffic, no it's not, but that isn't necessarily a bad thing. I think it's a sign that Lucene is pretty stable. What we do have a reluctance for are patches that don't have tests (i.e. this one), patches that massively change Lucene APIs in non- trivial ways or break back compatibility or are not kept up to date. Are we perfect? Of course not. I, personally, would love for there to be a way that helps us process a larger volume of patches (note, I didn't say commit a larger volume). Hadoop's automated patch tester would be a huge start in that, but at the end of the day, Lucene still works the way all ASF projects do: via meritocracy and volunteerism. You want stuff committed, keep it up to date, make it manageable to review, document it, respond to questions/concerns with answers as best you can. To that end, a real simple question can go a long way and getting something committed, and it simply is: "Hey Lucener's, what else can I do to help you review and commit LUCENE- XXXX?" Lather, rinse, repeat. Next thing you know, you'll be on the receiving end as a committer. -Grant ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-04, 05:36
Grant:
I am sorry that I disagree with some points: 1) "I think it's a sign that Lucene is pretty stable." - While lucene is a great project, especially with 2.x releases, great improvements are made, but do we really have a clear picture on how lucene is being used and deployed. While lucene works great running as a vanilla search library, when pushed to limits, one needs to "hack" into lucene to make certain things work. If 90% of the user base use it to build small indexes and using the vanilla api, and the other 10% is really stressing both on the scalability and api side and are running into issues, would you still say: "running well for 90% of the users, therefore it is stable or extensible"? I think it is unfair to the project itself to be measured by the vanilla use-case. I have done couple of large deployments, e.g. >30 million documents indexed and searched in realtime., and I really had to do some tweaking. 2) "You want stuff committed, keep it up to date, make it manageable to review, document it, respond to questions/concerns with answers as best you can. " - To some degree I would hope it depends on what the issue is, e.g. enforcing such process on a one-line null check seems to be an overkill. I agree with the process itself, what would make it better is some transparency on how patches/issues are evaluated to be committed. At least seemed from the outside, it is purely being decided on by the committers, and since my understanding is that an open source project belongs to the public, the public user base should have some say. 3) which brings me to this point: "I personally, would love to work on Lucene all day every day as I have a lot of things I'd love to engage the community on, but the fact is I'm not paid to do that, so I give what I can when I can. I know most of the other committers are that way too." - Is this really true? Isn't a large part of the committer base also a part of the for-profit, consulting business, e.g. Lucid? Would groups/companies that pay for consulting service get their patches/requirements committed with higher priority? If so, seems to me to be a conflict of interest there. 4) "Lather, rinse, repeat. Next thing you know, you'll be on the receiving end as a committer." - While I agree that being a committer is a great honor and many committers are awesome, but assuming everyone would want to be a committer is a little presumptuous. In conclusion, I hope I didn't unleash any wrath from the committers for expressing candor. -John On Wed, Dec 3, 2008 at 2:52 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote: > > >> >> Hoss wrote: "sort of mythical "Lucene powerhouse" >> Lucene seems to run itself quite differently than other open source Java >> projects. Perhaps it would be good to spell out the reasons for the >> reluctance to move ahead with features that developers work on, that work, >> but do not go in. The developer contributions seem to be quite low right >> now, especially compared to neighbor projects such as Hadoop. Is this >> because fewer people are using Lucene? Or is it due to the reluctance to >> work with the developer community? Unfortunately the perception in the eyes >> of some people who work on search related projects it is the latter. >> > > > Or, could it be that Hadoop is relatively new and in vogue at the moment, > very malleable and buggy(?) and has a HUGE corporate sponsor who dedicates > lots of resources to it on a full time basis, whilst Lucene has been around > in the ASF for 7+ years (and 12+ years total) and has a really large install > base and thus must move more deliberately and basically has 1 person who > gets to work on it full time while the rest of us pretty much volunteer? > That's not an excuse, it's just the way it is. I personally, would love to > work on Lucene all day every day as I have a lot of things I'd love to > engage the community on, but the fact is I'm not paid to do that, so I give
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesrobert engels 2008-12-04, 05:49
My two cents...
I think the committers do a great job of managing the product. I feel the single biggest failure when it comes to producing quality software is lack of vision, and/or enforcement of this vision. If every "wisher" or "submitter" had their code committed - even if it is "good code" - the product would quickly become unwieldy to maintain and/or learn (for new users), lessening its usefulness to everyone. The only problem I have with Lucene's current focus is that I feel the Lucene folks should work on standardizing the API, focusing on interfaces and/or abstract classes with proper protected level access. By doing this, people are much freer to develop their own enhancements, and can quickly apply them to later Lucene releases just by applying a patch (at worst), or just a link (at best !). Similar to how the JDK works. We have rarely if ever needed to change our code between JDK releases. I realize this is a dream right now, because of the bad shape (sorry) of the structure of much of Lucene, but if the committers spent more time on issues like this, I think they would hear far less complaints from the community. As an example of the above - being able to access the underlying readers in a multi-reader (I know there is a current bug for this). There is no harm to Lucene folks to expose this, and it is very helpful in many cases. If some developer uses this information in the wrong way, that is their fault, not Lucene's.... Making something protected is very different than making it public. Robert Engels On Dec 3, 2008, at 11:36 PM, John Wang wrote: > Grant: > > I am sorry that I disagree with some points: > > 1) "I think it's a sign that Lucene is pretty stable." - While > lucene is a great project, especially with 2.x releases, great > improvements are made, but do we really have a clear picture on how > lucene is being used and deployed. While lucene works great running > as a vanilla search library, when pushed to limits, one needs to > "hack" into lucene to make certain things work. If 90% of the user > base use it to build small indexes and using the vanilla api, and > the other 10% is really stressing both on the scalability and api > side and are running into issues, would you still say: "running > well for 90% of the users, therefore it is stable or extensible"? I > think it is unfair to the project itself to be measured by the > vanilla use-case. I have done couple of large deployments, e.g. >30 > million documents indexed and searched in realtime., and I really > had to do some tweaking. > > 2) "You want stuff committed, keep it up to date, make it > manageable to review, document it, respond to questions/concerns > with answers as best you can. " - To some degree I would hope it > depends on what the issue is, e.g. enforcing such process on a one- > line null check seems to be an overkill. I agree with the process > itself, what would make it better is some transparency on how > patches/issues are evaluated to be committed. At least seemed from > the outside, it is purely being decided on by the committers, and > since my understanding is that an open source project belongs to > the public, the public user base should have some say. > > 3) which brings me to this point: "I personally, would love to work > on Lucene all day every day as I have a lot of things I'd love to > engage the community on, but the fact is I'm not paid to do that, > so I give what I can when I can. I know most of the other > committers are that way too." - Is this really true? Isn't a large > part of the committer base also a part of the for-profit, > consulting business, e.g. Lucid? Would groups/companies that pay > for consulting service get their patches/requirements committed > with higher priority? If so, seems to me to be a conflict of > interest there. > > 4) "Lather, rinse, repeat. Next thing you know, you'll be on the
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesRobert Muir 2008-12-04, 06:13
sorry gotta speak up on this. i indexed 300m docs today. I'm using an out of
box jar. yeah i have some special subclasses but if i thought any of this stuff was general enough to be useful to others i'd submit it. I'm just happy to have something scalable that i can customize to my peculiarities. so i think i fit in your 10% and im not stressing on either scalability or api. thanks, robert On Thu, Dec 4, 2008 at 12:36 AM, John Wang <[EMAIL PROTECTED]> wrote: > Grant: > I am sorry that I disagree with some points: > > 1) "I think it's a sign that Lucene is pretty stable." - While lucene is a > great project, especially with 2.x releases, great improvements are made, > but do we really have a clear picture on how lucene is being used and > deployed. While lucene works great running as a vanilla search library, when > pushed to limits, one needs to "hack" into lucene to make certain things > work. If 90% of the user base use it to build small indexes and using the > vanilla api, and the other 10% is really stressing both on the scalability > and api side and are running into issues, would you still say: "running well > for 90% of the users, therefore it is stable or extensible"? I think it is > unfair to the project itself to be measured by the vanilla use-case. I have > done couple of large deployments, e.g. >30 million documents indexed and > searched in realtime., and I really had to do some tweaking. > > -- Robert Muir [EMAIL PROTECTED]
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-04, 06:24
Nice!
Some questions: 1) one index? 2) how big is your document? e.g. how many terms etc. 3) are you serving(searching) the docs in realtime? 4) search speed? I'd love to learn more about your architecture. -John On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > sorry gotta speak up on this. i indexed 300m docs today. I'm using an out > of box jar. > > yeah i have some special subclasses but if i thought any of this stuff was > general enough to be useful to others i'd submit it. I'm just happy to have > something scalable that i can customize to my peculiarities. > > so i think i fit in your 10% and im not stressing on either scalability or > api. > > thanks, > robert > > > On Thu, Dec 4, 2008 at 12:36 AM, John Wang <[EMAIL PROTECTED]> wrote: > >> Grant: >> I am sorry that I disagree with some points: >> >> 1) "I think it's a sign that Lucene is pretty stable." - While lucene is a >> great project, especially with 2.x releases, great improvements are made, >> but do we really have a clear picture on how lucene is being used and >> deployed. While lucene works great running as a vanilla search library, when >> pushed to limits, one needs to "hack" into lucene to make certain things >> work. If 90% of the user base use it to build small indexes and using the >> vanilla api, and the other 10% is really stressing both on the scalability >> and api side and are running into issues, would you still say: "running well >> for 90% of the users, therefore it is stable or extensible"? I think it is >> unfair to the project itself to be measured by the vanilla use-case. I have >> done couple of large deployments, e.g. >30 million documents indexed and >> searched in realtime., and I really had to do some tweaking. >> >> > > -- > Robert Muir > [EMAIL PROTECTED] >
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classeseks dev 2008-12-04, 06:36
John,
sorry I have to comment, but I feel here some substantial missconceptions abot Open Source 1) "e.g. >30 million documents indexed and searched in realtime., and I really had to do some tweaking." So what? What I or anyone else has to do with it? "some tweaking" is definitely better than making everything from the scratch or going to commercial vendors... no? 2) "what would make it better is some transparency on how patches/issues are evaluated to be committed. At least seemed from the outside, it is purely being decided on by the committers, and since my understanding is that an open source project belongs to the public, the public user base should have some say." Transparency, Jira + this mailing list. Everybody is allowed to express an opinion, *even committers* , weather you like it or not is just another question. If you put up convincing arguments, be assured even committers can change opinions. Imo, it does not go much more transparent than that. Sure it belongs to public, you do not have to pay for it, read ASF Licence. If you have better proposal on how to organize Open Source projects, speak-up. I do not know how we could ever avoid committers having final say on things without provoking haos? 3) "Would groups/companies that pay for consulting service get their patches/requirements committed with higher priority?" Sure, of course, *even commmercial users are parts of the comunity* and we schould be greatful that they contribute and commit ther resouces so that others can benefit from it. Think again about it, there is absolutly nothing bad behind it, no conspiracy. Just one example on micro scale. I had an itch and had to do some "tweaking", my customer(comercial) had nothing against contributing back to Lucene, so I did it. I get my money and I give something back to the comunity. End result, I am happy, Lucene gets better and everybody profits a bit from it. Should I have problems with my consciones? I do not think so. Conflict of interests, no, that is rather evolution. What do you think why commiters work on Lucene, do you honestly beleive they have no families to feed and just sit and wait someone feeds them with proposals for nice features? Commiters as well as everybody else here have their own, private agendas, goals, ideas, needs ... and all these things get somehow conflated into Lucene. Back to my example, I was lucky that a few commiters shared my opinion about usfulness and the priority of this patch, it could have been different. If all commiters were busy with private agenda and had higher priorities at that moment, well, that would habe been bad luck for me. No hard feelings even in that case, why should I expect someone puts my itch as their priority. Cheers, eks ________________________________ From: John Wang <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Thursday, 4 December, 2008 6:36:28 Subject: Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes Grant: I am sorry that I disagree with some points: 1) "I think it's a sign that Lucene is pretty stable." - While lucene is a great project, especially with 2.x releases, great improvements are made, but do we really have a clear picture on how lucene is being used and deployed. While lucene works great running as a vanilla search library, when pushed to limits, one needs to "hack" into lucene to make certain things work. If 90% of the user base use it to build small indexes and using the vanilla api, and the other 10% is really stressing both on the scalability and api side and are running into issues, would you still say: "running well for 90% of the users, therefore it is stable or extensible"? I think it is unfair to the project itself to be measured by the vanilla use-case. I have done couple of large deployments, e.g. >30 million documents indexed and searched in realtime., and I really had to do some tweaking. 2) "You want stuff committed, keep it up to date, make it manageable to review, document it, respond to questions/concerns with answers as best you can. " - To some degree I would hope it depends on what the issue is, e.g. enforcing such process on a one-line null check seems to be an overkill. I agree with the process itself, what would make it better is some transparency on how patches/issues are evaluated to be committed. At least seemed from the outside, it is purely being decided on by the committers, and since my understanding is that an open source project belongs to the public, the public user base should have some say. 3) which brings me to this point: "I personally, would love to work on Lucene all day every day as I have a lot of things I'd love to engage the community on, but the fact is I'm not paid to do that, so I give what I can when I can. I know most of the other committers are that way too." - Is this really true? Isn't a large part of the committer base also a part of the for-profit, consulting business, e.g. Lucid? Would groups/companies that pay for consulting service get their patches/requirements committed with higher priority? If so, seems to me to be a conflict of interest there. 4) "Lather, rinse, repeat. Next thing you know, you'll be on the receiving end as a committer." - While I agree that being a committer is a great honor and many committers are awesome, but assuming everyone would want to be a committer is a little presumptuous. In conclusion, I hope I didn't unleash any wrath from the committers for expressing candor. -John On Wed, Dec 3, 2008 at 2:52 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote: Hoss wrote: "sort of mythical "Lucene powerhouse" Lucene seems to run itself quite differently than other open source Java projects. Perhaps it would be good to spell out the reasons for the reluctance to move ahead with features that developers work on, that work, but do not go in. Th
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesRobert Muir 2008-12-04, 07:03
On Thu, Dec 4, 2008 at 1:24 AM, John Wang <[EMAIL PROTECTED]> wrote:
> Nice! > Some questions: > > 1) one index? > no, but two individual ones today were around 100M docs > 2) how big is your document? e.g. how many terms etc. > last one built has over 4M terms > 3) are you serving(searching) the docs in realtime? > i dont understand this question, but searching is slower if i am indexing on a disk thats also being searched. > > 4) search speed? > usually subsecond (or close) after some warmup. while this might seem slow its fast compared to the competition, trust me. > > I'd love to learn more about your architecture. > i hate to say you would be disappointed, but theres nothign fancy. probably why it works... > > -John > > > On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > >> sorry gotta speak up on this. i indexed 300m docs today. I'm using an out >> of box jar. >> >> yeah i have some special subclasses but if i thought any of this stuff was >> general enough to be useful to others i'd submit it. I'm just happy to have >> something scalable that i can customize to my peculiarities. >> >> so i think i fit in your 10% and im not stressing on either scalability or >> api. >> >> thanks, >> robert >> >> >> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <[EMAIL PROTECTED]> wrote: >> >>> Grant: >>> I am sorry that I disagree with some points: >>> >>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene is >>> a great project, especially with 2.x releases, great improvements are made, >>> but do we really have a clear picture on how lucene is being used and >>> deployed. While lucene works great running as a vanilla search library, when >>> pushed to limits, one needs to "hack" into lucene to make certain things >>> work. If 90% of the user base use it to build small indexes and using the >>> vanilla api, and the other 10% is really stressing both on the scalability >>> and api side and are running into issues, would you still say: "running well >>> for 90% of the users, therefore it is stable or extensible"? I think it is >>> unfair to the project itself to be measured by the vanilla use-case. I have >>> done couple of large deployments, e.g. >30 million documents indexed and >>> searched in realtime., and I really had to do some tweaking. >>> >>> >> >> -- >> Robert Muir >> [EMAIL PROTECTED] >> > > -- Robert Muir [EMAIL PROTECTED]
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-04, 07:10
Thanks Robert for sharing.
Good to hear it is working for what you need it to do. 3) Especially with ReadOnlyIndexReaders, you should not be blocked while indexing. Especially if you have multicore machines. 4) do you stay with sub-second responses with high thru-put? -John On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > > > On Thu, Dec 4, 2008 at 1:24 AM, John Wang <[EMAIL PROTECTED]> wrote: > >> Nice! >> Some questions: >> >> 1) one index? >> > no, but two individual ones today were around 100M docs > >> 2) how big is your document? e.g. how many terms etc. >> > last one built has over 4M terms > >> 3) are you serving(searching) the docs in realtime? >> > i dont understand this question, but searching is slower if i am indexing > on a disk thats also being searched. > >> >> 4) search speed? >> > usually subsecond (or close) after some warmup. while this might seem slow > its fast compared to the competition, trust me. > >> >> I'd love to learn more about your architecture. >> > i hate to say you would be disappointed, but theres nothign fancy. probably > why it works... > >> >> -John >> >> >> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >> >>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an out >>> of box jar. >>> >>> yeah i have some special subclasses but if i thought any of this stuff >>> was general enough to be useful to others i'd submit it. I'm just happy to >>> have something scalable that i can customize to my peculiarities. >>> >>> so i think i fit in your 10% and im not stressing on either scalability >>> or api. >>> >>> thanks, >>> robert >>> >>> >>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <[EMAIL PROTECTED]> wrote: >>> >>>> Grant: >>>> I am sorry that I disagree with some points: >>>> >>>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene is >>>> a great project, especially with 2.x releases, great improvements are made, >>>> but do we really have a clear picture on how lucene is being used and >>>> deployed. While lucene works great running as a vanilla search library, when >>>> pushed to limits, one needs to "hack" into lucene to make certain things >>>> work. If 90% of the user base use it to build small indexes and using the >>>> vanilla api, and the other 10% is really stressing both on the scalability >>>> and api side and are running into issues, would you still say: "running well >>>> for 90% of the users, therefore it is stable or extensible"? I think it is >>>> unfair to the project itself to be measured by the vanilla use-case. I have >>>> done couple of large deployments, e.g. >30 million documents indexed and >>>> searched in realtime., and I really had to do some tweaking. >>>> >>>> >>> >>> -- >>> Robert Muir >>> [EMAIL PROTECTED] >>> >> >> > > > -- > Robert Muir > [EMAIL PROTECTED] >
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-04, 07:25
Thanks Eks for the "education".
1) If you think Lucene is good enough for you, then great. I think there is room for improvement, and wanted to share on some work we did to the rest of the community thru open source. If you are happy to take a snapshot of lucene and build on top of it, then good for you. 2) yes, there is Jira. Yet at least seems to me the severity and votes do not reflect on how to patches gets committed. Good for you that your patches get regularly committed, I guess there is a lot for me to learn from you on how to do that. Obviously being out-spoken does not help. Open source politics, cool! 3) If that is how it works, then it is how it works. (Sounds a lot like the Spring project.) Seems like being a committer can be rather lucrative. My comment was on the statements of being volunteers and don't get paid, which is a little misleading. I guess I need to learn to be a good boy not to piss off the committers anymore (or convince my company to pay to get some patches in) And hopefully someday I get to grow up and get to become a committer and make some $ too. -John On Wed, Dec 3, 2008 at 10:36 PM, eks dev <[EMAIL PROTECTED]> wrote: > John, > sorry I have to comment, but I feel here some substantial missconceptions > abot Open Source > > 1) > "e.g. >30 million documents indexed and searched in realtime., and I really > had to do some tweaking." > So what? What I or anyone else has to do with it? "some tweaking" is > definitely better than making everything from the scratch or going to > commercial vendors... no? > > 2) > "what would make it better is some transparency on how patches/issues are > evaluated to be committed. At least seemed from the outside, it is purely > being decided on by the committers, and since my understanding is that an > open source project belongs to the public, the public user base should have > some say." > > Transparency, Jira + this mailing list. Everybody is allowed to express an > opinion, *even committers* , weather you like it or not is just another > question. If you put up convincing arguments, be assured even committers can > change opinions. > Imo, it does not go much more transparent than that. > Sure it belongs to public, you do not have to pay for it, read ASF Licence. > If you have better proposal on how to organize Open Source projects, > speak-up. I do not know how we could ever avoid committers having final say > on things without provoking haos? > > 3) "Would groups/companies that pay for consulting service get their > patches/requirements committed with higher priority?" > Sure, of course, *even commmercial users are parts of the comunity* and we > schould be greatful that they contribute and commit ther resouces so that > others can benefit from it. Think again about it, there is absolutly nothing > bad behind it, no conspiracy. > Just one example on micro scale. I had an itch and had to do some > "tweaking", my customer(comercial) had nothing against contributing back to > Lucene, so I did it. I get my money and I give something back to the > comunity. End result, I am happy, Lucene gets better and everybody profits a > bit from it. > Should I have problems with my consciones? I do not think so. > > Conflict of interests, no, that is rather evolution. What do you think why > commiters work on Lucene, do you honestly beleive they have no families to > feed and just sit and wait someone feeds them with proposals for nice > features? Commiters as well as everybody else here have their own, private > agendas, goals, ideas, needs ... and all these things get somehow conflated > into Lucene. > Back to my example, I was lucky that a few commiters shared my opinion > about usfulness and the priority of this patch, it could have been > different. If all commiters were busy with private agenda and had higher > priorities at that moment, well, that would habe been bad luck for me. No > hard feelings even in that case, why should I expect someone puts my itch as
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesRobert Muir 2008-12-04, 07:27
yeah i am using read-only.
i will admit to subclassing queryparser and having customized query/scorer for several. all queries contain fuzzy queries so this was necessary. "high" throughput i guess is a matter of opinion. in attempting to profile high-throughput, again customized query/scorer made it easy for me to simplify some things, such as some math in termquery that doesn't make sense (redundant) for my Similarity. everything is pretty much i/o bound now so if tehre is some throughput issue i will look into SSD for high volume indexes. i posted on Use Cases on the wiki how I made fuzzy and regex fast if you are curious. On Thu, Dec 4, 2008 at 2:10 AM, John Wang <[EMAIL PROTECTED]> wrote: > Thanks Robert for sharing. > Good to hear it is working for what you need it to do. > > 3) Especially with ReadOnlyIndexReaders, you should not be blocked while > indexing. Especially if you have multicore machines. > 4) do you stay with sub-second responses with high thru-put? > > -John > > > On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > >> >> >> On Thu, Dec 4, 2008 at 1:24 AM, John Wang <[EMAIL PROTECTED]> wrote: >> >>> Nice! >>> Some questions: >>> >>> 1) one index? >>> >> no, but two individual ones today were around 100M docs >> >>> 2) how big is your document? e.g. how many terms etc. >>> >> last one built has over 4M terms >> >>> 3) are you serving(searching) the docs in realtime? >>> >> i dont understand this question, but searching is slower if i am indexing >> on a disk thats also being searched. >> >>> >>> 4) search speed? >>> >> usually subsecond (or close) after some warmup. while this might seem slow >> its fast compared to the competition, trust me. >> >>> >>> I'd love to learn more about your architecture. >>> >> i hate to say you would be disappointed, but theres nothign fancy. >> probably why it works... >> >>> >>> -John >>> >>> >>> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >>> >>>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an >>>> out of box jar. >>>> >>>> yeah i have some special subclasses but if i thought any of this stuff >>>> was general enough to be useful to others i'd submit it. I'm just happy to >>>> have something scalable that i can customize to my peculiarities. >>>> >>>> so i think i fit in your 10% and im not stressing on either scalability >>>> or api. >>>> >>>> thanks, >>>> robert >>>> >>>> >>>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <[EMAIL PROTECTED]> wrote: >>>> >>>>> Grant: >>>>> I am sorry that I disagree with some points: >>>>> >>>>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene >>>>> is a great project, especially with 2.x releases, great improvements are >>>>> made, but do we really have a clear picture on how lucene is being used and >>>>> deployed. While lucene works great running as a vanilla search library, when >>>>> pushed to limits, one needs to "hack" into lucene to make certain things >>>>> work. If 90% of the user base use it to build small indexes and using the >>>>> vanilla api, and the other 10% is really stressing both on the scalability >>>>> and api side and are running into issues, would you still say: "running well >>>>> for 90% of the users, therefore it is stable or extensible"? I think it is >>>>> unfair to the project itself to be measured by the vanilla use-case. I have >>>>> done couple of large deployments, e.g. >30 million documents indexed and >>>>> searched in realtime., and I really had to do some tweaking. >>>>> >>>>> >>>> >>>> -- >>>> Robert Muir >>>> [EMAIL PROTECTED] >>>> >>> >>> >> >> >> -- >> Robert Muir >> [EMAIL PROTECTED] >> > > -- Robert Muir [EMAIL PROTECTED]
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-04, 07:45
Thanks Robert, definitely interested!
We are too, looking into SSDs for performance. 2.4 allows you to create extend QueryParser and create your own "leaf" queries. I am surprised you are mostly IO bound. Lucene does a good job caching. Do you do some sort of caching yourself? If your index is not changing often, there is a lot you can do without SSDs. -John On Wed, Dec 3, 2008 at 11:27 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > yeah i am using read-only. > > i will admit to subclassing queryparser and having customized query/scorer > for several. all queries contain fuzzy queries so this was necessary. > > "high" throughput i guess is a matter of opinion. in attempting to profile > high-throughput, again customized query/scorer made it easy for me to > simplify some things, such as some math in termquery that doesn't make sense > (redundant) for my Similarity. everything is pretty much i/o bound now so if > tehre is some throughput issue i will look into SSD for high volume indexes. > > i posted on Use Cases on the wiki how I made fuzzy and regex fast if you > are curious. > > > On Thu, Dec 4, 2008 at 2:10 AM, John Wang <[EMAIL PROTECTED]> wrote: > >> Thanks Robert for sharing. >> Good to hear it is working for what you need it to do. >> >> 3) Especially with ReadOnlyIndexReaders, you should not be blocked while >> indexing. Especially if you have multicore machines. >> 4) do you stay with sub-second responses with high thru-put? >> >> -John >> >> >> On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >> >>> >>> >>> On Thu, Dec 4, 2008 at 1:24 AM, John Wang <[EMAIL PROTECTED]> wrote: >>> >>>> Nice! >>>> Some questions: >>>> >>>> 1) one index? >>>> >>> no, but two individual ones today were around 100M docs >>> >>>> 2) how big is your document? e.g. how many terms etc. >>>> >>> last one built has over 4M terms >>> >>>> 3) are you serving(searching) the docs in realtime? >>>> >>> i dont understand this question, but searching is slower if i am indexing >>> on a disk thats also being searched. >>> >>>> >>>> 4) search speed? >>>> >>> usually subsecond (or close) after some warmup. while this might seem >>> slow its fast compared to the competition, trust me. >>> >>>> >>>> I'd love to learn more about your architecture. >>>> >>> i hate to say you would be disappointed, but theres nothign fancy. >>> probably why it works... >>> >>>> >>>> -John >>>> >>>> >>>> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >>>> >>>>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an >>>>> out of box jar. >>>>> >>>>> yeah i have some special subclasses but if i thought any of this stuff >>>>> was general enough to be useful to others i'd submit it. I'm just happy to >>>>> have something scalable that i can customize to my peculiarities. >>>>> >>>>> so i think i fit in your 10% and im not stressing on either scalability >>>>> or api. >>>>> >>>>> thanks, >>>>> robert >>>>> >>>>> >>>>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <[EMAIL PROTECTED]>wrote: >>>>> >>>>>> Grant: >>>>>> I am sorry that I disagree with some points: >>>>>> >>>>>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene >>>>>> is a great project, especially with 2.x releases, great improvements are >>>>>> made, but do we really have a clear picture on how lucene is being used and >>>>>> deployed. While lucene works great running as a vanilla search library, when >>>>>> pushed to limits, one needs to "hack" into lucene to make certain things >>>>>> work. If 90% of the user base use it to build small indexes and using the >>>>>> vanilla api, and the other 10% is really stressing both on the scalability >>>>>> and api side and are running into issues, would you still say: "running well >>>>>> for 90% of the users, therefore it is stable or extensible"? I think it is >>>>>> unfair to the project itself to be measured by the vanilla use-case. I have >>>>>> done couple of large deployments, e.g. >30 million documents indexed and
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesRobert Muir 2008-12-04, 07:58
no, i'm not doing any caching but as mentioned it did require some work to
become almost completely i/o bound due to the nature of my wacky queries, example removing O(n) behavior from fuzzy and regexp. probably the os cache is not helping much because indexes are very large. I'm very happy being i/o bound because now and especially in the future i think it will be cheaper to speed up with additional ram and faster storage. still even out of box without any tricks lucene performs *much* better than the commercial alternatives i have fought with. lucene was evaluated a while ago before 2.3 and this was not the case, but I re-evaluated around 2.3 release and it is now. On Thu, Dec 4, 2008 at 2:45 AM, John Wang <[EMAIL PROTECTED]> wrote: > Thanks Robert, definitely interested! > We are too, looking into SSDs for performance. > 2.4 allows you to create extend QueryParser and create your own "leaf" > queries. > I am surprised you are mostly IO bound. Lucene does a good job caching. Do > you do some sort of caching yourself? If your index is not changing often, > there is a lot you can do without SSDs. > > -John > > > On Wed, Dec 3, 2008 at 11:27 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > >> yeah i am using read-only. >> >> i will admit to subclassing queryparser and having customized query/scorer >> for several. all queries contain fuzzy queries so this was necessary. >> >> "high" throughput i guess is a matter of opinion. in attempting to profile >> high-throughput, again customized query/scorer made it easy for me to >> simplify some things, such as some math in termquery that doesn't make sense >> (redundant) for my Similarity. everything is pretty much i/o bound now so if >> tehre is some throughput issue i will look into SSD for high volume indexes. >> >> i posted on Use Cases on the wiki how I made fuzzy and regex fast if you >> are curious. >> >> >> On Thu, Dec 4, 2008 at 2:10 AM, John Wang <[EMAIL PROTECTED]> wrote: >> >>> Thanks Robert for sharing. >>> Good to hear it is working for what you need it to do. >>> >>> 3) Especially with ReadOnlyIndexReaders, you should not be blocked while >>> indexing. Especially if you have multicore machines. >>> 4) do you stay with sub-second responses with high thru-put? >>> >>> -John >>> >>> >>> On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >>> >>>> >>>> >>>> On Thu, Dec 4, 2008 at 1:24 AM, John Wang <[EMAIL PROTECTED]> wrote: >>>> >>>>> Nice! >>>>> Some questions: >>>>> >>>>> 1) one index? >>>>> >>>> no, but two individual ones today were around 100M docs >>>> >>>>> 2) how big is your document? e.g. how many terms etc. >>>>> >>>> last one built has over 4M terms >>>> >>>>> 3) are you serving(searching) the docs in realtime? >>>>> >>>> i dont understand this question, but searching is slower if i am >>>> indexing on a disk thats also being searched. >>>> >>>>> >>>>> 4) search speed? >>>>> >>>> usually subsecond (or close) after some warmup. while this might seem >>>> slow its fast compared to the competition, trust me. >>>> >>>>> >>>>> I'd love to learn more about your architecture. >>>>> >>>> i hate to say you would be disappointed, but theres nothign fancy. >>>> probably why it works... >>>> >>>>> >>>>> -John >>>>> >>>>> >>>>> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >>>>> >>>>>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an >>>>>> out of box jar. >>>>>> >>>>>> yeah i have some special subclasses but if i thought any of this stuff >>>>>> was general enough to be useful to others i'd submit it. I'm just happy to >>>>>> have something scalable that i can customize to my peculiarities. >>>>>> >>>>>> so i think i fit in your 10% and im not stressing on either >>>>>> scalability or api. >>>>>> >>>>>> thanks, >>>>>> robert >>>>>> >>>>>> >>>>>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <[EMAIL PROTECTED]>wrote: >>>>>> >>>>>>> Grant: >>>>>>> I am sorry that I disagree with some points: >>>>>>> Robert Muir [EMAIL PROTECTED]
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-04, 09:00
good open source projects should be better than the commercial counter
parts. I really like 2.4. The DocIDSet/Filter apis really allowed me to do some interesting stuff. I feel lucene has potential to be more than just a full text search library. -John On Wed, Dec 3, 2008 at 11:58 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > no, i'm not doing any caching but as mentioned it did require some work to > become almost completely i/o bound due to the nature of my wacky queries, > example removing O(n) behavior from fuzzy and regexp. > > probably the os cache is not helping much because indexes are very large. > I'm very happy being i/o bound because now and especially in the future i > think it will be cheaper to speed up with additional ram and faster storage. > > still even out of box without any tricks lucene performs *much* better than > the commercial alternatives i have fought with. lucene was evaluated a while > ago before 2.3 and this was not the case, but I re-evaluated around 2.3 > release and it is now. > > > On Thu, Dec 4, 2008 at 2:45 AM, John Wang <[EMAIL PROTECTED]> wrote: > >> Thanks Robert, definitely interested! >> We are too, looking into SSDs for performance. >> 2.4 allows you to create extend QueryParser and create your own "leaf" >> queries. >> I am surprised you are mostly IO bound. Lucene does a good job caching. Do >> you do some sort of caching yourself? If your index is not changing often, >> there is a lot you can do without SSDs. >> >> -John >> >> >> On Wed, Dec 3, 2008 at 11:27 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >> >>> yeah i am using read-only. >>> >>> i will admit to subclassing queryparser and having customized >>> query/scorer for several. all queries contain fuzzy queries so this was >>> necessary. >>> >>> "high" throughput i guess is a matter of opinion. in attempting to >>> profile high-throughput, again customized query/scorer made it easy for me >>> to simplify some things, such as some math in termquery that doesn't make >>> sense (redundant) for my Similarity. everything is pretty much i/o bound now >>> so if tehre is some throughput issue i will look into SSD for high volume >>> indexes. >>> >>> i posted on Use Cases on the wiki how I made fuzzy and regex fast if you >>> are curious. >>> >>> >>> On Thu, Dec 4, 2008 at 2:10 AM, John Wang <[EMAIL PROTECTED]> wrote: >>> >>>> Thanks Robert for sharing. >>>> Good to hear it is working for what you need it to do. >>>> >>>> 3) Especially with ReadOnlyIndexReaders, you should not be blocked while >>>> indexing. Especially if you have multicore machines. >>>> 4) do you stay with sub-second responses with high thru-put? >>>> >>>> -John >>>> >>>> >>>> On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >>>> >>>>> >>>>> >>>>> On Thu, Dec 4, 2008 at 1:24 AM, John Wang <[EMAIL PROTECTED]> wrote: >>>>> >>>>>> Nice! >>>>>> Some questions: >>>>>> >>>>>> 1) one index? >>>>>> >>>>> no, but two individual ones today were around 100M docs >>>>> >>>>>> 2) how big is your document? e.g. how many terms etc. >>>>>> >>>>> last one built has over 4M terms >>>>> >>>>>> 3) are you serving(searching) the docs in realtime? >>>>>> >>>>> i dont understand this question, but searching is slower if i am >>>>> indexing on a disk thats also being searched. >>>>> >>>>>> >>>>>> 4) search speed? >>>>>> >>>>> usually subsecond (or close) after some warmup. while this might seem >>>>> slow its fast compared to the competition, trust me. >>>>> >>>>>> >>>>>> I'd love to learn more about your architecture. >>>>>> >>>>> i hate to say you would be disappointed, but theres nothign fancy. >>>>> probably why it works... >>>>> >>>>>> >>>>>> -John >>>>>> >>>>>> >>>>>> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <[EMAIL PROTECTED]>wrote: >>>>>> >>>>>>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an >>>>>>> out of box jar. >>>>>>> >>>>>>> yeah i have some special subclasses but if i thought any of this >>>>>>> stuff was general enough to be useful to others i'd submit it. I'm just
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesMichael McCandless 2008-12-04, 11:32
Robert Muir wrote: > i posted on Use Cases on the wiki how I made fuzzy and regex fast if > you are curious. It looks like this is the wiki page: http://wiki.apache.org/lucene-java/FastSSFuzzy?highlight=(fuzzy) The approach is similar to how contrib/spellchecker generates its candidates, in that you build a 2nd index from the primary index and use the 2nd index to more quickly (not O(N)) generate candidates. It'd be nice to get your approach into contrib as well ;) Mike ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesMark Miller 2008-12-04, 11:42
John Wang wrote:
> > > Seems like being a committer can be rather lucrative. I think being an Apache committer on any project can be somewhat lucrative. Companies know that you probably work well with others if your a committer, which can probably lead to improved career opportunities. Cant say too much about working well with others :) I may not be extracting as much money as I can though - sounds like I could be taking bribes to commit code if I wanted to make more ;) > My comment was on the statements of being volunteers and don't get > paid, which is a little misleading. It depends. Sometimes, something your doing with a customer might make its way into Lucene. Thats not most of the work that goes on here though. Most of the work is looking at submitted patches in our free time, going over them, running the tests, and possibly committing them. I do that for the project because I like to, not for any money I'm getting (true enough I havnt been a core committer long, but I did the same as a contrib committer). When I'm sitting around at 11 at night or 7 in the morning, trying to get patches committed, I'd hate to be classified as a non volunteer. Its just as easy to get the committer title and then fall off the face of the world. No one ensures you are helping anyone get anything done. > > I guess I need to learn to be a good boy not to piss off the > committers anymore (or convince my company to pay to get some patches > in) And hopefully someday I get to grow up and get to become a > committer and make some $ too. You might consider it. I think you have been a bit rude, but watch and see...quality patches you submit will still get processed like any other. The people around here are friendly and mainly interested in the quality of Lucene. Noone is trying to enforce some sort of "power elite" here. There is no blacklist. At the same time, lashing out isnt going to help get any issues passed (in fact, I've seen it flounder more than one issue). I've certainly never been involved in Lucene for the money myself (and I don't have much of it, believe you me). - Mark > > -John > ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesGrant Ingersoll 2008-12-04, 13:24
On Dec 4, 2008, at 12:36 AM, John Wang wrote: > Grant: > > I am sorry that I disagree with some points: > > 1) "I think it's a sign that Lucene is pretty stable." - While > lucene is a great project, especially with 2.x releases, great > improvements are made, but do we really have a clear picture on how > lucene is being used and deployed. While lucene works great running > as a vanilla search library, when pushed to limits, one needs to > "hack" into lucene to make certain things work. If 90% of the user > base use it to build small indexes and using the vanilla api, and > the other 10% is really stressing both on the scalability and api > side and are running into issues, would you still say: "running well > for 90% of the users, therefore it is stable or extensible"? I think > it is unfair to the project itself to be measured by the vanilla use- > case. I have done couple of large deployments, e.g. >30 million > documents indexed and searched in realtime., and I really had to do > some tweaking. Sorry, we should have written a perfect engine the first time out. I'll get on that. Question for you: how much of that tweaking have you contributed back? If you have such obvious wins, put them up as patches so we can all benefit, just like you've benefitted from our volunteering. As for 90%, I'd say it is more like > 95% and, gee, if I can write a general purpose open source search library that keeps 95% of a very, very, very large install base happy all while still improving it and maintaining backward compatibility, than color me stable. > > 2) "You want stuff committed, keep it up to date, make it manageable > to review, document it, respond to questions/concerns with answers > as best you can. " - To some degree I would hope it depends on what > the issue is, e.g. enforcing such process on a one-line null check > seems to be an overkill. I agree with the process itself, what would > make it better is some transparency on how patches/issues are > evaluated to be committed. At least seemed from the outside, it is > purely being decided on by the committers, and since my > understanding is that an open source project belongs to the public, > the public user base should have some say. Here's your list of opened issues: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&reporterSelect=specificuser&[EMAIL PROTECTED] Only 1 of which has more than 2 votes and which is assigned to Hoss. However, from what I can see, you've had all but 1, I repeat ONE, issue not resolved. And, yes, what gets committed is decided on by the COMMITTERS with input from the community; who else can be responsible for committing? Hence the title. We can't please everyone, but I'll be damned if you're going to disparage the work of so many because you have sour grapes over some people (not all) disagreeing with you over how serialization should work in Lucene just b/c you think the problem is trivial when clearly others do not. Committers are picked by the project over a long period of time (feel free to nominate someone who you feel has merit, we've elected committers based on community nominations in the past) because they stick around and stay involved and respond on the list, etc. I'm starting to think your real issue here is that we haven't all agreed with you the minute you suggest something, but sorry, that is how open source works. > > 3) which brings me to this point: "I personally, would love to work > on Lucene all day every day as I have a lot of things I'd love to > engage the community on, but the fact is I'm not paid to do that, so > I give what I can when I can. I know most of the other committers > are that way too." - Is this really true? Isn't a large part of the > committer base also a part of the for-profit, consulting business, > e.g. Lucid? Would groups/companies that pay for consulting service Yes, John, it is true. I would love to work on Lucene all day. If I won the lottery tomorrow, I'd probably still volunteer on Lucene. Let me ask you back, who pays you to work on Lucene? Was this patch submitted because you just happened to spot it while pouring over the code at night on your own and out of the goodness of your heart? Or did you discover it at LinkedIn where you were specifically hired because of your Lucene skills and knowledge of the Lucene community? In other words, you're accusing me and others of getting paid for my expertise in Lucene, all the while you are getting paid for your expertise in Lucene. Where did I imply that? All I'm saying, is you can't just throw your code up here and say "Hey, fix this for me the way I want it fixed and then come back and tell me when it's done" It doesn't work that way. It never has. No open source project works that way. Hey, we're all entitled to your opinions. Personally, I think you've made a lot of nice contributions to Lucene over the years in terms of insights, ideas and patches. So, I guess I am a bit surprised by the rancor in your message, which came from out of no where, not too mention the fact that it has completely hijacked an otherwise interesting conversation about the right way to do serialization. If you want to call that candor, than feel free. -Grant
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-04, 16:48
Mark and Grant:
I do apologize if I came off seeming rude. I guess I let my frustration of the serialization issue got the better of me (and also a built up from some of the other issues, which I thought are trivial but was made to be not). And I will improve my behavior in the future. There is a reason I have stopped submitting patches via Jira. (For which I no longer dare to express.) There is absolutely nothing wrong with getting paid for Lucene expertise. I was just commenting on your comment about "volunteering", but if you think I am wrong, then I am. I did have a concern with the focus of the project getting biased by paying companies to the committers, but obviously it is not my business. The issues/patches I am having are trivial stuffs, and that was precisely my point. I am not pushing for grandeous ideas, I am frustrated with some very brain dead issues (I am not smart enough to provide any earth shattering patches) that has blown out of proportion in my mind. I will try to keep my mouth shut in the future. -John On Thu, Dec 4, 2008 at 5:24 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > On Dec 4, 2008, at 12:36 AM, John Wang wrote: > > Grant: >> >> I am sorry that I disagree with some points: >> >> 1) "I think it's a sign that Lucene is pretty stable." - While lucene is a >> great project, especially with 2.x releases, great improvements are made, >> but do we really have a clear picture on how lucene is being used and >> deployed. While lucene works great running as a vanilla search library, when >> pushed to limits, one needs to "hack" into lucene to make certain things >> work. If 90% of the user base use it to build small indexes and using the >> vanilla api, and the other 10% is really stressing both on the scalability >> and api side and are running into issues, would you still say: "running well >> for 90% of the users, therefore it is stable or extensible"? I think it is >> unfair to the project itself to be measured by the vanilla use-case. I have >> done couple of large deployments, e.g. >30 million documents indexed and >> searched in realtime., and I really had to do some tweaking. >> > > Sorry, we should have written a perfect engine the first time out. I'll > get on that. Question for you: how much of that tweaking have you > contributed back? If you have such obvious wins, put them up as patches so > we can all benefit, just like you've benefitted from our volunteering. > > As for 90%, I'd say it is more like > 95% and, gee, if I can write a > general purpose open source search library that keeps 95% of a very, very, > very large install base happy all while still improving it and maintaining > backward compatibility, than color me stable. > > >> 2) "You want stuff committed, keep it up to date, make it manageable to >> review, document it, respond to questions/concerns with answers as best you >> can. " - To some degree I would hope it depends on what the issue is, e.g. >> enforcing such process on a one-line null check seems to be an overkill. I >> agree with the process itself, what would make it better is some >> transparency on how patches/issues are evaluated to be committed. At least >> seemed from the outside, it is purely being decided on by the committers, >> and since my understanding is that an open source project belongs to the >> public, the public user base should have some say. >> > > Here's your list of opened issues: > https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&reporterSelect=specificuser&[EMAIL PROTECTED] Only 1 of which has more than 2 votes and which is assigned to Hoss. > However, from what I can see, you've had all but 1, I repeat ONE, issue not > resolved. > > And, yes, what gets committed is decided on by the COMMITTERS with input > from the community; who else can be responsible for committing? Hence the > title. We can't please everyone, but I'll be damned if you're going to > disparage the work of so many because you have sour grapes over some people
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesDoug Cutting 2008-12-04, 18:46
John Wang wrote:
> I agree with the process itself, what would make it better is > some transparency on how patches/issues are evaluated to be committed. To be clear: there is no forum for communication about patches except this list, and, by extension, Jira. The process of patch evaluation is completely transparent. > At least seemed from the outside, it is purely being decided on by the > committers, and since my understanding is that an open source project > belongs to the public, the public user base should have some say. It is not a democracy, it is a meritocracy. http://www.apache.org/foundation/how-it-works.html#meritocracy I'll repeat: committers are added when they've both contributed a series of high-quality, easy-to-commit patches, and when they've demonstrated that they are easy to work with. That process has resulted in the current set of committers, and those committers determine which patches are committed and when. Those are the rules. However committers cannot ram just any patch through. Committers are only added after they've demonstrated the ability to build consensus around their patches. And they must continue to build consensus around their patches even after they are committers. Patches that receive no endorsement from others are not committed, no matter who contributes them. A contribution is not more rapidly committed simply because the contributor is a committer. Rather, committers knows how to elicit and respond to criticism and build consensus around a patch in order to get them committed rapidly. Doug ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJason Rutherglen 2008-12-04, 19:21
To put things in perspective, I believe Microsoft (who could potentially
place a lot of resources towards Lucene) now uses Lucene through Powerset? and I don't think those folks are contributing back. I know of several other companies who do the same, and many potential contributions that are not submitted because people and their companies do not see the benefit of going through the hoops required to get patches committed. A relatively simple patch such as 1473 Serialization represents this well. For example if a company is developing custom search algorithms, Lucene supports TF/IDF but not much else. Custom search algorithms require rewriting lots of Lucene code. Companies who write new search algorithms do not necessarily want to rewrite Lucene as well to make it pluggable for new scoring as it is out of scope, they will simply branch the code. It does not help that the core APIs underneath IndexReader are protected and package protected which assumes a user that is not advanced. It is repeated in the mailing lists that new features will threaten the existing user base which is based on opinion rather than fact. More advanced users are currently hindered by the conservatism of the project and so naturally have stopped trying to submit changes that alter the core non-public code. The rancor is from users would benefit from a faster pace and the ability to be more creative inside the core Lucene system. As the internals change frequently and unnannounced the process of developing core patches is difficult and frustrating. Now that Lucene is stable and flexible indexing is being implemented. It would benefit the community to focus on the future. Who exactly is responsible for this? Which of the committers are building for the future? Which are doing bug fixes? What is the process of developing more advanced features in open source? Right now it seems to be one person, Michael McCandless developing all of the new core code. This is great forward progress, however it's unclear how others can get involved and not get stampeded by the constant changes that all happen via one brilliant person. I have requested of people such as Michael Busch to collaborate on the column stride fields and received no response. To me, an good example of volunteers are people who prepare food and donate their time at soup kitchens with no pay, and no hope for pay related to feeding the hungry. -J On Wed, Dec 3, 2008 at 2:52 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote: > > >> >> Hoss wrote: "sort of mythical "Lucene powerhouse" >> Lucene seems to run itself quite differently than other open source Java >> projects. Perhaps it would be good to spell out the reasons for the >> reluctance to move ahead with features that developers work on, that work, >> but do not go in. The developer contributions seem to be quite low right >> now, especially compared to neighbor projects such as Hadoop. Is this >> because fewer people are using Lucene? Or is it due to the reluctance to >> work with the developer community? Unfortunately the perception in the eyes >> of some people who work on search related projects it is the latter. >> > > > Or, could it be that Hadoop is relatively new and in vogue at the moment, > very malleable and buggy(?) and has a HUGE corporate sponsor who dedicates > lots of resources to it on a full time basis, whilst Lucene has been around > in the ASF for 7+ years (and 12+ years total) and has a really large install > base and thus must move more deliberately and basically has 1 person who > gets to work on it full time while the rest of us pretty much volunteer? > That's not an excuse, it's just the way it is. I personally, would love to > work on Lucene all day every day as I have a lot of things I'd love to > engage the community on, but the fact is I'm not paid to do that, so I give > what I can when I can. I know most of the other committers are that way
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesDoug Cutting 2008-12-04, 20:01
Jason Rutherglen wrote:
> A relatively simple patch such as 1473 Serialization > represents this well. LUCENE-1473 is an incomplete patch that proposes to commit the project to new back-compatibility requirements. Compatibility requirements should not be added lightly, but only deliberately, as they have a long-term impact on the ability of the project to evolve. Prior to this we've not heard from folks who require cross-version java serialization compatibility. Without more folks asserting this as a need it is hard to rationalize adding this. > As the > internals change frequently and unnannounced the process of developing > core patches is difficult and frustrating. The process is entirely in public. You have as much announcement as anyone. Patches are weighed on there merits as they are contributed. > It would benefit the community to focus on the future. Who exactly is > responsible for this? Which of the committers are building for the > future? Which are doing bug fixes? What is the process of developing > more advanced features in open source? I've already explained the process several times. We cannot easily make a long-term plan when we do not have the power to assign folks. We can state long-term goals, like flexible indexing, but in the end, it won't get done until someone volunteers to write the code. So you're welcome to start a wish list on the wiki, and you're welcome to then start contributing patches that implement items on your wish list. If you propose something that folks think is extremely useful, but requires an incompatible change, then it could perhaps be done in a branch. But most of the existing community is interested in pushing forward incrementally, trying hard to keep most things back-compatible. If that's too frustrating for you, you can fork Lucene and build a new community. > Right now it seems to be one > person, Michael McCandless developing all of the new core code. Mike does a lot of development, but he also commits a lot of patches written by others. > This is > great forward progress, however it's unclear how others can get involved > and not get stampeded by the constant changes that all happen via one > brilliant person. You want Mike to do less? Others can and do get involved all the time. Look at http://tinyurl.com/5nl78n. The majority of the things Mike works on are instigated by others. > I have requested of people such as Michael Busch to collaborate on the > column stride fields and received no response. Did you pay Michael? No one here is compelled to work with anyone else. We work with others when we feel it is in our mutual self interest. Doug ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJason Rutherglen 2008-12-04, 21:24
Correction: Powerset apparently did not use Lucene. And apparently there
are a few other companies who are not open sourcing, use Lucene serialization regularly. > Did you pay Michael? No one here is compelled to work with anyone else. We work with others when we feel it is in our mutual self interest. Nice... I guess our government is the macrocosm. On Thu, Dec 4, 2008 at 11:21 AM, Jason Rutherglen < [EMAIL PROTECTED]> wrote: > To put things in perspective, I believe Microsoft (who could potentially > place a lot of resources towards Lucene) now uses Lucene through Powerset? > and I don't think those folks are contributing back. I know of several > other companies who do the same, and many potential contributions that are > not submitted because people and their companies do not see the benefit of > going through the hoops required to get patches committed. A relatively > simple patch such as 1473 Serialization represents this well. > > For example if a company is developing custom search algorithms, Lucene > supports TF/IDF but not much else. Custom search algorithms require > rewriting lots of Lucene code. Companies who write new search algorithms do > not necessarily want to rewrite Lucene as well to make it pluggable for new > scoring as it is out of scope, they will simply branch the code. It does > not help that the core APIs underneath IndexReader are protected and package > protected which assumes a user that is not advanced. It is repeated in the > mailing lists that new features will threaten the existing user base which > is based on opinion rather than fact. More advanced users are currently > hindered by the conservatism of the project and so naturally have stopped > trying to submit changes that alter the core non-public code. > > The rancor is from users would benefit from a faster pace and the ability > to be more creative inside the core Lucene system. As the internals change > frequently and unnannounced the process of developing core patches is > difficult and frustrating. > > Now that Lucene is stable and flexible indexing is being implemented. It > would benefit the community to focus on the future. Who exactly is > responsible for this? Which of the committers are building for the future? > Which are doing bug fixes? What is the process of developing more advanced > features in open source? Right now it seems to be one person, Michael > McCandless developing all of the new core code. This is great forward > progress, however it's unclear how others can get involved and not get > stampeded by the constant changes that all happen via one brilliant person. > > > I have requested of people such as Michael Busch to collaborate on the > column stride fields and received no response. > > To me, an good example of volunteers are people who prepare food and donate > their time at soup kitchens with no pay, and no hope for pay related to > feeding the hungry. > > -J > > > On Wed, Dec 3, 2008 at 2:52 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > >> >> On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote: >> >> >>> >>> Hoss wrote: "sort of mythical "Lucene powerhouse" >>> Lucene seems to run itself quite differently than other open source Java >>> projects. Perhaps it would be good to spell out the reasons for the >>> reluctance to move ahead with features that developers work on, that work, >>> but do not go in. The developer contributions seem to be quite low right >>> now, especially compared to neighbor projects such as Hadoop. Is this >>> because fewer people are using Lucene? Or is it due to the reluctance to >>> work with the developer community? Unfortunately the perception in the eyes >>> of some people who work on search related projects it is the latter. >>> >> >> >> Or, could it be that Hadoop is relatively new and in vogue at the moment, >> very malleable and buggy(?) and has a HUGE corporate sponsor who dedicates >> lots of resources to it on a full time basis, whilst Lucene has been around
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesGrant Ingersoll 2008-12-04, 23:23
On Dec 4, 2008, at 2:21 PM, Jason Rutherglen wrote: > To put things in perspective, I believe Microsoft (who could > potentially place a lot of resources towards Lucene) now uses Lucene > through Powerset? and I don't think those folks are contributing > back. I know of several other companies who do the same, and many > potential contributions that are not submitted because people and > their companies do not see the benefit of going through the hoops > required to get patches committed. A relatively simple patch such > as 1473 Serialization represents this well. What do you suggest? We didn't force anyone to use Lucene. Heck, most of our users don't even ever participate on the mailing list. We do provide a very clear, transparent path for making contributions and becoming a committer. I don't know what else we can do, but we're totally open to suggestions on how to improve it. FWIW, just b/c you think 1473 is trivial doesn't make it so. You have a single use case and that's all you care about. The community has dozens, if not hundreds of use cases, and your "trivial" patch may not be so trivial in that regards. How would you feel if we "broke" something that you have relied on for years in the name of us moving faster? I am willing to bet the large number of people here in Lucene appreciate our deliberations for the most part. As for my opinion on 1473, I personally think there are better ways of achieving what you are trying to do, as Robert and others have suggested and I don't think it is worth it to maintain serialization across versions as it is a too large of a burden, IMO. But, heh, make an argument (preferably w/o the accusations) and convince me otherwise. > > > For example if a company is developing custom search algorithms, > Lucene supports TF/IDF but not much else. Custom search algorithms > require rewriting lots of Lucene code. Companies who write new > search algorithms do not necessarily want to rewrite Lucene as well > to make it pluggable for new scoring as it is out of scope, they > will simply branch the code. It does not help that the core APIs > underneath IndexReader are protected and package protected which > assumes a user that is not advanced. It is repeated in the mailing > lists that new features will threaten the existing user base which > is based on opinion rather than fact. More advanced users are > currently hindered by the conservatism of the project and so > naturally have stopped trying to submit changes that alter the core > non-public code. So, your mad at us for others not contributing back their forks? Even the ones we don't know about? Simply put, I'm sorry we can't please you. If you go read the archives, you will see plenty of times when even us committers have been frustrated from time to time by the process (just look at the JDK 1.5 debate, or the Interface/Abstract debate) but in the end, I feel Lucene is stronger for it. Community over code, it's the Apache Way. You are free to disagree. In fact, you have several options available to you to show that disagreement: 1. You can work to become a committer and change it from within. The bar really isn't that high, 3 to 4 non-trivial patches and a willingness to work with others in a mostly pleasant way. 2. You can make us aware of the patches and be persistent about seeing it through and we'll try to get to it. Just look at CHANGES.txt and JIRA and you will see that this happens all the time and from a wide variety of contributors (including both you and John). 3. You can fork the code and go do your thing and build your own community, etc. Personally, I hope you choose 1 or 2, as we're all stronger together than we are apart. > > > The rancor is from users would benefit from a faster pace and the > ability to be more creative inside the core Lucene system. As the > internals change frequently and unnannounced the process of I'm sorry that we can't work at a faster pace. Suggestions on how to deal with the number of patches we have and still maintain quality and how to move forward w/o breaking old patches are much appreciated. As for the internals changing, you have just hit the nail on the head as to why it is so important to maintain back-compat. I simply don't get the unannounced part. What isn't announced? Geez, I've been a committer for a few years now, and I have yet to see another open source project that is as public as Lucene, for better or worse. Look at the archives, we regularly even put our warts out for public consumption in an effort to improve ourselves. Rather than continue hijacking this thread, why don't we either let it die and focus on serialization, or we go over to java-dev and you and John and the rest of us can create a concrete list of suggestions that we think could make Lucene better and we can all discuss them in a positive manner and see how we can go about addressing them. I'd be more than happy to discuss there if you want. Cheers, Grant
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-05, 00:18
Hi Grant:
I agree and I apologize for hijacking this thread. If Luceners feel our criticisms are invalid, then so be it. We should focus on this issue, being the serialization story in Lucene. Not general java serialization, so I don't see how it would benefit to move this to the java dev list. As far as lucene serialization, incorporating comments from various people, this is what I gather are the choices (feel free to correct me) 1) Remove implementation and support of Serializable: We all agreed this is bad and breaks backward compatibility. 2) Do nothing to the code base and fix documentation, and clarify Lucene only supports Serialization between components with the release jar. This seems to be the suggested approach where I have a coupla concerns: a) Since given the exact code base, due to the nature of java serialization, different builds of the jar via IBM vm vs. Sun VM vs. Jrocket etc, cannot guarantee compatibility. Thus we are enforcing users that care about Serialization to use the release jar. b) There is at least one place, as I have previously mentioned, e.g. ScoreDocComparator, the contract returns a Comparable and via javadoc, must be serializable. How should this be treated? This can be an application object, should we pass on the same enforcement there when merge/sort is happening across the wire since similar serialization problem would break inside MultiSearcher? 3) Clean up the serialization story, either add SUID or implement Externalizable for some classes within Lucene that implements Serializable: >From what I am told, this is too much work for the committers. I hope you guys at least agree with me with the way it is currently, the serialization story is broken, whether in documentation or in code. I see the disagreement being its severity, and whether it is a trivial fix, which I have learned it is not really my place to say. Please do understand this is not a far-fetched, made-up use-case, we are running into this in production, and we are developing in accordance to lucene documentation. Thanks -John On Thu, Dec 4, 2008 at 3:23 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > On Dec 4, 2008, at 2:21 PM, Jason Rutherglen wrote: > > To put things in perspective, I believe Microsoft (who could potentially >> place a lot of resources towards Lucene) now uses Lucene through Powerset? >> and I don't think those folks are contributing back. I know of several >> other companies who do the same, and many potential contributions that are >> not submitted because people and their companies do not see the benefit of >> going through the hoops required to get patches committed. A relatively >> simple patch such as 1473 Serialization represents this well. >> > > What do you suggest? We didn't force anyone to use Lucene. Heck, most of > our users don't even ever participate on the mailing list. > > We do provide a very clear, transparent path for making contributions and > becoming a committer. I don't know what else we can do, but we're totally > open to suggestions on how to improve it. > > FWIW, just b/c you think 1473 is trivial doesn't make it so. You have a > single use case and that's all you care about. The community has dozens, if > not hundreds of use cases, and your "trivial" patch may not be so trivial in > that regards. How would you feel if we "broke" something that you have > relied on for years in the name of us moving faster? I am willing to bet > the large number of people here in Lucene appreciate our deliberations for > the most part. As for my opinion on 1473, I personally think there are > better ways of achieving what you are trying to do, as Robert and others > have suggested and I don't think it is worth it to maintain serialization > across versions as it is a too large of a burden, IMO. But, heh, make an > argument (preferably w/o the accusations) and convince me otherwise. > > >> >> For example if a company is developing custom search algorithms, Lucene
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesDoug Cutting 2008-12-05, 17:18
John Wang wrote:
> Thus we are enforcing users > that care about Serialization to use the release jar. We already encourage folks to use a release jar if possible. So this is not a big change. Also, if folks choose to build their own jar, then they are expected to use that same jar everywhere, effectively making their own release. That doesn't seem unreasonable to me. Incrementally upgrading distributed systems has, at least in the past, been outside the scope of Lucene. > 3) Clean up the serialization story, either add SUID or implement > Externalizable for some classes within Lucene that implements Serializable: > > From what I am told, this is too much work for the committers. Not that it's too much work today, but that it adds an ongoing burden and we should take this on cautiously if at all. If we want to go this way we'd need to: - document precisely which classes we'll evolve back-compatibly; - document the releases (major? minor?) that will be compatible; and - provide a test suite that validates this. As a side note, we should probably move the back-compatibility documentation from the wiki to the project website. This would permit patches to it, among other things. http://wiki.apache.org/lucene-java/BackwardsCompatibility > I hope you guys at least agree with me with the way it is currently, the > serialization story is broken, whether in documentation or in code. Documenting an unstated assumption is a good thing to do, especially when not everyone seems to share the assumption, but "broken" seems a bit strong here. > I see the disagreement being its severity, and whether it is a trivial > fix, which I have learned it is not really my place to say. I've outlined above what I think would be required. If you think that's trivial, then please pursue it and show us how trivial it is. The patch provided thus far is incomplete. > Please do understand this is not a far-fetched, made-up use-case, we are > running into this in production, and we are developing in accordance to > lucene documentation. You developed based on some very optimistic guesses about some unstated aspects. In Java, implementing Serializeable alone does not generally provide any cross-version guarantees. Assuming that it did was risky. Doug ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-05, 19:23
Doug:
1) "Incrementally upgrading distributed systems has, at least in the past, been outside the scope of Lucene" - That's good to know. Is it also out of the scope for distributed lucene effort (if it is still happening)? 2) I used the word broken to describe what happened for our deployment. I will try to use less harsh words when addressing lucene in the future. 3) " If you think that's trivial, then please pursue it and show us how trivial it is." - My proposal is to add the suid to Serializable classes, if you don't think that's trivial, many IDEs doe that for you. I think your main concern is that this is not the perfect solution to this problem, but it does provide better behavior than what it is now IMO. I understand we have discussed earlier in the thread there are cases where adding suid does not work. Given many of these classes are rather static, I don't share your concern. 4) "You developed based on some very optimistic guesses about some unstated aspects" - this is developed based on our understanding of Serializable without Lucene documentation discouraging us doing so. We also interpreted the fact RemoteSearcher being part of the package is an example of a valid use-case. The JOSS protocol is designed to handle versioning (although not perfectly) We didn't think that was risky, obviously in hindsight it is. But I do find it hard to believe it is something the author of these classes had in mind when Serializable interface was implemented. This is getting into a philosophical discussion on Java Serialization, and how it pertains to lucene. I don't see any resolution in the near future. Moving forward, we'd be happy to provide patches given the agreed solution. There is no reason to provide code patches if it is decided only documentation needs to change. (from what you have outlined, I interpret it being only documentation changes) Also, if you find us addressing this issue being a hassle, e.g. addressing serialization in lucene is an incorrect thing to do, feel free to let us know and we can close the bug and terminate the thread. Thanks -John On Fri, Dec 5, 2008 at 9:18 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > John Wang wrote: > >> Thus we are enforcing users that care about Serialization to use the >> release jar. >> > > We already encourage folks to use a release jar if possible. So this is > not a big change. Also, if folks choose to build their own jar, then they > are expected to use that same jar everywhere, effectively making their own > release. That doesn't seem unreasonable to me. Incrementally upgrading > distributed systems has, at least in the past, been outside the scope of > Lucene. > > 3) Clean up the serialization story, either add SUID or implement >> Externalizable for some classes within Lucene that implements Serializable: >> >> From what I am told, this is too much work for the committers. >> > > Not that it's too much work today, but that it adds an ongoing burden and > we should take this on cautiously if at all. If we want to go this way we'd > need to: > > - document precisely which classes we'll evolve back-compatibly; > - document the releases (major? minor?) that will be compatible; and > - provide a test suite that validates this. > > As a side note, we should probably move the back-compatibility > documentation from the wiki to the project website. This would permit > patches to it, among other things. > > http://wiki.apache.org/lucene-java/BackwardsCompatibility > > I hope you guys at least agree with me with the way it is currently, the >> serialization story is broken, whether in documentation or in code. >> > > Documenting an unstated assumption is a good thing to do, especially when > not everyone seems to share the assumption, but "broken" seems a bit strong > here. > > I see the disagreement being its severity, and whether it is a trivial >> fix, which I have learned it is not really my place to say. >> > > I've outlined above what I think would be required. If you think that's
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesMichael McCandless 2008-12-05, 20:07
John Wang wrote: > My proposal is to add the suid to Serializable classes That's too brittle. If we do that, then what happens when we need to add a field to the class (eg, in 2.9 we've replaced "inclusive" in RangeQuery with "includeLower" and "includeUpper")? The standard answer is you bump the suid, but, then that breaks back compatibility. Since we would still sometimes, unpredictably, break back compatibility, no app could rely on it. You can't have a "mostly back compatible" promise. So... we have to either 1) only support "live serialization" and update the javadocs saying so, or 2) support full back compat of serialized classes and spell out the actual policy, make thorough tests for it, etc. Mike ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-05, 21:10
Mike:
This has been gone back and forth on this thread already. Again, I agree it is not the perfect solution. I am comparing that to the current behavior, I don't think it is worse. (Only in my opinion). "live serialization" is not familiar to me. To understand it more, can you point me to somewhere the J2EE spec defines it? AFAIK, the J2EE spec does not make a distinction, and from what I gather from this thread, Lucene does not fall into the special category on how Serializable is used. Of course, it could just be my lack of understanding in the spec. We are happy to accept whatever you guys think on this issue. As it is currently, it is not consistent amongst different committers. Thanks -John On Fri, Dec 5, 2008 at 12:07 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > John Wang wrote: > > My proposal is to add the suid to Serializable classes >> > > That's too brittle. > > If we do that, then what happens when we need to add a field to the > class (eg, in 2.9 we've replaced "inclusive" in RangeQuery with > "includeLower" and "includeUpper")? The standard answer is you bump > the suid, but, then that breaks back compatibility. > > Since we would still sometimes, unpredictably, break back > compatibility, no app could rely on it. You can't have a "mostly > back compatible" promise. > > So... we have to either 1) only support "live serialization" and > update the javadocs saying so, or 2) support full back compat of > serialized classes and spell out the actual policy, make thorough > tests for it, etc. > > Mike > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesDoug Cutting 2008-12-05, 21:13
John Wang wrote:
> Also, if you find us addressing this issue being a hassle, e.g. > addressing serialization in lucene is an incorrect thing to do, feel > free to let us know and we can close the bug and terminate the thread. I don't know whether cross-version serialization belongs in Lucene. We need to discuss it, to find out how many users might want it, how many developers might fear it, how reasonable their fears are, etc. The discussion so far has not been an easy one. There have been many claims made which have little to do with the technical issue. As a project, we must reach consensus before we can do anything. Polarized comments do not help build consensus. Doug ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesDoug Cutting 2008-12-05, 21:23
John Wang wrote:
> This has been gone back and forth on this thread already. Again, > I agree it is not the perfect solution. I am comparing that to the > current behavior, I don't think it is worse. (Only in my opinion). So, if it's good enough for you, a user of java serialization, then perhaps those of us who don't use java serialization shouldn't complain. I think we'd want to add to the documentation something to the effect that this is all that's been done, and that if the classes change substantially then all bets are off. We do not want to imply that we're making any cross-version compatibility guarantees about serialization, rather just that folks who're willing to take their chances will not be impeded. Could something like that work? Doug ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJohn Wang 2008-12-05, 21:41
Works for me.
Thanks -John On Fri, Dec 5, 2008 at 1:23 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > John Wang wrote: > >> This has been gone back and forth on this thread already. Again, I >> agree it is not the perfect solution. I am comparing that to the current >> behavior, I don't think it is worse. (Only in my opinion). >> > > So, if it's good enough for you, a user of java serialization, then perhaps > those of us who don't use java serialization shouldn't complain. I think > we'd want to add to the documentation something to the effect that this is > all that's been done, and that if the classes change substantially then all > bets are off. We do not want to imply that we're making any cross-version > compatibility guarantees about serialization, rather just that folks who're > willing to take their chances will not be impeded. Could something like > that work? > > Doug > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesMichael McCandless 2008-12-05, 21:47
OK works for me too. John or Jason, can you update the patch on LUCENE-1743? We no longer need to implement Externalizable (just add fixed SUIDs), but we do need to update the javadocs for all classes implementing Serializable to state that cross-version compatibility is not guaranteed. Mike John Wang wrote: > Works for me. > > Thanks > > -John > > On Fri, Dec 5, 2008 at 1:23 PM, Doug Cutting <[EMAIL PROTECTED]> > wrote: > John Wang wrote: > This has been gone back and forth on this thread already. > Again, I agree it is not the perfect solution. I am comparing that > to the current behavior, I don't think it is worse. (Only in my > opinion). > > So, if it's good enough for you, a user of java serialization, then > perhaps those of us who don't use java serialization shouldn't > complain. I think we'd want to add to the documentation something > to the effect that this is all that's been done, and that if the > classes change substantially then all bets are off. We do not want > to imply that we're making any cross-version compatibility > guarantees about serialization, rather just that folks who're > willing to take their chances will not be impeded. Could something > like that work? > > Doug > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJason Rutherglen 2008-12-05, 22:24
I think it's best to implement Externalizable as long as someone is willing
to maintain it. I commit to maintaining the Externalizable code. The programming overhead is no more than implementing the equals method in the classes. New classes outside the Lucene code base simply need to implement Serializable to work. External developers are not required to implement Externalizable but may if they see fit. This will insure forward compatability between serialized versions, make the serialized objects smaller, and make serialization faster. Apparently it matters enough for Hadoop to implement Writeable in all over the wire classes. On Fri, Dec 5, 2008 at 1:47 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > OK works for me too. > > John or Jason, can you update the patch on LUCENE-1743? We no longer need > to implement Externalizable (just add fixed SUIDs), but we do need to update > the javadocs for all classes implementing Serializable to state that > cross-version compatibility is not guaranteed. > > Mike > > > John Wang wrote: > > Works for me. >> >> Thanks >> >> -John >> >> On Fri, Dec 5, 2008 at 1:23 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >> John Wang wrote: >> This has been gone back and forth on this thread already. Again, I >> agree it is not the perfect solution. I am comparing that to the current >> behavior, I don't think it is worse. (Only in my opinion). >> >> So, if it's good enough for you, a user of java serialization, then >> perhaps those of us who don't use java serialization shouldn't complain. I >> think we'd want to add to the documentation something to the effect that >> this is all that's been done, and that if the classes change substantially >> then all bets are off. We do not want to imply that we're making any >> cross-version compatibility guarantees about serialization, rather just that >> folks who're willing to take their chances will not be impeded. Could >> something like that work? >> >> Doug >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesDoug Cutting 2008-12-05, 22:40
Jason Rutherglen wrote:
> I think it's best to implement Externalizable as long as someone is > willing to maintain it. I commit to maintaining the Externalizable > code. We need to agree to maintain things as a community, not as individuals. We can't rely on any particular individual being around in the future. > This will insure forward compatability between serialized versions, make > the serialized objects smaller, and make serialization faster. If we want to promise compatibility we need to scope it and test it. We cannot in good faith promise that Query will be serially compatible forever, nor should we make any promises that we don't test. So if you choose to continue promoting this route, please specify the scope of compatibility and your plans to add tests for it. > Apparently it matters enough for Hadoop to implement Writeable in all > over the wire classes. I'm not sure what you're saying here. As I've said before, Hadoop is moving away from Writable because it is too fragile as classes change. As a part of the preparations for Hadoop 1.0 we are agreeing on serialization back-compatibility requirements and what technology we will use to support these. Hadoop is at its core a distributed system, while Lucene is not. Even then, Hadoop will continue to require that one update all nodes in a cluster in a coordinated manner, so only end-user protocols need be cross-version compatible, not internal protocols. I do not yet see a strong analogy here. Doug ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classesJason Rutherglen 2008-12-06, 00:02
The tests will be for backwards compatibility with previous versions of
Lucene using the described process of including previous versioned encoded serialized objects into the test code base. Similar to how CFS index files are included in the test code tree. There is a an elegance to the RemoteSearcher type of code that allows one to focus on their queries and algorithms and ignore the fact that they are searching over N machines. Protocol buffers seem okay. However given the way that Lucene allows customizations in things like SortComparatorSource I do not see how protocol buffers can be used with custom Java classes in the same way Java serialization works. If in the future Lucene allows greater customization such as with scorers, similarities and queries in Lucene 3.0 then marrying the data with code in a grid environment using protocol buffers gets ugly. Protocol buffers are nice and can be added to a distributed Lucene environment, but the cost of implementing them vs. Serialization is much higher. Uber distributed search may not be the most common use case right now for Lucene but as it improves it's capabilities then people will try to use Lucene in a distributed grid environment. One could conceivably execute arbitrarily complex coordinated operations over the standard Lucene 3.0 APIs without tearing down processes and other worries. Oracle has PL/SQL and Lucene effectively operates using Java for customized query operations like PL/SQL. It would seem natural to at least support Java as a way to execute customized queries. The customized queries would be dynamically loaded Java objects. In the marketplace Lucene seems to be a good place to do realtime search based data processing. At least compared to Sphinx and MG4J. A little further into the future with SSDs, it should be possible to perform place replacement of inverted index data using Lucene (at which point it is similar to a database) and the ability to execute remote code may be very useful. Hopefully the APIs for 3.0 will have a goal of being open enough for this. On Fri, Dec 5, 2008 at 2:40 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > Jason Rutherglen wrote: > >> I think it's best to implement Externalizable as long as someone is >> willing to maintain it. I commit to maintaining the Externalizable code. >> > > We need to agree to maintain things as a community, not as individuals. We > can't rely on any particular individual being around in the future. > > This will insure forward compatability between serialized versions, make >> the serialized objects smaller, and make serialization faster. >> > > If we want to promise compatibility we need to scope it and test it. We > cannot in good faith promise that Query will be serially compatible forever, > nor should we make any promises that we don't test. So if you choose to > continue promoting this route, please specify the scope of compatibility and > your plans to add tests for it. > > Apparently it matters enough for Hadoop to implement Writeable in all over >> the wire classes. >> > > I'm not sure what you're saying here. As I've said before, Hadoop is > moving away from Writable because it is too fragile as classes change. As a > part of the preparations for Hadoop 1.0 we are agreeing on serialization > back-compatibility requirements and what technology we will use to support > these. Hadoop is at its core a distributed system, while Lucene is not. > Even then, Hadoop will continue to require that one update all nodes in a > cluster in a coordinated manner, so only end-user protocols need be > cross-version compatible, not internal protocols. I do not yet see a strong > analogy here. > > > Doug > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
-
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versionseks dev 2008-12-08, 21:37
That sounds much better. Trying to distribute lucene (my reason why all this would be interesting) itself is just not going to work for far too many applications and will put burden on API extensions.
My point is, I do not want to distribute Lucene Index, I need to distribute my application that is using Lucene. Think of it like having distributed Luke, usefull by itself, but not really usefull for slightly more complex use cases. My Hit class is specialized Lucene Hit object, my Query has totally diferent features and agregates Lucene Query... this is what I can control, what I need to send over the wire and that is the place where I define what is my Version/API, if lucene API Classes change and all existing featurs remain, I have no problems in keeping my serialized objects compatible. So the versioning becomes under my control, Lucene provides only features, library. Having light layer, easily extensible, on top of the core API would be just great, as fas as I am concerned java Serialization is not my world, having something light and extensible in etch/thrift/hadop IPC/ProtocolBuffers direction is much more thrilling. That is exactly the road hadoop, nutch, katta and probably many others are taking, having comon base that supports such cases is maybe good idea, why not making RemoteSearchable using hadoop IPC, or etch/thrift ... Maybe there are other reasons to suport java serialization, I do not know. Just painting one view on this idea ----- Original Message ---- > From: Doug Cutting (JIRA) <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Monday, 8 December, 2008 19:52:46 > Subject: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions > > > [ > https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654513#action_12654513 > ] > > Doug Cutting commented on LUCENE-1473: > -------------------------------------- > > Would it take any more lines of code to remove Serializeable from the core > classes and re-implement RemoteSearchable in a separate layer on top of the core > APIs? That layer could be a contrib module and could get all the > externalizeable love it needs. It could support a specific popular subset of > query and filter classes, rather than arbitrary Query implementations. It would > be extensible, so that if folks wanted to support new kinds of queries, they > easily could. This other approach seems like a slippery slope, complicating > already complex code with new concerns. It would be better to encapsulate these > concerns in a layer atop APIs whose back-compatibility we already make promises > about, no? > > > Implement standard Serialization across Lucene versions > > ------------------------------------------------------- > > > > Key: LUCENE-1473 > > URL: https://issues.apache.org/jira/browse/LUCENE-1473 > > Project: Lucene - Java > > Issue Type: Bug > > Components: Search > > Affects Versions: 2.4 > > Reporter: Jason Rutherglen > > Priority: Minor > > Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, > LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch > > > > Original Estimate: 8h > > Remaining Estimate: 8h > > > > To maintain serialization compatibility between Lucene versions, > serialVersionUID needs to be added to classes that implement > java.io.Serializable. java.io.Externalizable may be implemented in classes for > faster performance. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versionsrobert engels 2008-12-08, 21:51
I think an important piece to make this work is the query parser/syntax.
We already have a system similar to what is outlined below. We made changes to the query syntax to support our various query extensions. The nice thing, is that persisting queries is a simple string. It also makes it very easy for external system to submit queries. We also have XML definitions for a "result set". I think the only way to make this work though, is probably a more detailed query syntax (similar to SQL), so that it can be easily extended with new clauses/functions without breaking existing code. I would also suggest that any core queries classes have a representation here. I would also like to see a way for "proprietary" clauses to be supported (like calls in SQL). On Dec 8, 2008, at 3:37 PM, eks dev wrote: > That sounds much better. Trying to distribute lucene (my reason why > all this would be interesting) itself is just not going to work for > far too many applications and will put burden on API extensions. > > My point is, I do not want to distribute Lucene Index, I need to > distribute my application that is using Lucene. Think of it like > having distributed Luke, usefull by itself, but not really usefull > for slightly more complex use cases. > My Hit class is specialized Lucene Hit object, my Query has totally > diferent features and agregates Lucene Query... this is what I can > control, what I need to send over the wire and that is the place > where I define what is my Version/API, if lucene API Classes change > and all existing featurs remain, I have no problems in keeping my > serialized objects compatible. So the versioning becomes under my > control, Lucene provides only features, library. > > Having light layer, easily extensible, on top of the core API > would be just great, as fas as I am concerned java Serialization is > not my world, having something light and extensible in etch/thrift/ > hadop IPC/ProtocolBuffers direction is much more thrilling. That > is exactly the road hadoop, nutch, katta and probably many others > are taking, having comon base that supports such cases is maybe > good idea, why not making RemoteSearchable using hadoop IPC, or > etch/thrift ... > > Maybe there are other reasons to suport java serialization, I do > not know. Just painting one view on this idea > > > > > ----- Original Message ---- >> From: Doug Cutting (JIRA) <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Monday, 8 December, 2008 19:52:46 >> Subject: [jira] Commented: (LUCENE-1473) Implement standard >> Serialization across Lucene versions >> >> >> [ >> https://issues.apache.org/jira/browse/LUCENE-1473? >> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- >> tabpanel&focusedCommentId=12654513#action_12654513 >> ] >> >> Doug Cutting commented on LUCENE-1473: >> -------------------------------------- >> >> Would it take any more lines of code to remove Serializeable from >> the core >> classes and re-implement RemoteSearchable in a separate layer on >> top of the core >> APIs? That layer could be a contrib module and could get all the >> externalizeable love it needs. It could support a specific >> popular subset of >> query and filter classes, rather than arbitrary Query >> implementations. It would >> be extensible, so that if folks wanted to support new kinds of >> queries, they >> easily could. This other approach seems like a slippery slope, >> complicating >> already complex code with new concerns. It would be better to >> encapsulate these >> concerns in a layer atop APIs whose back-compatibility we already >> make promises >> about, no? >> >>> Implement standard Serialization across Lucene versions >>> ------------------------------------------------------- >>> >>> Key: LUCENE-1473 >>> URL: https://issues.apache.org/jira/browse/ >>> LUCENE-1473 >>> Project: Lucene - Java
-
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versionsErik Hatcher 2008-12-08, 22:40
Well, there's the pretty sophisticated and extensible XML query parser
in contrib. I've still only scratched the surface of it, but it meets the specs you mentioned. Erik On Dec 8, 2008, at 4:51 PM, robert engels wrote: > I think an important piece to make this work is the query parser/ > syntax. > > We already have a system similar to what is outlined below. We made > changes to the query syntax to support our various query extensions. > > The nice thing, is that persisting queries is a simple string. It > also makes it very easy for external system to submit queries. > > We also have XML definitions for a "result set". > > I think the only way to make this work though, is probably a more > detailed query syntax (similar to SQL), so that it can be easily > extended with new clauses/functions without breaking existing code. > > I would also suggest that any core queries classes have a > representation here. > > I would also like to see a way for "proprietary" clauses to be > supported (like calls in SQL). > > On Dec 8, 2008, at 3:37 PM, eks dev wrote: > >> That sounds much better. Trying to distribute lucene (my reason why >> all this would be interesting) itself is just not going to work for >> far too many applications and will put burden on API extensions. >> >> My point is, I do not want to distribute Lucene Index, I need to >> distribute my application that is using Lucene. Think of it like >> having distributed Luke, usefull by itself, but not really usefull >> for slightly more complex use cases. >> My Hit class is specialized Lucene Hit object, my Query has totally >> diferent features and agregates Lucene Query... this is what I can >> control, what I need to send over the wire and that is the place >> where I define what is my Version/API, if lucene API Classes change >> and all existing featurs remain, I have no problems in keeping my >> serialized objects compatible. So the versioning becomes under my >> control, Lucene provides only features, library. >> >> Having light layer, easily extensible, on top of the core API >> would be just great, as fas as I am concerned java Serialization is >> not my world, having something light and extensible in etch/thrift/ >> hadop IPC/ProtocolBuffers direction is much more thrilling. That >> is exactly the road hadoop, nutch, katta and probably many others >> are taking, having comon base that supports such cases is maybe >> good idea, why not making RemoteSearchable using hadoop IPC, or >> etch/thrift ... >> >> Maybe there are other reasons to suport java serialization, I do >> not know. Just painting one view on this idea >> >> >> >> >> ----- Original Message ---- >>> From: Doug Cutting (JIRA) <[EMAIL PROTECTED]> >>> To: [EMAIL PROTECTED] >>> Sent: Monday, 8 December, 2008 19:52:46 >>> Subject: [jira] Commented: (LUCENE-1473) Implement standard >>> Serialization across Lucene versions >>> >>> >>> [ >>> https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654513 >>> #action_12654513 >>> ] >>> >>> Doug Cutting commented on LUCENE-1473: >>> -------------------------------------- >>> >>> Would it take any more lines of code to remove Serializeable from >>> the core >>> classes and re-implement RemoteSearchable in a separate layer on >>> top of the core >>> APIs? That layer could be a contrib module and could get all the >>> externalizeable love it needs. It could support a specific >>> popular subset of >>> query and filter classes, rather than arbitrary Query >>> implementations. It would >>> be extensible, so that if folks wanted to support new kinds of >>> queries, they >>> easily could. This other approach seems like a slippery slope, >>> complicating >>> already complex code with new concerns. It would be better to >>> encapsulate these >>> concerns in a layer atop APIs whose back-compatibility we already
-
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versionsrobert engels 2008-12-08, 22:49
The problem with that is that in most cases you still need a "string"
based syntax that "people" can enter... I guess you can always have an "advanced search" page that builds and submits the XML query behind the scenes. On Dec 8, 2008, at 4:40 PM, Erik Hatcher wrote: > Well, there's the pretty sophisticated and extensible XML query > parser in contrib. I've still only scratched the surface of it, > but it meets the specs you mentioned. > > Erik > > > On Dec 8, 2008, at 4:51 PM, robert engels wrote: > >> I think an important piece to make this work is the query parser/ >> syntax. >> >> We already have a system similar to what is outlined below. We >> made changes to the query syntax to support our various query >> extensions. >> >> The nice thing, is that persisting queries is a simple string. It >> also makes it very easy for external system to submit queries. >> >> We also have XML definitions for a "result set". >> >> I think the only way to make this work though, is probably a more >> detailed query syntax (similar to SQL), so that it can be easily >> extended with new clauses/functions without breaking existing code. >> >> I would also suggest that any core queries classes have a >> representation here. >> >> I would also like to see a way for "proprietary" clauses to be >> supported (like calls in SQL). >> >> On Dec 8, 2008, at 3:37 PM, eks dev wrote: >> >>> That sounds much better. Trying to distribute lucene (my reason >>> why all this would be interesting) itself is just not going to >>> work for far too many applications and will put burden on API >>> extensions. >>> >>> My point is, I do not want to distribute Lucene Index, I need to >>> distribute my application that is using Lucene. Think of it like >>> having distributed Luke, usefull by itself, but not really >>> usefull for slightly more complex use cases. >>> My Hit class is specialized Lucene Hit object, my Query has >>> totally diferent features and agregates Lucene Query... this is >>> what I can control, what I need to send over the wire and that is >>> the place where I define what is my Version/API, if lucene API >>> Classes change and all existing featurs remain, I have no >>> problems in keeping my serialized objects compatible. So the >>> versioning becomes under my control, Lucene provides only >>> features, library. >>> >>> Having light layer, easily extensible, on top of the core API >>> would be just great, as fas as I am concerned java Serialization >>> is not my world, having something light and extensible in etch/ >>> thrift/hadop IPC/ProtocolBuffers direction is much more >>> thrilling. That is exactly the road hadoop, nutch, katta and >>> probably many others are taking, having comon base that supports >>> such cases is maybe good idea, why not making RemoteSearchable >>> using hadoop IPC, or etch/thrift ... >>> >>> Maybe there are other reasons to suport java serialization, I do >>> not know. Just painting one view on this idea >>> >>> >>> >>> >>> ----- Original Message ---- >>>> From: Doug Cutting (JIRA) <[EMAIL PROTECTED]> >>>> To: [EMAIL PROTECTED] >>>> Sent: Monday, 8 December, 2008 19:52:46 >>>> Subject: [jira] Commented: (LUCENE-1473) Implement standard >>>> Serialization across Lucene versions >>>> >>>> >>>> [ >>>> https://issues.apache.org/jira/browse/LUCENE-1473? >>>> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- >>>> tabpanel&focusedCommentId=12654513#action_12654513 >>>> ] >>>> >>>> Doug Cutting commented on LUCENE-1473: >>>> -------------------------------------- >>>> >>>> Would it take any more lines of code to remove Serializeable >>>> from the core >>>> classes and re-implement RemoteSearchable in a separate layer on >>>> top of the core >>>> APIs? That layer could be a contrib module and could get all the >>>> externalizeable love it needs. It could support a specific >>>> popular subset of >>>> query and filter classes, rather than arbitrary Query
-
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versionsEarwin Burrfoot 2008-12-08, 22:53
Building your own parser with Antlr is really easy. Using Ragel is
harder, but yields insane parsing performance. Is there any reason to worry about library-bundled parsers if you're making something more complex then a college project? On Tue, Dec 9, 2008 at 01:49, robert engels <[EMAIL PROTECTED]> wrote: > The problem with that is that in most cases you still need a "string" based > syntax that "people" can enter... > > I guess you can always have an "advanced search" page that builds and > submits the XML query behind the scenes. > > > > On Dec 8, 2008, at 4:40 PM, Erik Hatcher wrote: > >> Well, there's the pretty sophisticated and extensible XML query parser in >> contrib. I've still only scratched the surface of it, but it meets the >> specs you mentioned. >> >> Erik >> >> >> On Dec 8, 2008, at 4:51 PM, robert engels wrote: >> >>> I think an important piece to make this work is the query parser/syntax. >>> >>> We already have a system similar to what is outlined below. We made >>> changes to the query syntax to support our various query extensions. >>> >>> The nice thing, is that persisting queries is a simple string. It also >>> makes it very easy for external system to submit queries. >>> >>> We also have XML definitions for a "result set". >>> >>> I think the only way to make this work though, is probably a more >>> detailed query syntax (similar to SQL), so that it can be easily extended >>> with new clauses/functions without breaking existing code. >>> >>> I would also suggest that any core queries classes have a representation >>> here. >>> >>> I would also like to see a way for "proprietary" clauses to be supported >>> (like calls in SQL). >>> >>> On Dec 8, 2008, at 3:37 PM, eks dev wrote: >>> >>>> That sounds much better. Trying to distribute lucene (my reason why all >>>> this would be interesting) itself is just not going to work for far too many >>>> applications and will put burden on API extensions. >>>> >>>> My point is, I do not want to distribute Lucene Index, I need to >>>> distribute my application that is using Lucene. Think of it like having >>>> distributed Luke, usefull by itself, but not really usefull for slightly >>>> more complex use cases. >>>> My Hit class is specialized Lucene Hit object, my Query has totally >>>> diferent features and agregates Lucene Query... this is what I can control, >>>> what I need to send over the wire and that is the place where I define what >>>> is my Version/API, if lucene API Classes change and all existing featurs >>>> remain, I have no problems in keeping my serialized objects compatible. So >>>> the versioning becomes under my control, Lucene provides only features, >>>> library. >>>> >>>> Having light layer, easily extensible, on top of the core API would be >>>> just great, as fas as I am concerned java Serialization is not my world, >>>> having something light and extensible in etch/thrift/hadop >>>> IPC/ProtocolBuffers direction is much more thrilling. That is exactly the >>>> road hadoop, nutch, katta and probably many others are taking, having comon >>>> base that supports such cases is maybe good idea, why not making >>>> RemoteSearchable using hadoop IPC, or etch/thrift ... >>>> >>>> Maybe there are other reasons to suport java serialization, I do not >>>> know. Just painting one view on this idea >>>> >>>> >>>> >>>> >>>> ----- Original Message ---- >>>>> >>>>> From: Doug Cutting (JIRA) <[EMAIL PROTECTED]> >>>>> To: [EMAIL PROTECTED] >>>>> Sent: Monday, 8 December, 2008 19:52:46 >>>>> Subject: [jira] Commented: (LUCENE-1473) Implement standard >>>>> Serialization across Lucene versions >>>>> >>>>> >>>>> [ >>>>> >>>>> https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654513#action_12654513 >>>>> ] >>>>> >>>>> Doug Cutting commented on LUCENE-1473: >>>>> -------------------------------------- >>>>> >>>>> Would it take any more lines of code to remove Serializeable from the Kirill Zakharenko/Кирилл Захаренко ([EMAIL PROTECTED]) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785
-
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versionsrobert engels 2008-12-08, 22:56
I only meant is from a persistence standpoint - if you need a full
"human enterable" query syntax anyway, why not just use that as the persistence format. On Dec 8, 2008, at 4:53 PM, Earwin Burrfoot wrote: > Building your own parser with Antlr is really easy. Using Ragel is > harder, but yields insane parsing performance. > Is there any reason to worry about library-bundled parsers if you're > making something more complex then a college project? > > On Tue, Dec 9, 2008 at 01:49, robert engels <[EMAIL PROTECTED]> > wrote: >> The problem with that is that in most cases you still need a >> "string" based >> syntax that "people" can enter... >> >> I guess you can always have an "advanced search" page that builds and >> submits the XML query behind the scenes. >> >> >> >> On Dec 8, 2008, at 4:40 PM, Erik Hatcher wrote: >> >>> Well, there's the pretty sophisticated and extensible XML query >>> parser in >>> contrib. I've still only scratched the surface of it, but it >>> meets the >>> specs you mentioned. >>> >>> Erik >>> >>> >>> On Dec 8, 2008, at 4:51 PM, robert engels wrote: >>> >>>> I think an important piece to make this work is the query parser/ >>>> syntax. >>>> >>>> We already have a system similar to what is outlined below. We >>>> made >>>> changes to the query syntax to support our various query >>>> extensions. >>>> >>>> The nice thing, is that persisting queries is a simple string. >>>> It also >>>> makes it very easy for external system to submit queries. >>>> >>>> We also have XML definitions for a "result set". >>>> >>>> I think the only way to make this work though, is probably a more >>>> detailed query syntax (similar to SQL), so that it can be easily >>>> extended >>>> with new clauses/functions without breaking existing code. >>>> >>>> I would also suggest that any core queries classes have a >>>> representation >>>> here. >>>> >>>> I would also like to see a way for "proprietary" clauses to be >>>> supported >>>> (like calls in SQL). >>>> >>>> On Dec 8, 2008, at 3:37 PM, eks dev wrote: >>>> >>>>> That sounds much better. Trying to distribute lucene (my reason >>>>> why all >>>>> this would be interesting) itself is just not going to work for >>>>> far too many >>>>> applications and will put burden on API extensions. >>>>> >>>>> My point is, I do not want to distribute Lucene Index, I need to >>>>> distribute my application that is using Lucene. Think of it >>>>> like having >>>>> distributed Luke, usefull by itself, but not really usefull for >>>>> slightly >>>>> more complex use cases. >>>>> My Hit class is specialized Lucene Hit object, my Query has >>>>> totally >>>>> diferent features and agregates Lucene Query... this is what I >>>>> can control, >>>>> what I need to send over the wire and that is the place where I >>>>> define what >>>>> is my Version/API, if lucene API Classes change and all >>>>> existing featurs >>>>> remain, I have no problems in keeping my serialized objects >>>>> compatible. So >>>>> the versioning becomes under my control, Lucene provides only >>>>> features, >>>>> library. >>>>> >>>>> Having light layer, easily extensible, on top of the core API >>>>> would be >>>>> just great, as fas as I am concerned java Serialization is not >>>>> my world, >>>>> having something light and extensible in etch/thrift/hadop >>>>> IPC/ProtocolBuffers direction is much more thrilling. That is >>>>> exactly the >>>>> road hadoop, nutch, katta and probably many others are taking, >>>>> having comon >>>>> base that supports such cases is maybe good idea, why not making >>>>> RemoteSearchable using hadoop IPC, or etch/thrift ... >>>>> >>>>> Maybe there are other reasons to suport java serialization, I >>>>> do not >>>>> know. Just painting one view on this idea >>>>> >>>>> >>>>> >>>>> >>>>> ----- Original Message ---- >>>>>> >>>>>> From: Doug Cutting (JIRA) <[EMAIL PROTECTED]> >>>>>> To: [EMAIL PROTECTED] >>>>>> Sent: Monday, 8 December, 2008 19:52:46
-
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versionsmarkharw00d 2008-12-08, 23:10
> The problem with that is that in most cases you still need a "string" > based syntax that "people" can enter... The XML syntax includes a <UserQuery> tag for embedding user input of this type. > > I guess you can always have an "advanced search" page that builds and > submits the XML query behind the scenes. Contrib now includes a worked demo web app showing how a very typical search form is converted into XML using XSL. User input is a mixture of edit boxes for classic QueryParser syntax used on free-text fields but also includes drop-downs and checkboxes etc that map to other non-free-text fields. Cheers Mark ---------------------------------------------------------------------
-
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versionsGrant Ingersoll 2008-12-09, 17:18
See http://lucene.markmail.org/message/fu34tuomnqejchfj?q=RemoteSearchable
for just such a proposal On Dec 8, 2008, at 1:52 PM, Doug Cutting (JIRA) wrote: > > [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654513 > #action_12654513 ] > > Doug Cutting commented on LUCENE-1473: > -------------------------------------- > > Would it take any more lines of code to remove Serializeable from > the core classes and re-implement RemoteSearchable in a separate > layer on top of the core APIs? That layer could be a contrib module > and could get all the externalizeable love it needs. It could > support a specific popular subset of query and filter classes, > rather than arbitrary Query implementations. It would be > extensible, so that if folks wanted to support new kinds of queries, > they easily could. This other approach seems like a slippery slope, > complicating already complex code with new concerns. It would be > better to encapsulate these concerns in a layer atop APIs whose back- > compatibility we already make promises about, no? > >> Implement standard Serialization across Lucene versions >> ------------------------------------------------------- >> >> Key: LUCENE-1473 >> URL: https://issues.apache.org/jira/browse/LUCENE-1473 >> Project: Lucene - Java >> Issue Type: Bug >> Components: Search >> Affects Versions: 2.4 >> Reporter: Jason Rutherglen >> Priority: Minor >> Attachments: custom-externalizable-reader.patch, >> LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, >> LUCENE-1473.patch >> >> Original Estimate: 8h >> Remaining Estimate: 8h >> >> To maintain serialization compatibility between Lucene versions, >> serialVersionUID needs to be added to classes that implement >> java.io.Serializable. java.io.Externalizable may be implemented in >> classes for faster performance. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -------------------------- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ --------------------------------------------------------------------- |