Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Lucene.Net, mail # user - Concatenated TokenStreams


+
Ben West 2010-12-16, 20:19
+
Troy Howard 2010-12-16, 21:10
+
Ben West 2010-12-16, 21:23
Copy link to this message
-
Re: Concatenated TokenStreams
Troy Howard 2010-12-17, 03:36
Here's some example code that seems to work...

In a nutshell, I created a new kind of field and a new kind of
TokenStream (MultiField and MultiTokenStream)... Both of which just
handle aggregation logic.

https://gist.github.com/744444

This is a naive example, so you might need to do a bit more to toughen
it up for production use. It takes advantage of CaptureState and
RestoreState in the TokenStream API, so this might not be the best
performing code in the world, but it should point you in the right
direction.
Thanks,
Troy
On Thu, Dec 16, 2010 at 1:23 PM, Ben West <[EMAIL PROTECTED]> wrote:
> wow yeah, that would be awesome.
>
> I'm using 2.9.2
>
> Thanks,
> -ben
>
> --- On Thu, 12/16/10, Troy Howard <[EMAIL PROTECTED]> wrote:
>
>> From: Troy Howard <[EMAIL PROTECTED]>
>> Subject: Re: Concatenated TokenStreams
>> To: [EMAIL PROTECTED]
>> Date: Thursday, December 16, 2010, 3:10 PM
>> Ben,
>>
>> This seems totally possible with a wrapper. I'll see if I
>> can't mock
>> up a prototype.
>>
>> I assume you're using 2.9.2?
>>
>> Thanks,
>> Troy
>>
>>
>> On Thu, Dec 16, 2010 at 12:19 PM, Ben West <[EMAIL PROTECTED]>
>> wrote:
>> > Hey All,
>> >
>> > I want to make something roughly equivalent to Solr's
>> copy fields. However, I can't just concatenate the values
>> into one string then put that into a field, because each
>> field needs its appropriate analyzer.
>> >
>> > I know you can instantiate a field with a token stream
>> as an input, but I'm not sure how to combine multiple token
>> streams into one. I tried to make a wrapper class to do
>> this, but I can't figure out how since with the attribute
>> API I can't switch the attributes they already had onto a
>> new stream.
>> >
>> > I checked Solr's source, and as far as I could tell it
>> doesn't seem like they allow each input field to be
>> tokenized in its own way (though I could definitely be
>> misreading it).
>> >
>> > Does anyone know if this is possible?
>> >
>> > Thanks,
>> > -Ben
>> >
>> >
>> >
>> >
>>
>
>
>
>
+
Ben West 2010-12-20, 18:57