Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr >> mail # user >> How to retrieve tokens?


Copy link to this message
-
Re: How to retrieve tokens?
Essentially, you're talking about reconstructing the field from the
tokens, and that's pretty difficult in general and lossy. For instance,
if you use stemming and "running" gets stemmed to "run", you
get back just "run" from the index. Is that acceptable?

But otherwise, you've got to go into the low levels of Lucene to
get this info, and reassembling it is lengthy, I suspect you'd find
that performance was unacceptable.

Why do you want to do this? This may be an XY problem.
http://people.apache.org/~hossman/#xyproblem

Best
Erick

On Thu, Feb 23, 2012 at 10:22 AM, Thiago <[EMAIL PROTECTED]> wrote:
> Hi to everybody,
>
> My name is Thiago and I'm new with Apache Solr and NoSQL databases. At the
> moment, I'm working and using Solr for document indexing. My Question is: Is
> there any way to retrieve the tokens in place of the original data?
>
> For example:
> I have a field using the fieldtype text_general from the original
> schema.xml. If I insert a document with the following string in this field:
> "All you need is love", the tokens that I get are: all, you, need, love.
> When I search in this base, I want to get the tokens(all, you, need, love)
> in place of the indexed string.
>
> I searched for this in the web and in this forum too, but I saw some people
> saying to use TermVectorsComponent. Is there any way more easy to do it? As
> I saw, TermVectorsComponent is more difficult and use more memory.
>
> Thanks to everybody.
>
> Thiago
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/How-to-retrieve-tokens-tp3770007p3770007.html
> Sent from the Solr - User mailing list archive at Nabble.com.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB