Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Lucene, mail # user - Lucene tokenization


Copy link to this message
-
RE: Lucene tokenization
Steven A Rowe 2012-03-27, 18:11
Hi Nilesh,

Which version of Lucene are you using?  StandardTokenizer behavior changed in v3.1.

Steve

-----Original Message-----
From: Nilesh Vijaywargiay [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, March 27, 2012 2:04 PM
To: [EMAIL PROTECTED]
Subject: Lucene tokenization

I have a string 01a_b-_-c-d which is tokenized as 01a_b c d

and the string a_b-_-c_d which is tokenized as a b c d

why is there a difference when there is a digit at the beginning? I am using standard unstemmed tokenizer.

---------------------------------------------------------------------