Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Lucene, mail # user - UTF-8/unicode input in querying in Lucene


Copy link to this message
-
Re: UTF-8/unicode input in querying in Lucene
Chris Hostetter 2007-09-15, 00:47

: The page http://lucene.apache.org/java/docs/queryparsersyntax.html does not
: mention that \uNNNN Unicode syntax is supported.
: For example, \u0048\u0045\u004c\u004c\u004f is HELLO.
:  
: Please add this to the page, it took experimentation to discover it.

I don't believe the QueryParser actually treats \uNNNNN as a special
syntax ... what you may have encountered was that when *javac* parses a
literal string constant, those sequences have special meaning -- but they
are already the literal unicode characters long before QueryParser sees
them.

As far as query parser is concerned the backslash in \uNNNNN is only
escaping the "u"  (all characters can be escaped, wether they need it or
not)

-Hoss
---------------------------------------------------------------------