Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - Solr 3.1 indexing error Invalid UTF-8 character 0xffff


Copy link to this message
-
Re: Solr 3.1 indexing error Invalid UTF-8 character 0xffff
Bernd Fehling 2011-06-27, 12:47

Am 27.06.2011 14:35, schrieb Robert Muir:
> On Mon, Jun 27, 2011 at 8:30 AM, Bernd Fehling
> <[EMAIL PROTECTED]>  wrote:
>
>> Unicode U+FFFF ist UTF-8 byte sequence "ef bf bf" that is right.
>>
>> But I was saying that UTF-8 0xffff (which is byte sequence "ff ff") is
>> illegal
>> and that's what the java.io.CharConversionException is complaining about.
>> "Invalid UTF-8 character 0xffff".
>>
>> Don't mix up Unicode with UTF-8.
>>
>> Sorry, but think are wrong ;-)
>>
>
> Hi, there is no such thing as "UTF-8 0xffff", nor is there any such
> thing as "utf-8 character", despite what this xml parser might say.
>
> This is just a stupid XML parser, like other stupid things about XML,
> it says 'illegal this' or 'illegal that' for arbitrary sets of unicode
> (such as control characters).
>
> You can tell the XML parser is totally broken, when it uses the phrase
> 'utf-8 character'. this term does not exist.

correct!!!