Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Nutch, mail # dev - Detecting Encoding with plugins


Copy link to this message
-
Re: Detecting Encoding with plugins
Julien Nioche 2012-02-15, 10:59
The mimetype is not the same thing as the encoding. As Ken pointed out this
is done at the individual parser level

On 14 February 2012 23:51, Markus Jelsma <[EMAIL PROTECTED]> wrote:

> Hi,
>
> This was indeed an issue until today. The detected type is in the crawl
> datum
> metadata.
>
> https://issues.apache.org/jira/browse/NUTCH-1259
>
> > Hi,
> >
> > I can't see anywhere within our parser plugins where we detect encoding
> of
> > documents. I've also begun looking through the o.a.n.p package but again
> I
> > can't see anything.
> >
> > Can anyone provide some detail on this please?
> >
> > Thank you
> >
> > Lewis
>

--
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble