Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Nutch, mail # user - Apparently far from last question :)


Copy link to this message
-
Re: Apparently far from last question :)
Tolga 2012-05-23, 12:20
My colleague has just made me realize something. Is it possible that
this xls file wasn't crawled because there isn't a link to it within the
website?

Regards,

On 5/23/12 2:05 PM, Lewis John Mcgibbney wrote:
> There is absolutely no requirement to add this configuration to this file.
> If you you look at the XML file in question, one of the first XML
> configuration blocks says
>
> <!--  by default if the mimeType is set to *, or
>          if it can't be determined, use parse-tika -->
> <mimeType name="*">
> <plugin id="parse-tika" />
> </mimeType>
>
> Just remove your unnecessary config and Tika will do the work for you :0)
>
> Lewis
>
> On Wed, May 23, 2012 at 11:44 AM, Tolga<[EMAIL PROTECTED]>  wrote:
>> Hi,
>>
>> I put the lines<mimeType name="application/x-excel">
>> <plugin id="parse-tika" />
>> <plugin id="feed" />
>> </mimeType>
>>
>> in parse-plugins.xml, but I still can't crawl xls files. Why is that?
>>
>> Regards,
>
>