| clear query|facets|time |
Search criteria: .
Results from 121 to 130 from
805 (5.436s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: How to configure nutch so that apache tika can extract all the tags ? - Nutch - [mail # user]
|
|
...Hi Kiran You should be able to do that with either parse-html and parse-tika by implementing an extension of HtmlParseFilter and store the attr_* values in the parse metadata then writ...
|
|
|
Author: Julien Nioche,
2012-09-07, 09:20
|
|
|
Re: SolrDeleteDuplicates bug - Nutch - [mail # user]
|
|
...Hi, 1.6 is not published yet and is the trunk on SVN see http://nutch.apache.org/version_control.html On 6 September 2012 20:17, wrote: * *Open Source Soluti...
|
|
|
Author: Julien Nioche,
2012-09-06, 19:44
|
|
|
Re: nutch 1.5 not able to parse mutliValued metatags - Nutch - [mail # user]
|
|
...Hi Kiran This looks like a possible improvement indeed. Please open an issue on JIRA https://issues.apache.org/jira/browse/NUTCH Thanks Julien On 5 September 2012 20:...
|
|
|
Author: Julien Nioche,
2012-09-06, 08:38
|
|
|
Re: Cached page (like google) with hits highlighted - Nutch - [mail # user]
|
|
...Sorry I had missed your previous comments. On 16 August 2012 09:32, Julien Nioche wrote: * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot....
|
|
|
Author: Julien Nioche,
2012-08-16, 08:33
|
|
|
Re: Cached page (like google) with hits highlighted - Nutch - [mail # user]
|
|
...You need to use parse-tika, however the underlying parser for pdf does not currently generate much markup, the Word one does IIRC. Why don't you try Tika standalone with its GUI to exp...
|
|
|
Author: Julien Nioche,
2012-08-16, 08:32
|
|
|
Re: how to add raw HTML field to Solr - Nutch - [mail # user]
|
|
...Max, Can you please open an issue on JIRA? I think it would make sense to have the possibility to add the raw content as a binary field so that people could use it on the SOLR side as ...
|
|
|
Author: Julien Nioche,
2012-08-16, 08:27
|
|
|
Re: duplicate jar files by plugin dependencies - Nutch - [mail # dev]
|
|
...+1 to using the maven-dependency-plugin within our ANT script. I think I had put a preliminary version for 1.x in JIRA but we'd need to extend the mechanism to the plugins as well. On ...
|
|
|
Author: Julien Nioche,
2012-08-10, 10:10
|
|
|
Re: cache field in index-basic in 2.X - Nutch - [mail # user]
|
|
... it does not concern metadata, we store as metadata the policies regarding caching that are specified in the html pages ( http://www.i18nguy.com/markup/metatags.html) then store the pol...
|
|
|
Author: Julien Nioche,
2012-08-10, 09:39
|
|
|
Re: CHM Files and Tika - Nutch - [mail # user]
|
|
...new JIRA? On 9 August 2012 23:30, Markus Jelsma wrote: * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebb...
|
|
|
Author: Julien Nioche,
2012-08-10, 07:32
|
|
|
Re: cache field in index-basic in 2.X - Nutch - [mail # user]
|
|
...Could this be for the html meta directives? On 9 August 2012 22:36, Lewis John Mcgibbney wrote: * *Open Source Solutions for Text Engineering http://digitalpebble.b...
|
|
|
Author: Julien Nioche,
2012-08-10, 07:30
|
|
|
|