| clear query|facets|time |
Search criteria: .
Results from 101 to 110 from
805 (0.281s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Keeping History/Archive with Nutch 2.x - Nutch - [mail # user]
|
|
...Hi James You could have a custom map reduce job to copy the documents with a custom ID as you just described. Another option would be to use Nutch 2 + HBase and set a large value of ve...
|
|
|
Author: Julien Nioche,
2012-10-09, 13:30
|
|
|
Re: language profile in Nutch 1.5 - Nutch - [mail # user]
|
|
...Hello Patricio The language identification is delegated to Tika since 1.4 ( https://issues.apache.org/jira/browse/NUTCH-1075) so you should create your own models with Tika instead. As...
|
|
|
Author: Julien Nioche,
2012-10-08, 10:09
|
|
|
Re: [ANNOUNCE] Apache Nutch 2.1 Released - Nutch - [mail # dev]
|
|
...Thanks Lewis and well done everyone! Enjoy your week end Julien On 5 October 2012 16:12, lewis john mcgibbney wrote: * *Open Source Solutions for Text Enginee...
|
|
|
Author: Julien Nioche,
2012-10-05, 16:29
|
|
|
Re: Index HTML raw content - Nutch - [mail # user]
|
|
...Hi Matteo Not so much the Tika plugin but is simply that the raw content is not indexed. This can be done however by writing a custom parser that will generate a new field for the raw ...
|
|
|
Author: Julien Nioche,
2012-10-05, 10:05
|
|
|
Re: [PING] [VOTE] Apache Nutch 2.1 Release Candidate Available - Nutch - [mail # user]
|
|
...Only the Apache distribution of Hadoop version 1.0.3 is officially supported by Nutch. Obviously if we can get it to work on other distribution then the better it is but this can't be consid...
|
|
|
Author: Julien Nioche,
2012-10-03, 13:37
|
|
|
Re: nutch-2.0 generate in deploy mode - Nutch - [mail # user]
|
|
...Guys, There are certainly overheads in using the distributed mode (communication with servers etc...) and moving the job file around, unpacking it etc... but before we start taking abo...
|
|
|
Author: Julien Nioche,
2012-10-02, 09:11
|
|
|
Re: priorised/scored fetching - Nutch - [mail # user]
|
|
...you should be able to do that with a custom scoring filter and give a score based on the mime type On 2 October 2012 08:28, Markus Jelsma wrote: * *Open Source Solu...
|
|
|
Author: Julien Nioche,
2012-10-02, 08:34
|
|
|
Re: [VOTE] Apache Nutch 2.1 Release Candidate Available - Nutch - [mail # dev]
|
|
...Would be good to get thumb-ups from people who've tested crawls on other backends (Cassandra, Hbase) before pushing the release. I can't really give a +1 as I've just checked the most ...
|
|
|
Author: Julien Nioche,
2012-10-01, 14:34
|
|
|
Re: [VOTE] Apache Nutch 2.1 Release Candidate Available - Nutch - [mail # dev]
|
|
...Ok, thanks. Was trying to get a minimalistic crawl of http://nutch.apache.org/ with MySQL but no success so far (the URL is not being fetched). Unfortunately won't have the time to investiga...
|
|
|
Author: Julien Nioche,
2012-10-01, 13:18
|
|
|
Re: [VOTE] Apache Nutch 2.1 Release Candidate Available - Nutch - [mail # dev]
|
|
...Shouldn't the dependency for gora-sql point to v 0.2.1? On 21 September 2012 16:07, Lewis John Mcgibbney wrote: * *Open Source Solutions for Text Engineering http:/...
|
|
|
Author: Julien Nioche,
2012-10-01, 12:36
|
|
|
|