| clear query|facets|time |
Search criteria: .
Results from 231 to 240 from
817 (0.729s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Common Crawl dataset - Nutch - [mail # user]
|
|
...I think Common Crawl which uses a slightly different definition of ARCs, not sure though. Anyway they have released a library to read/write to their format https://github.com/commoncrawl/com...
|
|
|
Author: Julien Nioche,
2012-05-23, 08:23
|
|
|
Re: Apache Nutch release 1.5 RC2 - Nutch - [mail # dev]
|
|
...Read http://people.apache.org/~lewismc/nutch-1.5-rc2/ :-) On 22 May 2012 20:59, Lewis John Mcgibbney wrote: * *Open Source Solutions for Text Engineering http://dig...
|
|
|
Author: Julien Nioche,
2012-05-22, 20:18
|
|
|
Re: 1.5 RC2 - Nutch - [mail # dev]
|
|
...Brilliant! Thanks Lewis On 22 May 2012 20:30, Lewis John Mcgibbney wrote: * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www...
|
|
|
Author: Julien Nioche,
2012-05-22, 20:16
|
|
|
Re: 1.5 RC2 - Nutch - [mail # dev]
|
|
...we'd need to duplicate the tasks tar and zip so that they operate on what package-bin produces + rename the output of the standard package into nutch-X-src. The modif I made to build.xml doe...
|
|
|
Author: Julien Nioche,
2012-05-22, 19:17
|
|
|
Re: Get Parent of URLs fetched by nutch - Nutch - [mail # user]
|
|
...Implement your own scoring filter and add the URL of the source to the targets' metadata. See https://issues.apache.org/jira/browse/NUTCH-1331 for something (vaguely) related On 22 May...
|
|
|
Author: Julien Nioche,
2012-05-22, 11:03
|
|
|
Re: svn commit: r1341365 - /nutch/trunk/ivy/mvn.template - Nutch - [mail # dev]
|
|
...cut and paste :-) Ferdy wasn't there at all etc... Fixed! Thanks On 22 May 2012 10:33, Lewis John Mcgibbney wrote: * *Open Source Solutions for Text Engineering htt...
|
|
|
Author: Julien Nioche,
2012-05-22, 09:38
|
|
|
1.5 RC2 - Nutch - [mail # dev]
|
|
...Hi Lewis, I am sure that Chris will have no problem with you doing the RC2. Chris? It would be a good thing to have more than one person who knows how to do it anyway :-) Note that to ...
|
|
|
Author: Julien Nioche,
2012-05-22, 09:15
|
|
|
Re: Bug in Trunk Generator mapper? - Nutch - [mail # dev]
|
|
...Hi Lewis [Moved to dev@] We could normalise before filtering in the mapper indeed. Whether this is accidental or on purpose is not clear. PLease open a JIRA for this. On a differ...
|
|
|
Author: Julien Nioche,
2012-05-21, 19:32
|
|
|
Re: [VOTE] Apache Nutch 1.5 release rc #1 - Nutch - [mail # user]
|
|
...can't remember the name of the task right now but should be easy to find out by looking at the build.xml. You'll need to make sure that the maven tasks jars are in the lib dirr. Don't think ...
|
|
|
Author: Julien Nioche,
2012-05-19, 18:33
|
|
|
Re: Tika parser exception IndexOutOfBoundsException - Nutch - [mail # user]
|
|
...Try setting http.content.limit to a very large value or -1. The parser sometimes chokes on truncated content On 15 May 2012 15:17, LEVILLAIN Olivier wrote: * *Open S...
|
|
|
Author: Julien Nioche,
2012-05-15, 14:43
|
|
|
|