Lewis John Mcgibbney
Mattmann, Chris A
lewis john mcgibbney
Lewis John McGibbney
Jorge Luis Betancourt Gon...
mail # user
mail # dev
last 7 days (27)
last 30 days (128)
last 90 days (221)
last 6 months (520)
last 9 months (19683)
newest on top
oldest on top
. Results from
Loading phrases to help you
refine your search...
[NUTCH-2576] HTTP protocol plugin based on okhttp
...Okhttp is an Apache2-licensed http library which supports HTTP/2. Julien Nioche's implementation storm-crawler#443 proves that it should be straightforward to implement a Nutch protocol plug...
, 2018-06-21, 12:16
[NUTCH-2608] Reduce size of Nutch job file and package
...The Nutch 1.15 binary package and the Nutch job file will reach or even exceed 300 MB. A huge job file isn't ideal as it needs to be distributed in the Hadoop cluster. There are several reas...
, 2018-06-21, 09:56
[NUTCH-2607] ParserChecker should call ScoringFilters.passScoreAfterParsing() on all parses
...A ParseResult may contain multiple parses, e.g., the feed parser adds one for every item in the RSS/Atom feed. The tool ParseSegment calls the method ScoringFilters.passScoreAfterParsing() f...
, 2018-06-21, 09:28
[NUTCH-2606] MIME detection is wrong for plain-text documents send as Content-Type "application/msword"
...Plain-text documents send as Content-Type "application/msword" are tried to parse as Word documents. The MIME detection should be fixed, so that these are correctly identified as plain-text ...
, 2018-06-20, 16:38
[NUTCH-2603] Bring back legacy pre-Tika parsers and use them as back up parsers
...There are cases when legacy parsers successfully parse documents on which Tika fails. I am attaching a list of examples of such documents. Nutch allows use of more than one parser on a docum...
, 2018-06-20, 15:06
[NUTCH-2605] The Feed plugin causes a NumberFormatException
...The Feed plugin seems to have a major problem. The line 102 in FeedIndexingFilter.java generated a NumberFormatException (which caused the failure of the entire crawling process!) because i...
, 2018-06-20, 09:21
[NUTCH-2604] The lines defining catch-all (*) parser in parse-plugins.xml are ignored
...The lines defining catch-all plugin in parse-plugins.xml are not effective, because they are ignored, as long as there is at least one plugin claiming * in its plugin.xml file. In some...
, 2018-06-20, 06:32
[NUTCH-2498] Docker files are outdated
...Docker file for hbase is outdated. It uses java 7 but, nutch requires java 8.Casandra docker file refers to meabed/debian-jdk, which is also based on java7....
, 2018-06-20, 06:19
[NUTCH-2594] Documentation for indexer plugins
...A NutchTutorial wiki page for indexer plugins, to document the structure proposed in GitHub Pull Request #218...
Roannel Fernández Hernánd...
, 2018-06-19, 14:46
[NUTCH-2578] Avoid lock by MimeUtil in constructor of protocol.Content
...The constructor of the class o.a.n.protocol.Content instantiates a new MimeUtil object. That's not cheap as it always creates a new Tika object and there is a lock on the job/jar file when c...
, 2018-06-18, 16:50
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by