Search / Big Data / DevOps
  • About
  • project

    • Nutch (49365)
    • ElasticSearch (216813)
    • Solr (174825)
    • Mahout (49737)
    • Lucene (26372)
    • ManifoldCF (22981)
    • Tika (15748)
    • PyLucene (2772)
    • Lucene.Net (2465)
    • Lucy (1407)

    author

    • Markus Jelsma (2556)
    • Lewis John Mcgibbney (1784)
    • Andrzej Bialecki (1638)
    • Julien Nioche (1181)
    • Stefan Groschupf (819)
    • Sebastian Nagel (799)
    • Dennis Kubes (745)
    • Mattmann, Chris A (671)
    • Doug Cutting (667)
    • Doğacan Güney (448)
    • lewis john mcgibbney (410)
    • Jérôme Charron (398)
    • Sami Siren (397)
    • Tejas Patil (343)
    • Lewis John McGibbney (290)
    • ogjunk-nutch@... (269)
    • Piotr Kosiorowski (263)
    • Chris Mattmann (239)
    • Ken Krugler (238)
    • Ferdy Galema (229)
    • Gal Nitzan (225)
    • alxsss@... (220)
    • MilleBii (218)
    • Jack Tang (194)
    • Bai Shen (188)
    • Susam Pal (170)
    • kiran chitturi (167)
    • Otis Gospodnetic (166)
    • feng lu (165)
    • Byron Miller (160)
    • Alexander Aristov (159)
    • remi tassing (158)
    • Fuad Efendi (154)
    • Raghavendra Prabhu (146)
    • Talat Uyarer (145)
    • Jorge Luis Betancourt Gon... (130)
    • AJ Chen (117)
    • Michael Ji (114)
    • TDLN (112)
    • Sean Dean (111)
    • Howie Wang (110)
    • A Laxmi (105)
    • Richard Braman (103)
    • BELLINI ADAM (101)
    • BlackIce (100)
    • Marek Bachmann (99)
    • Stefan Neufeind (94)
    • Dawid Weiss (93)
    • reinhard schwab (93)
    • S.L (92)
    • Zaheed Haque (91)
    • kaveh minooie (90)
    • webdev1977 (88)
    • Arkadi.Kosmynin@... (87)
    • yoursoft@... (87)
    • Marko Bauhardt (85)
    • Joe Zhang (83)
    • Michael Wechner (83)
    • Briggs (82)
    • Vanderdray, Jacob (82)

    type

    • mail # user (33599)
    • mail # dev (9759)
    • javadoc (2854)
    • issue (2561)
    • source code (900)
    • wiki (61)
    • web site (7)
  • date

    • last 7 days (25)
    • last 30 days (79)
    • last 90 days (322)
    • last 6 months (698)
    • last 9 months (20087)
clear query| facets| time Search criteria: .   Results from 1 to 10 from 49365 (0.0s).
Loading phrases to help you
refine your search...
[NUTCH-2571] SegmentReader -list fails to read segment - Nutch - [issue]
...The -list command of SegmentReader fails to read data from segments:% bin/nutch readseg -list crawl/segments/20180409100315/ Exception in thread "main" java.io.IOException: wrong value class...
http://issues.apache.org/jira/browse/NUTCH-2571    Author: Sebastian Nagel , 2018-04-23, 12:14
  
[NUTCH-2375] Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce - Nutch - [issue]
...Nutch is still using the deprecated org.apache.hadoop.mapred dependency which has been deprecated. It need to be updated to org.apache.hadoop.mapreduce dependency....
http://issues.apache.org/jira/browse/NUTCH-2375    Author: Omkar Reddy , 2018-04-23, 11:56
  
[NUTCH-2572] HostDb: updatehostdb does not set values - Nutch - [issue]
...% bin/nutch readdb crawl/crawldb -stats -sort...status 1 (db_unfetched):        3   nutch.apache.org :   3status 2 (db_fetched):  2   nutch....
http://issues.apache.org/jira/browse/NUTCH-2572    Author: Sebastian Nagel , 2018-04-23, 11:56
  
[NUTCH-2570] Deduplication job fails to install deduplicated CrawlDb - Nutch - [issue]
...The DeduplicationJob ("nutch dedup") fails to install the deduplicated CrawlDb and leaves only the "old" crawldb (if "db.preserve.backup" is true):% tree crawldbcrawldb├── current│   └── par...
http://issues.apache.org/jira/browse/NUTCH-2570    Author: Sebastian Nagel , 2018-04-23, 11:26
  
[NUTCH-2544] Nutch 1.15 no longer compatible with AWS EMR and S3 - Nutch - [issue]
...Nutch 1.14 is working OK with AWS EMR and S3 storage, but NUTCH-2375 appears to have broken this.Generator partitioning fails with Error: java.lang.NullPointerException at org.apache.nutch.c...
http://issues.apache.org/jira/browse/NUTCH-2544    Author: Steven W , 2018-04-23, 11:26
  
[NUTCH-2526] NPE in scoring-opic when indexing document without CrawlDb datum - Nutch - [issue]
...I was trying to write a parse filter plugin whose work was to parse internal links as a separate document.what I did basically is,breaking the page into multiple parseResults each parseResul...
http://issues.apache.org/jira/browse/NUTCH-2526    Author: Yash Thenuan , 2018-04-23, 09:53
  
[NUTCH-2456] Allow to index pages/URLs not contained in CrawlDb - Nutch - [issue]
...If http.redirect.max is set to a positive value, the Fetcher will follow redirects, creating a new CrawlDatum.If the redirected URL is fetched and parsed, during indexing for it we have a sp...
http://issues.apache.org/jira/browse/NUTCH-2456    Author: Yossi Tamari , 2018-04-23, 08:29
  
[NUTCH-2569] ClassNotFoundException when running in (pseudo-)distributed mode - Nutch - [issue]
...The CrawlDb / updatedb job fails in pseudo-distributed mode with a ClassNotFoundException:18/04/22 19:24:49 INFO mapreduce.Job: Task Id : attempt_1524395182329_0018_m_000000_0, Status : FAIL...
http://issues.apache.org/jira/browse/NUTCH-2569    Author: Sebastian Nagel , 2018-04-22, 19:49
  
[NUTCH-2517] mergesegs corrupts segment data - Nutch - [issue]
...The problem probably occurs since commit https://github.com/apache/nutch/commit/54510e503f7da7301a59f5f0e5bf4509b37d35b4How to reproduce: create container from apache/nutch image (latest) op...
http://issues.apache.org/jira/browse/NUTCH-2517    Author: Marco Ebbinghaus , 2018-04-22, 19:18
  
[NUTCH-1228] Change mapred.task.timeout to mapreduce.task.timeout in fetcher - Nutch - [issue]
http://issues.apache.org/jira/browse/NUTCH-1228    Author: Markus Jelsma , 2018-04-21, 17:14
  
1 2 3 4 5 Next >
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext