Search / Big Data / DevOps
  • About
  • project

    • Nutch (21)
    • Solr (30)

    author

    • ()
    • Markus Jelsma (2556)
    • Lewis John Mcgibbney (1784)
    • Andrzej Bialecki (1638)
    • Julien Nioche (1181)
    • Stefan Groschupf (819)
    • Sebastian Nagel (795)
    • Dennis Kubes (745)
    • Mattmann, Chris A (671)
    • Doug Cutting (667)
    • Doğacan Güney (448)
    • lewis john mcgibbney (410)
    • Jérôme Charron (398)
    • Sami Siren (397)
    • Tejas Patil (343)
    • Lewis John McGibbney (290)
    • ogjunk-nutch@... (269)
    • Piotr Kosiorowski (263)
    • Chris Mattmann (239)
    • Ken Krugler (238)
    • Ferdy Galema (229)
    • Gal Nitzan (225)
    • alxsss@... (220)
    • MilleBii (218)
    • Jack Tang (194)
    • Bai Shen (188)
    • Susam Pal (170)
    • kiran chitturi (167)
    • Otis Gospodnetic (166)
    • feng lu (165)
    • Byron Miller (160)
    • Alexander Aristov (159)
    • remi tassing (158)
    • Fuad Efendi (154)
    • Raghavendra Prabhu (146)
    • Talat Uyarer (145)
    • Jorge Luis Betancourt Gon... (130)
    • AJ Chen (117)
    • Michael Ji (114)
    • TDLN (112)
    • Sean Dean (111)
    • Howie Wang (110)
    • A Laxmi (105)
    • Richard Braman (103)
    • BELLINI ADAM (101)
    • BlackIce (100)
    • Marek Bachmann (99)
    • Stefan Neufeind (94)
    • Dawid Weiss (93)
    • reinhard schwab (93)
    • S.L (92)
    • Zaheed Haque (91)
    • kaveh minooie (90)
    • webdev1977 (88)
    • Arkadi.Kosmynin@... (87)
    • yoursoft@... (87)
    • Marko Bauhardt (85)
    • Joe Zhang (83)
    • Michael Wechner (83)
    • Briggs (82)
    • Vanderdray, Jacob (82)

    type

    • mail # user (21)
  • date

    • last 7 days (0)
    • last 30 days (1)
    • last 90 days (1)
    • last 6 months (1)
    • last 9 months (21)
clear query| facets| time Search criteria: .   Results from 1 to 10 from 21 (0.0s).
Loading phrases to help you
refine your search...
how do fetch wait times work? - Nutch - [mail # user]
...When I run bin/crawl once and it generates a segment list with a bunch offetch dates in the future, does nutch proactively run those fetches onthose future dates, or do I have to do somethin...
   Author: Fred Zimmerman , 2018-04-09, 19:14
  
OutOfMemoryError when indexing into Solr - Nutch - [mail # user]
...I'm having the exact same problem. I am trying to isolate whether it is a Solr problem or a Nutch+Solr problem.  On Wed, Oct 26, 2011 at 11:54 PM,  wrote:  > Hi, > > ...
   Author: Fred Zimmerman , 2011-10-27, 12:20
  
1) success 2) how to tell Nutch "index everything" - Nutch - [mail # user]
...1) I resolved the issues with solrindex. It turned out to be a matter of adding all the nutch schema-specific fields to solr's schema.xml.  there was one gotcha which is that the latest...
   Author: Fred Zimmerman , 2011-10-26, 14:37
  
[expand - 5 more] [collapse] - solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc. - Nutch - [mail # user]
...will do.  Of course I have already googled these terms without much luck.  Fred  On Wed, Oct 26, 2011 at 9:34 AM, lewis john mcgibbney  wrote:  > Hi Fred, > &g...
   Author: Fred Zimmerman , 2011-10-26, 13:38
  
solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc. - Nutch - [mail # user]
...I added just the  field ... I have already modified solr's schema.xml to accommodate some other data types.  Now when starting solr ...  INFO: SolrUpdateServlet.init() done 20...
   Author: Fred Zimmerman , 2011-10-26, 13:31
solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc. - Nutch - [mail # user]
...that's it.  org.apache.solr.common.SolrException: ERROR:unknown field 'content'  *ERROR:unknown field 'content'*  request: http://search.zimzaz.com:8983/solr/update?wt=javabin...
   Author: Fred Zimmerman , 2011-10-26, 13:07
solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc. - Nutch - [mail # user]
...OK, I've fixed the problem with the parameters giving incorrect paths to the files. Now I get this:  $ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb crawl/linkdb ...
   Author: Fred Zimmerman , 2011-10-26, 12:59
solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc. - Nutch - [mail # user]
...I'm still having trouble with this in 1.3. looks as if there's something dumb with syntax or file structure but can't get it.  $ bin/nutch solrindex http://search.zimzaz.com:8983/solr c...
   Author: Fred Zimmerman , 2011-10-25, 23:27
solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc. - Nutch - [mail # user]
...Hi -- I am having trouble with the solrindexer parameters -- I see that Lewis had similar problems a few months ago. Any idea what I am doing wrong?  bitnami@ip-10-202-202-68:~/nutch-1....
   Author: Fred Zimmerman , 2011-10-09, 00:22
[expand - 2 more] [collapse] - advice, config files for crawling private wikipedia mirror - Nutch - [mail # user]
...so let me make sure I understand.  what this guy did is that he made an XML file from his local backup of wikipedia but he didn't crawl it?maybe I don't need to crawl it, either, since ...
   Author: Fred Zimmerman , 2011-10-10, 14:41
  
advice, config files for crawling private wikipedia mirror - Nutch - [mail # user]
...OK, that sounds good.  Tell me about the indexing.  I came across an article where someone had indexed about 10% of a wikipedia clone  http://h3x.no/2011/05/10/guide-solr-perf...
   Author: Fred Zimmerman , 2011-10-10, 14:28
advice, config files for crawling private wikipedia mirror - Nutch - [mail # user]
...HI,  I am looking for advice on how to configure Nutch (and Solr) to crawl a private Wikipedia mirror.     - It is my mirror on an intranet so I do not need to be polite to my...
   Author: Fred Zimmerman , 2011-10-08, 17:29
[expand - 1 more] [collapse] - when and how to delete old crawls? - Nutch - [mail # user]
...I mean the directories like this:  crawl-20110920160208 crawl-20110920211805 etc ...     On Wed, Oct 5, 2011 at 11:08, Markus Jelsma wrote:  > "crawls" or segment dire...
   Author: Fred Zimmerman , 2011-10-05, 15:14
  
when and how to delete old crawls? - Nutch - [mail # user]
...hi,  I have a bunch of test crawls that I have carried out in the past sitting around.  most of them are indexed by solr configured per nutch-config to run again in 30 days.  ...
   Author: Fred Zimmerman , 2011-10-05, 14:57
[expand - 1 more] [collapse] - Interpreting Nutch results - Nutch - [mail # user]
...thanks for the tip about filtering  ----------------------------------------------------- Subscribe to the Nimble Books Mailing List  http://eepurl.com/czS- for monthly updates &nb...
   Author: Fred Zimmerman , 2011-09-30, 15:29
  
Interpreting Nutch results - Nutch - [mail # user]
...What does this mean? Why is db_unfetched so high?  I want to know how I can be confident that the crawler has fetched all the pages in the target site.  CrawlDb statistics start: c...
   Author: Fred Zimmerman , 2011-09-30, 13:23
Understanding Nutch workflow - Nutch - [mail # user]
...this is helpful -- can someone also explain whether there is mechanism to extract full text of pages from where they are stored in mapreduce?   On Tue, Sep 27, 2011 at 11:24, Bai Shen &...
   Author: Fred Zimmerman , 2011-09-27, 15:42
  
Can't retrieve Tika Parser for mime-type - Nutch - [mail # user]
...Basic question:  I have Nutch crawling and sending documents to Solr for indexing.  Now when I get the Solr answer set, I want to go get all the documents at once and append them i...
   Author: Fred Zimmerman , 2011-09-26, 17:25
  
[expand - 2 more] [collapse] - not writing anything to crawldb - Nutch - [mail # user]
...Ha! but out of curiosity, why is the average score so low out of 1.0? that seems pretty darned weak, whatever it is.   TOTAL urls:     1241 retry 0:        ...
   Author: Fred Zimmerman , 2011-09-22, 18:44
  
not writing anything to crawldb - Nutch - [mail # user]
...ok. i found that the crawl is writing crawldb to my home directory instead of crawldb, presumably because I ran from the wrong place, and presumably I will be able to index this in solr from...
   Author: Fred Zimmerman , 2011-09-22, 18:20
not writing anything to crawldb - Nutch - [mail # user]
...I had to delete the contents of the  crawldb folder to recover from a failed fetch (was this the best response? i doubt it).  now I have a fetch running, successfully, but i don't ...
   Author: Fred Zimmerman , 2011-09-22, 18:00
1 2 3 Next >
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext