clear
query|
facets|
time |
Search criteria: .
Results from 1 to 10 from
21 (0.0s).
|
|
|
Loading phrases to help you refine your search...
|
how do fetch wait times work? -
Nutch - [mail # user]
|
...When I run bin/crawl once and it generates a segment list with a bunch offetch dates in the future, does nutch proactively run those fetches onthose future dates, or do I have to do somethin... |
|
|
|
|
OutOfMemoryError when indexing into Solr -
Nutch - [mail # user]
|
...I'm having the exact same problem. I am trying to isolate whether it is a Solr problem or a Nutch+Solr problem. On Wed, Oct 26, 2011 at 11:54 PM, wrote: > Hi, > > ... |
|
|
|
|
1) success 2) how to tell Nutch "index everything" -
Nutch - [mail # user]
|
...1) I resolved the issues with solrindex. It turned out to be a matter of adding all the nutch schema-specific fields to solr's schema.xml. there was one gotcha which is that the latest... |
|
|
|
|
[expand - 5 more]
[collapse]
-
solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc. -
Nutch - [mail # user]
|
...will do. Of course I have already googled these terms without much luck. Fred On Wed, Oct 26, 2011 at 9:34 AM, lewis john mcgibbney wrote: > Hi Fred, > &g... |
|
|
|
|
solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc. - Nutch - [mail # user]
|
...I added just the field ... I have already modified solr's schema.xml to accommodate some other data types. Now when starting solr ... INFO: SolrUpdateServlet.init() done 20... |
|
|
|
solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc. - Nutch - [mail # user]
|
...that's it. org.apache.solr.common.SolrException: ERROR:unknown field 'content' *ERROR:unknown field 'content'* request: http://search.zimzaz.com:8983/solr/update?wt=javabin... |
|
|
|
solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc. - Nutch - [mail # user]
|
...OK, I've fixed the problem with the parameters giving incorrect paths to the files. Now I get this: $ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb crawl/linkdb ... |
|
|
|
solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc. - Nutch - [mail # user]
|
...I'm still having trouble with this in 1.3. looks as if there's something dumb with syntax or file structure but can't get it. $ bin/nutch solrindex http://search.zimzaz.com:8983/solr c... |
|
|
|
solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc. - Nutch - [mail # user]
|
...Hi -- I am having trouble with the solrindexer parameters -- I see that Lewis had similar problems a few months ago. Any idea what I am doing wrong? bitnami@ip-10-202-202-68:~/nutch-1.... |
|
|
|
|
|
[expand - 2 more]
[collapse]
-
advice, config files for crawling private wikipedia mirror -
Nutch - [mail # user]
|
...so let me make sure I understand. what this guy did is that he made an XML file from his local backup of wikipedia but he didn't crawl it?maybe I don't need to crawl it, either, since ... |
|
|
|
|
[expand - 1 more]
[collapse]
-
when and how to delete old crawls? -
Nutch - [mail # user]
|
...I mean the directories like this: crawl-20110920160208 crawl-20110920211805 etc ... On Wed, Oct 5, 2011 at 11:08, Markus Jelsma wrote: > "crawls" or segment dire... |
|
|
|
|
when and how to delete old crawls? - Nutch - [mail # user]
|
...hi, I have a bunch of test crawls that I have carried out in the past sitting around. most of them are indexed by solr configured per nutch-config to run again in 30 days. ... |
|
|
|
|
|
[expand - 1 more]
[collapse]
-
Interpreting Nutch results -
Nutch - [mail # user]
|
...thanks for the tip about filtering ----------------------------------------------------- Subscribe to the Nimble Books Mailing List http://eepurl.com/czS- for monthly updates &nb... |
|
|
|
|
Interpreting Nutch results - Nutch - [mail # user]
|
...What does this mean? Why is db_unfetched so high? I want to know how I can be confident that the crawler has fetched all the pages in the target site. CrawlDb statistics start: c... |
|
|
|
|
|
Understanding Nutch workflow -
Nutch - [mail # user]
|
...this is helpful -- can someone also explain whether there is mechanism to extract full text of pages from where they are stored in mapreduce? On Tue, Sep 27, 2011 at 11:24, Bai Shen &... |
|
|
|
|
Can't retrieve Tika Parser for mime-type -
Nutch - [mail # user]
|
...Basic question: I have Nutch crawling and sending documents to Solr for indexing. Now when I get the Solr answer set, I want to go get all the documents at once and append them i... |
|
|
|
|
[expand - 2 more]
[collapse]
-
not writing anything to crawldb -
Nutch - [mail # user]
|
...Ha! but out of curiosity, why is the average score so low out of 1.0? that seems pretty darned weak, whatever it is. TOTAL urls: 1241 retry 0:  ... |
|
|
|
|
not writing anything to crawldb - Nutch - [mail # user]
|
...ok. i found that the crawl is writing crawldb to my home directory instead of crawldb, presumably because I ran from the wrong place, and presumably I will be able to index this in solr from... |
|
|
|
not writing anything to crawldb - Nutch - [mail # user]
|
...I had to delete the contents of the crawldb folder to recover from a failed fetch (was this the best response? i doubt it). now I have a fetch running, successfully, but i don't ... |
|
|
|
|
|
|