When I run bin/crawl once and it generates a segment list with a bunch of fetch dates in the future, does nutch proactively run those fetches on those future dates, or do I have to do something to make that happen?
Nutch does nothing "proactively", the crawl jobs must be explicitly called. But you need no special command: - let's say the you didn't change the defaults and db.fetch.interval.default == 30 days - if you launch bin/crawl one month later, all pages are refetched, and optionally reindexed (404s removed) - just to clarify: new segments will be created, old segments can be removed, except you need same to recover eg. if the index is lost
On 04/09/2018 09:13 PM, Fred Zimmerman wrote:
> When I run bin/crawl once and it generates a segment list with a bunch of > fetch dates in the future, does nutch proactively run those fetches on > those future dates, or do I have to do something to make that happen? >
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext