I struggled with this as well. Eventually I moved to ElasticSearch, which is much easier.

What I did manage to find out, is that in newer versions of SOLR you need to use ZooKeeper to update the conf file. see https://stackoverflow.com/a/43351358.

-----Original Message-----
From: Pau Paches [mailto:[EMAIL PROTECTED]]
Sent: 11 July 2017 13:29
To: [EMAIL PROTECTED]
Subject: Re: nutch 1.x tutorial with solr 6.6.0

Hi,
I just crawl a single URL so no whole web crawling.
So I do option 2, fetching, invertlinks successfully. This is just Nutch 1.x Then I do Indexing into Apache Solr so go to section Setup Solr for search.
First thing that does not work:
cd ${APACHE_SOLR_HOME}/example
java -jar start.jar
No start.jar at the specified location, but no problem you start Solr
6.6.0 with bin/solr start.
Then the tutorial says:
Backup the original Solr example schema.xml:
mv ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml
${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml.org

But in current Solr, 6.6.0, there is no schema.xml file. In the whole distribution. What should I do here?
if I go directly to run the Solr Index command from ${NUTCH_RUNTIME_HOME}:
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/ which may not make sense since I have skipped some steps, it crashes:
The input path at segments is not a segment... skipping
Indexer: java.lang.RuntimeException: Missing elastic.cluster and elastic.host. At least one of them should be set in nutch-site.xml ElasticIndexWriter
elastic.cluster : elastic prefix cluster
elastic.host : hostname
elastic.port : port

Clearly there is some missing configuration in nutch-site.xml, apart from setting http.agent.name in nutch-site.xml (mentioned) other fields need to be set up. The segments message above is also troubling.

If you follow the steps (if they worked) should we run bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/ (this is the last step in Integrate Solr with Nutch) and then

bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20131108063838/ -filter -normalize -deleteGone (this is one of the steps of Using Individual Commands for Whole-Web Crawling, which in fact also is the section to read if you are only crawling a URL.

This is what I found by following the tutorial at https://wiki.apache.org/nutch/NutchTutorial

On 7/9/17, lewis john mcgibbney <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB