Hi Pau,

I have not used the solrindex command, but from the "input path" error message, it sounds like it wants the actual segment directory under segments/.

The nutch crawl script uses the following commands:
* inject
* generate
* fetch
* parse
* updatedb
* invertlinks
* dedup
* index
* clean

E.g., this is the nutch index command in my environment:
bin/nutch index -Dsolr.server.url=http://localhost:8983/solr/name_of_my_core my_crawl_name/crawldb -linkdb my_crawl_name/linkdb my_crawl_name/segments/20170710131518


-----Original Message-----
From: Pau Paches [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, July 11, 2017 2:50 PM
Subject: [EXTERNAL] Re: nutch 1.x tutorial with solr 6.6.0

Hi Rashmi,
I have followed your suggestions.
Now I'm seeing a different error.
bin/nutch solrindex crawl/crawld -linkdb crawl/linkdb crawl/segments The input path at segments is not a segment... skipping
Indexer: starting at 2017-07-11 20:45:56
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
solr.server.url : URL of the SOLR instance
solr.zookeeper.hosts : URL of the Zookeeper quorum
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default
solr.auth : use authentication (default false)
solr.auth.username : username for authentication
solr.auth.password : password for authentication
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)

Still I see the disturbing warning
The input path at segments is not a segment... skipping.

And it crashes.
If it had not crash the tutorial would ask me to execute bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20131108063838/ -filter -normalize -deleteGone which seems redundant with the solrindex command.

I think this is the way to go, but still something is missing.


On 7/11/17, Srinivasa, Rashmi <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB