|
Fred Zimmerman
2011-10-09, 00:22
lewis john mcgibbney
2011-10-09, 10:15
Bai Shen
2011-10-25, 16:41
Markus Jelsma
2011-10-25, 16:49
Fred Zimmerman
2011-10-25, 23:27
lewis john mcgibbney
2011-10-26, 06:34
Markus Jelsma
2011-10-26, 07:34
Fred Zimmerman
2011-10-26, 12:59
Markus Jelsma
2011-10-26, 13:03
Fred Zimmerman
2011-10-26, 13:07
Markus Jelsma
2011-10-26, 13:16
Fred Zimmerman
2011-10-26, 13:31
lewis john mcgibbney
2011-10-26, 13:34
Fred Zimmerman
2011-10-26, 13:38
|
-
solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.Fred Zimmerman 2011-10-09, 00:22
Hi -- I am having trouble with the solrindexer parameters -- I see that
Lewis had similar problems a few months ago. Any idea what I am doing wrong? bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ bin/nutch > solrindex http://zimzazsearch3-1.bitnamiapp.com:8983/solr/ crawl/crawldb > crawl/linkdb crawl/segments/* > SolrIndexer: starting at 2011-10-09 00:13:24 > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_fetch > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_parse > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_data > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_text > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_fetch > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_parse > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_data > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_text > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/crawl_parse > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_data > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_text > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_fetch > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_parse > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_data > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_text ----------------------------------------------------- Subscribe to the Nimble Books Mailing List http://eepurl.com/czS- for monthly updates On Sat, Oct 8, 2011 at 14:22, lewis john mcgibbney < [EMAIL PROTECTED]> wrote: > Hi guys, > > I have been watching this thread intently and I am very happy to see that > there is some progress :0) > > Radim, > > Can I ask that you open a JIRA issue and submit a patch, this way we can > not > only track it, but it will also give the community a chance to test and > validate the patch prior to integration into the source. > > Thanks > > Lewis > > On Fri, Oct 7, 2011 at 5:49 PM, Ramanathapuram, Rajesh < > [EMAIL PROTECTED]> wrote: > > > Hi Radim, > > > > Thank you so much for this. I am not familiar with commit process to the > > core. > > Is there someone who can help us get this committed and help resolve > this > > issue? > > > > Thanks for all your help. > > > > Rajesh Ramana > > > > -----Original Message----- > > From: Radim Kolar [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, October 06, 2011 2:18 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Nutch not crawling URLs with spanish accented characters ( > ñ) > > > > - The REGEX normalizer transforms the special characters, but fails to > > substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’ > > - The fetcher is having trouble interpreting the links with special > > character ‘ñ’. > > > > i can add this transformation to basic-url normalizer if somebody is > > willing to commit it. > > > > > > -- > *Lewis* > +
Fred Zimmerman 2011-10-09, 00:22
-
Re: solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.lewis john mcgibbney 2011-10-09, 10:15
Hi Fred,
How many individual directories do you have under /runtime/local/crawl/segments/ ? Another thing that raises alarms is the nohup.out dir's! Are these intentional? Interestingly, missing segment data is not the same with these dir's. Does your log output indicate any discrepancies between various command transitions? bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ bin/nutch >> solrindex http://zimzazsearch3-1.bitnamiapp.com:8983/solr/ crawl/crawldb >> crawl/linkdb crawl/segments/* >> SolrIndexer: starting at 2011-10-09 00:13:24 >> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_fetch >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_parse >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_data >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_text >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_fetch >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_parse >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_data >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_text >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/crawl_parse >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_data >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_text >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_fetch >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_parse >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_data >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_text > > > > ----------------------------------------------------- > Subscribe to the Nimble Books Mailing List http://eepurl.com/czS- for > monthly updates > > > > On Sat, Oct 8, 2011 at 14:22, lewis john mcgibbney < > [EMAIL PROTECTED]> wrote: > >> Hi guys, >> >> I have been watching this thread intently and I am very happy to see that >> there is some progress :0) >> >> Radim, >> >> Can I ask that you open a JIRA issue and submit a patch, this way we can >> not >> only track it, but it will also give the community a chance to test and >> validate the patch prior to integration into the source. >> >> Thanks >> >> Lewis >> >> On Fri, Oct 7, 2011 at 5:49 PM, Ramanathapuram, Rajesh < >> [EMAIL PROTECTED]> wrote: >> >> > Hi Radim, >> > >> > Thank you so much for this. I am not familiar with commit process to >> the >> > core. >> > Is there someone who can help us get this committed and help resolve >> this >> > issue? >> > >> > Thanks for all your help. >> > >> > Rajesh Ramana >> > >> > -----Original Message----- >> > From: Radim Kolar [mailto:[EMAIL PROTECTED]] >> > Sent: Thursday, October 06, 2011 2:18 PM >> > To: [EMAIL PROTECTED] >> > Subject: Re: Nutch not crawling URLs with spanish accented characters ( >> ñ) >> > >> > - The REGEX normalizer transforms the special characters, but fails to >> > substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’ >> > - The fetcher is having trouble interpreting the links with special >> > character ‘ñ’. >> > >> > i can add this transformation to basic-url normalizer if somebody is *Lewis* +
lewis john mcgibbney 2011-10-09, 10:15
-
Re: solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.Bai Shen 2011-10-25, 16:41
I'm having a similar issue. I'm using 1.4 and getting these errors with
linkdb. The segments seem fine. 2011-10-25 10:10:20,060 INFO solr.SolrIndexer - SolrIndexer: starting at 2011-10-25 10:10:20 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/linkdb 2011-10-25 10:10:20,136 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20111025095216 2011-10-25 10:10:20,138 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20111025100004 2011-10-25 10:10:20,207 ERROR solr.SolrIndexer - org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch Input path does not exist: file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_parse Input path does not exist: file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_data Input path does not exist: file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_text Did something change with 1.4? On Sun, Oct 9, 2011 at 6:15 AM, lewis john mcgibbney < [EMAIL PROTECTED]> wrote: > Hi Fred, > > How many individual directories do you have under > /runtime/local/crawl/segments/ > ? > > Another thing that raises alarms is the nohup.out dir's! Are these > intentional? Interestingly, missing segment data is not the same with these > dir's. > > Does your log output indicate any discrepancies between various command > transitions? > > > > bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ bin/nutch > >> solrindex http://zimzazsearch3-1.bitnamiapp.com:8983/solr/crawl/crawldb > >> crawl/linkdb crawl/segments/* > >> SolrIndexer: starting at 2011-10-09 00:13:24 > >> org.apache.hadoop.mapred.InvalidInputException: Input path does not > exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_fetch > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_parse > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_data > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_text > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_fetch > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_parse > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_data > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_text > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/crawl_parse > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_data > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_text > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_fetch > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_parse > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_data > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_text > > > > > > > > ----------------------------------------------------- > > Subscribe to the Nimble Books Mailing List http://eepurl.com/czS- for > > monthly updates +
Bai Shen 2011-10-25, 16:41
-
Re: solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.Markus Jelsma 2011-10-25, 16:49
From the changelog:
http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?view=markup 111 * NUTCH-1054 LinkDB optional during indexing (jnioche) With your command, the given linkdb is interpreted as a segment. https://issues.apache.org/jira/browse/NUTCH-1054 This is the new command: Usage: SolrIndexer <solr url> <crawldb> [-linkdb <linkdb>] (<segment> ... | - dir <segments>) [-noCommit On Tuesday 25 October 2011 18:41:09 Bai Shen wrote: > I'm having a similar issue. I'm using 1.4 and getting these errors with > linkdb. The segments seem fine. > > 2011-10-25 10:10:20,060 INFO solr.SolrIndexer - SolrIndexer: starting at > 2011-10-25 10:10:20 > 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - IndexerMapReduce: > crawldb: crawl/crawldb > 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - IndexerMapReduces: > adding segment: crawl/linkdb > 2011-10-25 10:10:20,136 INFO indexer.IndexerMapReduce - IndexerMapReduces: > adding segment: crawl/segments/20111025095216 > 2011-10-25 10:10:20,138 INFO indexer.IndexerMapReduce - IndexerMapReduces: > adding segment: crawl/segments/20111025100004 > 2011-10-25 10:10:20,207 ERROR solr.SolrIndexer - > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch > Input path does not exist: > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_parse > Input path does not exist: > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_data > Input path does not exist: > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_text > > > Did something change with 1.4? > > On Sun, Oct 9, 2011 at 6:15 AM, lewis john mcgibbney < > > [EMAIL PROTECTED]> wrote: > > Hi Fred, > > > > How many individual directories do you have under > > /runtime/local/crawl/segments/ > > ? > > > > Another thing that raises alarms is the nohup.out dir's! Are these > > intentional? Interestingly, missing segment data is not the same with > > these dir's. > > > > Does your log output indicate any discrepancies between various command > > transitions? > > > > > > > > bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ bin/nutch > > > > >> solrindex > > >> http://zimzazsearch3-1.bitnamiapp.com:8983/solr/crawl/crawldb > > >> crawl/linkdb crawl/segments/* > > >> SolrIndexer: starting at 2011-10-09 00:13:24 > > >> org.apache.hadoop.mapred.InvalidInputException: Input path does not > > > > exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922143907/crawl_fetch > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922143907/crawl_parse > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922143907/parse_data > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922143907/parse_text > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922144329/crawl_fetch > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922144329/crawl_parse > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922144329/parse_data > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922144329/parse_text > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111 > > 008015309/crawl_parse > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111 > > 008015309/parse_data > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111 > > 008015309/parse_text > > > > Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 +
Markus Jelsma 2011-10-25, 16:49
-
Re: solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.Fred Zimmerman 2011-10-25, 23:27
I'm still having trouble with this in 1.3. looks as if there's something
dumb with syntax or file structure but can't get it. $ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb -linkdb crawl/linkdb crawl/segments/* SolrIndexer: starting at 2011-10-25 23:26:02 org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_fetch Input path does not exist: file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_parse Input path does not exist: file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_data Input path does not exist: file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_text Input path does not exist: file:/home/bitnami/nutch-1.3/runtime/local/-linkdb/current On Tue, Oct 25, 2011 at 12:49 PM, Markus Jelsma <[EMAIL PROTECTED]>wrote: > From the changelog: > http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?view=markup > > 111 * NUTCH-1054 LinkDB optional during indexing (jnioche) > > With your command, the given linkdb is interpreted as a segment. > > https://issues.apache.org/jira/browse/NUTCH-1054 > > This is the new command: > > Usage: SolrIndexer <solr url> <crawldb> [-linkdb <linkdb>] (<segment> ... | > - > dir <segments>) [-noCommit > > On Tuesday 25 October 2011 18:41:09 Bai Shen wrote: > > I'm having a similar issue. I'm using 1.4 and getting these errors with > > linkdb. The segments seem fine. > > > > 2011-10-25 10:10:20,060 INFO solr.SolrIndexer - SolrIndexer: starting at > > 2011-10-25 10:10:20 > > 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - > IndexerMapReduce: > > crawldb: crawl/crawldb > > 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - > IndexerMapReduces: > > adding segment: crawl/linkdb > > 2011-10-25 10:10:20,136 INFO indexer.IndexerMapReduce - > IndexerMapReduces: > > adding segment: crawl/segments/20111025095216 > > 2011-10-25 10:10:20,138 INFO indexer.IndexerMapReduce - > IndexerMapReduces: > > adding segment: crawl/segments/20111025100004 > > 2011-10-25 10:10:20,207 ERROR solr.SolrIndexer - > > org.apache.hadoop.mapred.InvalidInputException: Input path does not > exist: > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch > > Input path does not exist: > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_parse > > Input path does not exist: > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_data > > Input path does not exist: > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_text > > > > > > Did something change with 1.4? > > > > On Sun, Oct 9, 2011 at 6:15 AM, lewis john mcgibbney < > > > > [EMAIL PROTECTED]> wrote: > > > Hi Fred, > > > > > > How many individual directories do you have under > > > /runtime/local/crawl/segments/ > > > ? > > > > > > Another thing that raises alarms is the nohup.out dir's! Are these > > > intentional? Interestingly, missing segment data is not the same with > > > these dir's. > > > > > > Does your log output indicate any discrepancies between various command > > > transitions? > > > > > > > > > > > > bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ > bin/nutch > > > > > > >> solrindex > > > >> http://zimzazsearch3-1.bitnamiapp.com:8983/solr/crawl/crawldb > > > >> crawl/linkdb crawl/segments/* > > > >> SolrIndexer: starting at 2011-10-09 00:13:24 > > > >> org.apache.hadoop.mapred.InvalidInputException: Input path does not > > > > > > exist: > > > > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > > 922143907/crawl_fetch > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > > 922143907/crawl_parse > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > > 922143907/parse_data > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 +
Fred Zimmerman 2011-10-25, 23:27
-
Re: solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.lewis john mcgibbney 2011-10-26, 06:34
Hi Fred,
Please ensure that the linkdb command was executed succesfully. The output logs do not indicate this. Looks like you've got a '-' minus character in from of the relative linkdb directory as well. HTH On Wed, Oct 26, 2011 at 1:27 AM, Fred Zimmerman <[EMAIL PROTECTED]>wrote: > I'm still having trouble with this in 1.3. looks as if there's something > dumb with syntax or file structure but can't get it. > > $ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb > -linkdb crawl/linkdb crawl/segments/* > > SolrIndexer: starting at 2011-10-25 23:26:02 > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_fetch > Input path does not exist: > file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_parse > Input path does not exist: > file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_data > Input path does not exist: > file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_text > Input path does not exist: > file:/home/bitnami/nutch-1.3/runtime/local/-linkdb/current > > > On Tue, Oct 25, 2011 at 12:49 PM, Markus Jelsma > <[EMAIL PROTECTED]>wrote: > > > From the changelog: > > http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?view=markup > > > > 111 * NUTCH-1054 LinkDB optional during indexing (jnioche) > > > > With your command, the given linkdb is interpreted as a segment. > > > > https://issues.apache.org/jira/browse/NUTCH-1054 > > > > This is the new command: > > > > Usage: SolrIndexer <solr url> <crawldb> [-linkdb <linkdb>] (<segment> ... > | > > - > > dir <segments>) [-noCommit > > > > On Tuesday 25 October 2011 18:41:09 Bai Shen wrote: > > > I'm having a similar issue. I'm using 1.4 and getting these errors > with > > > linkdb. The segments seem fine. > > > > > > 2011-10-25 10:10:20,060 INFO solr.SolrIndexer - SolrIndexer: starting > at > > > 2011-10-25 10:10:20 > > > 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - > > IndexerMapReduce: > > > crawldb: crawl/crawldb > > > 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - > > IndexerMapReduces: > > > adding segment: crawl/linkdb > > > 2011-10-25 10:10:20,136 INFO indexer.IndexerMapReduce - > > IndexerMapReduces: > > > adding segment: crawl/segments/20111025095216 > > > 2011-10-25 10:10:20,138 INFO indexer.IndexerMapReduce - > > IndexerMapReduces: > > > adding segment: crawl/segments/20111025100004 > > > 2011-10-25 10:10:20,207 ERROR solr.SolrIndexer - > > > org.apache.hadoop.mapred.InvalidInputException: Input path does not > > exist: > > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch > > > Input path does not exist: > > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_parse > > > Input path does not exist: > > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_data > > > Input path does not exist: > > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_text > > > > > > > > > Did something change with 1.4? > > > > > > On Sun, Oct 9, 2011 at 6:15 AM, lewis john mcgibbney < > > > > > > [EMAIL PROTECTED]> wrote: > > > > Hi Fred, > > > > > > > > How many individual directories do you have under > > > > /runtime/local/crawl/segments/ > > > > ? > > > > > > > > Another thing that raises alarms is the nohup.out dir's! Are these > > > > intentional? Interestingly, missing segment data is not the same with > > > > these dir's. > > > > > > > > Does your log output indicate any discrepancies between various > command > > > > transitions? > > > > > > > > > > > > > > > > bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ > > bin/nutch > > > > > > > > >> solrindex > > > > >> http://zimzazsearch3-1.bitnamiapp.com:8983/solr/crawl/crawldb > > > > >> crawl/linkdb crawl/segments/* > > > > >> SolrIndexer: starting at 2011-10-09 00:13:24 > > > > >> org.apache.hadoop.mapred.InvalidInputException: Input path does > not > > > > > > > > exist: > > > > > > > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 *Lewis* +
lewis john mcgibbney 2011-10-26, 06:34
-
Re: solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.Markus Jelsma 2011-10-26, 07:34
Besises, the -linkdb param is 1.4 not 1.3
that's what's wrong here. Bai explicitely mentioned 1.4 > Hi Fred, > > Please ensure that the linkdb command was executed succesfully. The output > logs do not indicate this. > Looks like you've got a '-' minus character in from of the relative linkdb > directory as well. > > HTH > > On Wed, Oct 26, 2011 at 1:27 AM, Fred Zimmerman <[EMAIL PROTECTED]>wrote: > > I'm still having trouble with this in 1.3. looks as if there's something > > dumb with syntax or file structure but can't get it. > > > > $ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb > > -linkdb crawl/linkdb crawl/segments/* > > > > SolrIndexer: starting at 2011-10-25 23:26:02 > > org.apache.hadoop.mapred.InvalidInputException: Input path does not > > exist: > > file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_fetch > > Input path does not exist: > > file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_parse > > Input path does not exist: > > file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_data > > Input path does not exist: > > file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_text > > Input path does not exist: > > file:/home/bitnami/nutch-1.3/runtime/local/-linkdb/current > > > > > > On Tue, Oct 25, 2011 at 12:49 PM, Markus Jelsma > > > > <[EMAIL PROTECTED]>wrote: > > > From the changelog: > > > http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?view=markup > > > > > > 111 * NUTCH-1054 LinkDB optional during indexing (jnioche) > > > > > > With your command, the given linkdb is interpreted as a segment. > > > > > > https://issues.apache.org/jira/browse/NUTCH-1054 > > > > > > This is the new command: > > > > > > Usage: SolrIndexer <solr url> <crawldb> [-linkdb <linkdb>] (<segment> > > > ... > > > > > > - > > > dir <segments>) [-noCommit > > > > > > On Tuesday 25 October 2011 18:41:09 Bai Shen wrote: > > > > I'm having a similar issue. I'm using 1.4 and getting these errors > > > > with > > > > > > linkdb. The segments seem fine. > > > > > > > > 2011-10-25 10:10:20,060 INFO solr.SolrIndexer - SolrIndexer: > > > > starting > > > > at > > > > > > 2011-10-25 10:10:20 > > > > 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - > > > > > > IndexerMapReduce: > > > > crawldb: crawl/crawldb > > > > 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - > > > > > > IndexerMapReduces: > > > > adding segment: crawl/linkdb > > > > 2011-10-25 10:10:20,136 INFO indexer.IndexerMapReduce - > > > > > > IndexerMapReduces: > > > > adding segment: crawl/segments/20111025095216 > > > > 2011-10-25 10:10:20,138 INFO indexer.IndexerMapReduce - > > > > > > IndexerMapReduces: > > > > adding segment: crawl/segments/20111025100004 > > > > 2011-10-25 10:10:20,207 ERROR solr.SolrIndexer - > > > > org.apache.hadoop.mapred.InvalidInputException: Input path does not > > > > > > exist: > > > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch > > > > Input path does not exist: > > > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_parse > > > > Input path does not exist: > > > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_data > > > > Input path does not exist: > > > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_text > > > > > > > > > > > > Did something change with 1.4? > > > > > > > > On Sun, Oct 9, 2011 at 6:15 AM, lewis john mcgibbney < > > > > > > > > [EMAIL PROTECTED]> wrote: > > > > > Hi Fred, > > > > > > > > > > How many individual directories do you have under > > > > > /runtime/local/crawl/segments/ > > > > > ? > > > > > > > > > > Another thing that raises alarms is the nohup.out dir's! Are these > > > > > intentional? Interestingly, missing segment data is not the same > > > > > with these dir's. > > > > > > > > > > Does your log output indicate any discrepancies between various > > > > command > > > > > > > transitions? > > > > > > > > > > > > > > > > > > > > bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ +
Markus Jelsma 2011-10-26, 07:34
-
Re: solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.Fred Zimmerman 2011-10-26, 12:59
OK, I've fixed the problem with the parameters giving incorrect paths to the
files. Now I get this: $ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb crawl/linkdb crawl/segments/* SolrIndexer: starting at 2011-10-26 12:57:57 java.io.IOException: Job failed! +
Fred Zimmerman 2011-10-26, 12:59
-
Re: solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.Markus Jelsma 2011-10-26, 13:03
Check your hadoop.log and Solr log. If that happens there's usually i field
mismatch when indexing. On Wednesday 26 October 2011 14:59:02 Fred Zimmerman wrote: > OK, I've fixed the problem with the parameters giving incorrect paths to > the files. Now I get this: > > $ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb > crawl/linkdb crawl/segments/* > SolrIndexer: starting at 2011-10-26 12:57:57 > java.io.IOException: Job failed! -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 +
Markus Jelsma 2011-10-26, 13:03
-
Re: solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.Fred Zimmerman 2011-10-26, 13:07
that's it.
org.apache.solr.common.SolrException: ERROR:unknown field 'content' *ERROR:unknown field 'content'* request: http://search.zimzaz.com:8983/solr/update?wt=javabin&version=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:436) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:82) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) 2011-10-26 12:58:20,596 ERROR solr.SolrIndexer - java.io.IOException: Job failed! On Wed, Oct 26, 2011 at 9:03 AM, Markus Jelsma <[EMAIL PROTECTED]>wrote: > Check your hadoop.log and Solr log. If that happens there's usually i field > mismatch when indexing. > > On Wednesday 26 October 2011 14:59:02 Fred Zimmerman wrote: > > OK, I've fixed the problem with the parameters giving incorrect paths to > > the files. Now I get this: > > > > $ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb > > crawl/linkdb crawl/segments/* > > SolrIndexer: starting at 2011-10-26 12:57:57 > > java.io.IOException: Job failed! > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 > +
Fred Zimmerman 2011-10-26, 13:07
-
Re: solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.Markus Jelsma 2011-10-26, 13:16
Add the schema.xml from nutch/conf to your Solr core.
btw: be careful with your host and port in the mailing lists. If it's open.... On Wednesday 26 October 2011 15:07:56 Fred Zimmerman wrote: > that's it. > > org.apache.solr.common.SolrException: ERROR:unknown field 'content' > > *ERROR:unknown field 'content'* > > request: http://url/solr/update?wt=javabin&version=2 > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp > SolrServer.java:436) at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp > SolrServer.java:245) at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstract > UpdateRequest.java:105) at > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:82) > at > org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.ja > va:48) at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > 2011-10-26 12:58:20,596 ERROR solr.SolrIndexer - java.io.IOException: Job > failed! > > > On Wed, Oct 26, 2011 at 9:03 AM, Markus Jelsma > > <[EMAIL PROTECTED]>wrote: > > Check your hadoop.log and Solr log. If that happens there's usually i > > field mismatch when indexing. > > > > On Wednesday 26 October 2011 14:59:02 Fred Zimmerman wrote: > > > OK, I've fixed the problem with the parameters giving incorrect paths > > > to the files. Now I get this: > > > > > > $ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb > > > crawl/linkdb crawl/segments/* > > > SolrIndexer: starting at 2011-10-26 12:57:57 > > > java.io.IOException: Job failed! > > > > -- > > Markus Jelsma - CTO - Openindex > > http://www.linkedin.com/in/markus17 > > 050-8536620 / 06-50258350 -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 +
Markus Jelsma 2011-10-26, 13:16
-
Re: solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.Fred Zimmerman 2011-10-26, 13:31
I added just the <content> field ... I have already modified solr's
schema.xml to accommodate some other data types. Now when starting solr ... INFO: SolrUpdateServlet.init() done 2011-10-26 13:29:50.849:INFO::Started SocketConnector@0.0.0.0:8983 2011-10-26 13:30:23.129:WARN::/solr/admin/ java.lang.IllegalStateException: STREAM at org.mortbay.jetty.Response.getWriter(Response.java:616) etc ... On Wed, Oct 26, 2011 at 9:16 AM, Markus Jelsma <[EMAIL PROTECTED]>wrote: > Add the schema.xml from nutch/conf to your Solr core. > > btw: be careful with your host and port in the mailing lists. If it's > open.... > > On Wednesday 26 October 2011 15:07:56 Fred Zimmerman wrote: > > that's it. > > > > org.apache.solr.common.SolrException: ERROR:unknown field 'content' > > > > *ERROR:unknown field 'content'* > > > > request: http://url/solr/update?wt=javabin&version=2 > > at > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp > > SolrServer.java:436) at > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp > > SolrServer.java:245) at > > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstract > > UpdateRequest.java:105) at > > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at > > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:82) > > at > > > org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.ja > > va:48) at > > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > > at > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > > 2011-10-26 12:58:20,596 ERROR solr.SolrIndexer - java.io.IOException: Job > > failed! > > > > > > On Wed, Oct 26, 2011 at 9:03 AM, Markus Jelsma > > > > <[EMAIL PROTECTED]>wrote: > > > Check your hadoop.log and Solr log. If that happens there's usually i > > > field mismatch when indexing. > > > > > > On Wednesday 26 October 2011 14:59:02 Fred Zimmerman wrote: > > > > OK, I've fixed the problem with the parameters giving incorrect paths > > > > to the files. Now I get this: > > > > > > > > $ bin/nutch solrindex http://search.zimzaz.com:8983/solrcrawl/crawldb > > > > crawl/linkdb crawl/segments/* > > > > SolrIndexer: starting at 2011-10-26 12:57:57 > > > > java.io.IOException: Job failed! > > > > > > -- > > > Markus Jelsma - CTO - Openindex > > > http://www.linkedin.com/in/markus17 > > > 050-8536620 / 06-50258350 > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 > +
Fred Zimmerman 2011-10-26, 13:31
-
Re: solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.lewis john mcgibbney 2011-10-26, 13:34
Hi Fred,
These are clearly Solr aimed questions, which I would observe are specific to your schema. Maybe try the Solr archives for key words or else try the Solr user lists.I think that you are much more likely to get a substantiated response there. Thank you On Wed, Oct 26, 2011 at 3:31 PM, Fred Zimmerman <[EMAIL PROTECTED]>wrote: > I added just the <content> field ... I have already modified solr's > schema.xml to accommodate some other data types. > > Now when starting solr ... > > INFO: SolrUpdateServlet.init() done > 2011-10-26 13:29:50.849:INFO::Started SocketConnector@0.0.0.0:8983 > 2011-10-26 13:30:23.129:WARN::/solr/admin/ > java.lang.IllegalStateException: STREAM > at org.mortbay.jetty.Response.getWriter(Response.java:616) etc ... > > > On Wed, Oct 26, 2011 at 9:16 AM, Markus Jelsma > <[EMAIL PROTECTED]>wrote: > > > Add the schema.xml from nutch/conf to your Solr core. > > > > btw: be careful with your host and port in the mailing lists. If it's > > open.... > > > > On Wednesday 26 October 2011 15:07:56 Fred Zimmerman wrote: > > > that's it. > > > > > > org.apache.solr.common.SolrException: ERROR:unknown field 'content' > > > > > > *ERROR:unknown field 'content'* > > > > > > request: http://url/solr/update?wt=javabin&version=2 > > > at > > > > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp > > > SolrServer.java:436) at > > > > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp > > > SolrServer.java:245) at > > > > > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstract > > > UpdateRequest.java:105) at > > > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at > > > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:82) > > > at > > > > > > org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.ja > > > va:48) at > > > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) > > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > > > at > > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > > > 2011-10-26 12:58:20,596 ERROR solr.SolrIndexer - java.io.IOException: > Job > > > failed! > > > > > > > > > On Wed, Oct 26, 2011 at 9:03 AM, Markus Jelsma > > > > > > <[EMAIL PROTECTED]>wrote: > > > > Check your hadoop.log and Solr log. If that happens there's usually i > > > > field mismatch when indexing. > > > > > > > > On Wednesday 26 October 2011 14:59:02 Fred Zimmerman wrote: > > > > > OK, I've fixed the problem with the parameters giving incorrect > paths > > > > > to the files. Now I get this: > > > > > > > > > > $ bin/nutch solrindex > http://search.zimzaz.com:8983/solrcrawl/crawldb > > > > > crawl/linkdb crawl/segments/* > > > > > SolrIndexer: starting at 2011-10-26 12:57:57 > > > > > java.io.IOException: Job failed! > > > > > > > > -- > > > > Markus Jelsma - CTO - Openindex > > > > http://www.linkedin.com/in/markus17 > > > > 050-8536620 / 06-50258350 > > > > -- > > Markus Jelsma - CTO - Openindex > > http://www.linkedin.com/in/markus17 > > 050-8536620 / 06-50258350 > > > -- *Lewis* +
lewis john mcgibbney 2011-10-26, 13:34
-
Re: solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.Fred Zimmerman 2011-10-26, 13:38
will do. Of course I have already googled these terms without much luck.
Fred On Wed, Oct 26, 2011 at 9:34 AM, lewis john mcgibbney < [EMAIL PROTECTED]> wrote: > Hi Fred, > > These are clearly Solr aimed questions, which I would observe are specific > to your schema. Maybe try the Solr archives for key words or else try the > Solr user lists.I think that you are much more likely to get a > substantiated > response there. > > Thank you > > On Wed, Oct 26, 2011 at 3:31 PM, Fred Zimmerman <[EMAIL PROTECTED] > >wrote: > > > I added just the <content> field ... I have already modified solr's > > schema.xml to accommodate some other data types. > > > > Now when starting solr ... > > > > INFO: SolrUpdateServlet.init() done > > 2011-10-26 13:29:50.849:INFO::Started SocketConnector@0.0.0.0:8983 > > 2011-10-26 13:30:23.129:WARN::/solr/admin/ > > java.lang.IllegalStateException: STREAM > > at org.mortbay.jetty.Response.getWriter(Response.java:616) etc ... > > > > > > On Wed, Oct 26, 2011 at 9:16 AM, Markus Jelsma > > <[EMAIL PROTECTED]>wrote: > > > > > Add the schema.xml from nutch/conf to your Solr core. > > > > > > btw: be careful with your host and port in the mailing lists. If it's > > > open.... > > > > > > On Wednesday 26 October 2011 15:07:56 Fred Zimmerman wrote: > > > > that's it. > > > > > > > > org.apache.solr.common.SolrException: ERROR:unknown field 'content' > > > > > > > > *ERROR:unknown field 'content'* > > > > > > > > request: http://url/solr/update?wt=javabin&version=2 > > > > at > > > > > > > > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp > > > > SolrServer.java:436) at > > > > > > > > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp > > > > SolrServer.java:245) at > > > > > > > > > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstract > > > > UpdateRequest.java:105) at > > > > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at > > > > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:82) > > > > at > > > > > > > > > > org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.ja > > > > va:48) at > > > > > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) > > > > at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > > > > at > > > > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > > > > 2011-10-26 12:58:20,596 ERROR solr.SolrIndexer - java.io.IOException: > > Job > > > > failed! > > > > > > > > > > > > On Wed, Oct 26, 2011 at 9:03 AM, Markus Jelsma > > > > > > > > <[EMAIL PROTECTED]>wrote: > > > > > Check your hadoop.log and Solr log. If that happens there's usually > i > > > > > field mismatch when indexing. > > > > > > > > > > On Wednesday 26 October 2011 14:59:02 Fred Zimmerman wrote: > > > > > > OK, I've fixed the problem with the parameters giving incorrect > > paths > > > > > > to the files. Now I get this: > > > > > > > > > > > > $ bin/nutch solrindex > > http://search.zimzaz.com:8983/solrcrawl/crawldb > > > > > > crawl/linkdb crawl/segments/* > > > > > > SolrIndexer: starting at 2011-10-26 12:57:57 > > > > > > java.io.IOException: Job failed! > > > > > > > > > > -- > > > > > Markus Jelsma - CTO - Openindex > > > > > http://www.linkedin.com/in/markus17 > > > > > 050-8536620 / 06-50258350 > > > > > > -- > > > Markus Jelsma - CTO - Openindex > > > http://www.linkedin.com/in/markus17 > > > 050-8536620 / 06-50258350 > > > > > > > > > -- > *Lewis* > +
Fred Zimmerman 2011-10-26, 13:38
|