Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Nutch, mail # user - HTTP ERROR 400


+
Stephan Kristyn 2012-05-09, 10:11
+
Stephan Kristyn 2012-05-09, 10:17
+
Lewis John Mcgibbney 2012-05-09, 10:32
+
Stephan Kristyn 2012-05-09, 11:21
+
Lewis John Mcgibbney 2012-05-09, 11:33
+
Stephan Kristyn 2012-05-09, 12:26
+
Stephan Kristyn 2012-05-09, 12:28
+
Stephan Kristyn 2012-05-09, 14:04
+
Tolga 2012-05-09, 19:28
+
Markus Jelsma 2012-05-09, 19:34
Copy link to this message
-
Re: HTTP ERROR 400
Stephan Kristyn 2012-05-09, 14:33
Ok now at the heading "Step-by-Step: Fetching" I get

-bash-4.1$ bin/nutch generate crawldb crawldb/segments
Generator: starting at 2012-05-09 14:32:44
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: jobtracker is 'local', generating exactly one partition.
Generator: org.apache.hadoop.mapred.InvalidInputException: Input path
does not exist:
file:/home/kristyns/apache-nutch-1.4-bin/runtime/local/crawldb/current
        at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
        at
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
        at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
        at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
        at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
        at org.apache.nutch.crawl.Generator.generate(Generator.java:538)
        at org.apache.nutch.crawl.Generator.run(Generator.java:704)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Generator.main(Generator.java:660)

Strange...

Am 09.05.2012 16:04, schrieb Stephan Kristyn:
> Hi, it seems like I forgot to fetch the crawled URLs, as mentioned in
> the tutorial:
>
> http://wiki.apache.org/nutch/NutchTutorial
>
> I'll let you know if and how that worked out for me.
>
> Am 09.05.2012 14:28, schrieb Stephan Kristyn:
>> This is the query that the SOLR interface generates when I enter "test" and hit the serach button:
>> http://myDomain:8983/solr/select/?q=test&version=2.2&start=0&rows=10&indent=on
>>
>> Maybe this is a question better suited for the Solr ML?
>>
>> From: Lewis John Mcgibbney [mailto:[EMAIL PROTECTED]]
>> Sent: Mittwoch, 9. Mai 2012 13:34
>> To: [EMAIL PROTECTED]
>> Subject: Re: HTTP ERROR 400
>>
>> are you attempting to index to Solr or is this simply when you start you solr server?
>> On Wed, May 9, 2012 at 12:21 PM, Stephan Kristyn <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
>> I copied over the schema and everything else in conf from nutch.
>>
>> $cp apache-nutch-1.4-bin/runtime/local/conf/* apache-solr-3.6.0/example/solr/conf/
>>
>>
>>
>>
>> Am 09.05.2012 12:32, schrieb Lewis John Mcgibbney:
>>
>> Which schema are you using with your SOlr server?
>>
>>
>>
>> On Wed, May 9, 2012 at 11:17 AM, Stephan Kristyn <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]> wrote:
>>
>> Also.. entering
>>
>>
>>
>> java -jar post.jar *.xml on RHEL6 I get a
>>
>>
>>
>> INFO: [] webapp=/solr path=/update params={} status=400 QTime=42
>>
>> SimplePostTool: FATAL: Solr returned an error #400 ERROR:
>>
>> [doc=GB18030TEST] unknown field 'name'
>>
>>
>>
>> Thanks,
>>
>> Stephan
>>
>>
>>
>>
>>
>> Am 09.05.2012 12:11, schrieb Stephan Kristyn:
>>
>> Hi,
>>
>>
>>
>> after installing Nutch and Solr I get a
>>
>>
>>
>>
>>
>>     HTTP ERROR 400
>>
>>
>>
>> Problem accessing /solr/select/. Reason:
>>
>>
>>
>>     undefined field text
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> /Powered by Jetty://
>>
>>
>>
>>
>>
>>
>>
>> /Any ideas how to fix this?
>>
>>
>>
>> Thanks,
>>
>> Stephan
>>
>> --
>>
>> stephan
>> kristyn
>> partner operations manager
>>
>> "The Internet? Is that thing still around?" - Homer Simpson
>>
>> [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
>> direct +49 (0)89 231 97 207<tel:%2B49%20%280%2989%20231%2097%20207>    mobile +49 (0) 162 28899 02<tel:%2B49%20%280%29%20162%2028899%2002>
>>
>> yahoo! deutschland gmbh theresienhoehe 12, munich, 80339, germany
>> phone (408) 349 3300<tel:%28408%29%20349%203300>    fax (408) 349 3301<tel:%28408%29%20349%203301>
>>
>> [cid:[EMAIL PROTECTED]]
>>
>>
>>
>>
>

 

*stephan*
*kristyn*
partner operations manager
 
"The Internet? Is that thing still around?" - Homer Simpson
 
[EMAIL PROTECTED]
direct +49 (0)89 231 97 207    mobile +49 (0) 162 28899 02
 
yahoo! deutschland gmbh theresienhoehe 12, munich, 80339, germany
phone (408) 349 3300    fax (408) 349 3301
 
http://us.i1.yimg.com/us.yimg.com/i/pt/i/buzzmktg/brand/logos/yahoo_email_sig_generic_v2.gif
 

 
+
Stephan Kristyn 2012-05-09, 15:11
+
Lewis John Mcgibbney 2012-05-09, 16:05
+
Markus Jelsma 2012-05-09, 16:36
+
Stephan Kristyn 2012-05-09, 16:25
+
Tolga 2012-05-10, 06:10
+
Markus Jelsma 2012-05-10, 06:42
+
Michael Erickson 2012-05-10, 12:56
+
Lewis John Mcgibbney 2012-05-10, 13:35
+
Markus Jelsma 2012-05-10, 13:45
+
Ferdy Galema 2012-05-10, 14:03
+
Tolga 2012-05-10, 19:54
+
Markus Jelsma 2012-05-10, 20:38
+
Tolga 2012-05-11, 04:39
+
Markus Jelsma 2012-05-11, 06:40
+
Tolga 2012-05-15, 10:40
+
Markus Jelsma 2012-05-15, 11:05
+
Tolga 2012-05-15, 12:01
+
Tolga 2012-05-15, 12:49
+
Tolga 2012-05-17, 10:07
+
Jean-François Gingras 2012-05-19, 01:43
+
m2000hsf 2012-05-19, 06:43
+
Tolga 2012-05-10, 07:07
+
Stephan Kristyn 2012-05-10, 10:22
+
keesp 2012-05-24, 08:29