| clear query|facets|time |
Search criteria: .
Results from 11 to 20 from
133 (0.53s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Getting seed url - Nutch - [mail # user]
|
|
...Segments have a field called 'outlinks', could this help? On Tuesday, June 12, 2012, Sebastian Nagel wrote: ...
|
|
|
Author: remi tassing,
2012-06-11, 22:45
|
|
|
Compilation of core classes - Nutch - [mail # user]
|
|
...Hello guys, this is probably a basic Java/Ant question. It's pretty easy to compile plugins. All you do is go to the plugin root directory and run 'ant' (e.g. nutch-1.4/src/plugin/prot...
|
|
|
Author: remi tassing,
2012-06-10, 09:35
|
|
|
Re: using less resources - Nutch - [mail # user]
|
|
...I was wondering how do you know if the page was changed without actually fetching it On Wednesday, May 23, 2012, wrote: ...
|
|
|
Author: remi tassing,
2012-05-23, 12:58
|
|
|
Re: Crawl sites with hashtags in url - Nutch - [mail # user]
|
|
...Hi Roberto, If you're having an invalid URI error, then this might probably help you: http://lucene.472066.n3.nabble.com/Invalid-uri-td3742047.html Remi On Tue, May 1, 2012...
|
|
|
Author: remi tassing,
2012-05-02, 00:20
|
|
|
Re: solution for scanned pdf parsing - Nutch - [mail # user]
|
|
...It could also be due to the filesize //Remi On Tuesday, April 24, 2012, nutchsolruser wrote: with http://lucene.472066.n3.nabble.com/solution-for-scanned-pdf-parsing-tp393...
|
|
|
Author: remi tassing,
2012-04-24, 10:45
|
|
|
Re: Good workflow for a regular re-indexing job - Nutch - [mail # user]
|
|
...Have you read this? http://wiki.apache.org/nutch/NutchTutorial/ You can put all commands in a shell script Remi On Monday, April 23, 2012, Ian Piper wrote: ...
|
|
|
Author: remi tassing,
2012-04-23, 22:57
|
|
|
Re: exclude some urls from crawling - Nutch - [mail # user]
|
|
...To exclude index.php and index.html just use: -index\.html -index\.php You can do the same for video and live-score. To ultimately make sure if a URL is blocked or not, try: echo...
|
|
|
Author: remi tassing,
2012-04-13, 13:46
|
|
|
Re: How to handle failures in nutch? - Nutch - [mail # user]
|
|
...I don't think so! freegen will generate a new segment and you don't need to merge it with the others. Then you can (fetch and) parse the content from that new segment. Fina...
|
|
|
Author: remi tassing,
2012-04-10, 10:15
|
|
|
Re: Returning web page abstract with Solr - Nutch - [mail # user]
|
|
...Are you looking for result highlighting? http://wiki.apache.org/solr/HighlightingParameters Remi On Wed, Apr 4, 2012 at 3:30 PM, smooth almonds wrote: ...
|
|
|
Author: remi tassing,
2012-04-04, 07:33
|
|
|
Re: Normalizer error: "IndexOutOfBoundsException: No group 1" - Nutch - [mail # user]
|
|
...True true, thanks! On Tue, Apr 3, 2012 at 3:08 AM, Sebastian Nagel wrote: ...
|
|
|
Author: remi tassing,
2012-04-03, 00:19
|
|
|
|