| clear query|facets|time |
Search criteria: the 0.8.
Results from 1 to 10 from
73 (3.417s).
|
|
|
Loading phrases to help you refine your search...
|
|
nutch-0.8-dev/bin/nutch_invertlinks - Nutch - [wiki]
|
|
...
"invertlinks" is an alias for "org.apache.nutch.crawl.LinkDb"
Updates the Link Database with linking information from a segment.
Usage
nutch-0.8-dev/bin/nutch org...
|
|
....
Commandline options for version 0.8
nutch-0.8-dev/bin/nutch_invertlinks...
|
|
|
http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_invertlinks
Author: localhost,
2009-09-20, 23:10
|
|
|
nutch-0.8-dev/bin/nutch_dedup - Nutch - [wiki]
|
|
...
"dedup" is an alias for "org.apache.nutch.indexer.DeleteDuplicates"
Removes duplicate pages from a set of segment indexes.
Usage
nutch-0.8-dev/bin/nutch org...
|
|
...-site.xml
Other Files
None.
Caveats and Notes
None.
DevelopmentCommandLineOptions
nutch-0.8-dev/bin/nutch_dedup...
|
|
|
http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_dedup
Author: localhost,
2009-09-20, 23:10
|
|
|
nutch-0.8-dev/bin/nutch_plugin - Nutch - [wiki]
|
|
...
"plugin" is an alias for "org.apache.nutch.plugin.PluginRepository"
Used to load a plugin from the repository and execute its main class.
Usage
nutch-0.8-dev/bin/nutch org...
|
|
....
DevelopmentCommandLineOptions
nutch-0.8-dev/bin/nutch_plugin...
|
[+ show more]
[- hide]
| ....apache.nutch.plugin.PluginRepository <pluginId> <className> [args ...]
<pluginId>: The id of the plugin you wish to execute.
<className>: The class with the main() function.
[args]: 0..N arguments... |
|
|
http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_plugin
Author: localhost,
2009-09-20, 23:09
|
|
|
nutch-0.8-dev/bin/nutch_updatedb - Nutch - [wiki]
|
|
...
"updatedb" is an alias for "org.apache.nutch.crawl.CrawlDb"
Updates the Crawl DB with information obtained from the Fetcher.
Usage
nutch-0.8-dev/bin/nutch org...
|
|
....
DevelopmentCommandLineOptions
nutch-0.8-dev/bin/nutch_updatedb...
|
|
|
http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_updatedb
Author: localhost,
2009-09-20, 23:09
|
|
|
nutch-0.8-dev/bin/nutch_generate - Nutch - [wiki]
|
|
...
"generate" is an alias for "org.apache.nutch.crawl.Generator"
Generates a new Fetcher Segment from the Crawl Database
Usage
nutch-0.8-dev/bin/nutch org...
|
|
... segments created. For instance if -numFetchers 2 was specified there would be 2 fetcher segments created under <segments_dir>. Under 0.8 this is no longer the case.
Examples
nutch...
|
[+ show more]
[- hide]
| ...-0.8-dev/bin/nutch generate /my/crawldb /my/segments
This example will generate a fetch list that contains all URLs ready to be fetched from the Crawl Database. The Crawl Database is located... |
| ... at my/crawldb and the Generator will output the fetch list to /my/segments/yyyyMMddHHmmss.
nutch-0.8-dev/bin/nutch generate /my/crawldb /my/segments -topN 100 -adddays 20
In this example... |
| ... the Generator will add 20 days to the current date/time when determining the top 100 scoring pages to fetch.
DevelopmentCommandLineOptions
nutch-0.8-dev/bin/nutch_generate... |
|
|
http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_generate
Author: localhost,
2009-09-20, 23:10
|
|
|
nutch-0.8-dev/bin/nutch_index - Nutch - [wiki]
|
|
...-0.8-dev/bin/nutch org.apache.nutch.indexer.Indexer <index> <crawldb> <linkdb> <segment> ...
<index>: Path to the directory where the index will be created...
|
|
....
DevelopmentCommandLineOptions
nutch-0.8-dev/bin/nutch_index...
|
|
|
http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_index
Author: localhost,
2009-09-20, 23:09
|
|
|
nutch-0.8-dev/bin/nutch_readlinkdb - Nutch - [wiki]
|
|
...
"readlinkdb" is an alias for "org.apache.nutch.crawl.LinkDbReader"
Exports information on the Link Database or Returns information on an URL in the Link Database
Usage
nutch-0.8...
|
|
...-default.xml
nutch-site.xml
Other Files
None.
Caveats and Notes
None.
DevelopmentCommandLineOptions
nutch-0.8-dev/bin/nutch_readlinkdb...
|
|
|
http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_readlinkdb
Author: localhost,
2009-09-20, 23:09
|
|
|
nutch-0.8-dev/bin/nutch_merge - Nutch - [wiki]
|
|
...
"merge" is an alias for "org.apache.nutch.indexer.IndexMerger"
Merges several segment indexes
Usage
nutch-0.8-dev/bin/nutch org.apache.nutch.indexer.IndexMerger [-workingdir <...
|
|
...CommandLineOptions
nutch-0.8-dev/bin/nutch_merge...
|
|
|
http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_merge
Author: localhost,
2009-09-20, 23:09
|
|
|
nutch-0.8-dev/bin/nutch_segread - Nutch - [wiki]
|
|
...
"segread" is an alias for "org.apache.nutch.segment.SegmentReader"
Reads and Exports a Segments Data
Usage
nutch-0.8-dev/bin/nutch org.apache.nutch.segment.SegmentReader <...
|
|
... with 'reset').
DevelopmentCommandLineOptions
nutch-0.8-dev/bin/nutch_segread...
|
|
|
http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_segread
Author: localhost,
2009-09-20, 23:10
|
|
|
nutch-0.8-dev/bin/nutch_fetch - Nutch - [wiki]
|
|
...
"fetch" is an alias for "org.apache.nutch.fetcher.Fetcher"
Runs the Fetcher on a segment.
Usage
nutch-0.8-dev/bin/nutch org.apache.nutch.fetcher.Fetcher <segment> [-threads...
|
|
....threads.fetch -> 10
[-noParsing]: Disables automatic parsing of the segment's data. See nutch-0.8-dev/bin/nutch_parse
Configuration Files
hadoop-default.xml
hadoop-site.xml
nutch...
|
[+ show more]
[- hide]
| ... to fetch both http and https protocols then only protocol-httpclient is needed.
DevelopmentCommandLineOptions
nutch-0.8-dev/bin/nutch_fetch... |
|
|
http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_fetch
Author: localhost,
2009-09-20, 23:09
|
|
|
|