Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Nutch, mail # dev - Skipping Root File from Indexing


Copy link to this message
-
Skipping Root File from Indexing
atul 2012-04-24, 19:44
Hi,

We have a nutch-solr combination in place to built up a web page.
We are reading a source index.html file which contains links to other web
pages.
Our code is working fine, we are getting rest of the web pages indexed
following the URL's on index.html.

However we don't want index.html file to get indexed in solr. We want rest
all the internal URL's (web pages)
indexed except the root page.

Please advice, how this can be achieved?

Thanks,
Atul

--
View this message in context: http://lucene.472066.n3.nabble.com/Skipping-Root-File-from-Indexing-tp3936375p3936375.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.