Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Nutch, mail # user - Indexing HTML metatags from Nutch into Solr


Copy link to this message
-
RE: Indexing HTML metatags from Nutch into Solr
Ing. Eyeris Rodriguez Rue... 2012-05-15, 20:04
Hi ML, this is the configuration for index-metatags plugins

In your schema.xml(this file is the same in solr and nutch)
<field name="keywords" type="text" indexed="true" stored="true"/>
<field name="description" type="text" indexed="true" stored="true"/>
<field name="lastModified" type="date" stored="true" indexed="true"/>

In nutch-site.xml you need to put some like this:
Look name and value(not put)
<property>
    <name>metatags.names</name>
    <value>keywords;description;last_modified</
value>
    <description>For plugin index-metatags: Indicate here the name of the
    html meta tag that should be
    parsed. Use a semicolon separated list if you want multiple
    tags, or use '*' to index all.
    Example: description;keywords;role
</description>
</property>

after you have to configure your solrindex-mapping like this:
<field dest="description" source="description" />
<field dest="keywords" source="keywords" />
<field dest="lastModified" source="lastModified"/>
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci