Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Nutch, mail # user - OpenCalais alternatives for use with Nutch?


Copy link to this message
-
OpenCalais alternatives for use with Nutch?
Alex McLintock 2010-07-02, 15:53
I'm quite interested in OpenCalais - a Reuters/Thompson initiative. It
is a web service to take your free text and identify important terms
in it like people, businesses, places, and so on. If you are the
document owner you can submit your document to their web site and get
back important tags saying what this document is about. I'd like to
tag this sort of data and feed it into a Lucene style index so that it
can be used in searches AND in focussed/topical crawls.

Now, here comes the problem. When we crawl the web we don't own the
documents we are crawling so we don't really have permission to use
Reuters' servers to do this analysis. (Maybe we could cut a deal
though if we were a big enough company).

So has anyone else looked at alternatives to OpenCalais which takes
free text and tries to understand what it is about? I've been looking
for software to do this but nothing seems suitable.

Alex