Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Nutch, mail # user - Google Analytics in Hadoop ?


Copy link to this message
-
Google Analytics in Hadoop ?
Alex McLintock 2012-04-30, 15:36
Hi Folks,

This is not 100% a Nutch question... and I hate it when other people say "I
know my question is off topic....." so why I am doing it myself I don;t
know.

I am looking at building a system similar to Google Analytics - in that it
logs page requests on third party sites using some kind of Javascript, does
processing on those logs, and produces reports. I see there are open source
tools for this which are MySQL/RDBMS backed - but I want a Hadoop backed
system for scalability. Do I just need to implement it myself or is anyone
working on such a thing?

To bring this back to Nutch I would also like to fetch and index all the
pages which are logged in this way so that my system knows what they are
about. (But I don't really need any web crawling after that)

Any ideas?

Cheers