|
|
-
Google Analytics in Hadoop ?Alex McLintock 2012-04-30, 15:36
Hi Folks,
This is not 100% a Nutch question... and I hate it when other people say "I know my question is off topic....." so why I am doing it myself I don;t know. I am looking at building a system similar to Google Analytics - in that it logs page requests on third party sites using some kind of Javascript, does processing on those logs, and produces reports. I see there are open source tools for this which are MySQL/RDBMS backed - but I want a Hadoop backed system for scalability. Do I just need to implement it myself or is anyone working on such a thing? To bring this back to Nutch I would also like to fetch and index all the pages which are logged in this way so that my system knows what they are about. (But I don't really need any web crawling after that) Any ideas? Cheers |