. Results from
Did you mean:
Loading phrases to help you
refine your search...
No results found for
Search results for
....mail-archive.com/[EMAIL PROTECTED]/msg08665.html Discussion Grub has some interesting ideas about building a search engine using distributed
. And how is that
to nutch? CategoryHomepage FAQ...
... fetched. Then later send a CONT signal to the process. Do not turn off your
between! How many concurrent threads should I use? This is dependent on your particular set-up; unless...
[+ show more]
... bugs, patches, or feature requests to the mailing lists. Refer instead to Commiter's_Rules and HowToContribute areas of the Nutch
. Are there any mailing lists available? There...
... (see above). There are instructions on how to get Nutch working with Eclipse on [http://
.apache.org/nutch/RunNutchInEclipse] but the easiest way of doing is to use ANT for compiling...
... fetch pages that require Authentication? See the HttpAuthenticationSchemes
page. Speed of Fetching seems to decrease between crawl iterations... what's wrong? A possible reason...
, 2013-02-07, 04:47
... into the Nutch or Hadoop architecture, resources relating to these topics can be found here. It only tells how to get the systems up and running. There are also
resources at the end...
... node I mean that it will run the Hadoop services that coordinate with the slave nodes (all of the other
) and it is the machine on which we performed our crawl. Downloading Hadoop...
[+ show more]
... the first time you login to each
asking if you want to add the
to the known hosts. Answer yes to the prompt. Once the key is copied you shouldn't have to enter a password when...
... machine we are running the master node, we will also need the local
in this slave list. Here is what the slaves file will look like to start. localhost It comes this way to start so...
.... The name node is the coordinator and stores what blocks (not really files but you can think of them as such for now) are on what
and what needs to be replicated to different data nodes...
, 2012-03-20, 14:44
...-analysis to get a single global
score for each url. Building a webgraph assumes that all links are stored in the current segments to be processed. Links are not held over from one processing...
... links to D which links back to A. This program is
expensive and usually, due to time and space requirement, can't be run on more than a three or four level depth. While it does...
[+ show more]
... and link cycles and then allow those links to be removed. Problem is the class is very expensive
. You can set the depth you want it to run but it is worse than exponential so I...
... scores. Some things to consider: Pagerank is just one of over 200 signals that google uses (if they still use it) to determine
. Even if Google still uses it it most likely has...
... changed. Link analysis scores are good global
scores, but a link score does not a search engine make today. Oh how I wish it was that simple. LinkRank is a good starting point, that...
, 2011-08-07, 12:55
... of the tutorial though I will point you to
resources if you want to know more about the architecture of Nutch and Hadoop. The tutorial comes in two phases. Firstly we get Hadoop running...
... not be compatible with future releases of either Nutch or Hadoop. Five: For this tutorial we setup nutch across 6 different
. If you are using a different number of machines you should still...
[+ show more]
... First let me layout the
that we used in our setup. To setup Nutch and Hadoop we had 7 commodity
ranging from 750Mghz to 1.0 Ghz. Each
had at least 128 Megs of RAM...
... and at least a 10 Gigabyte hard drive. One
had dual 750 Mghz CPUs and another had dual 30 Gigabyte hard drives. All of these
were purchased for under $500.00 at a liquidation sale...
.... I am telling you this to let you know that you don't have to have big hardware to get up and running with Nutch and Hadoop. Our
were named like this: devcluster01 devcluster02...
, 2011-09-02, 19:58
newest on top
oldest on top
last 7 days (0)
last 30 days (0)
last 90 days (0)
last 6 months (1)
last 9 months (4)
Mattmann, Chris A (20)
Chris Hostetter (11)
Chuck Williams (10)
Terry Steichen (8)
Grant Ingersoll (7)
Michael McCandless (6)
Otis Gospodnetic (6)
Ted Dunning (6)
Gururaja H (5)
Sean Owen (5)
aash dhariya (5)
Erick Erickson (4)
Dan Brickley (3)
All projects made searchable here are trademarks of the Apache Software Foundation. Service operated by