Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Lucene and all its subprojects:

Switch to Threaded View
Nutch >> mail # user >> Announcing release of Arch - an extension of Nutch for intranet search


Copy link to this message
-
RE: Announcing release of Arch - an extension of Nutch for intranet search
Awesome! This looks very interesting - I'll give it a look over the next
few weeks....

-Mark

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: 17 March 2010 13:59
To: [EMAIL PROTECTED]
Subject: Announcing release of Arch - an extension of Nutch for intranet
search

Hello,
I have been reading this list for quite a while. This was frustrating at
times because very often I thought, "If only I could release Arch now, I
could help this..., and this..., and this..." But, it was not ready. Now
it is ready and I am more than happy to release it.

I hope it will be useful in more than one way. A few examples:

-          People often asked how to avoid a complete re-crawl when a
crawl fails. With Arch, you can do it. You can split your web site into
areas and crawl them separately as needed. Then they are combined into a
single index. If a crawl fails and you restart Arch, it will start with
the area that failed, skipping already indexed ones.
-          People asked how to use Nutch classes from Java. Arch is
doing that, see the sources.
-          People had issues with updating pages in the index. Arch does
not have this problem.

Arch has a lot more than the above. For me, as a webmaster, it has
everything that I can ask for: document level security, easy support for
multiple web sites, modular pluggable authentication, automatic dynamic
site directory, scheduled cheap index updates.

A very important feature is improved document weighting scheme. It works
fantastic on intranets. No more users' complains about finding junk
instead of what they expect to find.

Arch has a dual (PHP and JSP) interface. For those of you that prefer
PHP to Java, the PHP interface will be easier to customise.

More information, sources, screenshots and binaries are available here:

http://www.atnf.csiro.au/computing/software/arch/index.html

Sorry, no demo is available, as Arch runs behind the firewall at ATNF. I
hope to get it out in the open in a few days.
Regards,
Arkadi Kosmynin
CSIRO Astronomy and Space Science
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB