Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Nutch, mail # user - crawldb modifications


Copy link to this message
-
Re: crawldb modifications
remi tassing 2012-02-28, 12:04
I think he ment to remove some specific URLs not everything

On Tue, Feb 28, 2012 at 1:51 PM, Markus Jelsma
<[EMAIL PROTECTED]>wrote:

> I may be missing something but rm -r crawl/crawldb works fine here.
>
> On Tuesday 28 February 2012 07:03:39 remi tassing wrote:
> > What do in this case is to erase the db, use the.command mergesegs with
> > -filter option and then updatedb.
> >
> > I would.love to know if there is a simpler way
> >
> > Remi
> >
> > On Monday, February 27, 2012, Charles Thomas <[EMAIL PROTECTED]> wrote:
> > > Is there a way to clear out the various databases that Nutch uses (e.g.
> > > crawldb)?  I did some testing which injected a lot of URLs into the DB
> >
> > that
> >
> > > I want to clear out as I move toward production.
> > >
> > > Thanks!
> > >
> > > CT
> > >
> > > --
> >
> > > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/crawldb-modifications-tp3781740p3781740
> .
> > html
> >
> > > Sent from the Nutch - User mailing list archive at Nabble.com.
>
> --
> Markus Jelsma - CTO - Openindex
>