Parse filters are run on a single document during parsing and indexing filters
when a document is about to be sent to Solr. It's best to have parse logic in
parse filters and index (or other) logic in the indexing filters.
On Friday 27 April 2012 16:19:43 Jim Chandler wrote:
> Hi Lewis,
>
> The idea is to move some of the processing from indexing to parsing, hoping
> to limit the latency on Solr.
>
> I've looked at the wiki, and it may be me, I am having a difficult time
> understanding the process as a whole. I am very unfamiliar with crawling,
> parsing and indexing. I'm just trying to understand how everything works
> together and at which point the plugins are run.
>
> Thanks,
> Jim
>
> On Thu, Apr 26, 2012 at 4:49 PM, Lewis John Mcgibbney <
>
> [EMAIL PROTECTED]> wrote:
> > Hi Jim,
> >
> > On Thu, Apr 26, 2012 at 2:23 PM, Jim Chandler <[EMAIL PROTECTED]>
> >
> > wrote:
> > > I am in the
> > > process of trying to change a plugin from an IndexingFilter to a
> > > Parser.
> >
> > Personally I wouldn't do this, I would pick up an existing parser and
> > edit it into another parser! Do you have any specific reasons for
> > doing this the other way around?
> >
> > > am having difficultying understanding where in the nutch process each
> > > one of these is run.
> >
> > Well the parser is run once you have fetched your pages and you wish
> > to extract content from them.
> > The indexingfilter is used when you wish to send things to be indexed
> > in some sort of custom manner.
> >
> > > Does anyone have any recommendations of sites or books that would be
> > >
> > > helpful?
> >
> > What I think your speaking about is getting up to speed with plugins;
> > how they are used, what they comprise of, and how they can be built to
> > solve your domain specific problems.
> >
> > Check out our wiki, it's the best source of Nutch info on the web...
> >
> >
http://wiki.apache.org/nutch/> >
http://wiki.apache.org/nutch/PluginCentral> >
> > hth
> >
> > Lewis
> >
> >
> >
> > --
> > Lewis
--
Markus Jelsma - CTO - Openindex