Andy Xue 2012-05-31, 01:34
-Re: "nutch-site.xml" not robust
Lewis John Mcgibbney 2012-05-31, 10:37
This is a good catch and I would suggest you open an issue on the Jira
and submit a patch for the few instances of where this actually
occurs... e.g. I think there are currently 4 such instances in
nutch-default which concern the ordering of such tools. Admittedly
though I haven't dug down into the code to see if it is consistent as
If you begin by investigating (and patching if necessary) these parts
then this would make a nice patch. As you are using trunk, I wouldn't
imagine it would take you too long.
Thanks very much
On Thu, May 31, 2012 at 2:34 AM, Andy Xue <[EMAIL PROTECTED]> wrote:
> Hi all:
> The following situation has come to my attention regarding "*nutch-site.xml*"
> when I'm using nutch trunk:
> When listing multiple scoring filters in the property "*scoring.filter.order
> *", it is vital that no spaces/newlines/tabs are placed in front of the
> first value. E.g.:
> This is fine:
> <value>org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value>
> Either of these will generate an exception:
> <value> org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value>
> The reason is: In *org.apache.nutch.scoring.ScoringFilters*, a statement
> (on line 59) "orderedFilters = order.split("\\s+");" tries to split the
> aforementioned string. The leading spaces will cause an empty separate
> array element as the first element, hence result in a ClassNotFound /
> NullPointer exception.
> It can be easily fixed of course, but what concerns me is that I suspect
> the fact that other properties will have the same problem (i.e., must have
> the value content immediately follow the *<value>* tag. This is not
> considered robust.
> Any thoughts?
Andy Xue 2012-06-06, 02:53
Lewis John Mcgibbney 2012-06-07, 11:28
Andy Xue 2012-06-09, 01:42
Andy Xue 2012-06-12, 06:25
Lewis John Mcgibbney 2012-06-12, 22:03