Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Nutch, mail # dev - [DISCUSS] Issues with Fetcher


+
Lewis John Mcgibbney 2012-01-20, 17:16
+
Ken Krugler 2012-01-20, 19:59
+
Julien Nioche 2012-01-21, 13:45
+
Lewis John Mcgibbney 2012-01-21, 14:27
+
Edward Drapkin 2012-01-21, 16:17
+
Julien Nioche 2012-01-21, 20:55
+
Ken Krugler 2012-01-21, 18:33
Copy link to this message
-
Re: [DISCUSS] Issues with Fetcher
Julien Nioche 2012-01-21, 20:57
> Or, if you have experience with JSPs/GUI work, then I think there's this
> big open issue around improving the Nutch GUI, which would likely provide
> the most benefit to the most users. I haven't been following the current
> status, but I know that there have been periodic discussions, and I think
> 101tec did some work on this a while back (for a client), but I don't know
> if that's been contributed (or could be, for that matter).
>

A related issue is porting the REST-API from nutchgora to trunk (
https://issues.apache.org/jira/browse/NUTCH-880) which in turn could be
used by a GUI

J.

>
> -- Ken
>
> On Jan 21, 2012, at 8:17am, Edward Drapkin wrote:
>
>  On 1/21/2012 8:27 AM, Lewis John Mcgibbney wrote:
>
> Hi Julien,
>
>
>  There are 8 issues in trunk about the fetcher - some of them unrelated
>> to the Fetcher (NUTCH-827<https://issues.apache.org/jira/browse/NUTCH-827>/ Nutch-1193) with most of the others being improvements (
>> NUTCH-828 <https://issues.apache.org/jira/browse/NUTCH-828> / NUTCH-1079<https://issues.apache.org/jira/browse/NUTCH-1079>)
>> with possibly just a very few being real issues.
>
>
> This puts the whole discussion into much better context, thanks for
> pointing this out. Maybe I should have made it more clear, that I only
> filtered the fetcher issues on our Jira and I was simply modelling my
> discussion around that. You are completely correct though, it would be
> different if the fetcher was in a similar state to protocol-httpclient...
> which it is obviously not.
>
>
>> I am also concerned about getting too radical changes to such a core part
>> of the framework, especially when more pressing issues could be looked
>> after instead.
>
> +1
>
>
>> Having said that if someone can come up with an interesting proposal for
>> improving the Fetcher that would be very good, I would simply suggest that
>> we then have a separate implementation for that.
>>
> +1
>
>
>>
>>
>>  Ok with this in mind then, is there some guidance we can communicate to
> Eddie? He has specifically mentioned that he shares similar opinions wrt
> the fetcher being a core part of Nutch, radical changes etc, and I also
> share this point of view. He has also added that he doesn't want to spend
> the time changing material which we may or may not merge with trunk, this
> also makes perfect sense. Additionally Ken's comments emphasise that this
> has been somewhat attempted in the past and that lessons have been learned
> and the implementation we have cuts the mustard as is.
> Maybe we could nudge Eddie in the right direction, which would benefit
> both himself and the project over the next while, I think this was the most
> important point I was trying to emphasise, however looking over my original
> comment this was maybe not how it was written.
>
> Thanks
> Lewis
>
>
> If there's more important and/or interesting things for me to work on,
> I'll be glad to.  I'm completely unfamiliar with the current state of the
> project as a whole - and looking through JIRA is a bit daunting.  The only
> reason I'm attracted to working on the fetcher is I think it's a really
> interesting and compelling problem to solve, and it's making it more
> flexible is something that would directly benefit our use for it, so it
> will be easier to devote time to it while I'm at the office.  I do have a
> glut of free time at the moment though, so I'm perfectly okay working on
> another area that's more pressing - I just don't know what it is.  I saw
> that protocol-httpclient needs to be rewritten, is there someone working on
> that?
>
> I can work on more important and less controversial / radical things, but
> I do think that having a more flexible, pluggable fetcher will be an
> enormous improvement to Nutch and can greatly expand the potential uses for
> it as a piece of software.  There's a ton of cases where pluggable fetching
> could have a huge improvement: local filesystem search, single-threaded /
> small site indexing, email indexing (SMTP, POP, etc.), etc.  I suggested an
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
+
Mattmann, Chris A 2012-01-21, 18:54