|
Vikas Hazrati
2012-05-21, 11:43
Markus Jelsma
2012-05-21, 12:44
Vikas Hazrati
2012-05-28, 11:55
Vikas Hazrati
2012-05-29, 04:48
Markus Jelsma
2012-05-29, 09:19
Vikas Hazrati
2012-05-29, 10:58
Vikas Hazrati
2012-06-01, 17:46
|
-
Setting the Fetch time with a CustomFetchScheduleVikas Hazrati 2012-05-21, 11:43
Hi,
I would like to implement a custom implementation of AbstractFetchSchedule and would like to change the FetchTime on the basis of some parameters that I get as a part of my parsing. // something like this datum.setFetchTime(fetchTime + (long)datum.getFetchInterval() * 1000 + customLogic); Right now I have a custom URLFilter and a custom parser which extends HtmlParseFilter. At the time of custom parsing, I come across some parameters which would help me define how should I define the fetchtime for that URL. I would like to pass these values to my CustomFetchSchedule. Is there a way to do that? Can I pass them as a part of configuration? Since I would get the data that i need to make a decision only as a part of Parse, would it be possible to pass this data to the FetchSchedule? Thoughts? Regards | Vikas
-
RE: Setting the Fetch time with a CustomFetchScheduleMarkus Jelsma 2012-05-21, 12:44
Yes, you can pass ParseMeta keys to the FetchSchedule as part of the CrawlDatum's meta data as i did with:
https://issues.apache.org/jira/browse/NUTCH-1024 -----Original message----- > From:Vikas Hazrati <[EMAIL PROTECTED]> > Sent: Mon 21-May-2012 13:44 > To: [EMAIL PROTECTED] > Subject: Setting the Fetch time with a CustomFetchSchedule > > Hi, > > I would like to implement a custom implementation of AbstractFetchSchedule > and would like to change the FetchTime on the basis of some parameters that > I get as a part of my parsing. > > // something like this > datum.setFetchTime(fetchTime + (long)datum.getFetchInterval() * 1000 + > customLogic); > > Right now I have a custom URLFilter and a custom parser which extends > HtmlParseFilter. At the time of custom parsing, I come across some > parameters which would help me define how should I define the fetchtime for > that URL. I would like to pass these values to my CustomFetchSchedule. > > Is there a way to do that? Can I pass them as a part of configuration? > > Since I would get the data that i need to make a decision only as a part of > Parse, would it be possible to pass this data to the FetchSchedule? > > Thoughts? > > Regards | Vikas >
-
Re: Setting the Fetch time with a CustomFetchScheduleVikas Hazrati 2012-05-28, 11:55
Thanks Markus, what I understand from the code is that I should be able to
extract and pass meta information from my ParsePlugin and access that as a part of the custom fetch schedule which extends AbstractFetchSchedule. If I create a custom fetch class as class CustomEventFetchScheduler extends AbstractFetchSchedule { ...} how do i include this custom class a part of my crawl cycle? I understand that there is no extension point for this? I get this -> Caused by: java.lang.RuntimeException: Plugin (12kdaggregator), extension point: org.apache.nutch.crawl.FetchSchedule does not exist. Also I could not successfully plug it as a part of nutch-site.xml by overriding the nutch-default.xml <property> <name>db.fetch.schedule.class</name> <value>com.custom.CustomEventFetchScheduler</value> </property> How do I include my custom logic so that it gets picked as a part of the crawl cycle. Regards | Vikas On Mon, May 21, 2012 at 6:14 PM, Markus Jelsma <[EMAIL PROTECTED]>wrote: > Yes, you can pass ParseMeta keys to the FetchSchedule as part of the > CrawlDatum's meta data as i did with: > https://issues.apache.org/jira/browse/NUTCH-1024 > > > -----Original message----- > > From:Vikas Hazrati <[EMAIL PROTECTED]> > > Sent: Mon 21-May-2012 13:44 > > To: [EMAIL PROTECTED] > > Subject: Setting the Fetch time with a CustomFetchSchedule > > > > Hi, > > > > I would like to implement a custom implementation of > AbstractFetchSchedule > > and would like to change the FetchTime on the basis of some parameters > that > > I get as a part of my parsing. > > > > // something like this > > datum.setFetchTime(fetchTime + (long)datum.getFetchInterval() * 1000 + > > customLogic); > > > > Right now I have a custom URLFilter and a custom parser which extends > > HtmlParseFilter. At the time of custom parsing, I come across some > > parameters which would help me define how should I define the fetchtime > for > > that URL. I would like to pass these values to my CustomFetchSchedule. > > > > Is there a way to do that? Can I pass them as a part of configuration? > > > > Since I would get the data that i need to make a decision only as a part > of > > Parse, would it be possible to pass this data to the FetchSchedule? > > > > Thoughts? > > > > Regards | Vikas > > >
-
Re: Setting the Fetch time with a CustomFetchScheduleVikas Hazrati 2012-05-29, 04:48
Anyone? Any idea on what could be going wrong? Is it possible to inject a
custom fetch scheduler? On Mon, May 28, 2012 at 5:25 PM, Vikas Hazrati <[EMAIL PROTECTED]> wrote: > Thanks Markus, what I understand from the code is that I should be able to > extract and pass meta information from my ParsePlugin and access that as a > part of the custom fetch schedule which extends AbstractFetchSchedule. > > If I create a custom fetch class as > > class CustomEventFetchScheduler extends AbstractFetchSchedule { ...} > > how do i include this custom class a part of my crawl cycle? I understand > that there is no extension point for this? > > I get this -> Caused by: java.lang.RuntimeException: Plugin > (myaggregator), extension point: org.apache.nutch.crawl.FetchSchedule does > not exist. > > Also I could not successfully plug it as a part of nutch-site.xml by > overriding the nutch-default.xml > > > <property> > <name>db.fetch.schedule.class</name> > <value>com.custom.CustomEventFetchScheduler</value> > </property> > > > How do I include my custom logic so that it gets picked as a part of the > crawl cycle. > > Regards | Vikas > > On Mon, May 21, 2012 at 6:14 PM, Markus Jelsma <[EMAIL PROTECTED] > > wrote: > >> Yes, you can pass ParseMeta keys to the FetchSchedule as part of the >> CrawlDatum's meta data as i did with: >> https://issues.apache.org/jira/browse/NUTCH-1024 >> >> >> -----Original message----- >> > From:Vikas Hazrati <[EMAIL PROTECTED]> >> > Sent: Mon 21-May-2012 13:44 >> > To: [EMAIL PROTECTED] >> > Subject: Setting the Fetch time with a CustomFetchSchedule >> > >> > Hi, >> > >> > I would like to implement a custom implementation of >> AbstractFetchSchedule >> > and would like to change the FetchTime on the basis of some parameters >> that >> > I get as a part of my parsing. >> > >> > // something like this >> > datum.setFetchTime(fetchTime + (long)datum.getFetchInterval() * 1000 + >> > customLogic); >> > >> > Right now I have a custom URLFilter and a custom parser which extends >> > HtmlParseFilter. At the time of custom parsing, I come across some >> > parameters which would help me define how should I define the fetchtime >> for >> > that URL. I would like to pass these values to my CustomFetchSchedule. >> > >> > Is there a way to do that? Can I pass them as a part of configuration? >> > >> > Since I would get the data that i need to make a decision only as a >> part of >> > Parse, would it be possible to pass this data to the FetchSchedule? >> > >> > Thoughts? >> > >> > Regards | Vikas >> > >> > >
-
RE: Setting the Fetch time with a CustomFetchScheduleMarkus Jelsma 2012-05-29, 09:19
-----Original message-----
> From:Vikas Hazrati <[EMAIL PROTECTED]> > Sent: Mon 28-May-2012 13:55 > To: [EMAIL PROTECTED] > Subject: Re: Setting the Fetch time with a CustomFetchSchedule > > Thanks Markus, what I understand from the code is that I should be able to > extract and pass meta information from my ParsePlugin and access that as a > part of the custom fetch schedule which extends AbstractFetchSchedule. > > If I create a custom fetch class as > > class CustomEventFetchScheduler extends AbstractFetchSchedule { ...} > > how do i include this custom class a part of my crawl cycle? I understand > that there is no extension point for this? Indeed, there is no extension point so you cannot make a nice plugin. What you can do is make sure it's on the classpath and simply tell the scheduler to use it via db.fetch.schedule.class, that should work just fine. > > I get this -> Caused by: java.lang.RuntimeException: Plugin > (12kdaggregator), extension point: org.apache.nutch.crawl.FetchSchedule > does not exist. > > Also I could not successfully plug it as a part of nutch-site.xml by > overriding the nutch-default.xml > > > <property> > <name>db.fetch.schedule.class</name> > <value>com.custom.CustomEventFetchScheduler</value> > </property> > > > How do I include my custom logic so that it gets picked as a part of the > crawl cycle. > > Regards | Vikas > > On Mon, May 21, 2012 at 6:14 PM, Markus Jelsma > <[EMAIL PROTECTED]>wrote: > > > Yes, you can pass ParseMeta keys to the FetchSchedule as part of the > > CrawlDatum's meta data as i did with: > > https://issues.apache.org/jira/browse/NUTCH-1024 > > > > > > -----Original message----- > > > From:Vikas Hazrati <[EMAIL PROTECTED]> > > > Sent: Mon 21-May-2012 13:44 > > > To: [EMAIL PROTECTED] > > > Subject: Setting the Fetch time with a CustomFetchSchedule > > > > > > Hi, > > > > > > I would like to implement a custom implementation of > > AbstractFetchSchedule > > > and would like to change the FetchTime on the basis of some parameters > > that > > > I get as a part of my parsing. > > > > > > // something like this > > > datum.setFetchTime(fetchTime + (long)datum.getFetchInterval() * 1000 + > > > customLogic); > > > > > > Right now I have a custom URLFilter and a custom parser which extends > > > HtmlParseFilter. At the time of custom parsing, I come across some > > > parameters which would help me define how should I define the fetchtime > > for > > > that URL. I would like to pass these values to my CustomFetchSchedule. > > > > > > Is there a way to do that? Can I pass them as a part of configuration? > > > > > > Since I would get the data that i need to make a decision only as a part > > of > > > Parse, would it be possible to pass this data to the FetchSchedule? > > > > > > Thoughts? > > > > > > Regards | Vikas > > > > > >
-
Re: Setting the Fetch time with a CustomFetchScheduleVikas Hazrati 2012-05-29, 10:58
Thanks Markus, would try with the classpath. I believe I did try that
> <property> > <name>db.fetch.schedule.class</name> > <value>com.custom.CustomEventFetchScheduler</value> > </property> but would give it a try again and let the group know... On Tue, May 29, 2012 at 2:49 PM, Markus Jelsma <[EMAIL PROTECTED]>wrote: > -----Original message----- > > From:Vikas Hazrati <[EMAIL PROTECTED]> > > Sent: Mon 28-May-2012 13:55 > > To: [EMAIL PROTECTED] > > Subject: Re: Setting the Fetch time with a CustomFetchSchedule > > > > Thanks Markus, what I understand from the code is that I should be able > to > > extract and pass meta information from my ParsePlugin and access that as > a > > part of the custom fetch schedule which extends AbstractFetchSchedule. > > > > If I create a custom fetch class as > > > > class CustomEventFetchScheduler extends AbstractFetchSchedule { ...} > > > > how do i include this custom class a part of my crawl cycle? I understand > > that there is no extension point for this? > > Indeed, there is no extension point so you cannot make a nice plugin. What > you can do is make sure it's on the classpath and simply tell the scheduler > to use it via db.fetch.schedule.class, that should work just fine. > > > > > I get this -> Caused by: java.lang.RuntimeException: Plugin > > (12kdaggregator), extension point: org.apache.nutch.crawl.FetchSchedule > > does not exist. > > > > Also I could not successfully plug it as a part of nutch-site.xml by > > overriding the nutch-default.xml > > > > > > <property> > > <name>db.fetch.schedule.class</name> > > <value>com.custom.CustomEventFetchScheduler</value> > > </property> > > > > > > How do I include my custom logic so that it gets picked as a part of the > > crawl cycle. > > > > Regards | Vikas > > > > On Mon, May 21, 2012 at 6:14 PM, Markus Jelsma > > <[EMAIL PROTECTED]>wrote: > > > > > Yes, you can pass ParseMeta keys to the FetchSchedule as part of the > > > CrawlDatum's meta data as i did with: > > > https://issues.apache.org/jira/browse/NUTCH-1024 > > > > > > > > > -----Original message----- > > > > From:Vikas Hazrati <[EMAIL PROTECTED]> > > > > Sent: Mon 21-May-2012 13:44 > > > > To: [EMAIL PROTECTED] > > > > Subject: Setting the Fetch time with a CustomFetchSchedule > > > > > > > > Hi, > > > > > > > > I would like to implement a custom implementation of > > > AbstractFetchSchedule > > > > and would like to change the FetchTime on the basis of some > parameters > > > that > > > > I get as a part of my parsing. > > > > > > > > // something like this > > > > datum.setFetchTime(fetchTime + (long)datum.getFetchInterval() * 1000 > + > > > > customLogic); > > > > > > > > Right now I have a custom URLFilter and a custom parser which extends > > > > HtmlParseFilter. At the time of custom parsing, I come across some > > > > parameters which would help me define how should I define the > fetchtime > > > for > > > > that URL. I would like to pass these values to my > CustomFetchSchedule. > > > > > > > > Is there a way to do that? Can I pass them as a part of > configuration? > > > > > > > > Since I would get the data that i need to make a decision only as a > part > > > of > > > > Parse, would it be possible to pass this data to the FetchSchedule? > > > > > > > > Thoughts? > > > > > > > > Regards | Vikas > > > > > > > > > >
-
Re: Setting the Fetch time with a CustomFetchScheduleVikas Hazrati 2012-06-01, 17:46
Ok, the class gets called after I include it as a part of the classpath.
Thanks On Tue, May 29, 2012 at 4:28 PM, Vikas Hazrati <[EMAIL PROTECTED]> wrote: > Thanks Markus, would try with the classpath. I believe I did try that > > > <property> > > <name>db.fetch.schedule.class</name> > > <value>com.custom.CustomEventFetchScheduler</value> > > </property> > > but would give it a try again and let the group know... > > On Tue, May 29, 2012 at 2:49 PM, Markus Jelsma <[EMAIL PROTECTED] > > wrote: > >> -----Original message----- >> > From:Vikas Hazrati <[EMAIL PROTECTED]> >> > Sent: Mon 28-May-2012 13:55 >> > To: [EMAIL PROTECTED] >> > Subject: Re: Setting the Fetch time with a CustomFetchSchedule >> > >> > Thanks Markus, what I understand from the code is that I should be able >> to >> > extract and pass meta information from my ParsePlugin and access that >> as a >> > part of the custom fetch schedule which extends AbstractFetchSchedule. >> > >> > If I create a custom fetch class as >> > >> > class CustomEventFetchScheduler extends AbstractFetchSchedule { ...} >> > >> > how do i include this custom class a part of my crawl cycle? I >> understand >> > that there is no extension point for this? >> >> Indeed, there is no extension point so you cannot make a nice plugin. >> What you can do is make sure it's on the classpath and simply tell the >> scheduler to use it via db.fetch.schedule.class, that should work just fine. >> >> > >> > I get this -> Caused by: java.lang.RuntimeException: Plugin >> > (12kdaggregator), extension point: org.apache.nutch.crawl.FetchSchedule >> > does not exist. >> > >> > Also I could not successfully plug it as a part of nutch-site.xml by >> > overriding the nutch-default.xml >> > >> > >> > <property> >> > <name>db.fetch.schedule.class</name> >> > <value>com.custom.CustomEventFetchScheduler</value> >> > </property> >> > >> > >> > How do I include my custom logic so that it gets picked as a part of the >> > crawl cycle. >> > >> > Regards | Vikas >> > >> > On Mon, May 21, 2012 at 6:14 PM, Markus Jelsma >> > <[EMAIL PROTECTED]>wrote: >> > >> > > Yes, you can pass ParseMeta keys to the FetchSchedule as part of the >> > > CrawlDatum's meta data as i did with: >> > > https://issues.apache.org/jira/browse/NUTCH-1024 >> > > >> > > >> > > -----Original message----- >> > > > From:Vikas Hazrati <[EMAIL PROTECTED]> >> > > > Sent: Mon 21-May-2012 13:44 >> > > > To: [EMAIL PROTECTED] >> > > > Subject: Setting the Fetch time with a CustomFetchSchedule >> > > > >> > > > Hi, >> > > > >> > > > I would like to implement a custom implementation of >> > > AbstractFetchSchedule >> > > > and would like to change the FetchTime on the basis of some >> parameters >> > > that >> > > > I get as a part of my parsing. >> > > > >> > > > // something like this >> > > > datum.setFetchTime(fetchTime + (long)datum.getFetchInterval() * >> 1000 + >> > > > customLogic); >> > > > >> > > > Right now I have a custom URLFilter and a custom parser which >> extends >> > > > HtmlParseFilter. At the time of custom parsing, I come across some >> > > > parameters which would help me define how should I define the >> fetchtime >> > > for >> > > > that URL. I would like to pass these values to my >> CustomFetchSchedule. >> > > > >> > > > Is there a way to do that? Can I pass them as a part of >> configuration? >> > > > >> > > > Since I would get the data that i need to make a decision only as a >> part >> > > of >> > > > Parse, would it be possible to pass this data to the FetchSchedule? >> > > > >> > > > Thoughts? >> > > > >> > > > Regards | Vikas >> > > > >> > > >> > >> > > |