Extension of @see AdaptiveFetchSchedule that allows for more flexible configuration
of DEC and INC factors for various MIME-types.
This class can be typically used in cases where a recrawl consists of many different
MIME-types. It's not very common for MIME-types other than text/html to change frequently.
Using this class you can configure different factors per MIME-type so to prefer frequently
changing MIME-types over others.
For it to work this class relies on the Content-Type MetaData key being present in the CrawlDB.
This can either be done when injecting new URL's or by adding "Content-Type" to the
db.parsemeta.to.crawldb configuration setting to force MIME-types of newly discovered URL's to
be added to the CrawlDB.
Sets the fetchInterval and fetchTime on a
successfully fetched page. NOTE: this implementation resets the
retry counter - extending classes should call super.setFetchSchedule() to
preserve this behavior.
datum - page description to be adjusted. NOTE: this instance, passed by reference,
may be modified inside the method.
prevFetchTime - previous value of fetch time, or 0 if not available.
prevModifiedTime - previous value of modifiedTime, or 0 if not available.
fetchTime - the latest time, when the page was recently re-fetched. Most FetchSchedule
implementations should update the value in @see CrawlDatum to something greater than this value.
modifiedTime - last time the content was modified. This information comes from
the protocol implementations, or is set to < 0 if not available. Most FetchSchedule
implementations should update the value in @see CrawlDatum to this value.
adjusted page information, including all original information. NOTE: this may
be a different instance than @see CrawlDatum, but implementations should make sure that
it contains at least all information from @see CrawlDatum}.