remi tassing 2012-07-03, 08:26
I have a similar problem and I'm planning to modify the parsing code...I
hope it works
On Mon, Jul 2, 2012 at 2:10 PM, Alexander Aristov <
[EMAIL PROTECTED]> wrote:
> if you referring to these links
> then these types of links cannot be processed and get discarded by url
> You need to live with it.
> Not sure if this idea has been discussed earlier but it would be
> browser in some way....
> Best Regards
> Alexander Aristov
> On 1 July 2012 22:14, arijit <[EMAIL PROTECTED]> wrote:
> > Hi,
> > links contain the meat of all information in this website. However, on
> > I have ensured the following:
> > nutch-site.xml contains parse-js in plugin.includes.
> > by plugin-id="parse-js"
> > regex-urlfiler.txt does not ignore js|JS - however, not sure this would
> > the following command:
> > http://districts.nic.in" does not result in the mentioned hrefs being
> > picked up as outlinks.
> > Any help in this regard, is much appreciated.
> > -Arijit