-Re: get rid of outlink code for Tika
Mattmann, Chris A 2011-12-21, 15:42
+1 from me -- those 3 Tika content handlers should take care of it...
On Dec 21, 2011, at 6:51 AM, Markus Jelsma wrote:
> For using Boilerpipe we need LinkCH, BoilerpipeCH and TeeCH in Tika. LinkCH
> returns all URL's with some meta data such as title etc. Fixes for old parsers
> such as Neko are then obsolete.
> I propose to rely on Tika for all outlinks. Right now this means not all types
> are returned such as area, form and whatelse. Is this a big problem? Rel is
> also not returned but i patched Tika to do that so we can still do something
> with nofollow which is important.
> Markus Jelsma - CTO - Openindex
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [EMAIL PROTECTED]
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA