|
Arturo Beltran
2010-06-17, 14:39
Ken Krugler
2010-06-17, 16:25
Arturo Beltran
2010-06-21, 10:34
Ken Krugler
2010-06-21, 17:04
Arturo Beltran
2010-07-07, 11:25
Mattmann, Chris A
2010-07-07, 14:04
Arturo Beltran
2010-07-13, 10:28
Nick Burch
2010-07-13, 10:37
Arturo Beltran
2010-07-13, 10:54
Nick Burch
2010-07-13, 11:03
Mattmann, Chris A
2010-07-13, 14:01
Arturo Beltran
2010-07-14, 10:31
Arturo Beltran
2010-07-16, 11:17
Mattmann, Chris A
2010-07-16, 15:53
|
-
Getting startedArturo Beltran 2010-06-17, 14:39
Hi all,
Some of you already know that I'm working on a new parser (https://issues.apache.org/jira/browse/TIKA-443). After all day trying to set up a workspace for Eclipse, I implemented the typical "hello world" class, in the Tika Parser version. My problem now, is how to configure Tika in order to call my new parser when a file with especific extension (p.e. *.shp) is found. I read something about a configuration file (tika-config.xml) but I couldn't find it in the source code. Greetings and thanks in advance Arturo -- Arturo Beltran Fonollosa Institute of New Imaging Technologies (INIT): http://www.init.uji.es Geographic Information research group: http://www.geoinfo.uji.es Universitat Jaume I, Avda. de Vicente Sos Baynat s/n E-12071, Castellón, Spain mailto: [EMAIL PROTECTED]
-
Re: Getting startedKen Krugler 2010-06-17, 16:25
Hi Arturo,
> Some of you already know that I'm working on a new parser (https://issues.apache.org/jira/browse/TIKA-443 > ). After all day trying to set up a workspace for Eclipse, I > implemented the typical "hello world" class, in the Tika Parser > version. My problem now, is how to configure Tika in order to call > my new parser when a file with especific extension (p.e. *.shp) is > found. I read something about a configuration file (tika-config.xml) > but I couldn't find it in the source code. You first need to modify tika-core/src/main/resources/tika- mimetypes.xml. E.g. something like this was done for mailbox files. <mime-type type="application/mbox"> <sub-class-of type="text/plain"/> <glob pattern="*.mbox"/> </mime-type> That maps the suffix to the mime-type. Then you define the SUPPORTED_TYPES static class field in your parser class that defines what mime-types it supports. E.g. for MboxParser: public class MboxParser implements Parser { private static final Set<MediaType> SUPPORTED_TYPES Collections.singleton(MediaType.application("mbox")); -- Ken -------------------------------------------- <http://ken-blog.krugler.org> +1 530-265-2225 -------------------------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
-
Re: Getting startedArturo Beltran 2010-06-21, 10:34
Hi Ken,
First of all, thanks for your quick response. This's exactly what I'm doing, but despite that Tika recognizes the new MIME tipe, my new parser is not called. I added to tika-mimetypes.xml: <mime-type type="application/shp"> <!--sub-class-of type="application/octet-stream"/--> <glob pattern="*.shp"/> </mime-type> I created a new class GeoParser: public class GeoParser implements Parser { private static final Set<MediaType> SUPPORTED_TYPES = Collections.singleton(MediaType.application("shp")); public static final String SHP_MIME_TYPE = "application/shp"; public Set<MediaType> getSupportedTypes(ParseContext context) { return SUPPORTED_TYPES; } public void parse( InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException { metadata.set(Metadata.CONTENT_TYPE, SHP_MIME_TYPE); metadata.set("Hello", "World"); System.out.println("HELLO WORLD"); System.err.println("ERR Hello world"); XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata); xhtml.startDocument(); xhtml.endDocument(); } ... } And that's the result: Content-Length: 755072 Content-Type: application/shp resourceName: comarques250.shp I don't know wht exactly is failing, but I can't make it work. Greetings and thanks in advance for your help. Arturo El 17/06/2010 18:25, Ken Krugler escribi�: > Hi Arturo, > >> Some of you already know that I'm working on a new parser >> (https://issues.apache.org/jira/browse/TIKA-443). After all day >> trying to set up a workspace for Eclipse, I implemented the typical >> "hello world" class, in the Tika Parser version. My problem now, is >> how to configure Tika in order to call my new parser when a file with >> especific extension (p.e. *.shp) is found. I read something about a >> configuration file (tika-config.xml) but I couldn't find it in the >> source code. > > You first need to modify tika-core/src/main/resources/tika-mimetypes.xml. > > E.g. something like this was done for mailbox files. > > <mime-type type="application/mbox"> > <sub-class-of type="text/plain"/> > <glob pattern="*.mbox"/> > </mime-type> > > That maps the suffix to the mime-type. > > Then you define the SUPPORTED_TYPES static class field in your parser > class that defines what mime-types it supports. > > E.g. for MboxParser: > > public class MboxParser implements Parser { > > private static final Set<MediaType> SUPPORTED_TYPES > Collections.singleton(MediaType.application("mbox")); > > > -- Ken > > -------------------------------------------- > <http://ken-blog.krugler.org> > +1 530-265-2225 > > > > > > > -------------------------------------------- > Ken Krugler > +1 530-210-6378 > http://bixolabs.com > e l a s t i c w e b m i n i n g > > > > > -- Arturo Beltran Fonollosa Institute of New Imaging Technologies (INIT): http://www.init.uji.es Geographic Information research group: http://www.geoinfo.uji.es Universitat Jaume I, Avda. de Vicente Sos Baynat s/n E-12071, Castell�n, Spain mailto: [EMAIL PROTECTED]
-
Re: Getting startedKen Krugler 2010-06-21, 17:04
Are you sure your new parser is on the classpath?
E.g. put a break on getSupportedTypes() and make sure that's getting called - if not, then the parser isn't being "found" by Tika. -- Ken On Jun 21, 2010, at 3:34am, Arturo Beltran wrote: > Hi Ken, > > First of all, thanks for your quick response. > This's exactly what I'm doing, but despite that Tika recognizes the > new MIME tipe, my new parser is not called. > > I added to tika-mimetypes.xml: > > <mime-type type="application/shp"> > <!--sub-class-of type="application/octet-stream"/--> > <glob pattern="*.shp"/> > </mime-type> > > I created a new class GeoParser: > > public class GeoParser implements Parser { > > private static final Set<MediaType> SUPPORTED_TYPES = > Collections.singleton(MediaType.application("shp")); > public static final String SHP_MIME_TYPE = "application/shp"; > > public Set<MediaType> getSupportedTypes(ParseContext context) { > return SUPPORTED_TYPES; > } > > public void parse( > InputStream stream, ContentHandler handler, > Metadata metadata, ParseContext context) > throws IOException, SAXException, TikaException { > > metadata.set(Metadata.CONTENT_TYPE, SHP_MIME_TYPE); > metadata.set("Hello", "World"); > > System.out.println("HELLO WORLD"); > System.err.println("ERR Hello world"); > > XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, > metadata); > xhtml.startDocument(); > xhtml.endDocument(); > } > ... > } > > And that's the result: > > Content-Length: 755072 > Content-Type: application/shp > resourceName: comarques250.shp > > I don't know wht exactly is failing, but I can't make it work. > > Greetings and thanks in advance for your help. > Arturo > > > El 17/06/2010 18:25, Ken Krugler escribió: >> Hi Arturo, >> >>> Some of you already know that I'm working on a new parser (https://issues.apache.org/jira/browse/TIKA-443 >>> ). After all day trying to set up a workspace for Eclipse, I >>> implemented the typical "hello world" class, in the Tika Parser >>> version. My problem now, is how to configure Tika in order to call >>> my new parser when a file with especific extension (p.e. *.shp) is >>> found. I read something about a configuration file (tika- >>> config.xml) but I couldn't find it in the source code. >> >> You first need to modify tika-core/src/main/resources/tika- >> mimetypes.xml. >> >> E.g. something like this was done for mailbox files. >> >> <mime-type type="application/mbox"> >> <sub-class-of type="text/plain"/> >> <glob pattern="*.mbox"/> >> </mime-type> >> >> That maps the suffix to the mime-type. >> >> Then you define the SUPPORTED_TYPES static class field in your >> parser class that defines what mime-types it supports. >> >> E.g. for MboxParser: >> >> public class MboxParser implements Parser { >> >> private static final Set<MediaType> SUPPORTED_TYPES >> Collections.singleton(MediaType.application("mbox")); >> >> >> -- Ken >> >> -------------------------------------------- >> <http://ken-blog.krugler.org> >> +1 530-265-2225 >> >> >> >> >> >> >> -------------------------------------------- >> Ken Krugler >> +1 530-210-6378 >> http://bixolabs.com >> e l a s t i c w e b m i n i n g >> >> >> >> >> > > > -- > Arturo Beltran Fonollosa > Institute of New Imaging Technologies (INIT): http://www.init.uji.es > Geographic Information research group: http://www.geoinfo.uji.es > Universitat Jaume I, Avda. de Vicente Sos Baynat s/n > E-12071, Castellón, Spain > mailto: [EMAIL PROTECTED] > -------------------------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
-
Re: Getting startedArturo Beltran 2010-07-07, 11:25
Hi,
I'm still with the same problem. I think it's all good, I do the/ "mvn install/" and my new class is included in the generated JAR, but never called. It should be very simple. I feel a little silly. I don't know how to make my new parser is found by Tika. Thanks in advance Arturo El 21/06/2010 19:04, Ken Krugler escribi�: > Are you sure your new parser is on the classpath? > > E.g. put a break on getSupportedTypes() and make sure that's getting > called - if not, then the parser isn't being "found" by Tika. > > -- Ken > > On Jun 21, 2010, at 3:34am, Arturo Beltran wrote: > >> Hi Ken, >> >> First of all, thanks for your quick response. >> This's exactly what I'm doing, but despite that Tika recognizes the >> new MIME tipe, my new parser is not called. >> >> I added to tika-mimetypes.xml: >> >> <mime-type type="application/shp"> >> <!--sub-class-of type="application/octet-stream"/--> >> <glob pattern="*.shp"/> >> </mime-type> >> >> I created a new class GeoParser: >> >> public class GeoParser implements Parser { >> >> private static final Set<MediaType> SUPPORTED_TYPES = >> Collections.singleton(MediaType.application("shp")); >> public static final String SHP_MIME_TYPE = "application/shp"; >> >> public Set<MediaType> getSupportedTypes(ParseContext context) { >> return SUPPORTED_TYPES; >> } >> >> public void parse( >> InputStream stream, ContentHandler handler, >> Metadata metadata, ParseContext context) >> throws IOException, SAXException, TikaException { >> >> metadata.set(Metadata.CONTENT_TYPE, SHP_MIME_TYPE); >> metadata.set("Hello", "World"); >> >> System.out.println("HELLO WORLD"); >> System.err.println("ERR Hello world"); >> >> XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, >> metadata); >> xhtml.startDocument(); >> xhtml.endDocument(); >> } >> ... >> } >> >> And that's the result: >> >> Content-Length: 755072 >> Content-Type: application/shp >> resourceName: comarques250.shp >> >> I don't know wht exactly is failing, but I can't make it work. >> >> Greetings and thanks in advance for your help. >> Arturo >> >> >> El 17/06/2010 18:25, Ken Krugler escribi�: >>> Hi Arturo, >>> >>>> Some of you already know that I'm working on a new parser >>>> (https://issues.apache.org/jira/browse/TIKA-443). After all day >>>> trying to set up a workspace for Eclipse, I implemented the typical >>>> "hello world" class, in the Tika Parser version. My problem now, is >>>> how to configure Tika in order to call my new parser when a file >>>> with especific extension (p.e. *.shp) is found. I read something >>>> about a configuration file (tika-config.xml) but I couldn't find it >>>> in the source code. >>> >>> You first need to modify >>> tika-core/src/main/resources/tika-mimetypes.xml. >>> >>> E.g. something like this was done for mailbox files. >>> >>> <mime-type type="application/mbox"> >>> <sub-class-of type="text/plain"/> >>> <glob pattern="*.mbox"/> >>> </mime-type> >>> >>> That maps the suffix to the mime-type. >>> >>> Then you define the SUPPORTED_TYPES static class field in your >>> parser class that defines what mime-types it supports. >>> >>> E.g. for MboxParser: >>> >>> public class MboxParser implements Parser { >>> >>> private static final Set<MediaType> SUPPORTED_TYPES >>> Collections.singleton(MediaType.application("mbox")); >>> >>> >>> -- Ken >>> >>> -------------------------------------------- >>> <http://ken-blog.krugler.org> >>> +1 530-265-2225 >>> >>> >>> >>> >>> >>> >>> -------------------------------------------- >>> Ken Krugler >>> +1 530-210-6378 >>> http://bixolabs.com >>> e l a s t i c w e b m i n i n g >>> >>> >>> >>> >>> >> >> >> -- >> Arturo Beltran Fonollosa >> Institute of New Imaging Technologies (INIT): http://www.init.uji.es >> Geographic Information research group: http://www.geoinfo.uji.es >> Universitat Jaume I, Avda. de Vicente Sos Baynat s/n Arturo Beltran Fonollosa Institute of New Imaging Technologies (INIT): http://www.init.uji.es Geographic Information research group: http://www.geoinfo.uji.es Universitat Jaume I, Avda. de Vicente Sos Baynat s/n E-12071, Castell�n, Spain mailto: [EMAIL PROTECTED]
-
Re: Getting startedMattmann, Chris A 2010-07-07, 14:04
Hi Arturo,
How exactly are you calling your parser? Are you using the AutoDetectParser? If so, can you put some print statements in in the public void parse(...) method of CompositeParser? Specifically, add a line right after: Parser parser = getParser(metadata); // print out the returned parser System.out.println("Parser returned is: ["+parser.getClass().getName()+"]"); What does that return? Also, have you done the work to map your incoming document type in the tika-mimetypes.xml file? That is, if you're using AutoDetectParser or anything that extends CompositeParser, the mime type of the incoming document is used to determine what parser gets called? Is the mime type being detected appropriately? You can check this by putting a println right before getParser in the parse(...) method: // print the mime type System.out.println("The MIME type is: ["+ metadata.get(Metadata.CONTENT_TYPE)+"]); Parser parser = getParser(metadata); What does that print out? Finally if both of these printlns check out, you should check and make sure that your new parser is correctly mapped to the media type it supports, in other words what Ken said below. Does your parser declare that it supports your expected MIME type? Let me know and thanks! Cheers, Chris On 7/7/10 4:25 AM, "Arturo Beltran" <[EMAIL PROTECTED]> wrote: Hi, I'm still with the same problem. I think it's all good, I do the/ "mvn install/" and my new class is included in the generated JAR, but never called. It should be very simple. I feel a little silly. I don't know how to make my new parser is found by Tika. Thanks in advance Arturo El 21/06/2010 19:04, Ken Krugler escribió: > Are you sure your new parser is on the classpath? > > E.g. put a break on getSupportedTypes() and make sure that's getting > called - if not, then the parser isn't being "found" by Tika. > > -- Ken > > On Jun 21, 2010, at 3:34am, Arturo Beltran wrote: > >> Hi Ken, >> >> First of all, thanks for your quick response. >> This's exactly what I'm doing, but despite that Tika recognizes the >> new MIME tipe, my new parser is not called. >> >> I added to tika-mimetypes.xml: >> >> <mime-type type="application/shp"> >> <!--sub-class-of type="application/octet-stream"/--> >> <glob pattern="*.shp"/> >> </mime-type> >> >> I created a new class GeoParser: >> >> public class GeoParser implements Parser { >> >> private static final Set<MediaType> SUPPORTED_TYPES >> Collections.singleton(MediaType.application("shp")); >> public static final String SHP_MIME_TYPE = "application/shp"; >> >> public Set<MediaType> getSupportedTypes(ParseContext context) { >> return SUPPORTED_TYPES; >> } >> >> public void parse( >> InputStream stream, ContentHandler handler, >> Metadata metadata, ParseContext context) >> throws IOException, SAXException, TikaException { >> >> metadata.set(Metadata.CONTENT_TYPE, SHP_MIME_TYPE); >> metadata.set("Hello", "World"); >> >> System.out.println("HELLO WORLD"); >> System.err.println("ERR Hello world"); >> >> XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, >> metadata); >> xhtml.startDocument(); >> xhtml.endDocument(); >> } >> ... >> } >> >> And that's the result: >> >> Content-Length: 755072 >> Content-Type: application/shp >> resourceName: comarques250.shp >> >> I don't know wht exactly is failing, but I can't make it work. >> >> Greetings and thanks in advance for your help. >> Arturo >> >> >> El 17/06/2010 18:25, Ken Krugler escribió: >>> Hi Arturo, >>> >>>> Some of you already know that I'm working on a new parser >>>> (https://issues.apache.org/jira/browse/TIKA-443). After all day >>>> trying to set up a workspace for Eclipse, I implemented the typical >>>> "hello world" class, in the Tika Parser version. My problem now, is >>>> how to configure Tika in order to call my new parser when a file >>>> with especific extension (p.e. *.shp) is found. I read something Arturo Beltran Fonollosa Institute of New Imaging Technologies (INIT): http://www.init.uji.es Geographic Information research group: http://www.geoinfo.uji.es Universitat Jaume I, Avda. de Vicente Sos Baynat s/n E-12071, Castellón, Spain mailto: [EMAIL PROTECTED] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: Getting startedArturo Beltran 2010-07-13, 10:28
Hi Chris and all,
El 07/07/2010 16:04, Mattmann, Chris A (388J) escribi�: > Hi Arturo, > > How exactly are you calling your parser? Are you using the AutoDetectParser? If so, can you put some print statements in in the public void parse(...) method of CompositeParser? Specifically, add a line right after: > I'm calling my parser using the Tika-app included, so I think I'm using AutoDetectParser. > > Parser parser = getParser(metadata); > // print out the returned parser > System.out.println("Parser returned is: ["+parser.getClass().getName()+"]"); > > What does that return? Also, have you done the work to map your incoming document type in the tika-mimetypes.xml file? Yes, sure. > That is, if you're using AutoDetectParser or anything that extends CompositeParser, the mime type of the incoming document is used to determine what parser gets called? Is the mime type being detected appropriately? You can check this by putting a println right before getParser in the parse(...) method: > Yes, it returns "application/shp" > // print the mime type > System.out.println("The MIME type is: ["+ metadata.get(Metadata.CONTENT_TYPE)+"]); > Parser parser = getParser(metadata); > > What does that print out? > > Finally if both of these printlns check out, you should check and make sure that your new parser is correctly mapped to the media type it supports, in other words what Ken said below. Does your parser declare that it supports your expected MIME type? > Yes I declared this MIME type in my parser. But the /getSupportedTypes(context)/ function is never called. I uploaded a file with the Tika source code that includes my modified /tika-mimetypes.xml/ file and my new parser /GeoParser.java/. Perhaps one of you will try it and find out where I'm wrong. Here the link: http://elcano.dlsi.uji.es/arturo/tika_geo.zip Greetings and thanks in advance for your help, Arturo > Let me know and thanks! > > Cheers, > Chris > > > > > On 7/7/10 4:25 AM, "Arturo Beltran"<[EMAIL PROTECTED]> wrote: > > Hi, > > I'm still with the same problem. > I think it's all good, I do the/ "mvn install/" and my new class is > included in the generated JAR, but never called. > It should be very simple. I feel a little silly. I don't know how to > make my new parser is found by Tika. > > Thanks in advance > Arturo > > > El 21/06/2010 19:04, Ken Krugler escribi�: > >> Are you sure your new parser is on the classpath? >> >> E.g. put a break on getSupportedTypes() and make sure that's getting >> called - if not, then the parser isn't being "found" by Tika. >> >> -- Ken >> >> On Jun 21, 2010, at 3:34am, Arturo Beltran wrote: >> >> >>> Hi Ken, >>> >>> First of all, thanks for your quick response. >>> This's exactly what I'm doing, but despite that Tika recognizes the >>> new MIME tipe, my new parser is not called. >>> >>> I added to tika-mimetypes.xml: >>> >>> <mime-type type="application/shp"> >>> <!--sub-class-of type="application/octet-stream"/--> >>> <glob pattern="*.shp"/> >>> </mime-type> >>> >>> I created a new class GeoParser: >>> >>> public class GeoParser implements Parser { >>> >>> private static final Set<MediaType> SUPPORTED_TYPES >>> Collections.singleton(MediaType.application("shp")); >>> public static final String SHP_MIME_TYPE = "application/shp"; >>> >>> public Set<MediaType> getSupportedTypes(ParseContext context) { >>> return SUPPORTED_TYPES; >>> } >>> >>> public void parse( >>> InputStream stream, ContentHandler handler, >>> Metadata metadata, ParseContext context) >>> throws IOException, SAXException, TikaException { >>> >>> metadata.set(Metadata.CONTENT_TYPE, SHP_MIME_TYPE); >>> metadata.set("Hello", "World"); >>> >>> System.out.println("HELLO WORLD"); >>> System.err.println("ERR Hello world"); >>> >>> XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, >>> metadata); >>> xhtml.startDocument(); Arturo Beltran Fonollosa Institute of New Imaging Technologies (INIT): http://www.init.uji.es Geographic Information research group: http://www.geoinfo.uji.es Universitat Jaume I, Avda. de Vicente Sos Baynat s/n E-12071, Castell�n, Spain mailto: [EMAIL PROTECTED]
-
Re: Getting startedNick Burch 2010-07-13, 10:37
On Tue, 13 Jul 2010, Arturo Beltran wrote:
> I'm calling my parser using the Tika-app included, so I think I'm using > AutoDetectParser. You have to explicitly tell the AutoDetectParser to try your parser, in addition to the mime type definition List your new parser in: tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser and I think it should then be picked up Nick
-
Re: Getting startedArturo Beltran 2010-07-13, 10:54
That was my "big" problem all this time, I almost went crazy. Now it
works perfectly, thank you very much for your help. It might be interesting to write a small manual: "How to create a new Tika Parser for Dummies". Simply including the three steps that I have finally figured out (new Parser, tika-mimetypes.xml, list the new parser). Greetings and thanks Nick it has been a great help El 13/07/2010 12:37, Nick Burch escribi�: > On Tue, 13 Jul 2010, Arturo Beltran wrote: >> I'm calling my parser using the Tika-app included, so I think I'm >> using AutoDetectParser. > > You have to explicitly tell the AutoDetectParser to try your parser, > in addition to the mime type definition > > List your new parser in: > tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser > > and I think it should then be picked up > > Nick > -- Arturo Beltran Fonollosa Institute of New Imaging Technologies (INIT): http://www.init.uji.es Geographic Information research group: http://www.geoinfo.uji.es Universitat Jaume I, Avda. de Vicente Sos Baynat s/n E-12071, Castell�n, Spain mailto: [EMAIL PROTECTED]
-
Re: Getting startedNick Burch 2010-07-13, 11:03
On Tue, 13 Jul 2010, Arturo Beltran wrote:
> It might be interesting to write a small manual: "How to create a new Tika > Parser for Dummies". Simply including the three steps that I have finally > figured out (new Parser, tika-mimetypes.xml, list the new parser). The 3rd step is only needed if you want to use the auto detect parser. If you figure out the correct parser a different way, it isn't needed It sounds like a very helpful short document though. The wiki is at http://wiki.apache.org/tika/ if you fancy writing it up :) Nick
-
Re: Getting startedMattmann, Chris A 2010-07-13, 14:01
Thanks Nick and thanks Arturo, for the offer to write a small guide to getting started with parsing. It might be good to create a JIRA issue for this? Arturo, can you head over to JIRA and create an issue to contribute a "get Tika parsing up and running in 5 minutes" quick start guide? Then, you could write the guide in APT format (see here [1] for an example and [2] for more detailed information), add your new guide file to your local SVN checkout, create a patch and then attach it to your new issue. I'd be happy to get it into the documentation sources.
Thanks! Cheers, Chris [1] http://svn.apache.org/repos/asf/tika/trunk/src/site/apt/formats.apt [2] http://maven.apache.org/doxia/references/apt-format.html On 7/13/10 3:54 AM, "Arturo Beltran" <[EMAIL PROTECTED]> wrote: That was my "big" problem all this time, I almost went crazy. Now it works perfectly, thank you very much for your help. It might be interesting to write a small manual: "How to create a new Tika Parser for Dummies". Simply including the three steps that I have finally figured out (new Parser, tika-mimetypes.xml, list the new parser). Greetings and thanks Nick it has been a great help El 13/07/2010 12:37, Nick Burch escribió: > On Tue, 13 Jul 2010, Arturo Beltran wrote: >> I'm calling my parser using the Tika-app included, so I think I'm >> using AutoDetectParser. > > You have to explicitly tell the AutoDetectParser to try your parser, > in addition to the mime type definition > > List your new parser in: > tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser > > and I think it should then be picked up > > Nick > -- Arturo Beltran Fonollosa Institute of New Imaging Technologies (INIT): http://www.init.uji.es Geographic Information research group: http://www.geoinfo.uji.es Universitat Jaume I, Avda. de Vicente Sos Baynat s/n E-12071, Castellón, Spain mailto: [EMAIL PROTECTED] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: Getting startedArturo Beltran 2010-07-14, 10:31
No problem, I'll do it.
El 13/07/2010 16:01, Mattmann, Chris A (388J) escribi�: > Thanks Nick and thanks Arturo, for the offer to write a small guide to getting started with parsing. It might be good to create a JIRA issue for this? Arturo, can you head over to JIRA and create an issue to contribute a "get Tika parsing up and running in 5 minutes" quick start guide? Then, you could write the guide in APT format (see here [1] for an example and [2] for more detailed information), add your new guide file to your local SVN checkout, create a patch and then attach it to your new issue. I'd be happy to get it into the documentation sources. > > Thanks! > > Cheers, > Chris > > [1] http://svn.apache.org/repos/asf/tika/trunk/src/site/apt/formats.apt > [2] http://maven.apache.org/doxia/references/apt-format.html > > > On 7/13/10 3:54 AM, "Arturo Beltran"<[EMAIL PROTECTED]> wrote: > > That was my "big" problem all this time, I almost went crazy. Now it > works perfectly, thank you very much for your help. > > It might be interesting to write a small manual: "How to create a new > Tika Parser for Dummies". Simply including the three steps that I have > finally figured out (new Parser, tika-mimetypes.xml, list the new parser). > > Greetings and thanks Nick it has been a great help > > > > El 13/07/2010 12:37, Nick Burch escribi�: > >> On Tue, 13 Jul 2010, Arturo Beltran wrote: >> >>> I'm calling my parser using the Tika-app included, so I think I'm >>> using AutoDetectParser. >>> >> You have to explicitly tell the AutoDetectParser to try your parser, >> in addition to the mime type definition >> >> List your new parser in: >> tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser >> >> and I think it should then be picked up >> >> Nick >> >> > > -- > Arturo Beltran Fonollosa > Institute of New Imaging Technologies (INIT): http://www.init.uji.es > Geographic Information research group: http://www.geoinfo.uji.es > Universitat Jaume I, Avda. de Vicente Sos Baynat s/n > E-12071, Castell�n, Spain > mailto: [EMAIL PROTECTED] > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > -- Arturo Beltran Fonollosa Geographic Information research group: http://www.geoinfo.uji.es Centro de Visualizaci�n Interactiva (CeVI) http://www.cevi.uji.es Departamento de Lenguajes y Sistemas Inform�ticos (LSI) Universitat Jaume I, Avda. de Vicente Sos Baynat s/n E-12071, Castell�n, Spain mailto: [EMAIL PROTECTED]
-
Re: Getting startedArturo Beltran 2010-07-16, 11:17
The guide is ready.
It can be found attached at: https://issues.apache.org/jira/browse/TIKA-464 Greetings and have nice weekend Arturo El 13/07/2010 16:01, Mattmann, Chris A (388J) escribi�: > Thanks Nick and thanks Arturo, for the offer to write a small guide to getting started with parsing. It might be good to create a JIRA issue for this? Arturo, can you head over to JIRA and create an issue to contribute a "get Tika parsing up and running in 5 minutes" quick start guide? Then, you could write the guide in APT format (see here [1] for an example and [2] for more detailed information), add your new guide file to your local SVN checkout, create a patch and then attach it to your new issue. I'd be happy to get it into the documentation sources. > > Thanks! > > Cheers, > Chris > > [1] http://svn.apache.org/repos/asf/tika/trunk/src/site/apt/formats.apt > [2] http://maven.apache.org/doxia/references/apt-format.html > > > On 7/13/10 3:54 AM, "Arturo Beltran"<[EMAIL PROTECTED]> wrote: > > That was my "big" problem all this time, I almost went crazy. Now it > works perfectly, thank you very much for your help. > > It might be interesting to write a small manual: "How to create a new > Tika Parser for Dummies". Simply including the three steps that I have > finally figured out (new Parser, tika-mimetypes.xml, list the new parser). > > Greetings and thanks Nick it has been a great help > > > > El 13/07/2010 12:37, Nick Burch escribi�: > >> On Tue, 13 Jul 2010, Arturo Beltran wrote: >> >>> I'm calling my parser using the Tika-app included, so I think I'm >>> using AutoDetectParser. >>> >> You have to explicitly tell the AutoDetectParser to try your parser, >> in addition to the mime type definition >> >> List your new parser in: >> tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser >> >> and I think it should then be picked up >> >> Nick >> >> > > -- > Arturo Beltran Fonollosa > Institute of New Imaging Technologies (INIT): http://www.init.uji.es > Geographic Information research group: http://www.geoinfo.uji.es > Universitat Jaume I, Avda. de Vicente Sos Baynat s/n > E-12071, Castell�n, Spain > mailto: [EMAIL PROTECTED] > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > -- Arturo Beltran Fonollosa Institute of New Imaging Technologies (INIT): http://www.init.uji.es Geographic Information research group: http://www.geoinfo.uji.es Universitat Jaume I, Avda. de Vicente Sos Baynat s/n E-12071, Castell�n, Spain mailto: [EMAIL PROTECTED]
-
Re: Getting startedMattmann, Chris A 2010-07-16, 15:53
Hi Arturo,
Working on committing it right now, thanks! Cheers, Chris On 7/16/10 4:17 AM, "Arturo Beltran" <[EMAIL PROTECTED]> wrote: The guide is ready. It can be found attached at: https://issues.apache.org/jira/browse/TIKA-464 Greetings and have nice weekend Arturo El 13/07/2010 16:01, Mattmann, Chris A (388J) escribió: > Thanks Nick and thanks Arturo, for the offer to write a small guide to getting started with parsing. It might be good to create a JIRA issue for this? Arturo, can you head over to JIRA and create an issue to contribute a "get Tika parsing up and running in 5 minutes" quick start guide? Then, you could write the guide in APT format (see here [1] for an example and [2] for more detailed information), add your new guide file to your local SVN checkout, create a patch and then attach it to your new issue. I'd be happy to get it into the documentation sources. > > Thanks! > > Cheers, > Chris > > [1] http://svn.apache.org/repos/asf/tika/trunk/src/site/apt/formats.apt > [2] http://maven.apache.org/doxia/references/apt-format.html > > > On 7/13/10 3:54 AM, "Arturo Beltran"<[EMAIL PROTECTED]> wrote: > > That was my "big" problem all this time, I almost went crazy. Now it > works perfectly, thank you very much for your help. > > It might be interesting to write a small manual: "How to create a new > Tika Parser for Dummies". Simply including the three steps that I have > finally figured out (new Parser, tika-mimetypes.xml, list the new parser). > > Greetings and thanks Nick it has been a great help > > > > El 13/07/2010 12:37, Nick Burch escribió: > >> On Tue, 13 Jul 2010, Arturo Beltran wrote: >> >>> I'm calling my parser using the Tika-app included, so I think I'm >>> using AutoDetectParser. >>> >> You have to explicitly tell the AutoDetectParser to try your parser, >> in addition to the mime type definition >> >> List your new parser in: >> tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser >> >> and I think it should then be picked up >> >> Nick >> >> > > -- > Arturo Beltran Fonollosa > Institute of New Imaging Technologies (INIT): http://www.init.uji.es > Geographic Information research group: http://www.geoinfo.uji.es > Universitat Jaume I, Avda. de Vicente Sos Baynat s/n > E-12071, Castellón, Spain > mailto: [EMAIL PROTECTED] > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > -- Arturo Beltran Fonollosa Institute of New Imaging Technologies (INIT): http://www.init.uji.es Geographic Information research group: http://www.geoinfo.uji.es Universitat Jaume I, Avda. de Vicente Sos Baynat s/n E-12071, Castellón, Spain mailto: [EMAIL PROTECTED] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |