|
Kevin Milburn
2012-07-05, 17:10
Jukka Zitting
2012-07-05, 17:22
Uwe Schindler
2012-07-05, 17:26
Kevin Milburn
2012-07-06, 13:59
Kevin Milburn
2012-07-06, 15:00
Jukka Zitting
2012-07-06, 15:14
Kevin Milburn
2012-07-06, 16:27
Nick Burch
2012-07-06, 16:34
Jukka Zitting
2012-07-06, 16:43
Kevin Milburn
2012-07-06, 17:47
Jukka Zitting
2012-07-06, 21:31
Kevin Milburn
2012-07-16, 11:13
Jukka Zitting
2012-07-16, 11:30
rodgersh
2012-07-18, 21:04
Nick Burch
2012-07-18, 22:41
|
-
using tika with eclipseKevin Milburn 2012-07-05, 17:10
Hi
I've been trying to add tika 1.1 support to an Eclipse RCP application but am struggling to get the parsers loaded. I have both tika-core-1.1.jar and tika-bundle-1.1.jar plugins added to the target and selected within product and have confirmed both plugins are present in the running program. The fundamental problem appears to be that the TikaConfig is ultimately reaching ServiceLoader.findServiceResources, looking for META-INF/services/org.apache.tika.parser.Parser. While doing so, it only appears to check the org.apache.tika.core plugin, it doesn't contain it, so not Parsers are available. Any ideas where I may have gone wrong or how to get it working? TIA Kevin.
-
Re: using tika with eclipseJukka Zitting 2012-07-05, 17:22
Hi,
On Thu, Jul 5, 2012 at 7:10 PM, Kevin Milburn <[EMAIL PROTECTED]> wrote: > Any ideas where I may have gone wrong or how to get it working? In an OSGi environment Tika makes the Parser and Detector implementations available as OSGi services that tika-core then automatically picks up for use with things like AutoDetectParser and the Tika facade. In 1.1 you need declarative services support for that to happen, which is probably why you don't see the parsers coming up in your deployment. You can either deploy Tika 1.1 with declarative services, or upgrade to the latest 1.2 SNAPSHOT where declarative services is no longer needed (see https://issues.apache.org/jira/browse/TIKA-896). BR, Jukka Zitting
-
RE: using tika with eclipseUwe Schindler 2012-07-05, 17:26
Do you have the JAR files in classpath or do you extract them and merge all
class files and resources? This happens, e.g. if you ask Eclipse to create one uber-jar containing everything. The problem that then appears is, that every meta-inf file coming from separate jar files are overwriting each over. SPI is relying on actual jar packages as deployment units. If you only add the unmodified jar files to classpath, this should work. The same applies by the way for Solr and Lucene 4.0, which also use SPI for their codec infrastructure. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] > -----Original Message----- > From: Kevin Milburn [mailto:[EMAIL PROTECTED]] > Sent: Thursday, July 05, 2012 7:11 PM > To: [EMAIL PROTECTED] > Subject: using tika with eclipse > > Hi > > I've been trying to add tika 1.1 support to an Eclipse RCP application but am > struggling to get the parsers loaded. > I have both tika-core-1.1.jar and tika-bundle-1.1.jar plugins added to the target > and selected within product and have confirmed both plugins are present in the > running program. > > The fundamental problem appears to be that the TikaConfig is ultimately > reaching ServiceLoader.findServiceResources, looking for > META-INF/services/org.apache.tika.parser.Parser. While doing so, it > only appears to check the org.apache.tika.core plugin, it doesn't contain it, so > not Parsers are available. > > Any ideas where I may have gone wrong or how to get it working? > > TIA > Kevin.
-
Re: using tika with eclipseKevin Milburn 2012-07-06, 13:59
On 2012/07/05 18:26, Uwe Schindler wrote:
> Do you have the JAR files in classpath or do you extract them and merge all > class files and resources? This happens, e.g. if you ask Eclipse to create > one uber-jar containing everything. The problem that then appears is, that > every meta-inf file coming from separate jar files are overwriting each > over. SPI is relying on actual jar packages as deployment units. The JAR files (which are pulled from a Maven repository) have been added to the plugins section of the RCP product and are both loaded (i.e. on the apps classpath). The problem stems from the tika-bundle not being on the classpath of tika-core bundle. I could repackage the tika-core and tika-bundle into a single OSGI-Bundle, effectively replicating the bundle before the 1.0 release. However, this would seem to defeat the purpose of the OSGi-bundles provided by the tika project. Also, From what I can gather, SPI is the cause of the problem, as OSGI and SPI are largely incompatible.
-
Re: using tika with eclipseKevin Milburn 2012-07-06, 15:00
On 2012/07/05 18:22, Jukka Zitting wrote:
> upgrade to the latest 1.2 SNAPSHOT where declarative services is no > longer needed (see https://issues.apache.org/jira/browse/TIKA-896). I've built and installed the 1.2 SNAPSHOT, but it has made no difference. It still suffers from the same fundamental problem that the ServiceLoader (in tika-core) cannot find "META-INF/services/org.apache.tika.parser.Parser" (in tika-bundle). Is there any guidance anywhere on how to setup an eclipse RCP application to use the bundles? Kevin..
-
Re: using tika with eclipseJukka Zitting 2012-07-06, 15:14
Hi,
On Fri, Jul 6, 2012 at 5:00 PM, Kevin Milburn <[EMAIL PROTECTED]> wrote: > On 2012/07/05 18:22, Jukka Zitting wrote: >> upgrade to the latest 1.2 SNAPSHOT where declarative services is no longer >> needed (see https://issues.apache.org/jira/browse/TIKA-896). > > I've built and installed the 1.2 SNAPSHOT, but it has made no difference. Hmm, do you start/activate the bundles after deploying them to the OSGi environment? I've seen some OSGi setups that only resolve bundles by default, which only makes the contained classes available, but doesn't start up the services provided by the bundles. > It still suffers from the same fundamental problem that the ServiceLoader > (in tika-core) cannot find "META-INF/services/org.apache.tika.parser.Parser" > (in tika-bundle). It's not supposed to. The tika-bundle should start up Parser and Detector services that tika-core will then access through the OSGi framework. As you mentioned, OSGi and SPI don't work that well together, which is why we're using the OSGi services when Tika gets deployed to an OSGi environment. BR, Jukka Zitting
-
Re: using tika with eclipseKevin Milburn 2012-07-06, 16:27
On 2012/07/06 16:14, Jukka Zitting wrote:
> The tika-bundle should start up Parser and Detector services that > tika-core will then access through the OSGi framework. OK, I've done a bit more debugging, and think I know where I've gone wrong. Having got a breakpoint in the right place, I can see that the Parser and Detector services are being generate correctly. It appears my main mistake is trying to use Tika or TikaConfig, like all every example I've found has done, which appears to be completely incompatible with using Tika in an OSGI environment! :( e.g. the following produces no output, despite the file containing text. Tika tika = new Tika(); System.out.print(tika.parseToString(new FileInputStream(xmlFile)));
-
Re: using tika with eclipseNick Burch 2012-07-06, 16:34
On Fri, 6 Jul 2012, Kevin Milburn wrote:
> It appears my main mistake is trying to use Tika or TikaConfig, like all > every example I've found has done, which appears to be completely > incompatible with using Tika in an OSGI environment! :( > > e.g. the following produces no output, despite the file containing text. > Tika tika = new Tika(); > System.out.print(tika.parseToString(new FileInputStream(xmlFile))); Once you work out the appropriate incantation, any chance you could write something up for the Tika wiki about it? <http://wiki.apache.org/tika/> (As you may have gathered, there aren't a lot of people using Tika with OSGi yet, so the trail you blaze can hopefully help others later!) Cheers Nick
-
Re: using tika with eclipseJukka Zitting 2012-07-06, 16:43
Hi,
On Fri, Jul 6, 2012 at 6:27 PM, Kevin Milburn <[EMAIL PROTECTED]> wrote: > It appears my main mistake is trying to use Tika or TikaConfig, like all > every example I've found has done, which appears to be completely > incompatible with using Tika in an OSGI environment! :( That shouldn't be the case. What's the code you're using. You'll want to make sure that both the tika-bundle and tika-core bundles are actually started/activated by the OSGi environment, as otherwise the relevant Activators that Tika uses to hook up with the available services won't get started. Adding a breakpoint or a System.out print to the o.a.t.config.TikaActivator class in tika-core and the o.a.t.parser.internal.Activator class in tika-parsers/-bundle should help making sure that these Activators really are being invoked by the OSGi environment. > e.g. the following produces no output, despite the file containing text. > Tika tika = new Tika(); > System.out.print(tika.parseToString(new FileInputStream(xmlFile))); See the BundleIT test case inside the tika-bundle component. That's a pretty similar piece of code that works fine in an OSGi environment. BR, Jukka Zitting
-
Re: using tika with eclipseKevin Milburn 2012-07-06, 17:47
On 2012/07/06 17:43, Jukka Zitting wrote: > You'll want to make sure that both the tika-bundle and tika-core > bundles are actually started/activated by the OSGi environment, as > otherwise the relevant Activators that Tika uses to hook up with the > available services won't get started. Bingo, having spent much time on why the Parsers were not behaving, it's actually the tika-core bunde that is not activating. Eclipse is a finicky beast, even if a bundle has an Activator it won't be activated if the Bundle-ActivationPolicy is not set, unless the product is modified to explicitly auto start the bundle. Ideally, it would be preferable to set the Bundle-ActivationPolicy to lazy to allow Eclipse (and others?) to do the right thing without needless complication. I've tested this by modifying the tika-core/pom.xml (see attached), and adding the following line: <Bundle-Activator> org.apache.tika.config.TikaActivator </Bundle-Activator> + <Bundle-ActivationPolicy>lazy</Bundle-ActivationPolicy> Any chance of this for the 1.2 release? Thanks for the help. Kevin. p.s. an alternative method of obtaining access to the Detector and Parser involves something like this in your own bundles activator: import org.apache.tika.detect.Detector; import org.apache.tika.parser.Parser; ... @Override public void start(BundleContext context) throws Exception { super.start(context); detector = (Detector) context.getService(context.getServiceReference(Detector.class.getName())); parser = (Parser) context.getService(context.getServiceReference(Parser.class.getName())); }
-
Re: using tika with eclipseJukka Zitting 2012-07-06, 21:31
Hi,
On Fri, Jul 6, 2012 at 7:47 PM, Kevin Milburn <[EMAIL PROTECTED]> wrote: > Eclipse is a finicky beast, even if a bundle has an Activator it won't be > activated if the Bundle-ActivationPolicy is not set, unless the product is > modified to explicitly auto start the bundle. Interesting, I didn't know that. > Ideally, it would be preferable to set the Bundle-ActivationPolicy to lazy > to allow Eclipse (and others?) to do the right thing without needless > complication. Sounds like a good idea! > I've tested this by modifying the tika-core/pom.xml (see attached), and > adding the following line: > > <Bundle-Activator> > org.apache.tika.config.TikaActivator > </Bundle-Activator> > + <Bundle-ActivationPolicy>lazy</Bundle-ActivationPolicy> > > Any chance of this for the 1.2 release? Sure, I just committed it, see https://issues.apache.org/jira/browse/TIKA-951. > p.s. an alternative method of obtaining access to the Detector and Parser > involves something like this in your own bundles activator: The reason why we use ServiceTrackers instead is that we want to support deployments where new parser and detector services can be added or removed dynamically from the running system. BR, Jukka Zitting
-
Re: using tika with eclipseKevin Milburn 2012-07-16, 11:13
On 2012/07/06 22:31, Jukka Zitting wrote: > On Fri, Jul 6, 2012 at 7:47 PM, Kevin Milburn > <[EMAIL PROTECTED]> wrote: >> I've tested this by modifying the tika-core/pom.xml (see attached), and >> adding the following line: >> >> <Bundle-Activator> >> org.apache.tika.config.TikaActivator >> </Bundle-Activator> >> + <Bundle-ActivationPolicy>lazy</Bundle-ActivationPolicy> >> >> Any chance of this for the 1.2 release? > Sure, I just committed it, see https://issues.apache.org/jira/browse/TIKA-951. > Thanks for that, I've tested the latest snapshot (and RC1) and things behave themselves a lot better. It would be nice if the Tika and TikaConfig classes had greater awareness of the OSGI environment as they currently perform redundant work trying to load the services files which they'll never find. Thanks again Kevin. p.s. For those trying to get Tika to work in Eclipse, you need to do something along these lines. Change the Target Definition (or create a new one) On the Definition tab, add the location of the tika-bundle and tika-core jars On the Content tab, make sure the core and bundle plugins are selected Set as Target Platform In each plugin that needs Tika support, add org.apache.tika.core to the plugins dependencies Change the Product Configuration (or create a new one), On the Dependencies tab, add org.apache.tika.core and o.a.t.bundle On the Configuration tab, add o.a.t.bundle to the Start levels, and set Auto-Start to true. On the Overview tab, Test the product by launching a runtime instance of it.
-
Re: using tika with eclipseJukka Zitting 2012-07-16, 11:30
Hi,
On Mon, Jul 16, 2012 at 1:13 PM, Kevin Milburn <[EMAIL PROTECTED]> wrote: > It would be nice if the Tika and TikaConfig classes had greater awareness of > the OSGI environment as they currently perform redundant work trying to load > the services files which they'll never find. Note that there are cases where people embed the tika-core jar into a larger bundle that also comes with some of the parser libraries. Or when a client bundle uses Tika with parser services loaded from the class loader of the client bundle. In such cases it's a good idea that also the Java service provider mechanism is used to load services. And in any case the static service loading is a fairly cheap operation that's typically only done once during the lifetime of an application or a bundle. BR, Jukka Zitting
-
Re: using tika with eclipserodgersh 2012-07-18, 21:04
I have a very similar issue, but using Tika on Karaf vs. eclipse.
I am using Tika v1.2 and Karaf v2.2.7 on Windows 7. I have made an OSGi bundle that uses Tika and provides a getFileExtensionForMimeType(...) method. I have added a org/apache/tika/mime/custom-mimetypes.xml file to my src/main/resources directory. I have made a custom parser and added a META-INF/services/org.apache.tika.parser.Parser file that lists it (although I am not trying to use the custom parser yet). When another bundle invokes this bundle's getFileExtensionForMimeType(...) method it works for mime types that Tika supports by default, but it does not find the mime types in my custom-mimetypes.xml file. It's like this custom mime types file is not found by the OSGi container. Any help is appreciated. Here is my method's code: public String getFileExtensionForMimeType( String contentType ) throws MimeTypeException { //TikaConfig config = TikaConfig.getDefaultConfig(); // this did not work for custom mime types TikaConfig config = null; try { config = new TikaConfig( this.getClass().getClassLoader() ); } catch ( IOException e ) { logger.warn( "Error creating TikaConfig with ClassLoader", e ); return null; } MimeTypes mimeTypes = config.getMimeRepository(); String extension = null; try { MimeType mimeType = mimeTypes.forName( contentType ); extension = mimeType.getExtension(); } catch ( Exception e ) { logger.warn( "Exception caught getting file extension for mime type" + contentType, e ); } logger.debug( "mimeType = " + contentType + ", file extension = [" + extension + "]" ); return extension; } And here is my custom-mimetypes.xml file: <?xml version="1.0" encoding="UTF-8"?> <mime-info> <mime-type type="image/nitf"> <alias type="image/ntf"/> <glob pattern="*.nitf"/> </mime-type> </mime-info> I have verified my input is "image/nitf" mime type. This method worked when the input was "application/octet-stream", it returned ".bin" -- View this message in context: http://apache-tika-users.1629097.n2.nabble.com/using-tika-with-eclipse-tp7572799p7572828.html Sent from the Apache Tika - Users mailing list archive at Nabble.com.
-
Re: using tika with eclipseNick Burch 2012-07-18, 22:41
On Wed, 18 Jul 2012, rodgersh wrote:
> And here is my custom-mimetypes.xml file: > > <?xml version="1.0" encoding="UTF-8"?> > <mime-info> > <mime-type type="image/nitf"> > <alias type="image/ntf"/> > <glob pattern="*.nitf"/> > </mime-type> > </mime-info> I've no idea about OSGi, so I can't comment on what you need to do to have it look at your extra file. Hopefully one of our OSGi experts can help you with the appropriate incantation / jar file blessing / etc. However, I do know about mimetypes in Tika, so I've fixed your problem that way - see TIKA-957. As of r1363160 Tika should now know about NTIF files, and should have some mime magic for them (works on the few sample files I tried) Nick |