|
|
-
[metadata] Input on reorganization of Metadata interfaces
Joerg Ehrlich 2012-05-04, 13:43
Hi,
I wanted to start submitting patches for the following and would like your input on that:
Create one "Core Properties" interface for the Metadata class which contains just the keys for the properties which should be directly addressable through the Metadata class in the future. Those are all DublinCore plus copyright and a bit of other relevant stuff. Those keys will be the ones we have had before like "Title", "Keywords", "Format", etc. The keys will always link to properties of other namespace interfaces like: String Title = DublinCore.Title.getName(); String Author = DublinCore.Creator.getName();
On a side note: This version is a bit different for the DublinCore namespace to what is provided by TIKA-859. Instead of introducing a new DC_Creator property I would keep the current Creator property in the Core interface and by removing DublinCore interface from the Metadata class, the core property can easily alias the DC ones like above. I would provide a new patch for TIKA-859.
The keys of all other interfaces currently included in the Metadata class will be either removed to avoid conflicts with the Core interface or declared @Deprecated and replacements will be offered by specific namespace interfaces. For example: MSOffice.Author -> removed, replaced by new CoreProperties.Author which links to DublinCore.Creator MSOffice.Template -> kept, but declared deprecated and replaced by new OfficeOpenXMLExtended.Template
In the long term all interfaces except the core one should be removed from the Metadata class, otherwise we end up with tons of naming conflicts.
WDYT? Regards Jörg
--- Jörg Ehrlich | Computer Scientist | XMP Technology | Adobe Systems | [EMAIL PROTECTED] | work: +49(40)306360
+
Joerg Ehrlich 2012-05-04, 13:43
-
Re: [metadata] Input on reorganization of Metadata interfaces
Mattmann, Chris A 2012-05-04, 15:36
Hi Jörg, On May 4, 2012, at 6:43 AM, Joerg Ehrlich wrote: > Hi, > > I wanted to start submitting patches for the following and would like your input on that: > > Create one "Core Properties" interface for the Metadata class which contains just the keys for the properties which should be directly addressable through the Metadata class in the future. Those are all DublinCore plus copyright and a bit of other relevant stuff. Those keys will be the ones we have had before like "Title", "Keywords", "Format", etc. > The keys will always link to properties of other namespace interfaces like: > String Title = DublinCore.Title.getName(); > String Author = DublinCore.Creator.getName(); > > On a side note: This version is a bit different for the DublinCore namespace to what is provided by TIKA-859. Instead of introducing a new DC_Creator property I would keep the current Creator property in the Core interface and by removing DublinCore interface from the Metadata class, the core property can easily alias the DC ones like above. I would provide a new patch for TIKA-859. > > The keys of all other interfaces currently included in the Metadata class will be either removed to avoid conflicts with the Core interface or declared @Deprecated and replacements will be offered by specific namespace interfaces. > For example: > MSOffice.Author -> removed, replaced by new CoreProperties.Author which links to DublinCore.Creator > MSOffice.Template -> kept, but declared deprecated and replaced by new OfficeOpenXMLExtended.Template > > In the long term all interfaces except the core one should be removed from the Metadata class, otherwise we end up with tons of naming conflicts. I'm OK with the code-level implications of that, but I will just have to scope out the patch and so forth. Thanks for pushing this. I really appreciate your help here. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
Mattmann, Chris A 2012-05-04, 15:36
-
RE: [metadata] Input on reorganization of Metadata interfaces
Joerg Ehrlich 2012-05-08, 12:39
Hi Chris,
>I'm OK with the code-level implications of that, but I will just have to scope out the patch and so forth. >Thanks for pushing this. I really appreciate your help here.
Sorry, I am not a native speaker: Does that you would like to see a patch of the proposed ideas and make a decision based on that?
Thanks Jörg
+
Joerg Ehrlich 2012-05-08, 12:39
-
Re: [metadata] Input on reorganization of Metadata interfaces
Mattmann, Chris A 2012-05-08, 14:00
Hi Jörg, On May 8, 2012, at 5:39 AM, Joerg Ehrlich wrote: > Hi Chris, > >> I'm OK with the code-level implications of that, but I will just have to scope out the patch and so forth. >> Thanks for pushing this. I really appreciate your help here. > > Sorry, I am not a native speaker: Does that you would like to see a patch of the proposed ideas and make a decision based on that? No problem at all. Yep I was suggesting that seeing a patch would help make concrete some of these abstract ideas we're talking about and I think would help to drive where we're going. Thanks, again. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
Mattmann, Chris A 2012-05-08, 14:00
-
Re: [metadata] Input on reorganization of Metadata interfaces
Nick Burch 2012-05-04, 14:09
On Fri, 4 May 2012, Joerg Ehrlich wrote: > Create one "Core Properties" interface for the Metadata class which > contains just the keys for the properties which should be directly > addressable through the Metadata class in the future. Those are all > DublinCore plus copyright and a bit of other relevant stuff. Those keys > will be the ones we have had before like "Title", "Keywords", "Format", > etc. > > The keys will always link to properties of other namespace interfaces like: > String Title = DublinCore.Title.getName(); > String Author = DublinCore.Creator.getName();
Won't that break existing parsers and consumers though? As Title will suddenly change from being "title" to "dc:title", won't it?
Nick
+
Nick Burch 2012-05-04, 14:09
-
RE: [metadata] Input on reorganization of Metadata interfaces
Joerg Ehrlich 2012-05-04, 14:56
On Fri, 4 May 2012, Joerg Ehrlich wrote: >> Create one "Core Properties" interface for the Metadata class which >> contains just the keys for the properties which should be directly >> addressable through the Metadata class in the future. Those are all >> DublinCore plus copyright and a bit of other relevant stuff. Those >> keys will be the ones we have had before like "Title", "Keywords", >> "Format", etc. >> >> The keys will always link to properties of other namespace interfaces like: >> String Title = DublinCore.Title.getName(); String Author = >> DublinCore.Creator.getName();
> Won't that break existing parsers and consumers though? As Title will suddenly change from being "title" to "dc:title", won't it?
If they are not using the Tika constants themselves but their values instead, then yes.
Thinking about it, I am actually not sure whether we really need to have the prefixes in the names anymore if the new keys are properties instead of strings. Then we could implement other means to identify the namespace for a property, by storing it in the property for example :)
Jörg
+
Joerg Ehrlich 2012-05-04, 14:56
-
RE: [metadata] Input on reorganization of Metadata interfaces
Nick Burch 2012-05-04, 21:34
On Fri, 4 May 2012, Joerg Ehrlich wrote: >>> The keys will always link to properties of other namespace interfaces like: >>> String Title = DublinCore.Title.getName(); String Author >>> DublinCore.Creator.getName(); > >> Won't that break existing parsers and consumers though? As Title will >> suddenly change from being "title" to "dc:title", won't it? > > If they are not using the Tika constants themselves but their values > instead, then yes.
That'll break things like Alfresco then. (We do the mapping from Tika metadata to Alfresco metadata on the strings, rather than by Metadata constants, so it's more flexible and easier for users to extend). I suspect Alfresco isn't the only consumer of Tika's metadata that does the same thing. Anything that uses tika-cli will likewise be string based, not Metadata Constant based
> Thinking about it, I am actually not sure whether we really need to have > the prefixes in the names anymore if the new keys are properties instead > of strings. Then we could implement other means to identify the > namespace for a property, by storing it in the property for example :)
I think the current ones that have a prefix are easier and cleaner to understand than the un-prefixed ones. If we're going to be basing the keys explicitly on a standard, I think we ought to make that explicit wherever we can, including in the key names. It will be a faff for people to change over, and for us to handle in the mean time, but I think if we're going to be making a change of this scale we should take the chance to do it all properly
Nick
+
Nick Burch 2012-05-04, 21:34
-
RE: [metadata] Input on reorganization of Metadata interfaces
Joerg Ehrlich 2012-05-08, 12:35
-----Original Message----- From: Nick Burch [mailto:[EMAIL PROTECTED]] Sent: Freitag, 4. Mai 2012 23:34 To: [EMAIL PROTECTED] Subject: RE: [metadata] Input on reorganization of Metadata interfaces
On Fri, 4 May 2012, Joerg Ehrlich wrote: >>>> The keys will always link to properties of other namespace interfaces like: >>>> String Title = DublinCore.Title.getName(); String Author = >>>> DublinCore.Creator.getName(); >> >>> Won't that break existing parsers and consumers though? As Title will >>> suddenly change from being "title" to "dc:title", won't it? >> >> If they are not using the Tika constants themselves but their values >> instead, then yes. > >That'll break things like Alfresco then. (We do the mapping from Tika metadata to Alfresco metadata on the strings, rather than by Metadata constants, so it's more flexible and easier for users to extend). I suspect >Alfresco isn't the only consumer of Tika's metadata that does the same thing. Anything that uses tika-cli will likewise be string based, not Metadata Constant based > >> Thinking about it, I am actually not sure whether we really need to >> have the prefixes in the names anymore if the new keys are properties >> instead of strings. Then we could implement other means to identify >> the namespace for a property, by storing it in the property for >> example :) > >I think the current ones that have a prefix are easier and cleaner to understand than the un-prefixed ones. If we're going to be basing the keys explicitly on a standard, I think we ought to make that explicit wherever we >can, including in the key names. It will be a faff for people to change over, and for us to handle in the mean time, but I think if we're going to be making a change of this scale we should take the chance to do it all properly
I am not sure whether it is the proper way to put prefixes into the strings. As you said above the clients depend on those strings and it is actually not a good thing to depend on namespace prefixes instead of the actual namespaces, because prefixes are just variables everyone can choose as one likes. The only reason I did not touch the prefix concept was that I just didn't want to change yet another part of Tika :) But if you ask me, I would try to keep the prefixes out of the names.
What would be your proposal then how to handle this transition may it be with prefixes or not?
Thanks Jörg
+
Joerg Ehrlich 2012-05-08, 12:35
|
|