|
|
-
Re: improving odf / general questions on forms and deleted textKen Krugler 2010-09-25, 21:19
Hi Bart,
I know very little about ODF, so just some general comments below... On Sep 25, 2010, at 7:56am, Hanssens Bart wrote: > Hi, > > I'm planning to further improve the ODF support in Tika. A few > questions though, > that might also be useful for other formats: > > Should Tika parse deleted text ? XHTML has INS and DEL, but they are > to be used > where the content is removed / inserted, while ODF stores removed > content at the > very beginning of the document (so "fixing" this will hurt > performance, not sure if > that's worth it) > It can also be very confusing for the end user if one gets a result > for "removed", > then again, it is somewhere in the document... If the above is similar to what you get when tracking changes in say Word, then I would argue for not including the text. My rule of thumb would be that if the text doesn't appear in "normal" viewing mode (whatever that means) using a typical app, then it's more confusing to include it. > Forms: most form elements in ODF can be mapped to their HTML > counterparts, > although I have to check if the result is always valid HTML (i.e., > when ODF parent > and form element are mapped to HTML, is the HTML form still allowed > within the > mapped parent) > Should they be mapped to HTML forms in the first place ? Or just to > div / span ? I wouldn't worry about trying to map explicitly to HTML forms - capturing the text is 99% of the value here, versus trying to maintain greater logical consistency between ODF and XHTML. -- Ken -------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g |