|
|
-
Problem extracting date from Outlook 2007 .msg fileJoe Wicentowski 2012-06-25, 20:14
Hi all,
Hello! This is my message to the list. I'm building an application that uses Tika to extract text from Outlook 2007 .msg files, among other things. While experimenting with some sample .msg files, I noticed that Tika is failing not returning the date of most messages. For example, Outlook indicates that the following message was sent on "Fri 6/22/2012 8:11 AM", but no date appears in the HTML head or in the early portion of the body of the Tika output [1]. I retrieved this using Tika 1.1 on Windows XP using the following command: java -jar tika-app-1.1.jar "C:\Documents and Settings\wicentowskijc\Desktop\portal\outlook\RE Inquiry.msg" > inquiry.html If anyone has suggestions for ensuring that the date can be preserved in Tika's output, I'd be grateful. Thanks, Joe [1] Tika output showing no date <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="Message-Bcc" content="" /> <meta name="subject" content="Inquiry" /> <meta name="Content-Length" content="40960" /> <meta name="Message-Recipient-Address" content="[EMAIL PROTECTED]" /> <meta name="Message-From" content="History Mailbox" /> <meta name="Author" content="History Mailbox" /> <meta name="Message-To" content="'Snip'" /> <meta name="Message-Cc" content="" /> <meta name="Content-Type" content="application/vnd.ms-outlook" /> <meta name="resourceName" content="RE Inquiry.msg" /> </head> <body> <h1>RE: Inquiry</h1> <dl> <dt>From</dt> <dd>History Mailbox</dd> <dt>To</dt> <dd>'Snip'</dd> <dt>Recipients</dt> <dd>[EMAIL PROTECTED]</dd> </dl> <p>Dear Snip</p> ... |