Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - How to import data with a different date format


Copy link to this message
-
Re: How to import data with a different date format
Dennis Gearon 2010-09-08, 17:59
I'm doing something similar for dates/times/timestamps.

I'm actually trying to do, "'now' is within the range of what appointments(date/time from and to combos, i.e. timestamps).

Fairly simple search of:

   What items have a start time BEFORE now, and an end time AFTER now?

My thoughts were to store:
  unix time stamp BIGINTS (64 bit)
  "ISO_DATE ISO_TIME" strings

Which is going to be faster:
   1/ Indexing?
   2/ Searching?

How does the 'tint' field mentioned below apply?

Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Wed, 9/8/10, Jonathan Rochkind <[EMAIL PROTECTED]> wrote:

> From: Jonathan Rochkind <[EMAIL PROTECTED]>
> Subject: Re: How to import data with a different date format
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Date: Wednesday, September 8, 2010, 10:27 AM
> Just throwing it out there, I'd
> consider a different approach for an actual real app,
> although it might not be easier to get up quickly. (For
> quickly, yeah, I'd just store it as a string, more on that
> at bottom).
>
> If none of your dates have times, they're all just full
> days, I'm not sure you really need the date type at all.
>
> Convert the date to number-of-days since epoch
> integer.  (Most languages will have a way to do this,
> but I don't know about pure XSLT).  Store _that_ in a
> 1.4 'int' field.  On top of that, make it a "tint"
> (precision non-zero) for faster range queries.
>
> But now your actual interface will have to convert from
> "number of days since epoch" to a displayable date. (And if
> you allow user input, convert the input to
> number-of-days-since-epoch before making a range query or
> fq, but you'd have to do that anyway even with solr dates,
> users aren't going to be entering W3CDate raw, I don't
> think).
>
> That is probably the most efficient way to have solr handle
> it -- using an actual date field type gives you a lot more
> precision than you need, which is going to hurt performance
> on range queries. Which you can compensate for with trie
> date sure, but if you don't really need that precision to
> begin with, why use it?  Also the extra precision can
> end up doing unexpected things and making it easier to have
> bugs (range queries on that high precision stuff, you need
> to make sure your start date has 00:00:00 set and your end
> date has 23:59:59 set, to do what you probably expect). If
> you aren't going to use the extra precision, makes
> everything a lot simpler to not use a date field.
>
> Alternately, for your "get this done quick" method, yeah,
> I'd just store it as a string. With a string exactly as
> you've specified, sorting and range queries won't work how
> you'd want.  But if you can make it a string of the
> format "yyyy/mm/dd" instead (always two-digit month and
> year), then you can even sort and do range queries on your
> string dates. For the quick and dirty prototype, I'd just do
> that.  In fact, while this might make range queries and
> sorting _slightly_ slower than if you use an int or a tint,
> this might really be good enough even for a real app (hey,
> it's what lots of people did before the trie-based fields
> existed).
>
> Jonathan
>
> Erick Erickson wrote:
> > I think Markus is spot-on given the fact that you have
> 2 days. Using a
> > string field is quickest.
> >
> > However, if you absolutely MUST have functioning
> dates, there are three
> > options I can think of:
> > 1> can you make your XSLT transform the dates?
> Confession; I'm XSLT-ignorant
> > 2> use DIH and DateTransformer, see:
> > http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
> >       you can walk a
> directory importing all the XML files with
> > FileDataSource.
> > <http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer>3>
> you
> > could write a program to do this manually.
> >