|
Rico Lelina
2010-09-08, 15:35
Markus Jelsma
2010-09-08, 16:34
Rico Lelina
2010-09-08, 17:03
Markus Jelsma
2010-09-08, 17:06
Erick Erickson
2010-09-08, 17:09
Rico Lelina
2010-09-08, 17:23
Rico Lelina
2010-09-08, 17:24
Markus Jelsma
2010-09-08, 17:26
Jonathan Rochkind
2010-09-08, 17:27
Jonathan Rochkind
2010-09-08, 17:29
Jonathan Rochkind
2010-09-08, 17:33
Dennis Gearon
2010-09-08, 17:59
Erick Erickson
2010-09-08, 18:28
Jonathan Rochkind
2010-09-08, 18:35
Dennis Gearon
2010-09-08, 19:01
Chris Hostetter
2010-09-08, 19:09
Jonathan Rochkind
2010-09-08, 22:07
Dennis Gearon
2010-09-08, 23:45
|
-
How to import data with a different date formatRico Lelina 2010-09-08, 15:35
Hi,
I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/YYYY. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
-
RE: How to import data with a different date formatMarkus Jelsma 2010-09-08, 16:34
No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features.
[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -----Original message----- From: Rico Lelina <[EMAIL PROTECTED]> Sent: Wed 08-09-2010 17:36 To: [EMAIL PROTECTED]; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/YYYY. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
-
Re: How to import data with a different date formatRico Lelina 2010-09-08, 17:03
That was my first thought :-) But it would be nice to be able to do date
queries. I guess when I export the data I can just add 00:00:00Z. Thanks. ----- Original Message ---- From: Markus Jelsma <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wed, September 8, 2010 11:34:32 AM Subject: RE: How to import data with a different date format No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -----Original message----- From: Rico Lelina <[EMAIL PROTECTED]> Sent: Wed 08-09-2010 17:36 To: [EMAIL PROTECTED]; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/YYYY. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
-
RE: Re: How to import data with a different date formatMarkus Jelsma 2010-09-08, 17:06
Your format (MM/DD/YYYY) is not compatible.
-----Original message----- From: Rico Lelina <[EMAIL PROTECTED]> Sent: Wed 08-09-2010 19:03 To: [EMAIL PROTECTED]; Subject: Re: How to import data with a different date format That was my first thought :-) But it would be nice to be able to do date queries. I guess when I export the data I can just add 00:00:00Z. Thanks. ----- Original Message ---- From: Markus Jelsma <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wed, September 8, 2010 11:34:32 AM Subject: RE: How to import data with a different date format No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -----Original message----- From: Rico Lelina <[EMAIL PROTECTED]> Sent: Wed 08-09-2010 17:36 To: [EMAIL PROTECTED]; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/YYYY. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
-
Re: How to import data with a different date formatErick Erickson 2010-09-08, 17:09
I think Markus is spot-on given the fact that you have 2 days. Using a
string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1> can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2> use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. <http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer>3> you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the "string" field type, which is NOT tokenized. You usually want "text" unless it's some sort of ID.... So it might be worth it to do some searching earlier rather than later <G>.... Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma <[EMAIL PROTECTED]>wrote: > No. The Datefield [1] will not accept it any other way. You could, however, > fool your boss and dump your dates in an ordinary string field. But then you > cannot use some of the nice date features. > > > > [1]: > http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html > > -----Original message----- > From: Rico Lelina <[EMAIL PROTECTED]> > Sent: Wed 08-09-2010 17:36 > To: [EMAIL PROTECTED]; > Subject: How to import data with a different date format > > Hi, > > I am attempting to import some of our data into SOLR. I did it the quickest > way > I know because I literally only have 2 days to import the data and do some > queries for a proof-of-concept. > > So I have this data in XML format and I wrote a short XSLT script to > convert it > to the format in solr/example/exampledocs (except I retained the element > names > so I had to modify schema.xml in the conf directory. So far so good -- the > import works and I can search the data. One of my immediate problems is > that > there is a date field with the format MM/DD/YYYY. Looking at schema.xml, it > seems SOLR accepts only full date fields -- everything seems to be > mandatory > including the Z for Zulu/UTC time according to the doc. Is there a way to > specify the date format? > > Thanks very much. > Rico > >
-
Re: Re: How to import data with a different date formatRico Lelina 2010-09-08, 17:23
It will work. The original data is in XML format. I have an XSLT that transforms
the data into the same format as that in exampledocs: <add><doc><field name="...">...</field></doc>...</add>. ----- Original Message ---- From: Markus Jelsma <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wed, September 8, 2010 12:06:39 PM Subject: RE: Re: How to import data with a different date format Your format (MM/DD/YYYY) is not compatible. -----Original message----- From: Rico Lelina <[EMAIL PROTECTED]> Sent: Wed 08-09-2010 19:03 To: [EMAIL PROTECTED]; Subject: Re: How to import data with a different date format That was my first thought :-) But it would be nice to be able to do date queries. I guess when I export the data I can just add 00:00:00Z. Thanks. ----- Original Message ---- From: Markus Jelsma <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wed, September 8, 2010 11:34:32 AM Subject: RE: How to import data with a different date format No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -----Original message----- From: Rico Lelina <[EMAIL PROTECTED]> Sent: Wed 08-09-2010 17:36 To: [EMAIL PROTECTED]; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/YYYY. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
-
Re: How to import data with a different date formatRico Lelina 2010-09-08, 17:24
I'm going with option 1, converting MM/DD/YYYY to YYYY-MM-DD (which is fairly
easy in XSLT) and then adding T00:00:00Z to it. Thanks. ----- Original Message ---- From: Erick Erickson <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wed, September 8, 2010 12:09:55 PM Subject: Re: How to import data with a different date format I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1> can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2> use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. <http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer>3> you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the "string" field type, which is NOT tokenized. You usually want "text" unless it's some sort of ID.... So it might be worth it to do some searching earlier rather than later <G>.... Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma <[EMAIL PROTECTED]>wrote: > No. The Datefield [1] will not accept it any other way. You could, however, > fool your boss and dump your dates in an ordinary string field. But then you > cannot use some of the nice date features. > > > > [1]: > http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html > > -----Original message----- > From: Rico Lelina <[EMAIL PROTECTED]> > Sent: Wed 08-09-2010 17:36 > To: [EMAIL PROTECTED]; > Subject: How to import data with a different date format > > Hi, > > I am attempting to import some of our data into SOLR. I did it the quickest > way > I know because I literally only have 2 days to import the data and do some > queries for a proof-of-concept. > > So I have this data in XML format and I wrote a short XSLT script to > convert it > to the format in solr/example/exampledocs (except I retained the element > names > so I had to modify schema.xml in the conf directory. So far so good -- the > import works and I can search the data. One of my immediate problems is > that > there is a date field with the format MM/DD/YYYY. Looking at schema.xml, it > seems SOLR accepts only full date fields -- everything seems to be > mandatory > including the Z for Zulu/UTC time according to the doc. Is there a way to > specify the date format? > > Thanks very much. > Rico > >
-
RE: Re: How to import data with a different date formatMarkus Jelsma 2010-09-08, 17:26
Ah, that answers Erick's question. And mine ;)
-----Original message----- From: Rico Lelina <[EMAIL PROTECTED]> Sent: Wed 08-09-2010 19:25 To: [EMAIL PROTECTED]; Subject: Re: How to import data with a different date format I'm going with option 1, converting MM/DD/YYYY to YYYY-MM-DD (which is fairly easy in XSLT) and then adding T00:00:00Z to it. Thanks. ----- Original Message ---- From: Erick Erickson <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wed, September 8, 2010 12:09:55 PM Subject: Re: How to import data with a different date format I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1> can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2> use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. <http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer>3> you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the "string" field type, which is NOT tokenized. You usually want "text" unless it's some sort of ID.... So it might be worth it to do some searching earlier rather than later <G>.... Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma <[EMAIL PROTECTED]>wrote: > No. The Datefield [1] will not accept it any other way. You could, however, > fool your boss and dump your dates in an ordinary string field. But then you > cannot use some of the nice date features. > > > > [1]: > http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html > > -----Original message----- > From: Rico Lelina <[EMAIL PROTECTED]> > Sent: Wed 08-09-2010 17:36 > To: [EMAIL PROTECTED]; > Subject: How to import data with a different date format > > Hi, > > I am attempting to import some of our data into SOLR. I did it the quickest > way > I know because I literally only have 2 days to import the data and do some > queries for a proof-of-concept. > > So I have this data in XML format and I wrote a short XSLT script to > convert it > to the format in solr/example/exampledocs (except I retained the element > names > so I had to modify schema.xml in the conf directory. So far so good -- the > import works and I can search the data. One of my immediate problems is > that > there is a date field with the format MM/DD/YYYY. Looking at schema.xml, it > seems SOLR accepts only full date fields -- everything seems to be > mandatory > including the Z for Zulu/UTC time according to the doc. Is there a way to > specify the date format? > > Thanks very much. > Rico > >
-
Re: How to import data with a different date formatJonathan Rochkind 2010-09-08, 17:27
Just throwing it out there, I'd consider a different approach for an
actual real app, although it might not be easier to get up quickly. (For quickly, yeah, I'd just store it as a string, more on that at bottom). If none of your dates have times, they're all just full days, I'm not sure you really need the date type at all. Convert the date to number-of-days since epoch integer. (Most languages will have a way to do this, but I don't know about pure XSLT). Store _that_ in a 1.4 'int' field. On top of that, make it a "tint" (precision non-zero) for faster range queries. But now your actual interface will have to convert from "number of days since epoch" to a displayable date. (And if you allow user input, convert the input to number-of-days-since-epoch before making a range query or fq, but you'd have to do that anyway even with solr dates, users aren't going to be entering W3CDate raw, I don't think). That is probably the most efficient way to have solr handle it -- using an actual date field type gives you a lot more precision than you need, which is going to hurt performance on range queries. Which you can compensate for with trie date sure, but if you don't really need that precision to begin with, why use it? Also the extra precision can end up doing unexpected things and making it easier to have bugs (range queries on that high precision stuff, you need to make sure your start date has 00:00:00 set and your end date has 23:59:59 set, to do what you probably expect). If you aren't going to use the extra precision, makes everything a lot simpler to not use a date field. Alternately, for your "get this done quick" method, yeah, I'd just store it as a string. With a string exactly as you've specified, sorting and range queries won't work how you'd want. But if you can make it a string of the format "yyyy/mm/dd" instead (always two-digit month and year), then you can even sort and do range queries on your string dates. For the quick and dirty prototype, I'd just do that. In fact, while this might make range queries and sorting _slightly_ slower than if you use an int or a tint, this might really be good enough even for a real app (hey, it's what lots of people did before the trie-based fields existed). Jonathan Erick Erickson wrote: > I think Markus is spot-on given the fact that you have 2 days. Using a > string field is quickest. > > However, if you absolutely MUST have functioning dates, there are three > options I can think of: > 1> can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant > 2> use DIH and DateTransformer, see: > http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer > you can walk a directory importing all the XML files with > FileDataSource. > <http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer>3> you > could write a program to do this manually. > > But given the time constraints, I suspect your time would be better spent > doing the other stuff and just using string as per Markus. I have no clue > how SOLR-savvy you are, so pardon if this is something you already know. But > lots of people trip up over the "string" field type, which is NOT tokenized. > You usually want "text" unless it's some sort of ID.... So it might be worth > it to do some searching earlier rather than later <G>.... > > Best > Erick > > On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma <[EMAIL PROTECTED]>wrote: > > >> No. The Datefield [1] will not accept it any other way. You could, however, >> fool your boss and dump your dates in an ordinary string field. But then you >> cannot use some of the nice date features. >> >> >> >> [1]: >> http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html >> >> -----Original message----- >> From: Rico Lelina <[EMAIL PROTECTED]> >> Sent: Wed 08-09-2010 17:36 >> To: [EMAIL PROTECTED]; >> Subject: How to import data with a different date format
-
Re: How to import data with a different date formatJonathan Rochkind 2010-09-08, 17:29
I'm really thinking, once you convert to YYYY-MM-DD anyway, you might be
better off just sticking this in a string field, rather than using a date field at all. The extra precision in the date field is going to make things confusing later, I predict. Especially for a quick and dirty prototype, I'd just use a string. Solr is not an rdbms, our learned behavior to always try and normalize everything and define the field 'right' often is not the right way to go with solr/lucene. Jonathan Rico Lelina wrote: > I'm going with option 1, converting MM/DD/YYYY to YYYY-MM-DD (which is fairly > easy in XSLT) and then adding T00:00:00Z to it. > > Thanks. > > > > ----- Original Message ---- > From: Erick Erickson <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Wed, September 8, 2010 12:09:55 PM > Subject: Re: How to import data with a different date format > > I think Markus is spot-on given the fact that you have 2 days. Using a > string field is quickest. > > However, if you absolutely MUST have functioning dates, there are three > options I can think of: > 1> can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant > 2> use DIH and DateTransformer, see: > http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer > you can walk a directory importing all the XML files with > FileDataSource. > <http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer>3> you > could write a program to do this manually. > > But given the time constraints, I suspect your time would be better spent > doing the other stuff and just using string as per Markus. I have no clue > how SOLR-savvy you are, so pardon if this is something you already know. But > lots of people trip up over the "string" field type, which is NOT tokenized. > You usually want "text" unless it's some sort of ID.... So it might be worth > it to do some searching earlier rather than later <G>.... > > Best > Erick > > On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma <[EMAIL PROTECTED]>wrote: > > >> No. The Datefield [1] will not accept it any other way. You could, however, >> fool your boss and dump your dates in an ordinary string field. But then you >> cannot use some of the nice date features. >> >> >> >> [1]: >> http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html >> >> -----Original message----- >> From: Rico Lelina <[EMAIL PROTECTED]> >> Sent: Wed 08-09-2010 17:36 >> To: [EMAIL PROTECTED]; >> Subject: How to import data with a different date format >> >> Hi, >> >> I am attempting to import some of our data into SOLR. I did it the quickest >> way >> I know because I literally only have 2 days to import the data and do some >> queries for a proof-of-concept. >> >> So I have this data in XML format and I wrote a short XSLT script to >> convert it >> to the format in solr/example/exampledocs (except I retained the element >> names >> so I had to modify schema.xml in the conf directory. So far so good -- the >> import works and I can search the data. One of my immediate problems is >> that >> there is a date field with the format MM/DD/YYYY. Looking at schema.xml, it >> seems SOLR accepts only full date fields -- everything seems to be >> mandatory >> including the Z for Zulu/UTC time according to the doc. Is there a way to >> specify the date format? >> >> Thanks very much. >> Rico >> >> >> > > >
-
Re: How to import data with a different date formatJonathan Rochkind 2010-09-08, 17:33
> how SOLR-savvy you are, so pardon if this is something you already know. But > lots of people trip up over the "string" field type, which is NOT tokenized. > You usually want "text" unless it's some sort of ID.... So it might be worth > it to do some searching earlier rather than later <G>.... > Why would you want to tokenize a yyyy-mm-dd value? I'm liking the 'string' type. If you do yyyy-mm-dd, then you can even sort properly, and range query with endpoints also specified as yyyy-mm-dd, no? Okay, I'll stop spamming the thread now, heh. Jonathan
-
Re: How to import data with a different date formatDennis Gearon 2010-09-08, 17:59
I'm doing something similar for dates/times/timestamps.
I'm actually trying to do, "'now' is within the range of what appointments(date/time from and to combos, i.e. timestamps). Fairly simple search of: What items have a start time BEFORE now, and an end time AFTER now? My thoughts were to store: unix time stamp BIGINTS (64 bit) "ISO_DATE ISO_TIME" strings Which is going to be faster: 1/ Indexing? 2/ Searching? How does the 'tint' field mentioned below apply? Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind <[EMAIL PROTECTED]> wrote: > From: Jonathan Rochkind <[EMAIL PROTECTED]> > Subject: Re: How to import data with a different date format > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Date: Wednesday, September 8, 2010, 10:27 AM > Just throwing it out there, I'd > consider a different approach for an actual real app, > although it might not be easier to get up quickly. (For > quickly, yeah, I'd just store it as a string, more on that > at bottom). > > If none of your dates have times, they're all just full > days, I'm not sure you really need the date type at all. > > Convert the date to number-of-days since epoch > integer. (Most languages will have a way to do this, > but I don't know about pure XSLT). Store _that_ in a > 1.4 'int' field. On top of that, make it a "tint" > (precision non-zero) for faster range queries. > > But now your actual interface will have to convert from > "number of days since epoch" to a displayable date. (And if > you allow user input, convert the input to > number-of-days-since-epoch before making a range query or > fq, but you'd have to do that anyway even with solr dates, > users aren't going to be entering W3CDate raw, I don't > think). > > That is probably the most efficient way to have solr handle > it -- using an actual date field type gives you a lot more > precision than you need, which is going to hurt performance > on range queries. Which you can compensate for with trie > date sure, but if you don't really need that precision to > begin with, why use it? Also the extra precision can > end up doing unexpected things and making it easier to have > bugs (range queries on that high precision stuff, you need > to make sure your start date has 00:00:00 set and your end > date has 23:59:59 set, to do what you probably expect). If > you aren't going to use the extra precision, makes > everything a lot simpler to not use a date field. > > Alternately, for your "get this done quick" method, yeah, > I'd just store it as a string. With a string exactly as > you've specified, sorting and range queries won't work how > you'd want. But if you can make it a string of the > format "yyyy/mm/dd" instead (always two-digit month and > year), then you can even sort and do range queries on your > string dates. For the quick and dirty prototype, I'd just do > that. In fact, while this might make range queries and > sorting _slightly_ slower than if you use an int or a tint, > this might really be good enough even for a real app (hey, > it's what lots of people did before the trie-based fields > existed). > > Jonathan > > Erick Erickson wrote: > > I think Markus is spot-on given the fact that you have > 2 days. Using a > > string field is quickest. > > > > However, if you absolutely MUST have functioning > dates, there are three > > options I can think of: > > 1> can you make your XSLT transform the dates? > Confession; I'm XSLT-ignorant > > 2> use DIH and DateTransformer, see: > > http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer > > you can walk a > directory importing all the XML files with > > FileDataSource. > > <http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer>3> > you > > could write a program to do this manually. > >
-
Re: How to import data with a different date formatErick Erickson 2010-09-08, 18:28
That was a general comment on SOLR string types. Mostly I wanted to
prompt Rico to try some searching before getting too hung up on indexing refinements. I'd far rather demo a prototype being able to say "Dates don't work yet, but you can search" than "searching is broken to pieces, but dates work fine!"..... FWIW Erick On Wed, Sep 8, 2010 at 1:33 PM, Jonathan Rochkind <[EMAIL PROTECTED]> wrote: > > how SOLR-savvy you are, so pardon if this is something you already know. >> But >> lots of people trip up over the "string" field type, which is NOT >> tokenized. >> You usually want "text" unless it's some sort of ID.... So it might be >> worth >> it to do some searching earlier rather than later <G>.... >> >> > Why would you want to tokenize a yyyy-mm-dd value? > > I'm liking the 'string' type. If you do yyyy-mm-dd, then you can even sort > properly, and range query with endpoints also specified as yyyy-mm-dd, no? > > Okay, I'll stop spamming the thread now, heh. > > Jonathan > >
-
Re: How to import data with a different date formatJonathan Rochkind 2010-09-08, 18:35
So the standard 'int' field in Solr 1.4 is a "trie based" field,
although the example "int" type in the default solrconfig.xml has a "precision" set to 0, which means it's not really doing "trie" things. If you set the precision to something greater than 0, as in the default example "tint" type, then it's really using 'trie' functionality. 'trie' functionality speeds up range queries by putting each value into 'buckets' (my own term), per the precision specified, so solr has to do less to grab all values within a certain range. That's all tint/non-zero-precision-trie does, speed up range queries. Your use case involves range queries though, so it's worth investigating. If you use a string or other textual type for sorting or range queries, you need to make sure your values sort the way you want them to as strings. But yyyy-mm-dd will. More on trie: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ I think there probably won't be much of a difference at query time between non-trie int and string, although I'm not sure, and it may depend on the nature of your data and queries. Using a trie int will be faster for (and only for) range queries, if you have a lot of data. (There are some cases, depending on the data and the nature of your queries, where the overhead of a non-zero-precision trie may outweigh the hypothetical gain, but generally it's faster). I don't think there should be any appreciable difference between how long a non-trie int or a string will take to index -- at least as far as solr is concerned, if your app preparing the documents for solr takes longer to prepare one than another, that's another story. An actual trie (non-zero-precision) theoretically has indexing-time overhead, but I doubt it would be noticeable, unless you have a really really lean mean indexing setup where ever microsecond counts. Jonathan Dennis Gearon wrote: > I'm doing something similar for dates/times/timestamps. > > I'm actually trying to do, "'now' is within the range of what appointments(date/time from and to combos, i.e. timestamps). > > Fairly simple search of: > > What items have a start time BEFORE now, and an end time AFTER now? > > My thoughts were to store: > unix time stamp BIGINTS (64 bit) > "ISO_DATE ISO_TIME" strings > > Which is going to be faster: > 1/ Indexing? > 2/ Searching? > > How does the 'tint' field mentioned below apply? > > > > Dennis Gearon > > Signature Warning > ---------------- > EARTH has a Right To Life, > otherwise we all die. > > Read 'Hot, Flat, and Crowded' > Laugh at http://www.yert.com/film.php > > > --- On Wed, 9/8/10, Jonathan Rochkind <[EMAIL PROTECTED]> wrote: > > >> From: Jonathan Rochkind <[EMAIL PROTECTED]> >> Subject: Re: How to import data with a different date format >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> Date: Wednesday, September 8, 2010, 10:27 AM >> Just throwing it out there, I'd >> consider a different approach for an actual real app, >> although it might not be easier to get up quickly. (For >> quickly, yeah, I'd just store it as a string, more on that >> at bottom). >> >> If none of your dates have times, they're all just full >> days, I'm not sure you really need the date type at all. >> >> Convert the date to number-of-days since epoch >> integer. (Most languages will have a way to do this, >> but I don't know about pure XSLT). Store _that_ in a >> 1.4 'int' field. On top of that, make it a "tint" >> (precision non-zero) for faster range queries. >> >> But now your actual interface will have to convert from >> "number of days since epoch" to a displayable date. (And if >> you allow user input, convert the input to >> number-of-days-since-epoch before making a range query or >> fq, but you'd have to do that anyway even with solr dates, >> users aren't going to be entering W3CDate raw, I don't >> think). >> >> That is probably the most efficient way to have solr handle
-
Re: How to import data with a different date formatDennis Gearon 2010-09-08, 19:01
So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right?
And nothing date/timestamp related has come out since, making 'trie'/INT the field of choice for timestamps, right? Seems like the fastest choice. I will have to read up on it. Seems like my original choice to use unix timestamp as storage in my SQL database, vs native Postgres timestamp, will make everything easier between: PHP Symfony Postgres Solr It's probably going to be a good idea to store two other columns in the search index for display, 'date', 'time'. That is, unless I force the user's javascript to generate the time and date from the unix timestamp. hmmmmmm. Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind <[EMAIL PROTECTED]> wrote: > From: Jonathan Rochkind <[EMAIL PROTECTED]> > Subject: Re: How to import data with a different date format > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Date: Wednesday, September 8, 2010, 11:35 AM > So the standard 'int' field in Solr > 1.4 is a "trie based" field, although the example "int" type > in the default solrconfig.xml has a "precision" set to 0, > which means it's not really doing "trie" things. If you set > the precision to something greater than 0, as in the default > example "tint" type, then it's really using 'trie' > functionality. 'trie' functionality speeds up range > queries by putting each value into 'buckets' (my own term), > per the precision specified, so solr has to do less to grab > all values within a certain range. > > That's all tint/non-zero-precision-trie does, speed up > range queries. Your use case involves range queries though, > so it's worth investigating. If you use a string or > other textual type for sorting or range queries, you need to > make sure your values sort the way you want them to as > strings. But yyyy-mm-dd will. > > More on trie: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ > > I think there probably won't be much of a difference at > query time between non-trie int and string, although I'm not > sure, and it may depend on the nature of your data and > queries. Using a trie int will be faster > for (and only for) range queries, if you have a lot of data. > (There are some cases, depending on the data and the nature > of your queries, where the overhead of a non-zero-precision > trie may outweigh the hypothetical gain, but generally it's > faster). > I don't think there should be any appreciable difference > between how long a non-trie int or a string will take to > index -- at least as far as solr is concerned, if your app > preparing the documents for solr takes longer to prepare one > than another, that's another story. An actual trie > (non-zero-precision) theoretically has indexing-time > overhead, but I doubt it would be noticeable, unless you > have a really really lean mean indexing setup where ever > microsecond counts. > > Jonathan > > Dennis Gearon wrote: > > I'm doing something similar for > dates/times/timestamps. > > > > I'm actually trying to do, "'now' is within the range > of what appointments(date/time from and to combos, i.e. > timestamps). > > > > Fairly simple search of: > > > > What items have a start time BEFORE now, > and an end time AFTER now? > > > > My thoughts were to store: > > unix time stamp BIGINTS (64 bit) > > "ISO_DATE ISO_TIME" strings > > > > Which is going to be faster: > > 1/ Indexing? > > 2/ Searching? > > > > How does the 'tint' field mentioned below apply? > > > > > > > > Dennis Gearon > > > > Signature Warning > > ---------------- > > EARTH has a Right To Life, > > otherwise we all die. > > > > Read 'Hot, Flat, and Crowded' > > Laugh at http://www.yert.com/film.php > > > > > > --- On Wed, 9/8/10, Jonathan Rochkind <[EMAIL PROTECTED]>
-
Re: How to import data with a different date formatChris Hostetter 2010-09-08, 19:09
: If none of your dates have times, they're all just full days, I'm not sure you : really need the date type at all. : : Convert the date to number-of-days since epoch integer. (Most languages will : have a way to do this, but I don't know about pure XSLT). Store _that_ in a : 1.4 'int' field. On top of that, make it a "tint" (precision non-zero) for : faster range queries. There's really no advantage to doing this over using the TrieDateField (available in Solr 1.4). It's esentially how it's implemented under the covers (you can pick the precision just like TrieInt) except that: 1) it uses a long instead of an int 2) it supports DateMath expressions 3) it supports Date Faceting -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
-
Re: How to import data with a different date formatJonathan Rochkind 2010-09-08, 22:07
Solr 1.4 was the first tagged release with trie fields.
And Solr 1.4+ also includes a 'date' field based on 'trie' just for dates. If your dates are actually going to include hour/minute/second, not just calendar day-of-month, then I'd definitely use the built in solr trie date field, that's what it's for, will do the translation from calendar date-time to integer for you (in both directions), and add trie buckets for fast range querying too. I was suggesting that just using 'int' might be simpler if you don't need hour/minute/second precision, but are just storing year-month-day. If you've got hour-minute-second too, no reason not to use Solr's date type, and lots of reasons to do so. Jonathan Dennis Gearon wrote: > So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right? > > And nothing date/timestamp related has come out since, making 'trie'/INT the field of choice for timestamps, right? > > Seems like the fastest choice. > > I will have to read up on it. > > Seems like my original choice to use unix timestamp as storage in my SQL database, vs native Postgres timestamp, will make everything easier between: > PHP > Symfony > Postgres > Solr > > It's probably going to be a good idea to store two other columns in the search index for display, 'date', 'time'. That is, unless I force the user's javascript to generate the time and date from the unix timestamp. hmmmmmm. > > Dennis Gearon > > Signature Warning > ---------------- > EARTH has a Right To Life, > otherwise we all die. > > Read 'Hot, Flat, and Crowded' > Laugh at http://www.yert.com/film.php > > > --- On Wed, 9/8/10, Jonathan Rochkind <[EMAIL PROTECTED]> wrote: > > >> From: Jonathan Rochkind <[EMAIL PROTECTED]> >> Subject: Re: How to import data with a different date format >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> Date: Wednesday, September 8, 2010, 11:35 AM >> So the standard 'int' field in Solr >> 1.4 is a "trie based" field, although the example "int" type >> in the default solrconfig.xml has a "precision" set to 0, >> which means it's not really doing "trie" things. If you set >> the precision to something greater than 0, as in the default >> example "tint" type, then it's really using 'trie' >> functionality. 'trie' functionality speeds up range >> queries by putting each value into 'buckets' (my own term), >> per the precision specified, so solr has to do less to grab >> all values within a certain range. >> >> That's all tint/non-zero-precision-trie does, speed up >> range queries. Your use case involves range queries though, >> so it's worth investigating. If you use a string or >> other textual type for sorting or range queries, you need to >> make sure your values sort the way you want them to as >> strings. But yyyy-mm-dd will. >> >> More on trie: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ >> >> I think there probably won't be much of a difference at >> query time between non-trie int and string, although I'm not >> sure, and it may depend on the nature of your data and >> queries. Using a trie int will be faster >> for (and only for) range queries, if you have a lot of data. >> (There are some cases, depending on the data and the nature >> of your queries, where the overhead of a non-zero-precision >> trie may outweigh the hypothetical gain, but generally it's >> faster). >> I don't think there should be any appreciable difference >> between how long a non-trie int or a string will take to >> index -- at least as far as solr is concerned, if your app >> preparing the documents for solr takes longer to prepare one >> than another, that's another story. An actual trie >> (non-zero-precision) theoretically has indexing-time >> overhead, but I doubt it would be noticeable, unless you >> have a really really lean mean indexing setup where ever >> microsecond counts. >> >> Jonathan >> >> Dennis Gearon wrote: >>
-
Re: How to import data with a different date formatDennis Gearon 2010-09-08, 23:45
I already have the issue of how to store between different databases, languages, platforms, and frameworks.
Settling on LONGINT/unix timestamp solves the problem on all fronts. I may even send them to the browser and have the JScript convert them to date/times (maybe ;-) So, it's *nix timestamp or bust! Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind <[EMAIL PROTECTED]> wrote: > From: Jonathan Rochkind <[EMAIL PROTECTED]> > Subject: Re: How to import data with a different date format > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Date: Wednesday, September 8, 2010, 3:07 PM > Solr 1.4 was the first tagged release > with trie fields. > > And Solr 1.4+ also includes a 'date' field based on 'trie' > just for > dates. If your dates are actually going to include > hour/minute/second, > not just calendar day-of-month, then I'd definitely use the > built in > solr trie date field, that's what it's for, will do the > translation from > calendar date-time to integer for you (in both directions), > and add trie > buckets for fast range querying too. > > I was suggesting that just using 'int' might be simpler if > you don't > need hour/minute/second precision, but are just storing > year-month-day. > If you've got hour-minute-second too, no reason not to use > Solr's date > type, and lots of reasons to do so. > > Jonathan > > Dennis Gearon wrote: > > So now, vs when 'trie' came out, Solr has an INT field > that IS 'trie', right? > > > > And nothing date/timestamp related has come out since, > making 'trie'/INT the field of choice for timestamps, > right? > > > > Seems like the fastest choice. > > > > I will have to read up on it. > > > > Seems like my original choice to use unix timestamp as > storage in my SQL database, vs native Postgres timestamp, > will make everything easier between: > > PHP > > Symfony > > Postgres > > Solr > > > > It's probably going to be a good idea to store two > other columns in the search index for display, 'date', > 'time'. That is, unless I force the user's javascript to > generate the time and date from the unix timestamp. > hmmmmmm. > > > > Dennis Gearon > > > > Signature Warning > > ---------------- > > EARTH has a Right To Life, > > otherwise we all die. > > > > Read 'Hot, Flat, and Crowded' > > Laugh at http://www.yert.com/film.php > > > > > > --- On Wed, 9/8/10, Jonathan Rochkind <[EMAIL PROTECTED]> > wrote: > > > > > >> From: Jonathan Rochkind <[EMAIL PROTECTED]> > >> Subject: Re: How to import data with a different > date format > >> To: "[EMAIL PROTECTED]" > <[EMAIL PROTECTED]> > >> Date: Wednesday, September 8, 2010, 11:35 AM > >> So the standard 'int' field in Solr > >> 1.4 is a "trie based" field, although the example > "int" type > >> in the default solrconfig.xml has a "precision" > set to 0, > >> which means it's not really doing "trie" things. > If you set > >> the precision to something greater than 0, as in > the default > >> example "tint" type, then it's really using > 'trie' > >> functionality. 'trie' functionality speeds > up range > >> queries by putting each value into 'buckets' (my > own term), > >> per the precision specified, so solr has to do > less to grab > >> all values within a certain range. > >> > >> That's all tint/non-zero-precision-trie does, > speed up > >> range queries. Your use case involves range > queries though, > >> so it's worth investigating. If you use a > string or > >> other textual type for sorting or range queries, > you need to > >> make sure your values sort the way you want them > to as > >> strings. But yyyy-mm-dd will. > >> > >> More on trie: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ > >> > >> I think there probably won't be much of a > difference at |