Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Solr, mail # dev - indexing null values?


+
Ryan McKinley 2007-08-08, 19:56
+
Yonik Seeley 2007-08-08, 20:09
Copy link to this message
-
Re: indexing null values?
Pieter Berkel 2007-08-09, 04:17
>From an theoretical IR standpoint, there is no reason to index null values,
or even empty strings for that matter.  However in practice there are plenty
of cases that I've encountered where it is necessary to obtain a list of
documents where a particular field is null (i.e. hasn't been specified at
index time) or an empty string.

For example, you may need to generate a list of products contained in your
index that do not have a part number.  A dirty, ugly hack work-around to
this problem that we've used in the past is to replace null or unset values
at index time with a special token value like "__null__" that (hopefully)
won't appear in normal indexed data.  This then allows you to perform a
query something like part_number:"__null__" to obtain all documents without
a part number.  This approach has worked in the past for string fields, not
sure how effective it would be for numerical field types though.

Ultimately, this leads to the situation where you are using Lucene (and
Solr) as a RDBMS, which it clearly is not.  While I'd love to have support
for querying null / empty string fields, I don't think it's going to happen
in the near future.

PIete
+
Yonik Seeley 2007-08-09, 05:27
+
Pieter Berkel 2007-08-09, 07:27
+
Yonik Seeley 2007-08-09, 20:03
+
Yonik Seeley 2007-08-09, 20:07
+
Pieter Berkel 2007-08-10, 02:07
+
Luke Tan 2007-08-10, 03:53
+
Yonik Seeley 2007-08-10, 04:59