Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Lucene, mail # dev - DocumentsWriter.checkMaxTermLength issues


Copy link to this message
-
Re: DocumentsWriter.checkMaxTermLength issues
Yonik Seeley 2007-12-31, 16:10
On Dec 31, 2007 5:53 AM, Michael McCandless <[EMAIL PROTECTED]> wrote:
> Doron Cohen <[EMAIL PROTECTED]> wrote:
> > I like the approach of configuration of this behavior in Analysis
> > (and so IndexWriter can throw an exception on such errors).
> >
> > It seems that this should be a property of Analyzer vs.
> > just StandardAnalyzer, right?
> >
> > It can probably be a "policy" property, with two parameters:
> > 1) maxLength, 2) action: chop/split/ignore/raiseException when
> > generating too long tokens.
>
> Agreed, this should be generic/shared to all analyzers.
>
> But maybe for 2.3, we just truncate any too-long term to the max
> allowed size, and then after 2.3 we make this a settable "policy"?

But we already have a nice component model for analyzers...
why not just encapsulate truncation/discarding in a TokenFilter?

-Yonik

---------------------------------------------------------------------