Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Lucene.Net, mail # user - [Lucene.Net] Analyzer Question for Lucene.Net


+
Trevor Watson 2011-06-16, 15:31
+
Trevor Watson 2011-06-16, 15:50
+
Franklin Simmons 2011-06-16, 20:58
Copy link to this message
-
RE: [Lucene.Net] Analyzer Question for Lucene.Net
Digy 2011-06-16, 16:04
Take a look at UnaccentedWordAnalyzer in
https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_
9_4g/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs

If you want, you can remove the "ASCIIFoldingFilter" from the chain.
DIGY

-----Original Message-----
From: Trevor Watson [mailto:[EMAIL PROTECTED]]
Sent: Thursday, June 16, 2011 6:31 PM
To: [EMAIL PROTECTED]
Subject: [Lucene.Net] Analyzer Question for Lucene.Net

I'm trying to get Lucene.Net to create terms the way that we want it to
happen.  I'm currently running Lucene.Net 2.9.2.2.

Bascially, we want the StandardAnalyzer with the exception that we want
terms to be divided at a period as well.  The StandardAnalyzer seems to
only split the 2 words into terms if the period is followed by white-space.

So if we index autoexec.bat it should do [autoexec] and [bat], not
[autoexec.bat]

I was trying to create my own Analyzer that would do it, but could not
figure out how.
So far I have a very basic analyzer that uses the StandardTokenizer and
2 filters.

// --------- code block ----------------------

class ExtendedStandardAnalyzer : Analyzer
{
     public override TokenStream TokenStream(string fieldName,
System.IO.TextReader reader)
     {
         TokenStream ersult = new
StandardTokenizer(Lucene.Net.Util.Version.LUCENE_29, reader);
         // TokenStream result = new LetterTokenizer(reader); // doesn't
work because we want numbers

         result = new StandardFilter(result);
         result = new LowerCaseFilter(result);

         return result;
     }
}
// --------- end code block ------------------
Thanks in advance.
+
Prescott Nasser 2011-06-16, 16:03