|
|
+
Trevor Watson 2011-06-16, 15:31
+
Trevor Watson 2011-06-16, 15:50
+
Franklin Simmons 2011-06-16, 20:58
-
RE: [Lucene.Net] Analyzer Question for Lucene.NetDigy 2011-06-16, 16:04
Take a look at UnaccentedWordAnalyzer in
https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_ 9_4g/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs If you want, you can remove the "ASCIIFoldingFilter" from the chain. DIGY -----Original Message----- From: Trevor Watson [mailto:[EMAIL PROTECTED]] Sent: Thursday, June 16, 2011 6:31 PM To: [EMAIL PROTECTED] Subject: [Lucene.Net] Analyzer Question for Lucene.Net I'm trying to get Lucene.Net to create terms the way that we want it to happen. I'm currently running Lucene.Net 2.9.2.2. Bascially, we want the StandardAnalyzer with the exception that we want terms to be divided at a period as well. The StandardAnalyzer seems to only split the 2 words into terms if the period is followed by white-space. So if we index autoexec.bat it should do [autoexec] and [bat], not [autoexec.bat] I was trying to create my own Analyzer that would do it, but could not figure out how. So far I have a very basic analyzer that uses the StandardTokenizer and 2 filters. // --------- code block ---------------------- class ExtendedStandardAnalyzer : Analyzer { public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader) { TokenStream ersult = new StandardTokenizer(Lucene.Net.Util.Version.LUCENE_29, reader); // TokenStream result = new LetterTokenizer(reader); // doesn't work because we want numbers result = new StandardFilter(result); result = new LowerCaseFilter(result); return result; } } // --------- end code block ------------------ Thanks in advance. +
Prescott Nasser 2011-06-16, 16:03
|