Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Solr, mail # user - Search a URL


+
Max Lynch 2010-09-23, 20:59
+
Markus Jelsma 2010-09-23, 21:11
Copy link to this message
-
RE: Search a URL
Dennis Gearon 2010-09-24, 00:42
WDF is not WTF(what I think when I see WDF), right ;-)

What is WDF?

Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Thu, 9/23/10, Markus Jelsma <[EMAIL PROTECTED]> wrote:

> From: Markus Jelsma <[EMAIL PROTECTED]>
> Subject: RE: Search a URL
> To: [EMAIL PROTECTED]
> Date: Thursday, September 23, 2010, 2:11 PM
> Try setting generateWordParts=1 in
> your WDF. Also, having a WhitespaceTokenizer makes little
> sense for URL's, there should be no whitespace in a URL, the
> StandardTokenizer can tokenize a URL. Anyway, the problem is
> your WDF.
>  
> -----Original message-----
> From: Max Lynch <[EMAIL PROTECTED]>
> Sent: Thu 23-09-2010 23:00
> To: [EMAIL PROTECTED];
>
> Subject: Search a URL
>
> Is there a tokenizer that will allow me to search for parts
> of a URL?  For
> example, the search "google" would match on the data "
> http://mail.google.com/dlkjadf"
>
> This tokenizer factory doesn't seem to be sufficient:
>
>        <fieldType name="text_standard"
> class="solr.TextField"
> positionIncrementGap="100">
>            <analyzer type="index">
>                <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                <filter
> class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts="1"
> catenateWords="1"
> catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1"/>
>                <filter
> class="solr.LowerCaseFilterFactory"/>
>                <filter
> class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>            </analyzer>
>            <analyzer type="query">
>                 <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>
>                 <filter
> class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts="1"
> catenateWords="1"
> catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1"/>
>                 <filter
> class="solr.LowerCaseFilterFactory"/>
>                 <filter
> class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>             </analyzer>
>    </fieldType>
>
> Thanks.
>
+
Markus Jelsma 2010-09-24, 10:36
+
dl 2010-09-23, 21:07