Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - adding and updating a lot of document to Solr, metadata extraction etc


Copy link to this message
-
Re: adding and updating a lot of document to Solr, metadata extraction etc
Lance Norskog 2009-11-04, 01:49
The DIH has improved a great deal from Solr 1.3 to 1.4. You will be
much better off using the DIH from this.

This is the current Solr release candidate binary:
http://people.apache.org/~gsingers/solr/1.4.0/

On Tue, Nov 3, 2009 at 8:08 AM, Eugene Dzhurinsky <[EMAIL PROTECTED]> wrote:
> On Mon, Nov 02, 2009 at 05:45:37PM -0800, Lance Norskog wrote:
>> About large XML files and http overhead: you can tell solr to load the
>> file directly from a file system. This will stream thousands of
>> documents in one XML file without loading everything in memory at
>> once.
>>
>> This is a new book on Solr. It will help you through this early learning phase.
>>
>> http://www.packtpub.com/solr-1-4-enterprise-search-server
>
> Thank you, but we have to prepare some proof of concept with the stable
> version. I didn't see any 1.4.0 artifacts released to repo1.maven.org for now.
>
> Additionally, I've learned about http://wiki.apache.org/solr/DataImportHandler
> and looks like this way is preferred in my case.
>
> I do have a lot of HTML pages on disk storage, and some metadata being stored
> in SQL tables. What I seem to need is to provide some sort of EntityProcessor
> and DataSource to DataImportHandler. Additionally I will need to provide some
> sort of properties to instruct data source for data retrieval (table names
> etc).
>
> So may be there is some tutorial or how-to, describing the process of creation
> of custom classes for importing the data into Solr 1.3.0?
>
> Thank you in advance!
>
> --
> Eugene N Dzhurinsky
>

--
Lance Norskog
[EMAIL PROTECTED]