-Re: adding and updating a lot of document to Solr, metadata extraction etc
Lance Norskog 2009-11-04, 01:49
The DIH has improved a great deal from Solr 1.3 to 1.4. You will be
much better off using the DIH from this.
This is the current Solr release candidate binary:
On Tue, Nov 3, 2009 at 8:08 AM, Eugene Dzhurinsky <[EMAIL PROTECTED]> wrote:
> On Mon, Nov 02, 2009 at 05:45:37PM -0800, Lance Norskog wrote:
>> About large XML files and http overhead: you can tell solr to load the
>> file directly from a file system. This will stream thousands of
>> documents in one XML file without loading everything in memory at
>> This is a new book on Solr. It will help you through this early learning phase.
> Thank you, but we have to prepare some proof of concept with the stable
> version. I didn't see any 1.4.0 artifacts released to repo1.maven.org for now.
> Additionally, I've learned about http://wiki.apache.org/solr/DataImportHandler
> and looks like this way is preferred in my case.
> I do have a lot of HTML pages on disk storage, and some metadata being stored
> in SQL tables. What I seem to need is to provide some sort of EntityProcessor
> and DataSource to DataImportHandler. Additionally I will need to provide some
> sort of properties to instruct data source for data retrieval (table names
> So may be there is some tutorial or how-to, describing the process of creation
> of custom classes for importing the data into Solr 1.3.0?
> Thank you in advance!
> Eugene N Dzhurinsky