|
|
Prasad KVSH 2012-02-01, 13:07
Hi,
lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, doc, xls, msg, TXT files. For this we have any common function to accomplish this. Please help me on this.
Thanks
Prasad
+
Prasad KVSH 2012-02-01, 13:07
KARTHIK SHIVAKUMAR 2012-02-01, 13:34
Hi
>>lucene-3.0.3 can be used for searching a text from
Lucene 's primary job is to do a text search.
May it be PDF/HTML/XML/MSword/PPT/XLS
U have to have the code for plugin to do 2 things
1) Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS) 2) Index this processed text using Lucene
The indexed process can be later used for Searching thru the required content.
;) with regards karthik On Wed, Feb 1, 2012 at 6:37 PM, Prasad KVSH <[EMAIL PROTECTED]>wrote:
> Hi, > > > > lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, doc, > xls, msg, TXT files. For this we have any common function to accomplish > this. Please help me on this. > > > > Thanks > > Prasad > > > > -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
+
KARTHIK SHIVAKUMAR 2012-02-01, 13:34
Ian Lea 2012-02-01, 13:51
You could also take a look at Solr. From http://lucene.apache.org/solr/features.html * Easy ways to pull in data from databases and XML files from local disk and HTTP sources * Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika Sounds just what you need. -- Ian. On Wed, Feb 1, 2012 at 1:34 PM, KARTHIK SHIVAKUMAR <[EMAIL PROTECTED]> wrote: > Hi > >>>lucene-3.0.3 can be used for searching a text from > > Lucene 's primary job is to do a text search. > > May it be PDF/HTML/XML/MSword/PPT/XLS > > U have to have the code for plugin to do 2 things > > 1) Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS) > 2) Index this processed text using Lucene > > The indexed process can be later used for Searching thru the required > content. > > ;) > with regards > karthik > > > On Wed, Feb 1, 2012 at 6:37 PM, Prasad KVSH <[EMAIL PROTECTED]>wrote: > >> Hi, >> >> >> >> lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, doc, >> xls, msg, TXT files. For this we have any common function to accomplish >> this. Please help me on this. >> >> >> >> Thanks >> >> Prasad >> >> >> >> > > > -- > *N.S.KARTHIK > R.M.S.COLONY > BEHIND BANK OF INDIA > R.M.V 2ND STAGE > BANGALORE > 560094* ---------------------------------------------------------------------
+
Ian Lea 2012-02-01, 13:51
Prasad KVSH 2012-02-01, 13:54
It will be great if you provide some working examples on this. We tried to deploy solr.war but getting exceptions. Thanks Prasad -----Original Message----- From: Ian Lea [mailto:[EMAIL PROTECTED]] Sent: Wednesday, February 01, 2012 7:22 PM To: [EMAIL PROTECTED] Subject: Re: lucene-3.0.3 You could also take a look at Solr. From http://lucene.apache.org/solr/features.html * Easy ways to pull in data from databases and XML files from local disk and HTTP sources * Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika Sounds just what you need. -- Ian. On Wed, Feb 1, 2012 at 1:34 PM, KARTHIK SHIVAKUMAR <[EMAIL PROTECTED]> wrote: > Hi > >>>lucene-3.0.3 can be used for searching a text from > > Lucene 's primary job is to do a text search. > > May it be PDF/HTML/XML/MSword/PPT/XLS > > U have to have the code for plugin to do 2 things > > 1) Strip text from either of the Documents > (PDF/HTML/XML/MSword/PPT/XLS) > 2) Index this processed text using Lucene > > The indexed process can be later used for Searching thru the required > content. > > ;) > with regards > karthik > > > On Wed, Feb 1, 2012 at 6:37 PM, Prasad KVSH <[EMAIL PROTECTED]>wrote: > >> Hi, >> >> >> >> lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, >> doc, xls, msg, TXT files. For this we have any common function to >> accomplish this. Please help me on this. >> >> >> >> Thanks >> >> Prasad >> >> >> >> > > > -- > *N.S.KARTHIK > R.M.S.COLONY > BEHIND BANK OF INDIA > R.M.V 2ND STAGE > BANGALORE > 560094* --------------------------------------------------------------------- ---------------------------------------------------------------------
+
Prasad KVSH 2012-02-01, 13:54
Erick Erickson 2012-02-01, 13:59
What did you try and what exceptions did you get? You might review: http://wiki.apache.org/solr/UsingMailingListsBest Erick On Wed, Feb 1, 2012 at 8:54 AM, Prasad KVSH <[EMAIL PROTECTED]> wrote: > It will be great if you provide some working examples on this. We tried > to deploy solr.war but getting exceptions. > > Thanks > Prasad > > -----Original Message----- > From: Ian Lea [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, February 01, 2012 7:22 PM > To: [EMAIL PROTECTED] > Subject: Re: lucene-3.0.3 > > You could also take a look at Solr. From > http://lucene.apache.org/solr/features.html> > * Easy ways to pull in data from databases and XML files from local > disk and HTTP sources > > * Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using > Apache Tika > > > Sounds just what you need. > > > -- > Ian. > > On Wed, Feb 1, 2012 at 1:34 PM, KARTHIK SHIVAKUMAR > <[EMAIL PROTECTED]> wrote: >> Hi >> >>>>lucene-3.0.3 can be used for searching a text from >> >> Lucene 's primary job is to do a text search. >> >> May it be PDF/HTML/XML/MSword/PPT/XLS >> >> U have to have the code for plugin to do 2 things >> >> 1) Strip text from either of the Documents >> (PDF/HTML/XML/MSword/PPT/XLS) >> 2) Index this processed text using Lucene >> >> The indexed process can be later used for Searching thru the required >> content. >> >> ;) >> with regards >> karthik >> >> >> On Wed, Feb 1, 2012 at 6:37 PM, Prasad KVSH > <[EMAIL PROTECTED]>wrote: >> >>> Hi, >>> >>> >>> >>> lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, >>> doc, xls, msg, TXT files. For this we have any common function to >>> accomplish this. Please help me on this. >>> >>> >>> >>> Thanks >>> >>> Prasad >>> >>> >>> >>> >> >> >> -- >> *N.S.KARTHIK >> R.M.S.COLONY >> BEHIND BANK OF INDIA >> R.M.V 2ND STAGE >> BANGALORE >> 560094* > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > ---------------------------------------------------------------------
+
Erick Erickson 2012-02-01, 13:59
Prasad KVSH 2012-02-01, 13:51
Hi Karthik,
I appreciate your quick response.
I guess the next question is how to do strip the text from PDF/HTML/XML/MSword/PPT/XLS and where it will store for indexing.
What are the other scenarios (like adding files, deleting files) where we need to execute indexfiles.classs.
Thanks Prasad
-----Original Message----- From: KARTHIK SHIVAKUMAR [mailto:[EMAIL PROTECTED]] Sent: Wednesday, February 01, 2012 7:04 PM To: [EMAIL PROTECTED] Subject: Re: lucene-3.0.3
Hi
>>lucene-3.0.3 can be used for searching a text from
Lucene 's primary job is to do a text search.
May it be PDF/HTML/XML/MSword/PPT/XLS
U have to have the code for plugin to do 2 things
1) Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS) 2) Index this processed text using Lucene
The indexed process can be later used for Searching thru the required content.
;) with regards karthik On Wed, Feb 1, 2012 at 6:37 PM, Prasad KVSH <[EMAIL PROTECTED]>wrote:
> Hi, > > > > lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, > doc, xls, msg, TXT files. For this we have any common function to > accomplish this. Please help me on this. > > > > Thanks > > Prasad > > > > -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
---------------------------------------------------------------------
+
Prasad KVSH 2012-02-01, 13:51
Prasad KVSH 2012-02-01, 16:41
Hi We have added all the files including PDF/Word/Excel/Txt files but it is searching and finding which are there text files. How to Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS) Thanks, Prasad K.V.S.H. * Project Manager * PACIFIC COAST STEEL (Pinnacle) Project Ness Technologies Road No 11, Banjara Hills, Hyderabad, India.Tel: +91 40 66041401 | Mobile: +91 9247475840 [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> | www.ness.com < https://hyd1owa.ness.com/exchweb/bin/redir.asp?URL=http://www.ness.com/> ________________________________ From: KARTHIK SHIVAKUMAR [mailto:[EMAIL PROTECTED]] Sent: Wed 2/1/2012 7:04 PM To: [EMAIL PROTECTED] Subject: Re: lucene-3.0.3 Hi >>lucene-3.0.3 can be used for searching a text from Lucene 's primary job is to do a text search. May it be PDF/HTML/XML/MSword/PPT/XLS U have to have the code for plugin to do 2 things 1) Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS) 2) Index this processed text using Lucene The indexed process can be later used for Searching thru the required content. ;) with regards karthik On Wed, Feb 1, 2012 at 6:37 PM, Prasad KVSH <[EMAIL PROTECTED]>wrote: > Hi, > > > > lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, doc, > xls, msg, TXT files. For this we have any common function to accomplish > this. Please help me on this. > > > > Thanks > > Prasad > > > > -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
+
Prasad KVSH 2012-02-01, 16:41
Prasad KVSH 2012-02-01, 16:53
Hi, Please find our requirement and we trying to accomplish this.
Our client is looking for a Extended search engine like searching the given text inside the documents like (PDF, Msg, Excel, XML, Word, TXT etc) and return the list of file names where it find the text. Using the return list we can populate them in User Interface after validating with user access rights. Actually we have one image server in that there will be few folders and sub folders, each folder will have may have 10,000 files.
so far we are search text for TXT files only using lucene-3.0.3.
Thanks
Prasad ________________________________
From: KARTHIK SHIVAKUMAR [mailto:[EMAIL PROTECTED]] Sent: Wed 2/1/2012 7:04 PM To: [EMAIL PROTECTED] Subject: Re: lucene-3.0.3
Hi
>>lucene-3.0.3 can be used for searching a text from
Lucene 's primary job is to do a text search.
May it be PDF/HTML/XML/MSword/PPT/XLS
U have to have the code for plugin to do 2 things
1) Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS) 2) Index this processed text using Lucene
The indexed process can be later used for Searching thru the required content.
;) with regards karthik On Wed, Feb 1, 2012 at 6:37 PM, Prasad KVSH <[EMAIL PROTECTED]>wrote:
> Hi, > > > > lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, doc, > xls, msg, TXT files. For this we have any common function to accomplish > this. Please help me on this. > > > > Thanks > > Prasad > > > > -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
+
Prasad KVSH 2012-02-01, 16:53
Sethi, Parampreet 2012-02-01, 16:59
Hi Prasad, I was looking through documentation few days ago and found helpful information in Lucene FAQs. Here are the links http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_PDF_documents. 3F http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_file_formats_like_OpenDocument_.28aka_OpenOffice.org.29.2C_RTF.2C_Microsoft_Word.2C_Excel .2C_PowerPoint.2C_Visio.2C_etc.3F This will be a good starting point for indexing PDF and other files. (e.g. You can extract the text from PDF documents using one of the mentioned clients.) -param On 2/1/12 11:53 AM, "Prasad KVSH" <[EMAIL PROTECTED]> wrote: >Hi, > >Please find our requirement and we trying to accomplish this. > >Our client is looking for a Extended search engine like searching the >given text inside the documents like (PDF, Msg, Excel, XML, Word, TXT >etc) and return the list of file names where it find the text. Using the >return list we can populate them in User Interface after validating with >user access rights. Actually we have one image server in that there will >be few folders and sub folders, each folder will have may have 10,000 >files. > >so far we are search text for TXT files only using lucene-3.0.3. > >Thanks > >Prasad > > >________________________________ > >From: KARTHIK SHIVAKUMAR [mailto:[EMAIL PROTECTED]] >Sent: Wed 2/1/2012 7:04 PM >To: [EMAIL PROTECTED] >Subject: Re: lucene-3.0.3 > > > >Hi > >>>lucene-3.0.3 can be used for searching a text from > >Lucene 's primary job is to do a text search. > >May it be PDF/HTML/XML/MSword/PPT/XLS > >U have to have the code for plugin to do 2 things > >1) Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS) >2) Index this processed text using Lucene > >The indexed process can be later used for Searching thru the required >content. > >;) >with regards >karthik > > >On Wed, Feb 1, 2012 at 6:37 PM, Prasad KVSH ><[EMAIL PROTECTED]>wrote: > >> Hi, >> >> >> >> lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, doc, >> xls, msg, TXT files. For this we have any common function to accomplish >> this. Please help me on this. >> >> >> >> Thanks >> >> Prasad >> >> >> >> > > >-- >*N.S.KARTHIK >R.M.S.COLONY >BEHIND BANK OF INDIA >R.M.V 2ND STAGE >BANGALORE >560094* > > ---------------------------------------------------------------------
+
Sethi, Parampreet 2012-02-01, 16:59
Prasad KVSH 2012-02-02, 18:02
Hi Everybody, lucene-3.0.3. will handle outlook files, DOCX and .EXLX files while searching a text?? We have taken indexfiles.java and searchfiles.java from lucene-3.0.3\src folder, it is working fine for PDF, txt, doc, excel, java, CSV files.
Thanks Prasad
________________________________
From: Prasad KVSH [mailto:[EMAIL PROTECTED]] Sent: Wed 2/1/2012 10:23 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: lucene-3.0.3
Hi,
Please find our requirement and we trying to accomplish this.
Our client is looking for a Extended search engine like searching the given text inside the documents like (PDF, Msg, Excel, XML, Word, TXT etc) and return the list of file names where it find the text. Using the return list we can populate them in User Interface after validating with user access rights. Actually we have one image server in that there will be few folders and sub folders, each folder will have may have 10,000 files.
so far we are search text for TXT files only using lucene-3.0.3.
Thanks
Prasad
+
Prasad KVSH 2012-02-02, 18:02
|