|
Burcu Buyukkagnici
2011-11-15, 07:12
Yuval Feinstein
2011-11-15, 07:34
Burcu Buyukkagnici
2011-11-16, 14:38
Lance Norskog
2011-11-17, 04:58
Isabel Drost
2011-11-18, 21:01
|
-
mahout for enterprise search projectBurcu Buyukkagnici 2011-11-15, 07:12
Hi,
I'm new to this community. I want to use mahout as a component of an enterprise search project. The project is at conceptual phase. My business need is to be able to find everything about a related task and reorganize the output as a new view. The results should be actionable. Also the system should be integrated with software development environment tools; Subversion; JIRA and Redmine; Sharepoint Blogs; wikis and people ( active directory) Everything means, files, tools and people. Files are mostly text based (word, pdf, source files);to search audio and video files are further needs. Where does mahout; Lucene/solr and UIMA framework fit in the following scenario? And what are the system requirements to setup a development environment? X is a new project team member in a software development firm. Her project is a 10 years-old maintainence project mainly; however customers want small development requests on that platform. Her boss wants her to prepare a software requirement specification document for a new request. Since she hasn't prepared an SRS before; she wants to find previously prepared documents, and asks her collegues to give her a sample. Her friend gives her a sample based on a very ancient version of SRS from her local computer. The company has Windows file server, a new content management system (portal); also some projects use Subversion to store the docs and also wikis. 1. There should be a platform that can search files in all these environments. 2. The system should understand SRS is an outcome of software requirements engineering or analysis process. The system should understand SRS, software requirements specification and functional design descriptions are similar terms. 3. The company has manuals, templates and process definitions about requirements engineering and has an SRS template which supersedes other versions. While searching the system should list organizational docs and then project docs related to SRSes. 4. The project has different SRSes written through 10 years. So the system should list that specific projectsSRS templates indicationg version conflicts between org. document templates and projects... 5. Also the system should list the people who involve requirements engineering process previously in that project first; then in other projects. 6. Also system should have a suggestion mechanism. The system should know the domain of the project X is workin on and its sub parts. For ex, X is working on an e-commerce project. And the new request is about mobile payments. In the same company but in a different project; a project team is working on e-wallet projects for a bank. Based on her profile, system should be able to suggest people, tools and outcomes from the other project relating with payments domain. The domain identification and grouping the related docs, tools and people in an existing system is nearly not possible manually. I want the system can identify and cluster the related things itself and also learn and improve the results by user feedback. Also, some people should give input to the system by classifying the concepts for the system. Like for example; I have organizational assets; document; tools; people. The documents are project docs and organizational docs and they are related. This can be a guidance for the system. I think carrot2 is doing sth very similar to what I say; but it has got file limitation.Anyway, I need a roadmap to initiate a project like this.Where should I start? Thanks,
-
Re: mahout for enterprise search projectYuval Feinstein 2011-11-15, 07:34
My 2c: Start with getting all the relevant texts into one place, namely a
search index. A good prototyping tool would be Solr. You will need something like ManifoldCF: http://incubator.apache.org/connectors/ for collecting documents from the various environments. Here is Erik Hatcher's "Rapid Prototyping With Solr": http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-4312681 Once you get enough stuff into Solr, you will be able to search it easily. Next, you can start using Mahout: http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ I would go for an iterative design, first taking a small sample of documents from each environment, trying the systems out, and then scaling. Good luck, Yuval On Tue, Nov 15, 2011 at 9:12 AM, Burcu Buyukkagnici <[EMAIL PROTECTED]>wrote: > Hi, > I'm new to this community. I want to use mahout as a component of an > enterprise search project. The project is at conceptual phase. My business > need is to be able to find everything about a related task and reorganize > the output as a new view. The results should be actionable. Also the system > should be integrated with software development environment tools; > Subversion; JIRA and Redmine; Sharepoint Blogs; wikis and people ( active > directory) > Everything means, files, tools and people. Files are mostly text based > (word, pdf, source files);to search audio and video files are further > needs. > > Where does mahout; Lucene/solr and UIMA framework fit in the following > scenario? And what are the system requirements to setup a development > environment? > > X is a new project team member in a software development firm. Her project > is a 10 years-old maintainence project mainly; however customers want small > development requests on that platform. Her boss wants her to prepare a > software requirement specification document for a new request. Since she > hasn't prepared an SRS before; she wants to find previously prepared > documents, and asks her collegues to give her a sample. > Her friend gives her a sample based on a very ancient version of SRS from > her local computer. The company has Windows file server, a new content > management system (portal); also some projects use Subversion to store the > docs and also wikis. > > > 1. There should be a platform that can search files in all these > environments. > 2. The system should understand SRS is an outcome of software > requirements engineering or analysis process. The system should > understand > SRS, software requirements specification and functional design > descriptions > are similar terms. > 3. The company has manuals, templates and process definitions about > requirements engineering and has an SRS template which supersedes other > versions. While searching the system should list organizational docs and > then project docs related to SRSes. > 4. The project has different SRSes written through 10 years. So the > system should list that specific projectsSRS templates indicationg > version > conflicts between org. document templates and projects... > 5. Also the system should list the people who involve requirements > engineering process previously in that project first; then in other > projects. > 6. Also system should have a suggestion mechanism. The system should > know the domain of the project X is workin on and its sub parts. For ex, > X > is working on an e-commerce project. And the new request is about mobile > payments. In the same company but in a different project; a project team > is > working on e-wallet projects for a bank. Based on her profile, system > should be able to suggest people, tools and outcomes from the other > project > relating with payments domain. > > The domain identification and grouping the related docs, tools and people > in an existing system is nearly not possible manually. I want the system > can identify and cluster the related things itself and also learn and
-
Re: mahout for enterprise search projectBurcu Buyukkagnici 2011-11-16, 14:38
Hi,
Thanks for the resources. They, especially the blogs and its links, are very helpful for me to understand the things.I might have skipped the things related expert finding in the docs, because I haven't read everything yet. Regarding expert finding, do I need a social engine to create, keep and relate profiles or lucene/solr, apache's other projects have this kind of functionality? I want people and the organization can identify the experts relating to a topic. sth like maven7. http://www.maven7.com/index_en.php?page=organizational The experts can be found from their products. For example, from Subversion annotations I can learn who previously work on a similar subject. I want to see the related developers, test specialist and related bugs. Also, based on dependency of code, I want to identify the people who might be affected by the changes that I am doing. I hope I can explain what I'm thinking. So profiling experts based on text files and database records mostly, can it be done with mahout, lucene etc? Thanks again, On Tue, Nov 15, 2011 at 9:34 AM, Yuval Feinstein <[EMAIL PROTECTED]>wrote: > My 2c: Start with getting all the relevant texts into one place, namely a > search index. > A good prototyping tool would be Solr. > You will need something like ManifoldCF: > http://incubator.apache.org/connectors/ > for collecting documents from the various environments. > Here is Erik Hatcher's "Rapid Prototyping With Solr": > http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-4312681 > Once you get enough stuff into Solr, you will be able to search it easily. > Next, you can start using Mahout: > > http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ > I would go for an iterative design, first taking a small sample of > documents from each environment, > trying the systems out, and then scaling. > Good luck, > Yuval > > > On Tue, Nov 15, 2011 at 9:12 AM, Burcu Buyukkagnici <[EMAIL PROTECTED] > >wrote: > > > Hi, > > I'm new to this community. I want to use mahout as a component of an > > enterprise search project. The project is at conceptual phase. My > business > > need is to be able to find everything about a related task and reorganize > > the output as a new view. The results should be actionable. Also the > system > > should be integrated with software development environment tools; > > Subversion; JIRA and Redmine; Sharepoint Blogs; wikis and people ( active > > directory) > > Everything means, files, tools and people. Files are mostly text based > > (word, pdf, source files);to search audio and video files are further > > needs. > > > > Where does mahout; Lucene/solr and UIMA framework fit in the following > > scenario? And what are the system requirements to setup a development > > environment? > > > > X is a new project team member in a software development firm. Her > project > > is a 10 years-old maintainence project mainly; however customers want > small > > development requests on that platform. Her boss wants her to prepare a > > software requirement specification document for a new request. Since she > > hasn't prepared an SRS before; she wants to find previously prepared > > documents, and asks her collegues to give her a sample. > > Her friend gives her a sample based on a very ancient version of SRS from > > her local computer. The company has Windows file server, a new content > > management system (portal); also some projects use Subversion to store > the > > docs and also wikis. > > > > > > 1. There should be a platform that can search files in all these > > environments. > > 2. The system should understand SRS is an outcome of software > > requirements engineering or analysis process. The system should > > understand > > SRS, software requirements specification and functional design > > descriptions > > are similar terms. > > 3. The company has manuals, templates and process definitions about > > requirements engineering and has an SRS template which supersedes other
-
Re: mahout for enterprise search projectLance Norskog 2011-11-17, 04:58
This project is mostly a text search project. You can get basic
functionality without doing any math of this sort. (The Lucene search algorithms do a simplified and very fast version of one of the recommender algorithms in Mahout.) On Wed, Nov 16, 2011 at 6:38 AM, Burcu Buyukkagnici <[EMAIL PROTECTED]>wrote: > Hi, > > Thanks for the resources. They, especially the blogs and its links, are > very helpful for me to understand the things.I might have skipped the > things related expert finding in the docs, because I haven't read > everything yet. Regarding expert finding, do I need a social engine to > create, keep and relate profiles or lucene/solr, apache's other projects > have this kind of functionality? > I want people and the organization can identify the experts relating to a > topic. sth like maven7. > http://www.maven7.com/index_en.php?page=organizational > The experts can be found from their products. For example, from Subversion > annotations I can learn who previously work on a similar subject. I want to > see the related developers, test specialist and related bugs. Also, based > on dependency of code, I want to identify the people who might be affected > by the changes that I am doing. > I hope I can explain what I'm thinking. So profiling experts based on text > files and database records mostly, can it be done with mahout, lucene etc? > > Thanks again, > > On Tue, Nov 15, 2011 at 9:34 AM, Yuval Feinstein <[EMAIL PROTECTED] > >wrote: > > > My 2c: Start with getting all the relevant texts into one place, namely a > > search index. > > A good prototyping tool would be Solr. > > You will need something like ManifoldCF: > > http://incubator.apache.org/connectors/ > > for collecting documents from the various environments. > > Here is Erik Hatcher's "Rapid Prototyping With Solr": > > > http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-4312681 > > Once you get enough stuff into Solr, you will be able to search it > easily. > > Next, you can start using Mahout: > > > > > http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ > > I would go for an iterative design, first taking a small sample of > > documents from each environment, > > trying the systems out, and then scaling. > > Good luck, > > Yuval > > > > > > On Tue, Nov 15, 2011 at 9:12 AM, Burcu Buyukkagnici <[EMAIL PROTECTED] > > >wrote: > > > > > Hi, > > > I'm new to this community. I want to use mahout as a component of an > > > enterprise search project. The project is at conceptual phase. My > > business > > > need is to be able to find everything about a related task and > reorganize > > > the output as a new view. The results should be actionable. Also the > > system > > > should be integrated with software development environment tools; > > > Subversion; JIRA and Redmine; Sharepoint Blogs; wikis and people ( > active > > > directory) > > > Everything means, files, tools and people. Files are mostly text based > > > (word, pdf, source files);to search audio and video files are further > > > needs. > > > > > > Where does mahout; Lucene/solr and UIMA framework fit in the following > > > scenario? And what are the system requirements to setup a development > > > environment? > > > > > > X is a new project team member in a software development firm. Her > > project > > > is a 10 years-old maintainence project mainly; however customers want > > small > > > development requests on that platform. Her boss wants her to prepare a > > > software requirement specification document for a new request. Since > she > > > hasn't prepared an SRS before; she wants to find previously prepared > > > documents, and asks her collegues to give her a sample. > > > Her friend gives her a sample based on a very ancient version of SRS > from > > > her local computer. The company has Windows file server, a new content > > > management system (portal); also some projects use Subversion to store > > the > > > docs and also wikis. Lance Norskog [EMAIL PROTECTED]
-
Re: mahout for enterprise search projectIsabel Drost 2011-11-18, 21:01
On 15.11.2011 Burcu Buyukkagnici wrote:
> Where does mahout; Lucene/solr and UIMA framework fit in the following > scenario? Some more background on how search and machine learning fit together see also http://www.manning.com/ingersoll/ Also at the latest ApacheConNA Grant provided some ideas and insights on what types of problems can be solved by a search engine alone. Recordings of all talks are online at http://feathercast.org Isabel |