Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Mahout, mail # user - Persisting trained models in Mahout


+
Vinod 2011-12-08, 12:07
+
Sean Owen 2011-12-08, 12:13
+
Vinod 2011-12-08, 12:27
+
Sean Owen 2011-12-08, 12:30
+
Vinod 2011-12-08, 13:02
+
Sean Owen 2011-12-08, 13:19
+
Vinod 2011-12-08, 13:46
+
Sean Owen 2011-12-08, 13:49
+
Sebastian Schelter 2011-12-08, 14:19
+
Jens Grivolla 2011-12-09, 10:17
+
Sebastian Schelter 2011-12-09, 14:20
+
Jens Grivolla 2011-12-09, 15:56
+
Ted Dunning 2011-12-08, 14:23
+
Vinod 2011-12-08, 17:17
+
Suneel Marthi 2011-12-08, 14:30
+
Vinod 2011-12-08, 17:20
Copy link to this message
-
Re: Persisting trained models in Mahout
Lance Norskog 2011-12-08, 22:52
It would also be useful to load and cache often-used items and compute
rarely-used items online. The Caching classes are the natural fit for this.

On Thu, Dec 8, 2011 at 9:20 AM, Vinod <[EMAIL PROTECTED]> wrote:

> Sure Suneel. Thanks.
>
> On Thu, Dec 8, 2011 at 8:00 PM, Suneel Marthi <[EMAIL PROTECTED]
> >wrote:
>
> > Would ModelSerializer class in Mahout be what you are looking for?  I had
> > used it to persist trained models for SGD classifiers, you may want to
> look
> > into it.
> >
> >
> >
> > ________________________________
> >  From: Vinod <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Sent: Thursday, December 8, 2011 8:46 AM
> > Subject: Re: Persisting trained models in Mahout
> >
> > I'll use the first example from Chapter 2 of your book to clarify what I
> > mean by training:-
> >
> > Following code trains the recommender:-
> >     DataModel model = new FileDataModel(new File("intro.csv"));
> >
> >     UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
> >     UserNeighborhood neighborhood > >       new NearestNUserNeighborhood(2, similarity, model);
> >
> >     Recommender recommender = new GenericUserBasedRecommender(
> >         model, neighborhood, similarity);
> >
> > At this point, recommender is trained on preferences of users 1 to 5 in
> > intro.csv.
> >
> > We should now be able to serialize() this recommender instance into a
> file,
> > say "Movie Recommender.model" using steps mentioned here (
> >
> http://java.sun.com/developer/technicalArticles/Programming/serialization/
> > )
> >
> > All we need to do now is deploy "Movie Recommender.model" to production.
> >
> > If I understand the behavior correctly, this model should now be able to
> > predict recommendation for a new user.
> >
> > As an example, lets assume that production has a different user base. If
> > recommender instance is loaded from "Movie Recommender.model" file and
> > asked to provide recommendations for user '7' who has rated 101 and 102
> as
> > 4 and 3 respectively, it should be able to predict recommendations for 7.
> > right?
> >
> > regards,
> > Vinod
> >
> >
> >
> >
> > On Thu, Dec 8, 2011 at 6:49 PM, Sean Owen <[EMAIL PROTECTED]> wrote:
> >
> > > Yes, I mean you need to write it and read it in your own code.
> > >
> > > What do you mean by training a model? computing similarities? I don't
> > know
> > > if there's such a thing here as "training" on one data set and running
> on
> > > another. The implementations always use all currently available info.
> Is
> > > this a cold-start issue?
> > >
> > > OutOfMemoryError is nothing to do with this; on such a small data set
> it
> > > indicates you didn't set your JVM heap size above the default.
> > >
> > >
> > > On Thu, Dec 8, 2011 at 1:02 PM, Vinod <[EMAIL PROTECTED]> wrote:
> > >
> > > > Hi Sean,
> > > >
> > > > Neither Recommender nor any of its parent interface extends
> > serializable
> > > so
> > > > there is no way that I'd be able to serialize it.
> > > >
> > > > I agree that the implementations may not have startup overhead.
> > However,
> > > > training a model on millions of row is a cpu, memory & time consuming
> > > > activity. For example, when data set is changed from 100K to 1M in
> > > chapter
> > > > 4, program crashes with OutOfMemory after significant amount of time.
> > > >
> > > > I feel that training should be done in development only. Once a
> > developer
> > > > is ok with test results, he should be able to save instance of the
> > > trained
> > > > and tested model  (for ex:- recommender or classifier).
> > > >
> > > > These saved instances of trained and tested models only should be
> > > deployed
> > > > to production.
> > > >
> > > > Thought?
> > > >
> > > > regards,
> > > > Vinod
> > > >
> > > >
> > > >
> > > > On Thu, Dec 8, 2011 at 6:00 PM, Sean Owen <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Ah right. No, there's still not a provision for this. You would
> just
> > > have
> > > > > to serialize it yourself if you like.

Lance Norskog
[EMAIL PROTECTED]
+
Suneel Marthi 2011-12-08, 23:04
+
Ted Dunning 2011-12-08, 14:19