-Re: RecommenderJob Mahout Creating a data model
Sebastian Schelter 2011-09-14, 14:46
I think we best start with you giving us more details about your
usecase. How much data do you have? How much users? What kind of domain
does your system live in?
If you answer these questions first, I'm confident we'll figure out the
best way you can use Mahout.
Mahout's recommender code supports lots of scenarios ranging from
in-memory recommenders on a single machine for small data to massive
batch recommendation computation on hadoop for datasets with dozens of
millions of interactions.
We'll have to find out how much complexity you really have to adapt.
On 14.09.2011 16:36, Robert Evans wrote:
> This should probably be directed more toward the Mahout list then the Hadoop Map/reduce one.
> [EMAIL PROTECTED]
> --Bobby Evans
> On 9/14/11 6:28 AM, "Amit Sangroya" <[EMAIL PROTECTED]> wrote:
> Hi all
> I am trying to run the example from
> with the following command bin/mahout
> -Dmapred.input.dir=input -Dmapred.output.dir=output --itemsFile itemfile
> --tempDir tempDir
> The algorithm estimate the preference of a user towards an item which he/she
> has not yet seen. Once an algorithm can predict preferences it can also be
> used to do Top-N-Recommendation where the task is to find the N items a
> given user might like best. It is mentioned that given a DataModel, it can
> produce recommendations.
> The algorithm takes approx. 5 minutes to generate top 5 recommendations for
> one user on a 10 node hadoop cluster. The size of input is shortened only to
> 200 users from "1 Million MovieLens Dataset" from Grouplens.org.
> I have few questions:
> 1) I want to know that if it is possible to isolate the data model building
> step to generating recommendations.
> 2) Can we use the model once generated using the training data for
> generating recommendations for a range of users.
> 3) To be specific, if I want to provide an on-line service that generates
> recommendations for users, Can I minimize the cost of MapReduce interactions
> each time.
> I am not a data mining expert. Please help me to understand this in a better
> Thanks and Regards,