|
Edward J. Yoon
2012-05-25, 08:35
Manuel Blechschmidt
2012-05-25, 10:22
Thomas Jungblut
2012-05-25, 10:44
Ted Dunning
2012-05-25, 17:20
Thomas Jungblut
2012-05-25, 17:24
Sebastian Schelter
2012-05-25, 19:24
Edward J. Yoon
2012-05-25, 23:31
Edward J. Yoon
2012-05-25, 23:41
Ted Dunning
2012-05-26, 07:54
Edward J. Yoon
2012-05-26, 09:58
Suraj Menon
2012-05-26, 11:22
Ted Dunning
2012-05-26, 21:03
Robin Anil
2012-05-27, 16:11
Suraj Menon
2012-05-28, 11:40
Robin Anil
2012-05-28, 16:08
Sean Owen
2012-05-28, 16:12
Robin Anil
2012-05-28, 16:17
Thomas Jungblut
2012-05-26, 09:26
Sebastian Schelter
2012-05-26, 12:05
Ted Dunning
2012-05-26, 20:55
|
-
Re: Online machine learning on top of Hama BSPEdward J. Yoon 2012-05-25, 08:35
CC'ing hama dev.
On Fri, May 25, 2012 at 5:34 PM, Edward J. Yoon <[EMAIL PROTECTED]> wrote: > OKay, I'm FWD this to mahout dev. > > I'm planning to create a project related to On-line machine learning, > as a Apache Hama sub-module. Since the graph of message queues and > workers could be implemented using BSP (see also [1]). The first idea > is On-line recommendation system based on click-stream data. > > If you have interested in this plan, let's talk together here. > > 1. http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html > > ---------- Forwarded message ---------- > From: Thomas Jungblut <[EMAIL PROTECTED]> > Date: Fri, May 25, 2012 at 4:55 PM > Subject: Re: Online machine learning on top of Hama BSP > To: [EMAIL PROTECTED] > > > Should we cooperate with the Mahout guys on this? I'm pretty sure they > would have fun with it. > Edward, do you want to ask them? > > 2012/5/25 Tommaso Teofili <[EMAIL PROTECTED]> > >> Do you have a plan for that Edward? >> A separate package in examples or a separate (online) machine learning >> module? Or something else? >> Regards >> Tommaso >> >> 2012/5/25 Edward J. Yoon <[EMAIL PROTECTED]> >> >> > OKay, then let's get started. >> > >> > My first idea is simple online recommendation system based on >> click-stream >> > data. >> > >> > On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati >> > <[EMAIL PROTECTED]> wrote: >> > > +1 >> > > >> > > For those who are interested in ML, please check this. GNU Octave is >> > used. >> > > >> > > https://www.coursera.org/course/ml >> > > >> > > Another session is yet to be announced. >> > > >> > > Thanks, >> > > Praveen >> > > >> > > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut < >> > > [EMAIL PROTECTED]> wrote: >> > > >> > >> +1 >> > >> >> > >> 2012/5/24 Tommaso Teofili <[EMAIL PROTECTED]> >> > >> >> > >> > and same here :) >> > >> > >> > >> > 2012/5/24 Vaijanath Rao <[EMAIL PROTECTED]> >> > >> > >> > >> > > +1 me too >> > >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" < >> > [EMAIL PROTECTED]> >> > >> > > wrote: >> > >> > > >> > >> > > > +1 >> > >> > > > I would be happy to help :) >> > >> > > > >> > >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon < >> > >> [EMAIL PROTECTED] >> > >> > > > >wrote: >> > >> > > > >> > >> > > > > Hi, >> > >> > > > > >> > >> > > > > Does anyone interesting in online machine learning? >> > >> > > > > >> > >> > > > > -- >> > >> > > > > Best Regards, Edward J. Yoon >> > >> > > > > @eddieyoon >> > >> > > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > > -- >> > >> > > > Cheers, >> > >> > > > Aditya Sarawgi >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> >> > >> >> > >> -- >> > >> Thomas Jungblut >> > >> Berlin <[EMAIL PROTECTED]> >> > >> >> > >> > >> > >> > -- >> > Best Regards, Edward J. Yoon >> > @eddieyoon >> > >> > > > > -- > Thomas Jungblut > Berlin <[EMAIL PROTECTED]> > > > -- > Best Regards, Edward J. Yoon > @eddieyoon -- Best Regards, Edward J. Yoon @eddieyoon +
Edward J. Yoon 2012-05-25, 08:35
-
Re: Online machine learning on top of Hama BSPManuel Blechschmidt 2012-05-25, 10:22
Hi Edward,
do you already have a test dataset? I might get one with about 300.000 clicks for you. It is from www.nelou.com and we are already running a recommender in preview mode: http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode It could be the case that you would have to sign an NDA. Would this be possible for you? /Manuel On 25.05.2012, at 10:34, Edward J. Yoon wrote: > OKay, I'm FWD this to mahout dev. > > I'm planning to create a project related to On-line machine learning, > as a Apache Hama sub-module. Since the graph of message queues and > workers could be implemented using BSP (see also [1]). The first idea > is On-line recommendation system based on click-stream data. > > If you have interested in this plan, let's talk together here. > > 1. http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html > > ---------- Forwarded message ---------- > From: Thomas Jungblut <[EMAIL PROTECTED]> > Date: Fri, May 25, 2012 at 4:55 PM > Subject: Re: Online machine learning on top of Hama BSP > To: [EMAIL PROTECTED] > > > Should we cooperate with the Mahout guys on this? I'm pretty sure they > would have fun with it. > Edward, do you want to ask them? > > 2012/5/25 Tommaso Teofili <[EMAIL PROTECTED]> > >> Do you have a plan for that Edward? >> A separate package in examples or a separate (online) machine learning >> module? Or something else? >> Regards >> Tommaso >> >> 2012/5/25 Edward J. Yoon <[EMAIL PROTECTED]> >> >>> OKay, then let's get started. >>> >>> My first idea is simple online recommendation system based on >> click-stream >>> data. >>> >>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati >>> <[EMAIL PROTECTED]> wrote: >>>> +1 >>>> >>>> For those who are interested in ML, please check this. GNU Octave is >>> used. >>>> >>>> https://www.coursera.org/course/ml >>>> >>>> Another session is yet to be announced. >>>> >>>> Thanks, >>>> Praveen >>>> >>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut < >>>> [EMAIL PROTECTED]> wrote: >>>> >>>>> +1 >>>>> >>>>> 2012/5/24 Tommaso Teofili <[EMAIL PROTECTED]> >>>>> >>>>>> and same here :) >>>>>> >>>>>> 2012/5/24 Vaijanath Rao <[EMAIL PROTECTED]> >>>>>> >>>>>>> +1 me too >>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" < >>> [EMAIL PROTECTED]> >>>>>>> wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> I would be happy to help :) >>>>>>>> >>>>>>>> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon < >>>>> [EMAIL PROTECTED] >>>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Does anyone interesting in online machine learning? >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Regards, Edward J. Yoon >>>>>>>>> @eddieyoon >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Cheers, >>>>>>>> Aditya Sarawgi >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Thomas Jungblut >>>>> Berlin <[EMAIL PROTECTED]> >>>>> >>> >>> >>> >>> -- >>> Best Regards, Edward J. Yoon >>> @eddieyoon >>> >> > > > > -- > Thomas Jungblut > Berlin <[EMAIL PROTECTED]> > > > -- > Best Regards, Edward J. Yoon > @eddieyoon -- Manuel Blechschmidt Dortustr. 57 14467 Potsdam Mobil: 0173/6322621 Twitter: http://twitter.com/Manuel_B +
Manuel Blechschmidt 2012-05-25, 10:22
-
Re: Online machine learning on top of Hama BSPThomas Jungblut 2012-05-25, 10:44
Hi Manuel,
300k is small, I have one with 6 mio clicks. However it is more a question of interest and what algorithms could be suitable for BSP. In case you wonder what BSP is, it stands for bulk synchronous parallel [1]. We think that realtime and strongly iterative algorithms that are slow in mapreduce could be more efficiently solved with BSP. If you're interested, let us know. Regards, Thomas [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel 2012/5/25 Manuel Blechschmidt <[EMAIL PROTECTED]> > Hi Edward, > do you already have a test dataset? > > I might get one with about 300.000 clicks for you. > > It is from www.nelou.com and we are already running a recommender in > preview mode: > http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode > > It could be the case that you would have to sign an NDA. Would this be > possible for you? > > /Manuel > > On 25.05.2012, at 10:34, Edward J. Yoon wrote: > > > OKay, I'm FWD this to mahout dev. > > > > I'm planning to create a project related to On-line machine learning, > > as a Apache Hama sub-module. Since the graph of message queues and > > workers could be implemented using BSP (see also [1]). The first idea > > is On-line recommendation system based on click-stream data. > > > > If you have interested in this plan, let's talk together here. > > > > 1. > http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html > > > > ---------- Forwarded message ---------- > > From: Thomas Jungblut <[EMAIL PROTECTED]> > > Date: Fri, May 25, 2012 at 4:55 PM > > Subject: Re: Online machine learning on top of Hama BSP > > To: [EMAIL PROTECTED] > > > > > > Should we cooperate with the Mahout guys on this? I'm pretty sure they > > would have fun with it. > > Edward, do you want to ask them? > > > > 2012/5/25 Tommaso Teofili <[EMAIL PROTECTED]> > > > >> Do you have a plan for that Edward? > >> A separate package in examples or a separate (online) machine learning > >> module? Or something else? > >> Regards > >> Tommaso > >> > >> 2012/5/25 Edward J. Yoon <[EMAIL PROTECTED]> > >> > >>> OKay, then let's get started. > >>> > >>> My first idea is simple online recommendation system based on > >> click-stream > >>> data. > >>> > >>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati > >>> <[EMAIL PROTECTED]> wrote: > >>>> +1 > >>>> > >>>> For those who are interested in ML, please check this. GNU Octave is > >>> used. > >>>> > >>>> https://www.coursera.org/course/ml > >>>> > >>>> Another session is yet to be announced. > >>>> > >>>> Thanks, > >>>> Praveen > >>>> > >>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut < > >>>> [EMAIL PROTECTED]> wrote: > >>>> > >>>>> +1 > >>>>> > >>>>> 2012/5/24 Tommaso Teofili <[EMAIL PROTECTED]> > >>>>> > >>>>>> and same here :) > >>>>>> > >>>>>> 2012/5/24 Vaijanath Rao <[EMAIL PROTECTED]> > >>>>>> > >>>>>>> +1 me too > >>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" < > >>> [EMAIL PROTECTED]> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> +1 > >>>>>>>> I would be happy to help :) > >>>>>>>> > >>>>>>>> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon < > >>>>> [EMAIL PROTECTED] > >>>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> Does anyone interesting in online machine learning? > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Best Regards, Edward J. Yoon > >>>>>>>>> @eddieyoon > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Cheers, > >>>>>>>> Aditya Sarawgi > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Thomas Jungblut > >>>>> Berlin <[EMAIL PROTECTED]> > >>>>> > >>> > >>> > >>> > >>> -- > >>> Best Regards, Edward J. Yoon > >>> @eddieyoon > >>> > >> > > > > > > > > -- > > Thomas Jungblut > > Berlin <[EMAIL PROTECTED]> > > > > > > -- > > Best Regards, Edward J. Yoon > > @eddieyoon > > -- > Manuel Blechschmidt > Dortustr. 57 > 14467 Potsdam > Mobil: 0173/6322621 > Twitter: http Thomas Jungblut Berlin <[EMAIL PROTECTED]> +
Thomas Jungblut 2012-05-25, 10:44
-
Re: Online machine learning on top of Hama BSPTed Dunning 2012-05-25, 17:20
Apache Giraph probably offers a more mature BSP model of computation. My
guess is that it would make a stronger implementation substrate. It certainly has a very strong community. On Fri, May 25, 2012 at 10:44 AM, Thomas Jungblut < [EMAIL PROTECTED]> wrote: > Hi Manuel, > > 300k is small, I have one with 6 mio clicks. > However it is more a question of interest and what algorithms could be > suitable for BSP. > In case you wonder what BSP is, it stands for bulk synchronous parallel > [1]. > We think that realtime and strongly iterative algorithms that are slow in > mapreduce could be more efficiently solved with BSP. > If you're interested, let us know. > > Regards, > Thomas > > [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel > > 2012/5/25 Manuel Blechschmidt <[EMAIL PROTECTED]> > > > Hi Edward, > > do you already have a test dataset? > > > > I might get one with about 300.000 clicks for you. > > > > It is from www.nelou.com and we are already running a recommender in > > preview mode: > > > http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode > > > > It could be the case that you would have to sign an NDA. Would this be > > possible for you? > > > > /Manuel > > > > On 25.05.2012, at 10:34, Edward J. Yoon wrote: > > > > > OKay, I'm FWD this to mahout dev. > > > > > > I'm planning to create a project related to On-line machine learning, > > > as a Apache Hama sub-module. Since the graph of message queues and > > > workers could be implemented using BSP (see also [1]). The first idea > > > is On-line recommendation system based on click-stream data. > > > > > > If you have interested in this plan, let's talk together here. > > > > > > 1. > > > http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html > > > > > > ---------- Forwarded message ---------- > > > From: Thomas Jungblut <[EMAIL PROTECTED]> > > > Date: Fri, May 25, 2012 at 4:55 PM > > > Subject: Re: Online machine learning on top of Hama BSP > > > To: [EMAIL PROTECTED] > > > > > > > > > Should we cooperate with the Mahout guys on this? I'm pretty sure they > > > would have fun with it. > > > Edward, do you want to ask them? > > > > > > 2012/5/25 Tommaso Teofili <[EMAIL PROTECTED]> > > > > > >> Do you have a plan for that Edward? > > >> A separate package in examples or a separate (online) machine learning > > >> module? Or something else? > > >> Regards > > >> Tommaso > > >> > > >> 2012/5/25 Edward J. Yoon <[EMAIL PROTECTED]> > > >> > > >>> OKay, then let's get started. > > >>> > > >>> My first idea is simple online recommendation system based on > > >> click-stream > > >>> data. > > >>> > > >>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati > > >>> <[EMAIL PROTECTED]> wrote: > > >>>> +1 > > >>>> > > >>>> For those who are interested in ML, please check this. GNU Octave is > > >>> used. > > >>>> > > >>>> https://www.coursera.org/course/ml > > >>>> > > >>>> Another session is yet to be announced. > > >>>> > > >>>> Thanks, > > >>>> Praveen > > >>>> > > >>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut < > > >>>> [EMAIL PROTECTED]> wrote: > > >>>> > > >>>>> +1 > > >>>>> > > >>>>> 2012/5/24 Tommaso Teofili <[EMAIL PROTECTED]> > > >>>>> > > >>>>>> and same here :) > > >>>>>> > > >>>>>> 2012/5/24 Vaijanath Rao <[EMAIL PROTECTED]> > > >>>>>> > > >>>>>>> +1 me too > > >>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" < > > >>> [EMAIL PROTECTED]> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> +1 > > >>>>>>>> I would be happy to help :) > > >>>>>>>> > > >>>>>>>> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon < > > >>>>> [EMAIL PROTECTED] > > >>>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> Hi, > > >>>>>>>>> > > >>>>>>>>> Does anyone interesting in online machine learning? > > >>>>>>>>> > > >>>>>>>>> -- > > >>>>>>>>> Best Regards, Edward J. Yoon > > >>>>>>>>> @eddieyoon > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>> +
Ted Dunning 2012-05-25, 17:20
-
Re: Online machine learning on top of Hama BSPThomas Jungblut 2012-05-25, 17:24
Hi Ted,
Giraph offers a graph layer that uses internally BSP on top of MapReduce. You don't have access to the BSP primitives, therefore you need to treat every machine learning problem as graph problem which maybe very inconvenient in many cases. 2012/5/25 Ted Dunning <[EMAIL PROTECTED]> > Apache Giraph probably offers a more mature BSP model of computation. My > guess is that it would make a stronger implementation substrate. It > certainly has a very strong community. > > On Fri, May 25, 2012 at 10:44 AM, Thomas Jungblut < > [EMAIL PROTECTED]> wrote: > > > Hi Manuel, > > > > 300k is small, I have one with 6 mio clicks. > > However it is more a question of interest and what algorithms could be > > suitable for BSP. > > In case you wonder what BSP is, it stands for bulk synchronous parallel > > [1]. > > We think that realtime and strongly iterative algorithms that are slow in > > mapreduce could be more efficiently solved with BSP. > > If you're interested, let us know. > > > > Regards, > > Thomas > > > > [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel > > > > 2012/5/25 Manuel Blechschmidt <[EMAIL PROTECTED]> > > > > > Hi Edward, > > > do you already have a test dataset? > > > > > > I might get one with about 300.000 clicks for you. > > > > > > It is from www.nelou.com and we are already running a recommender in > > > preview mode: > > > > > > http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode > > > > > > It could be the case that you would have to sign an NDA. Would this be > > > possible for you? > > > > > > /Manuel > > > > > > On 25.05.2012, at 10:34, Edward J. Yoon wrote: > > > > > > > OKay, I'm FWD this to mahout dev. > > > > > > > > I'm planning to create a project related to On-line machine learning, > > > > as a Apache Hama sub-module. Since the graph of message queues and > > > > workers could be implemented using BSP (see also [1]). The first idea > > > > is On-line recommendation system based on click-stream data. > > > > > > > > If you have interested in this plan, let's talk together here. > > > > > > > > 1. > > > > > > http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html > > > > > > > > ---------- Forwarded message ---------- > > > > From: Thomas Jungblut <[EMAIL PROTECTED]> > > > > Date: Fri, May 25, 2012 at 4:55 PM > > > > Subject: Re: Online machine learning on top of Hama BSP > > > > To: [EMAIL PROTECTED] > > > > > > > > > > > > Should we cooperate with the Mahout guys on this? I'm pretty sure > they > > > > would have fun with it. > > > > Edward, do you want to ask them? > > > > > > > > 2012/5/25 Tommaso Teofili <[EMAIL PROTECTED]> > > > > > > > >> Do you have a plan for that Edward? > > > >> A separate package in examples or a separate (online) machine > learning > > > >> module? Or something else? > > > >> Regards > > > >> Tommaso > > > >> > > > >> 2012/5/25 Edward J. Yoon <[EMAIL PROTECTED]> > > > >> > > > >>> OKay, then let's get started. > > > >>> > > > >>> My first idea is simple online recommendation system based on > > > >> click-stream > > > >>> data. > > > >>> > > > >>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati > > > >>> <[EMAIL PROTECTED]> wrote: > > > >>>> +1 > > > >>>> > > > >>>> For those who are interested in ML, please check this. GNU Octave > is > > > >>> used. > > > >>>> > > > >>>> https://www.coursera.org/course/ml > > > >>>> > > > >>>> Another session is yet to be announced. > > > >>>> > > > >>>> Thanks, > > > >>>> Praveen > > > >>>> > > > >>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut < > > > >>>> [EMAIL PROTECTED]> wrote: > > > >>>> > > > >>>>> +1 > > > >>>>> > > > >>>>> 2012/5/24 Tommaso Teofili <[EMAIL PROTECTED]> > > > >>>>> > > > >>>>>> and same here :) > > > >>>>>> > > > >>>>>> 2012/5/24 Vaijanath Rao <[EMAIL PROTECTED]> > > > >>>>>> > > > >>>>>>> +1 me too > > > >>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" < > Thomas Jungblut Berlin <[EMAIL PROTECTED]> +
Thomas Jungblut 2012-05-25, 17:24
-
Re: Online machine learning on top of Hama BSPSebastian Schelter 2012-05-25, 19:24
Hi Thomas,
Interesting discussion, which examples do you have in mind that might be easier representable in general BSP than in Giraph/Pregel? To add my 2-cent: I think the real question whether BSP itself is the best model for distributed machine learning or an asychronous model as implemented in GraphLab should be preferred. But that's more a scientific/esoteric question :) --sebastian On 25.05.2012 19:24, Thomas Jungblut wrote: > Hi Ted, > > Giraph offers a graph layer that uses internally BSP on top of MapReduce. > You don't have access to the BSP primitives, therefore you need to treat > every machine learning problem as graph problem which maybe very > inconvenient in many cases. > > 2012/5/25 Ted Dunning <[EMAIL PROTECTED]> > >> Apache Giraph probably offers a more mature BSP model of computation. My >> guess is that it would make a stronger implementation substrate. It >> certainly has a very strong community. >> >> On Fri, May 25, 2012 at 10:44 AM, Thomas Jungblut < >> [EMAIL PROTECTED]> wrote: >> >>> Hi Manuel, >>> >>> 300k is small, I have one with 6 mio clicks. >>> However it is more a question of interest and what algorithms could be >>> suitable for BSP. >>> In case you wonder what BSP is, it stands for bulk synchronous parallel >>> [1]. >>> We think that realtime and strongly iterative algorithms that are slow in >>> mapreduce could be more efficiently solved with BSP. >>> If you're interested, let us know. >>> >>> Regards, >>> Thomas >>> >>> [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel >>> >>> 2012/5/25 Manuel Blechschmidt <[EMAIL PROTECTED]> >>> >>>> Hi Edward, >>>> do you already have a test dataset? >>>> >>>> I might get one with about 300.000 clicks for you. >>>> >>>> It is from www.nelou.com and we are already running a recommender in >>>> preview mode: >>>> >>> >> http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode >>>> >>>> It could be the case that you would have to sign an NDA. Would this be >>>> possible for you? >>>> >>>> /Manuel >>>> >>>> On 25.05.2012, at 10:34, Edward J. Yoon wrote: >>>> >>>>> OKay, I'm FWD this to mahout dev. >>>>> >>>>> I'm planning to create a project related to On-line machine learning, >>>>> as a Apache Hama sub-module. Since the graph of message queues and >>>>> workers could be implemented using BSP (see also [1]). The first idea >>>>> is On-line recommendation system based on click-stream data. >>>>> >>>>> If you have interested in this plan, let's talk together here. >>>>> >>>>> 1. >>>> >>> >> http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html >>>>> >>>>> ---------- Forwarded message ---------- >>>>> From: Thomas Jungblut <[EMAIL PROTECTED]> >>>>> Date: Fri, May 25, 2012 at 4:55 PM >>>>> Subject: Re: Online machine learning on top of Hama BSP >>>>> To: [EMAIL PROTECTED] >>>>> >>>>> >>>>> Should we cooperate with the Mahout guys on this? I'm pretty sure >> they >>>>> would have fun with it. >>>>> Edward, do you want to ask them? >>>>> >>>>> 2012/5/25 Tommaso Teofili <[EMAIL PROTECTED]> >>>>> >>>>>> Do you have a plan for that Edward? >>>>>> A separate package in examples or a separate (online) machine >> learning >>>>>> module? Or something else? >>>>>> Regards >>>>>> Tommaso >>>>>> >>>>>> 2012/5/25 Edward J. Yoon <[EMAIL PROTECTED]> >>>>>> >>>>>>> OKay, then let's get started. >>>>>>> >>>>>>> My first idea is simple online recommendation system based on >>>>>> click-stream >>>>>>> data. >>>>>>> >>>>>>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati >>>>>>> <[EMAIL PROTECTED]> wrote: >>>>>>>> +1 >>>>>>>> >>>>>>>> For those who are interested in ML, please check this. GNU Octave >> is >>>>>>> used. >>>>>>>> >>>>>>>> https://www.coursera.org/course/ml >>>>>>>> >>>>>>>> Another session is yet to be announced. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Praveen >>>>>>>> >>>>>>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut < >>>>>>>> [EMAIL PROTECTED]> wrote: +
Sebastian Schelter 2012-05-25, 19:24
-
Re: Online machine learning on top of Hama BSPEdward J. Yoon 2012-05-25, 23:31
Seba,
Hama has Pregel layer. If you love Pregel, you can use it instead of basic BSP model. Ted, Compared with Hama, what's the advantage of giraph? probably On Sat, May 26, 2012 at 4:24 AM, Sebastian Schelter <[EMAIL PROTECTED]> wrote: > Hi Thomas, > > Interesting discussion, which examples do you have in mind that might be > easier representable in general BSP than in Giraph/Pregel? > > To add my 2-cent: I think the real question whether BSP itself is the > best model for distributed machine learning or an asychronous model as > implemented in GraphLab should be preferred. But that's more a > scientific/esoteric question :) > > --sebastian > > On 25.05.2012 19:24, Thomas Jungblut wrote: >> Hi Ted, >> >> Giraph offers a graph layer that uses internally BSP on top of MapReduce. >> You don't have access to the BSP primitives, therefore you need to treat >> every machine learning problem as graph problem which maybe very >> inconvenient in many cases. >> >> 2012/5/25 Ted Dunning <[EMAIL PROTECTED]> >> >>> Apache Giraph probably offers a more mature BSP model of computation. My >>> guess is that it would make a stronger implementation substrate. It >>> certainly has a very strong community. >>> >>> On Fri, May 25, 2012 at 10:44 AM, Thomas Jungblut < >>> [EMAIL PROTECTED]> wrote: >>> >>>> Hi Manuel, >>>> >>>> 300k is small, I have one with 6 mio clicks. >>>> However it is more a question of interest and what algorithms could be >>>> suitable for BSP. >>>> In case you wonder what BSP is, it stands for bulk synchronous parallel >>>> [1]. >>>> We think that realtime and strongly iterative algorithms that are slow in >>>> mapreduce could be more efficiently solved with BSP. >>>> If you're interested, let us know. >>>> >>>> Regards, >>>> Thomas >>>> >>>> [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel >>>> >>>> 2012/5/25 Manuel Blechschmidt <[EMAIL PROTECTED]> >>>> >>>>> Hi Edward, >>>>> do you already have a test dataset? >>>>> >>>>> I might get one with about 300.000 clicks for you. >>>>> >>>>> It is from www.nelou.com and we are already running a recommender in >>>>> preview mode: >>>>> >>>> >>> http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode >>>>> >>>>> It could be the case that you would have to sign an NDA. Would this be >>>>> possible for you? >>>>> >>>>> /Manuel >>>>> >>>>> On 25.05.2012, at 10:34, Edward J. Yoon wrote: >>>>> >>>>>> OKay, I'm FWD this to mahout dev. >>>>>> >>>>>> I'm planning to create a project related to On-line machine learning, >>>>>> as a Apache Hama sub-module. Since the graph of message queues and >>>>>> workers could be implemented using BSP (see also [1]). The first idea >>>>>> is On-line recommendation system based on click-stream data. >>>>>> >>>>>> If you have interested in this plan, let's talk together here. >>>>>> >>>>>> 1. >>>>> >>>> >>> http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html >>>>>> >>>>>> ---------- Forwarded message ---------- >>>>>> From: Thomas Jungblut <[EMAIL PROTECTED]> >>>>>> Date: Fri, May 25, 2012 at 4:55 PM >>>>>> Subject: Re: Online machine learning on top of Hama BSP >>>>>> To: [EMAIL PROTECTED] >>>>>> >>>>>> >>>>>> Should we cooperate with the Mahout guys on this? I'm pretty sure >>> they >>>>>> would have fun with it. >>>>>> Edward, do you want to ask them? >>>>>> >>>>>> 2012/5/25 Tommaso Teofili <[EMAIL PROTECTED]> >>>>>> >>>>>>> Do you have a plan for that Edward? >>>>>>> A separate package in examples or a separate (online) machine >>> learning >>>>>>> module? Or something else? >>>>>>> Regards >>>>>>> Tommaso >>>>>>> >>>>>>> 2012/5/25 Edward J. Yoon <[EMAIL PROTECTED]> >>>>>>> >>>>>>>> OKay, then let's get started. >>>>>>>> >>>>>>>> My first idea is simple online recommendation system based on >>>>>>> click-stream >>>>>>>> data. >>>>>>>> >>>>>>>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati >>>>>>>> <[EMAIL PROTECTED]> wrote: >>>>>> Best Regards, Edward J. Yoon @eddieyoon +
Edward J. Yoon 2012-05-25, 23:31
-
Re: Online machine learning on top of Hama BSPEdward J. Yoon 2012-05-25, 23:41
> Compared with Hama, what's the advantage of giraph? probably
probably mature implementation? :D Anyway, what I said was not a discussion of your preferences. On Sat, May 26, 2012 at 8:31 AM, Edward J. Yoon <[EMAIL PROTECTED]> wrote: > Seba, > > Hama has Pregel layer. If you love Pregel, you can use it instead of > basic BSP model. > > Ted, > > Compared with Hama, what's the advantage of giraph? probably > > On Sat, May 26, 2012 at 4:24 AM, Sebastian Schelter <[EMAIL PROTECTED]> wrote: >> Hi Thomas, >> >> Interesting discussion, which examples do you have in mind that might be >> easier representable in general BSP than in Giraph/Pregel? >> >> To add my 2-cent: I think the real question whether BSP itself is the >> best model for distributed machine learning or an asychronous model as >> implemented in GraphLab should be preferred. But that's more a >> scientific/esoteric question :) >> >> --sebastian >> >> On 25.05.2012 19:24, Thomas Jungblut wrote: >>> Hi Ted, >>> >>> Giraph offers a graph layer that uses internally BSP on top of MapReduce. >>> You don't have access to the BSP primitives, therefore you need to treat >>> every machine learning problem as graph problem which maybe very >>> inconvenient in many cases. >>> >>> 2012/5/25 Ted Dunning <[EMAIL PROTECTED]> >>> >>>> Apache Giraph probably offers a more mature BSP model of computation. My >>>> guess is that it would make a stronger implementation substrate. It >>>> certainly has a very strong community. >>>> >>>> On Fri, May 25, 2012 at 10:44 AM, Thomas Jungblut < >>>> [EMAIL PROTECTED]> wrote: >>>> >>>>> Hi Manuel, >>>>> >>>>> 300k is small, I have one with 6 mio clicks. >>>>> However it is more a question of interest and what algorithms could be >>>>> suitable for BSP. >>>>> In case you wonder what BSP is, it stands for bulk synchronous parallel >>>>> [1]. >>>>> We think that realtime and strongly iterative algorithms that are slow in >>>>> mapreduce could be more efficiently solved with BSP. >>>>> If you're interested, let us know. >>>>> >>>>> Regards, >>>>> Thomas >>>>> >>>>> [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel >>>>> >>>>> 2012/5/25 Manuel Blechschmidt <[EMAIL PROTECTED]> >>>>> >>>>>> Hi Edward, >>>>>> do you already have a test dataset? >>>>>> >>>>>> I might get one with about 300.000 clicks for you. >>>>>> >>>>>> It is from www.nelou.com and we are already running a recommender in >>>>>> preview mode: >>>>>> >>>>> >>>> http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode >>>>>> >>>>>> It could be the case that you would have to sign an NDA. Would this be >>>>>> possible for you? >>>>>> >>>>>> /Manuel >>>>>> >>>>>> On 25.05.2012, at 10:34, Edward J. Yoon wrote: >>>>>> >>>>>>> OKay, I'm FWD this to mahout dev. >>>>>>> >>>>>>> I'm planning to create a project related to On-line machine learning, >>>>>>> as a Apache Hama sub-module. Since the graph of message queues and >>>>>>> workers could be implemented using BSP (see also [1]). The first idea >>>>>>> is On-line recommendation system based on click-stream data. >>>>>>> >>>>>>> If you have interested in this plan, let's talk together here. >>>>>>> >>>>>>> 1. >>>>>> >>>>> >>>> http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html >>>>>>> >>>>>>> ---------- Forwarded message ---------- >>>>>>> From: Thomas Jungblut <[EMAIL PROTECTED]> >>>>>>> Date: Fri, May 25, 2012 at 4:55 PM >>>>>>> Subject: Re: Online machine learning on top of Hama BSP >>>>>>> To: [EMAIL PROTECTED] >>>>>>> >>>>>>> >>>>>>> Should we cooperate with the Mahout guys on this? I'm pretty sure >>>> they >>>>>>> would have fun with it. >>>>>>> Edward, do you want to ask them? >>>>>>> >>>>>>> 2012/5/25 Tommaso Teofili <[EMAIL PROTECTED]> >>>>>>> >>>>>>>> Do you have a plan for that Edward? >>>>>>>> A separate package in examples or a separate (online) machine >>>> learning >>>>>>>> module? Or something else? >>>>>>>> Regards >>>> Best Regards, Edward J. Yoon @eddieyoon +
Edward J. Yoon 2012-05-25, 23:41
-
Re: Online machine learning on top of Hama BSPTed Dunning 2012-05-26, 07:54
On Fri, May 25, 2012 at 11:41 PM, Edward J. Yoon <[EMAIL PROTECTED]>wrote:
> > Compared with Hama, what's the advantage of giraph? probably > > probably mature implementation? :D > Yes. And very active community. And recent history of rapid development. And easy compatibility with map-reduce programs. +
Ted Dunning 2012-05-26, 07:54
-
Re: Online machine learning on top of Hama BSPEdward J. Yoon 2012-05-26, 09:58
Pls stop the matardor Ted.
나의 iPhone에서 보냄 2012. 5. 26. 오후 4:54 Ted Dunning <[EMAIL PROTECTED]> 작성: > On Fri, May 25, 2012 at 11:41 PM, Edward J. Yoon <[EMAIL PROTECTED]>wrote: > >>> Compared with Hama, what's the advantage of giraph? probably >> >> probably mature implementation? :D >> > > Yes. And very active community. And recent history of rapid development. > And easy compatibility with map-reduce programs. +
Edward J. Yoon 2012-05-26, 09:58
-
Re: Online machine learning on top of Hama BSPSuraj Menon 2012-05-26, 11:22
Steering back to relevance, it would be nice to know if there is an
expectation on features and benchmarks for any system to be considered as a platform to implement machine learning algorithms on Mahout. This would be a good input for Hama community. Compared to Hadoop/MapReduce, Hama is young and evidently disruptive eventhough it is and intends to be compatible with Hadoop as much as possible. But if you have any inputs on aforesaid matters, it will be a good direction for our community to test Hama. Thanks, Suraj On Sat, May 26, 2012 at 5:58 AM, Edward J. Yoon <[EMAIL PROTECTED]> wrote: > Pls stop the matardor Ted. > > 나의 iPhone에서 보냄 > > 2012. 5. 26. 오후 4:54 Ted Dunning <[EMAIL PROTECTED]> 작성: > >> On Fri, May 25, 2012 at 11:41 PM, Edward J. Yoon <[EMAIL PROTECTED]>wrote: >> >>>> Compared with Hama, what's the advantage of giraph? probably >>> >>> probably mature implementation? :D >>> >> >> Yes. And very active community. And recent history of rapid development. >> And easy compatibility with map-reduce programs. +
Suraj Menon 2012-05-26, 11:22
-
Re: Online machine learning on top of Hama BSPTed Dunning 2012-05-26, 21:03
The key thing to look for is implementation on a platform that is widely
accepted for practical data mining. We have only recently begun considering Pig as an implementation platform after deciding not to use it before. What has changed is the fairly wide adoption of Pig. On Sat, May 26, 2012 at 11:22 AM, Suraj Menon <[EMAIL PROTECTED]> wrote: > Steering back to relevance, it would be nice to know if there is an > expectation on features and benchmarks for any system to be considered > as a platform to implement machine learning algorithms on Mahout. > +
Ted Dunning 2012-05-26, 21:03
-
Re: Online machine learning on top of Hama BSPRobin Anil 2012-05-27, 16:11
I am confused, what is the actual ask from the Hama community to Mahout
community? Is that a) Port Mahout algorithms to use BSP? b) Rewrite Mahout algorithms to use BSP? c) Argue that Hama is better than Giraph and vice versa? Because the response will depend on what the actual question is? This thread seems to have lost the intended question. ------ Robin Anil On Sat, May 26, 2012 at 4:03 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > The key thing to look for is implementation on a platform that is widely > accepted for practical data mining. > > We have only recently begun considering Pig as an implementation platform > after deciding not to use it before. What has changed is the fairly wide > adoption of Pig. > > On Sat, May 26, 2012 at 11:22 AM, Suraj Menon <[EMAIL PROTECTED]> > wrote: > > > Steering back to relevance, it would be nice to know if there is an > > expectation on features and benchmarks for any system to be considered > > as a platform to implement machine learning algorithms on Mahout. > > > +
Robin Anil 2012-05-27, 16:11
-
Re: Online machine learning on top of Hama BSPSuraj Menon 2012-05-28, 11:40
First of all we would like to mention that the ugly side in this
thread was totally not intended. >From the options you gave, (c) would be a waste of time. The original intention of this thread was to politely check with Mahout community, if it would consider another programming model than Map-Reduce to implement machine learning algorithms. My previous mail was to check if there is any specific feature set (e.g. fault-tolerance, proven scalability, etc.) that is required before Mahout community would consider a new model. But, we do understand now that adoption of a new model could be based on popularity of the system among ML programmers which in turn builds a strong community for that project. Thanks, Suraj On Sun, May 27, 2012 at 12:11 PM, Robin Anil <[EMAIL PROTECTED]> wrote: > I am confused, what is the actual ask from the Hama community to Mahout > community? > > Is that > a) Port Mahout algorithms to use BSP? > b) Rewrite Mahout algorithms to use BSP? > c) Argue that Hama is better than Giraph and vice versa? > > Because the response will depend on what the actual question is? This > thread seems to have lost the intended question. > > > ------ > Robin Anil > > > On Sat, May 26, 2012 at 4:03 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > >> The key thing to look for is implementation on a platform that is widely >> accepted for practical data mining. >> >> We have only recently begun considering Pig as an implementation platform >> after deciding not to use it before. What has changed is the fairly wide >> adoption of Pig. >> >> On Sat, May 26, 2012 at 11:22 AM, Suraj Menon <[EMAIL PROTECTED]> >> wrote: >> >> > Steering back to relevance, it would be nice to know if there is an >> > expectation on features and benchmarks for any system to be considered >> > as a platform to implement machine learning algorithms on Mahout. >> > >> +
Suraj Menon 2012-05-28, 11:40
-
Re: Online machine learning on top of Hama BSPRobin Anil 2012-05-28, 16:08
OK. So say mahout moves to using bsp. There are obviously risks you
mentioned. if possible we need to be abstracting out the underlying execution. So an iterative algorithm should be written using a wrapper library that hides giraph, bsp and map reduce. That's something I think will be attractive to mahout community, because the risks would no longer be there. We would implement any algorithm without betting on the future of any execution model. And it will serve as a place where providers of each execution model will strive to improve benchmarking against a common platform Is this something bsp dev would be willing to push?. Because the way I see it things are stacked in favour of hadoop map reduce. And a common execution library will help bsp push people to go away from map reduce without the risk Robin On May 28, 2012 6:41 AM, "Suraj Menon" <[EMAIL PROTECTED]> wrote: > First of all we would like to mention that the ugly side in this > thread was totally not intended. > From the options you gave, (c) would be a waste of time. > > The original intention of this thread was to politely check with > Mahout community, if it would consider another programming model than > Map-Reduce to implement machine learning algorithms. My previous mail > was to check if there is any specific feature set (e.g. > fault-tolerance, proven scalability, etc.) that is required before > Mahout community would consider a new model. > > But, we do understand now that adoption of a new model could be based > on popularity of the system among ML programmers which in turn builds > a strong community for that project. > > Thanks, > Suraj > > On Sun, May 27, 2012 at 12:11 PM, Robin Anil <[EMAIL PROTECTED]> wrote: > > I am confused, what is the actual ask from the Hama community to Mahout > > community? > > > > Is that > > a) Port Mahout algorithms to use BSP? > > b) Rewrite Mahout algorithms to use BSP? > > c) Argue that Hama is better than Giraph and vice versa? > > > > Because the response will depend on what the actual question is? This > > thread seems to have lost the intended question. > > > > > > ------ > > Robin Anil > > > > > > On Sat, May 26, 2012 at 4:03 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > > >> The key thing to look for is implementation on a platform that is widely > >> accepted for practical data mining. > >> > >> We have only recently begun considering Pig as an implementation > platform > >> after deciding not to use it before. What has changed is the fairly > wide > >> adoption of Pig. > >> > >> On Sat, May 26, 2012 at 11:22 AM, Suraj Menon <[EMAIL PROTECTED]> > >> wrote: > >> > >> > Steering back to relevance, it would be nice to know if there is an > >> > expectation on features and benchmarks for any system to be considered > >> > as a platform to implement machine learning algorithms on Mahout. > >> > > >> > +
Robin Anil 2012-05-28, 16:08
-
Re: Online machine learning on top of Hama BSPSean Owen 2012-05-28, 16:12
Personally -- note, personally -- I think that's a whole other project. I
doubt Mahout will ever be anything but Hadoop-based, plus some sequential / pure Java bits. Or, put another way: that's way too much scope, to span a third (fourth?) computation model, in a project already sprawling. I think this is certainly could, should, just be another project. BSP-based or graph-based ML algorithms. No reason it can't be done by same or similar people or reuse code, etc. It's a good idea. I don't see a reason such a thing has to intersect with Mahout directly. Sean On Mon, May 28, 2012 at 5:08 PM, Robin Anil <[EMAIL PROTECTED]> wrote: > OK. So say mahout moves to using bsp. There are obviously risks you > mentioned. > > if possible we need to be abstracting out the underlying execution. So an > iterative algorithm should be written using a wrapper library that hides > giraph, bsp and map reduce. That's something I think will be attractive to > mahout community, because the risks would no longer be there. We would > implement any algorithm without betting on the future of any execution > model. And it will serve as a place where providers of each execution model > will strive to improve benchmarking against a common platform > > Is this something bsp dev would be willing to push?. Because the way I see > it things are stacked in favour of hadoop map reduce. And a common > execution library will help bsp push people to go away from map reduce > without the risk > > Robin > On May 28, 2012 6:41 AM, "Suraj Menon" <[EMAIL PROTECTED]> wrote: > > > First of all we would like to mention that the ugly side in this > > thread was totally not intended. > > From the options you gave, (c) would be a waste of time. > > > > The original intention of this thread was to politely check with > > Mahout community, if it would consider another programming model than > > Map-Reduce to implement machine learning algorithms. My previous mail > > was to check if there is any specific feature set (e.g. > > fault-tolerance, proven scalability, etc.) that is required before > > Mahout community would consider a new model. > > > > But, we do understand now that adoption of a new model could be based > > on popularity of the system among ML programmers which in turn builds > > a strong community for that project. > > > > Thanks, > > Suraj > > > > On Sun, May 27, 2012 at 12:11 PM, Robin Anil <[EMAIL PROTECTED]> > wrote: > > > I am confused, what is the actual ask from the Hama community to Mahout > > > community? > > > > > > Is that > > > a) Port Mahout algorithms to use BSP? > > > b) Rewrite Mahout algorithms to use BSP? > > > c) Argue that Hama is better than Giraph and vice versa? > > > > > > Because the response will depend on what the actual question is? This > > > thread seems to have lost the intended question. > > > > > > > > > ------ > > > Robin Anil > > > > > > > > > On Sat, May 26, 2012 at 4:03 PM, Ted Dunning <[EMAIL PROTECTED]> > > wrote: > > > > > >> The key thing to look for is implementation on a platform that is > widely > > >> accepted for practical data mining. > > >> > > >> We have only recently begun considering Pig as an implementation > > platform > > >> after deciding not to use it before. What has changed is the fairly > > wide > > >> adoption of Pig. > > >> > > >> On Sat, May 26, 2012 at 11:22 AM, Suraj Menon <[EMAIL PROTECTED]> > > >> wrote: > > >> > > >> > Steering back to relevance, it would be nice to know if there is an > > >> > expectation on features and benchmarks for any system to be > considered > > >> > as a platform to implement machine learning algorithms on Mahout. > > >> > > > >> > > > +
Sean Owen 2012-05-28, 16:12
-
Re: Online machine learning on top of Hama BSPRobin Anil 2012-05-28, 16:17
It doesn't at all come into mahouts goals in anyway. all I am saying is
such a library could reduce the risk of mahout moving to bsp or any other platform. And it is something non-map-reduce devs should try to push if they want ease of adoption. On May 28, 2012 11:12 AM, "Sean Owen" <[EMAIL PROTECTED]> wrote: > Personally -- note, personally -- I think that's a whole other project. I > doubt Mahout will ever be anything but Hadoop-based, plus some sequential / > pure Java bits. Or, put another way: that's way too much scope, to span a > third (fourth?) computation model, in a project already sprawling. > > I think this is certainly could, should, just be another project. BSP-based > or graph-based ML algorithms. No reason it can't be done by same or similar > people or reuse code, etc. It's a good idea. I don't see a reason such a > thing has to intersect with Mahout directly. > > Sean > > On Mon, May 28, 2012 at 5:08 PM, Robin Anil <[EMAIL PROTECTED]> wrote: > > > OK. So say mahout moves to using bsp. There are obviously risks you > > mentioned. > > > > if possible we need to be abstracting out the underlying execution. So an > > iterative algorithm should be written using a wrapper library that hides > > giraph, bsp and map reduce. That's something I think will be attractive > to > > mahout community, because the risks would no longer be there. We would > > implement any algorithm without betting on the future of any execution > > model. And it will serve as a place where providers of each execution > model > > will strive to improve benchmarking against a common platform > > > > Is this something bsp dev would be willing to push?. Because the way I > see > > it things are stacked in favour of hadoop map reduce. And a common > > execution library will help bsp push people to go away from map reduce > > without the risk > > > > Robin > > On May 28, 2012 6:41 AM, "Suraj Menon" <[EMAIL PROTECTED]> wrote: > > > > > First of all we would like to mention that the ugly side in this > > > thread was totally not intended. > > > From the options you gave, (c) would be a waste of time. > > > > > > The original intention of this thread was to politely check with > > > Mahout community, if it would consider another programming model than > > > Map-Reduce to implement machine learning algorithms. My previous mail > > > was to check if there is any specific feature set (e.g. > > > fault-tolerance, proven scalability, etc.) that is required before > > > Mahout community would consider a new model. > > > > > > But, we do understand now that adoption of a new model could be based > > > on popularity of the system among ML programmers which in turn builds > > > a strong community for that project. > > > > > > Thanks, > > > Suraj > > > > > > On Sun, May 27, 2012 at 12:11 PM, Robin Anil <[EMAIL PROTECTED]> > > wrote: > > > > I am confused, what is the actual ask from the Hama community to > Mahout > > > > community? > > > > > > > > Is that > > > > a) Port Mahout algorithms to use BSP? > > > > b) Rewrite Mahout algorithms to use BSP? > > > > c) Argue that Hama is better than Giraph and vice versa? > > > > > > > > Because the response will depend on what the actual question is? This > > > > thread seems to have lost the intended question. > > > > > > > > > > > > ------ > > > > Robin Anil > > > > > > > > > > > > On Sat, May 26, 2012 at 4:03 PM, Ted Dunning <[EMAIL PROTECTED]> > > > wrote: > > > > > > > >> The key thing to look for is implementation on a platform that is > > widely > > > >> accepted for practical data mining. > > > >> > > > >> We have only recently begun considering Pig as an implementation > > > platform > > > >> after deciding not to use it before. What has changed is the fairly > > > wide > > > >> adoption of Pig. > > > >> > > > >> On Sat, May 26, 2012 at 11:22 AM, Suraj Menon < > [EMAIL PROTECTED]> > > > >> wrote: > > > >> > > > >> > Steering back to relevance, it would be nice to know if there is > an > > > +
Robin Anil 2012-05-28, 16:17
-
Re: Online machine learning on top of Hama BSPThomas Jungblut 2012-05-26, 09:26
Hi Ted,
please keep this factual, we are not here to start a flame war. But to correct you, if you take a closter look at the mailing list statistics [1]: hama-commits: 1.51 mails per day (AVG) Opposed to giraph: giraph-commits: 0.68 mails per day (AVG) So we have a more faster development than giraph. Also we work on top of HDFS, so you can combine mapreduce jobs with BSP jobs easily. We are just not running inside of MapReduce, these things will neglect anyways when YARN has a stable release. Currently Hama can operate on YARN with it's on ApplicationMaster whereas Giraph still needs to be on top of MapReduce. Now to you Sebastian, > Interesting discussion, which examples do you have in mind that might be > easier representable in general BSP than in Giraph/Pregel? straight forward translations from MPI for example. Someone of us is currently working on a SVM implementation in BSP, which originally was based on MPI.[2] We would love to have this contributed to mahout, but if Ted is not interested in Hama we will put this in our modules. Also there are graph problems that need major supervision like Top-K Shortest Paths, which cannot be easily expressed with aggregators. We have benchmarks showing the scalability and maturity of Hama [3] and would be glad to roll out to several other Apache projects. BTW it would be cool if we could compare the performance of your k-means in MapReduce with that of our BSP version, you see the benchmark in [3] as well. Actually that was not why were are here, we wanted to hear some general interest in real-time recommendation with Hama since all the ML guys are here. Even if Ted is a fanboy of giraph ;) Regards from Berlin, Thomas [1] http://pulse.apache.org/#incubator.apache.org [2] http://code.google.com/p/psvm/ [3] http://wiki.apache.org/hama/Benchmarks 2012/5/26 Ted Dunning <[EMAIL PROTECTED]> > On Fri, May 25, 2012 at 11:41 PM, Edward J. Yoon <[EMAIL PROTECTED] > >wrote: > > > > Compared with Hama, what's the advantage of giraph? probably > > > > probably mature implementation? :D > > > > Yes. And very active community. And recent history of rapid development. > And easy compatibility with map-reduce programs. > -- Thomas Jungblut Berlin <[EMAIL PROTECTED]> +
Thomas Jungblut 2012-05-26, 09:26
-
Re: Online machine learning on top of Hama BSPSebastian Schelter 2012-05-26, 12:05
Hi Thomas,
I think that none of us wants to start a flame war here. As a disclaimer I have to remark that I'm biased towards Giraph as well because besides my engagement at Mahout, I'm committer and PMC member of Giraph. Regarding commit statistics: a single commit can correct a comment or rewrite a whole layer of an application, so looking at the raw number of commits is useless. In my personal opinion, Mahout will have to move away from Hadoop/MapReduce for a lot of problems. The question which alternative execution model to integrate is a hard one, as well as deciding when this should happen. The answer to that question will determine the future of Mahout, and a discussion about this should be unagitated. I think the real question is whether BSP itself is the optimal execution model (regardless of the flavor of implementation) or whether Mahout should better wait for a viable implementation of an asynchronous execution model similar to what is implemented in GraphLab. --sebastian On 26.05.2012 11:26, Thomas Jungblut wrote: > Hi Ted, > > please keep this factual, we are not here to start a flame war. > But to correct you, if you take a closter look at the mailing list > statistics [1]: > hama-commits: 1.51 mails per day (AVG) > Opposed to giraph: > giraph-commits: 0.68 mails per day (AVG) > So we have a more faster development than giraph. > Also we work on top of HDFS, so you can combine mapreduce jobs with BSP > jobs easily. > We are just not running inside of MapReduce, these things will neglect > anyways when YARN has a stable release. > Currently Hama can operate on YARN with it's on ApplicationMaster whereas > Giraph still needs to be on top of MapReduce. > > Now to you Sebastian, > >> Interesting discussion, which examples do you have in mind that might be >> easier representable in general BSP than in Giraph/Pregel? > > > straight forward translations from MPI for example. Someone of us is > currently working on a SVM implementation in BSP, which originally was > based on MPI.[2] > We would love to have this contributed to mahout, but if Ted is not > interested in Hama we will put this in our modules. > Also there are graph problems that need major supervision like Top-K > Shortest Paths, which cannot be easily expressed with aggregators. > > We have benchmarks showing the scalability and maturity of Hama [3] and > would be glad to roll out to several other Apache projects. > BTW it would be cool if we could compare the performance of your k-means in > MapReduce with that of our BSP version, you see the benchmark in [3] as > well. > > Actually that was not why were are here, we wanted to hear some general > interest in real-time recommendation with Hama since all the ML guys are > here. Even if Ted is a fanboy of giraph ;) > > Regards from Berlin, > Thomas > > [1] http://pulse.apache.org/#incubator.apache.org > [2] http://code.google.com/p/psvm/ > [3] http://wiki.apache.org/hama/Benchmarks > > > 2012/5/26 Ted Dunning <[EMAIL PROTECTED]> > >> On Fri, May 25, 2012 at 11:41 PM, Edward J. Yoon <[EMAIL PROTECTED] >>> wrote: >> >>>> Compared with Hama, what's the advantage of giraph? probably >>> >>> probably mature implementation? :D >>> >> >> Yes. And very active community. And recent history of rapid development. >> And easy compatibility with map-reduce programs. >> > > > +
Sebastian Schelter 2012-05-26, 12:05
-
Re: Online machine learning on top of Hama BSPTed Dunning 2012-05-26, 20:55
These speeds are not far from what the new streaming k-means achieves
except that instead of 16 nodes it reaches those speeds (1 million points in 20 seconds at 10 dimensions) on a single node. This is with a trivially parallel algorithm with no need for iteration. Running this under Hadoop would incur the normal startup costs (10-20 seconds with MapR), but otherwise should run at the same speed adjusted for node count. See https://github.com/tdunning/knn/tree/master/docs for more info on this clustering algorithm. On Sat, May 26, 2012 at 9:26 AM, Thomas Jungblut < [EMAIL PROTECTED]> wrote: > We have benchmarks showing the scalability and maturity of Hama [3] and > would be glad to roll out to several other Apache projects. > BTW it would be cool if we could compare the performance of your k-means in > MapReduce with that of our BSP version, you see the benchmark in [3] as > well. > > Actually that was not why were are here, we wanted to hear some general > interest in real-time recommendation with Hama since all the ML guys are > here. Even if Ted is a fanboy of giraph ;) > > Regards from Berlin, > Thomas > > [1] http://pulse.apache.org/#incubator.apache.org > [2] http://code.google.com/p/psvm/ > [3] http://wiki.apache.org/hama/Benchmarks > +
Ted Dunning 2012-05-26, 20:55
|