|
WangRamon
2012-03-10, 15:38
Paritosh Ranjan
2012-03-11, 03:40
Paritosh Ranjan
2012-03-11, 03:44
WangRamon
2012-03-11, 03:55
Jeff Eastman
2012-03-11, 05:31
WangRamon
2012-03-11, 05:34
Paritosh Ranjan
2012-03-11, 07:48
WangRamon
2012-03-11, 11:19
Paritosh Ranjan
2012-03-12, 06:46
|
-
Not all Mapper/Reducer slots are taken when running K-Means clusterWangRamon 2012-03-10, 15:38
Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have 42 map and 42 reduce slots configured, I set the default reduce task per job as 73 (42 * 1.75), I find there are always about 12 of the reduce tasks are running at any time although there are 73 reduce tasks created for each of the K-Means job and i do have 42 reduce slots, it means at anytime i have about 30 reduce slots free. So i tried RecommenderJob from mahout again, i remember that job will use all my slots in my previouse test, and YES for this time, "RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42 reduce and 42 map, so I'm wondering is that something configured in Mahout which cause this strange behavior? Any suggestions? Thanks in advance. Btw, i'm using mahout-0.6 release. Cheers Ramon
-
Re: Not all Mapper/Reducer slots are taken when running K-Means clusterParitosh Ranjan 2012-03-11, 03:40
Can you run K-means jobs again ( all with the same block size ) and give
same statistics for : a) only 1 job running b) 2 jobs running simultaneously c) 5 jobs running simultaneously On 10-03-2012 21:08, WangRamon wrote: > > > > Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have 42 map and 42 reduce slots configured, I set the default reduce task per job as 73 (42 * 1.75), I find there are always about 12 of the reduce tasks are running at any time although there are 73 reduce tasks created for each of the K-Means job and i do have 42 reduce slots, it means at anytime i have about 30 reduce slots free. So i tried RecommenderJob from mahout again, i remember that job will use all my slots in my previouse test, and YES for this time, "RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42 reduce and 42 map, so I'm wondering is that something configured in Mahout which cause this strange behavior? Any suggestions? Thanks in advance. Btw, i'm using mahout-0.6 release. Cheers Ramon
-
Re: Not all Mapper/Reducer slots are taken when running K-Means clusterParitosh Ranjan 2012-03-11, 03:44
And to answer the question about KMeans configuration :
Kmeans has two jobs : 1) builClusters : has a reducer and has no limitation on the number of reducer tasks 2) clusterData : executes if runClustering = true, has no reducer tasks On 11-03-2012 09:10, Paritosh Ranjan wrote: > Can you run K-means jobs again ( all with the same block size ) and give > same statistics for : > > a) only 1 job running > b) 2 jobs running simultaneously > c) 5 jobs running simultaneously > > On 10-03-2012 21:08, WangRamon wrote: >> >> >> Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have 42 map and 42 reduce slots configured, I set the default reduce task per job as 73 (42 * 1.75), I find there are always about 12 of the reduce tasks are running at any time although there are 73 reduce tasks created for each of the K-Means job and i do have 42 reduce slots, it means at anytime i have about 30 reduce slots free. So i tried RecommenderJob from mahout again, i remember that job will use all my slots in my previouse test, and YES for this time, "RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42 reduce and 42 map, so I'm wondering is that something configured in Mahout which cause this strange behavior? Any suggestions? Thanks in advance. Btw, i'm using mahout-0.6 release. Cheers Ramon
-
RE: Not all Mapper/Reducer slots are taken when running K-Means clusterWangRamon 2012-03-11, 03:55
Hi Paritosh I did the tests with 1 job and 5 jobs, they all have the same problem, the job i'm running is the buildClusters one, I can see there are 73 reduce tasks created from the monitor GUI, but only 12 of them are running at any time (the rest are in pending state), the task finished very quickly, it's about no more than 18 seconds to finish every reduce task, so maybe that's the cause? Thanks Cheers Ramon > Date: Sun, 11 Mar 2012 09:14:15 +0530 > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means cluster > > And to answer the question about KMeans configuration : > > Kmeans has two jobs : > 1) builClusters : has a reducer and has no limitation on the number of > reducer tasks > 2) clusterData : executes if runClustering = true, has no reducer tasks > > On 11-03-2012 09:10, Paritosh Ranjan wrote: > > Can you run K-means jobs again ( all with the same block size ) and give > > same statistics for : > > > > a) only 1 job running > > b) 2 jobs running simultaneously > > c) 5 jobs running simultaneously > > > > On 10-03-2012 21:08, WangRamon wrote: > >> > >> > >> Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have 42 map and 42 reduce slots configured, I set the default reduce task per job as 73 (42 * 1.75), I find there are always about 12 of the reduce tasks are running at any time although there are 73 reduce tasks created for each of the K-Means job and i do have 42 reduce slots, it means at anytime i have about 30 reduce slots free. So i tried RecommenderJob from mahout again, i remember that job will use all my slots in my previouse test, and YES for this time, "RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42 reduce and 42 map, so I'm wondering is that something configured in Mahout which cause this strange behavior? Any suggestions? Thanks in advance. Btw, i'm using mahout-0.6 release. Cheers Ramon >
-
Re: Not all Mapper/Reducer slots are taken when running K-Means clusterJeff Eastman 2012-03-11, 05:31
What's your Hadoop config in terms of the maximum number of reducers?
It's a function of your available RAM on each node and numbers of nodes. On 3/10/12 8:55 PM, WangRamon wrote: > Hi Paritosh I did the tests with 1 job and 5 jobs, they all have the same problem, the job i'm running is the buildClusters one, I can see there are 73 reduce tasks created from the monitor GUI, but only 12 of them are running at any time (the rest are in pending state), the task finished very quickly, it's about no more than 18 seconds to finish every reduce task, so maybe that's the cause? Thanks Cheers Ramon > > Date: Sun, 11 Mar 2012 09:14:15 +0530 >> From: [EMAIL PROTECTED] >> To: [EMAIL PROTECTED] >> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means cluster >> >> And to answer the question about KMeans configuration : >> >> Kmeans has two jobs : >> 1) builClusters : has a reducer and has no limitation on the number of >> reducer tasks >> 2) clusterData : executes if runClustering = true, has no reducer tasks >> >> On 11-03-2012 09:10, Paritosh Ranjan wrote: >>> Can you run K-means jobs again ( all with the same block size ) and give >>> same statistics for : >>> >>> a) only 1 job running >>> b) 2 jobs running simultaneously >>> c) 5 jobs running simultaneously >>> >>> On 10-03-2012 21:08, WangRamon wrote: >>>> >>>> Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have 42 map and 42 reduce slots configured, I set the default reduce task per job as 73 (42 * 1.75), I find there are always about 12 of the reduce tasks are running at any time although there are 73 reduce tasks created for each of the K-Means job and i do have 42 reduce slots, it means at anytime i have about 30 reduce slots free. So i tried RecommenderJob from mahout again, i remember that job will use all my slots in my previouse test, and YES for this time, "RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42 reduce and 42 map, so I'm wondering is that something configured in Mahout which cause this strange behavior? Any suggestions? Thanks in advance. Btw, i'm using mahout-0.6 release. Cheers Ramon >
-
RE: Not all Mapper/Reducer slots are taken when running K-Means clusterWangRamon 2012-03-11, 05:34
Here is the configuration: <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>14</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>14</value> </property> <property> <name>mapred.reduce.tasks</name> <value>73</value> </property> Each node has a RAM of 32GB, i think it should be fine to have the above configuartion. > Date: Sat, 10 Mar 2012 22:31:44 -0700 > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means cluster > > What's your Hadoop config in terms of the maximum number of reducers? > It's a function of your available RAM on each node and numbers of nodes. > > On 3/10/12 8:55 PM, WangRamon wrote: > > Hi Paritosh I did the tests with 1 job and 5 jobs, they all have the same problem, the job i'm running is the buildClusters one, I can see there are 73 reduce tasks created from the monitor GUI, but only 12 of them are running at any time (the rest are in pending state), the task finished very quickly, it's about no more than 18 seconds to finish every reduce task, so maybe that's the cause? Thanks Cheers Ramon > > > Date: Sun, 11 Mar 2012 09:14:15 +0530 > >> From: [EMAIL PROTECTED] > >> To: [EMAIL PROTECTED] > >> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means cluster > >> > >> And to answer the question about KMeans configuration : > >> > >> Kmeans has two jobs : > >> 1) builClusters : has a reducer and has no limitation on the number of > >> reducer tasks > >> 2) clusterData : executes if runClustering = true, has no reducer tasks > >> > >> On 11-03-2012 09:10, Paritosh Ranjan wrote: > >>> Can you run K-means jobs again ( all with the same block size ) and give > >>> same statistics for : > >>> > >>> a) only 1 job running > >>> b) 2 jobs running simultaneously > >>> c) 5 jobs running simultaneously > >>> > >>> On 10-03-2012 21:08, WangRamon wrote: > >>>> > >>>> Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have 42 map and 42 reduce slots configured, I set the default reduce task per job as 73 (42 * 1.75), I find there are always about 12 of the reduce tasks are running at any time although there are 73 reduce tasks created for each of the K-Means job and i do have 42 reduce slots, it means at anytime i have about 30 reduce slots free. So i tried RecommenderJob from mahout again, i remember that job will use all my slots in my previouse test, and YES for this time, "RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42 reduce and 42 map, so I'm wondering is that something configured in Mahout which cause this strange behavior? Any suggestions? Thanks in advance. Btw, i'm using mahout-0.6 release. Cheers Ramon > > >
-
Re: Not all Mapper/Reducer slots are taken when running K-Means clusterParitosh Ranjan 2012-03-11, 07:48
Can you try reducing/increasing you block and see the impact?
I am suspecting block size to be the problem. I have faced the same problem once ( for a different hadoop job, and it was very hard to debug it ). In that case, CompositeInputFormat was being used as input, which used to fix the block size to 64 MB, and hence, only few reducers were activated. So, trying different block sizes might give some clue. On 11-03-2012 11:04, WangRamon wrote: > Here is the configuration: <property> > <name>mapred.tasktracker.map.tasks.maximum</name> > <value>14</value> > </property> > <property> > <name>mapred.tasktracker.reduce.tasks.maximum</name> > <value>14</value> > </property> > <property> > <name>mapred.reduce.tasks</name> > <value>73</value> > </property> > > Each node has a RAM of 32GB, i think it should be fine to have the above configuartion. > > Date: Sat, 10 Mar 2012 22:31:44 -0700 >> From: [EMAIL PROTECTED] >> To: [EMAIL PROTECTED] >> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means cluster >> >> What's your Hadoop config in terms of the maximum number of reducers? >> It's a function of your available RAM on each node and numbers of nodes. >> >> On 3/10/12 8:55 PM, WangRamon wrote: >>> Hi Paritosh I did the tests with 1 job and 5 jobs, they all have the same problem, the job i'm running is the buildClusters one, I can see there are 73 reduce tasks created from the monitor GUI, but only 12 of them are running at any time (the rest are in pending state), the task finished very quickly, it's about no more than 18 seconds to finish every reduce task, so maybe that's the cause? Thanks Cheers Ramon >>> > Date: Sun, 11 Mar 2012 09:14:15 +0530 >>>> From: [EMAIL PROTECTED] >>>> To: [EMAIL PROTECTED] >>>> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means cluster >>>> >>>> And to answer the question about KMeans configuration : >>>> >>>> Kmeans has two jobs : >>>> 1) builClusters : has a reducer and has no limitation on the number of >>>> reducer tasks >>>> 2) clusterData : executes if runClustering = true, has no reducer tasks >>>> >>>> On 11-03-2012 09:10, Paritosh Ranjan wrote: >>>>> Can you run K-means jobs again ( all with the same block size ) and give >>>>> same statistics for : >>>>> >>>>> a) only 1 job running >>>>> b) 2 jobs running simultaneously >>>>> c) 5 jobs running simultaneously >>>>> >>>>> On 10-03-2012 21:08, WangRamon wrote: >>>>>> Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have 42 map and 42 reduce slots configured, I set the default reduce task per job as 73 (42 * 1.75), I find there are always about 12 of the reduce tasks are running at any time although there are 73 reduce tasks created for each of the K-Means job and i do have 42 reduce slots, it means at anytime i have about 30 reduce slots free. So i tried RecommenderJob from mahout again, i remember that job will use all my slots in my previouse test, and YES for this time, "RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42 reduce and 42 map, so I'm wondering is that something configured in Mahout which cause this strange behavior? Any suggestions? Thanks in advance. Btw, i'm using mahout-0.6 release. Cheers Ramon >>> >
-
RE: Not all Mapper/Reducer slots are taken when running K-Means clusterWangRamon 2012-03-11, 11:19
Hi Paritosh I think the block size may be the problem too, btw, do you mean the block size of the HDFS? I know its default size is 64MB, but I haven't tried some other size. Thanks Ramon> Date: Sun, 11 Mar 2012 13:18:52 +0530 > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means cluster > > Can you try reducing/increasing you block and see the impact? > I am suspecting block size to be the problem. > > I have faced the same problem once ( for a different hadoop job, and it > was very hard to debug it ). In that case, CompositeInputFormat was > being used as input, which used to fix the block size to 64 MB, and > hence, only few reducers were activated. So, trying different block > sizes might give some clue. > > On 11-03-2012 11:04, WangRamon wrote: > > Here is the configuration: <property> > > <name>mapred.tasktracker.map.tasks.maximum</name> > > <value>14</value> > > </property> > > <property> > > <name>mapred.tasktracker.reduce.tasks.maximum</name> > > <value>14</value> > > </property> > > <property> > > <name>mapred.reduce.tasks</name> > > <value>73</value> > > </property> > > > > Each node has a RAM of 32GB, i think it should be fine to have the above configuartion. > > > Date: Sat, 10 Mar 2012 22:31:44 -0700 > >> From: [EMAIL PROTECTED] > >> To: [EMAIL PROTECTED] > >> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means cluster > >> > >> What's your Hadoop config in terms of the maximum number of reducers? > >> It's a function of your available RAM on each node and numbers of nodes. > >> > >> On 3/10/12 8:55 PM, WangRamon wrote: > >>> Hi Paritosh I did the tests with 1 job and 5 jobs, they all have the same problem, the job i'm running is the buildClusters one, I can see there are 73 reduce tasks created from the monitor GUI, but only 12 of them are running at any time (the rest are in pending state), the task finished very quickly, it's about no more than 18 seconds to finish every reduce task, so maybe that's the cause? Thanks Cheers Ramon > >>> > Date: Sun, 11 Mar 2012 09:14:15 +0530 > >>>> From: [EMAIL PROTECTED] > >>>> To: [EMAIL PROTECTED] > >>>> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means cluster > >>>> > >>>> And to answer the question about KMeans configuration : > >>>> > >>>> Kmeans has two jobs : > >>>> 1) builClusters : has a reducer and has no limitation on the number of > >>>> reducer tasks > >>>> 2) clusterData : executes if runClustering = true, has no reducer tasks > >>>> > >>>> On 11-03-2012 09:10, Paritosh Ranjan wrote: > >>>>> Can you run K-means jobs again ( all with the same block size ) and give > >>>>> same statistics for : > >>>>> > >>>>> a) only 1 job running > >>>>> b) 2 jobs running simultaneously > >>>>> c) 5 jobs running simultaneously > >>>>> > >>>>> On 10-03-2012 21:08, WangRamon wrote: > >>>>>> Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have 42 map and 42 reduce slots configured, I set the default reduce task per job as 73 (42 * 1.75), I find there are always about 12 of the reduce tasks are running at any time although there are 73 reduce tasks created for each of the K-Means job and i do have 42 reduce slots, it means at anytime i have about 30 reduce slots free. So i tried RecommenderJob from mahout again, i remember that job will use all my slots in my previouse test, and YES for this time, "RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42 reduce and 42 map, so I'm wondering is that something configured in Mahout which cause this strange behavior? Any suggestions? Thanks in advance. Btw, i'm using mahout-0.6 release. Cheers Ramon > >>> > > >
-
Re: Not all Mapper/Reducer slots are taken when running K-Means clusterParitosh Ranjan 2012-03-12, 06:46
Yes, I mean the block size of HDFS. Since there is a combiner in picture
in buildClusters, so, there might not be enough rows to process for the reduce tasks. Just a wild guess.You can also try with a larger input data. On 11-03-2012 16:49, WangRamon wrote: > Hi Paritosh I think the block size may be the problem too, btw, do you mean the block size of the HDFS? I know its default size is 64MB, but I haven't tried some other size. Thanks Ramon> Date: Sun, 11 Mar 2012 13:18:52 +0530 >> From: [EMAIL PROTECTED] >> To: [EMAIL PROTECTED] >> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means cluster >> >> Can you try reducing/increasing you block and see the impact? >> I am suspecting block size to be the problem. >> >> I have faced the same problem once ( for a different hadoop job, and it >> was very hard to debug it ). In that case, CompositeInputFormat was >> being used as input, which used to fix the block size to 64 MB, and >> hence, only few reducers were activated. So, trying different block >> sizes might give some clue. >> >> On 11-03-2012 11:04, WangRamon wrote: >>> Here is the configuration: <property> >>> <name>mapred.tasktracker.map.tasks.maximum</name> >>> <value>14</value> >>> </property> >>> <property> >>> <name>mapred.tasktracker.reduce.tasks.maximum</name> >>> <value>14</value> >>> </property> >>> <property> >>> <name>mapred.reduce.tasks</name> >>> <value>73</value> >>> </property> >>> >>> Each node has a RAM of 32GB, i think it should be fine to have the above configuartion. >>> > Date: Sat, 10 Mar 2012 22:31:44 -0700 >>>> From: [EMAIL PROTECTED] >>>> To: [EMAIL PROTECTED] >>>> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means cluster >>>> >>>> What's your Hadoop config in terms of the maximum number of reducers? >>>> It's a function of your available RAM on each node and numbers of nodes. >>>> >>>> On 3/10/12 8:55 PM, WangRamon wrote: >>>>> Hi Paritosh I did the tests with 1 job and 5 jobs, they all have the same problem, the job i'm running is the buildClusters one, I can see there are 73 reduce tasks created from the monitor GUI, but only 12 of them are running at any time (the rest are in pending state), the task finished very quickly, it's about no more than 18 seconds to finish every reduce task, so maybe that's the cause? Thanks Cheers Ramon >>>>> > Date: Sun, 11 Mar 2012 09:14:15 +0530 >>>>>> From: [EMAIL PROTECTED] >>>>>> To: [EMAIL PROTECTED] >>>>>> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means cluster >>>>>> >>>>>> And to answer the question about KMeans configuration : >>>>>> >>>>>> Kmeans has two jobs : >>>>>> 1) builClusters : has a reducer and has no limitation on the number of >>>>>> reducer tasks >>>>>> 2) clusterData : executes if runClustering = true, has no reducer tasks >>>>>> >>>>>> On 11-03-2012 09:10, Paritosh Ranjan wrote: >>>>>>> Can you run K-means jobs again ( all with the same block size ) and give >>>>>>> same statistics for : >>>>>>> >>>>>>> a) only 1 job running >>>>>>> b) 2 jobs running simultaneously >>>>>>> c) 5 jobs running simultaneously >>>>>>> >>>>>>> On 10-03-2012 21:08, WangRamon wrote: >>>>>>>> Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have 42 map and 42 reduce slots configured, I set the default reduce task per job as 73 (42 * 1.75), I find there are always about 12 of the reduce tasks are running at any time although there are 73 reduce tasks created for each of the K-Means job and i do have 42 reduce slots, it means at anytime i have about 30 reduce slots free. So i tried RecommenderJob from mahout again, i remember that job will use all my slots in my previouse test, and YES for this time, "RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42 reduce and 42 map, so I'm wondering is that something configured in Mahout which cause this strange behavior? Any suggestions? Thanks in advance. Btw, i'm using mahout-0.6 release. Cheers Ramon |