Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - Decomposing large graphs


Copy link to this message
-
Re: Decomposing large graphs
Sebastian Schelter 2011-12-19, 10:00
Hi Dmitriy,

I finally found the problem... I did not set --reduceTasks, I ran the
job again with 24 reduce tasks and was able to run the whole
decomposition in less than 10 minutes. Sorry for causing you so much work.

--sebastian

hadoop jar mahout-core-0.6-SNAPSHOT-job.jar
org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli
-Dmapred.tasktracker.map.tasks.maximum=8
-Dmapred.tasktracker.reduce.tasks.maximum=6
-Dmapred.child.java.opts=-Xmx1024m --input
hdfs:///ssc/wiki-eig15-tmp/adjacencyMatrix --output hdfs:///ssc/M922-7
--tempDir hdfs:///ssc/M922-tmp-7 --rank 10 --oversampling 5 --computeV
false --powerIter 1 --reduceTasks 24

Q-Job 1mins, 49sec
Bt-Job 2mins, 39sec
ABt-Job 3mins, 51sec
Bt-Job 2mins, 41sec
U-Job 29sec

On 19.12.2011 09:09, Dmitriy Lyubimov wrote:
> Ok reducer then, hm. Then it's probably not the memory, for the
> pressure holds high mostly in the mapper only. Reducer actually
> doesn't use much memory at all. The problem i previously had was with
> memory pressure and load time in the AB' mappers but reducers were
> always ok, copy+sort+run has always taken about  a minute there for
> this input, nothing changed much there still. --> just in case, please
> check how many records map phase outputs. If it is more than 1 per
> mapper, please note by how much and increase -abth option to say
> 300,000 or until you don't see the multiple records output by map
> tasks. it should be ok, it is still roughly 24mb for everything for
> k+p=15 of your case ( (k+p) x abth x 8bytes packed as long double[]
> swathes, so it doesn't take any object overhead so much). Although
> with 922 patch it should not matter much, that was the point.
>
> If you can, please notice if it is a sort, copy or the reducer itself?
> If it is the sort, you may need to tweak sort buffer size, especially
> if you didn't set up too many reducers per job (but i assume you set
> num reducers at at least 20). By default, those buffers are just 100m
> which may turn out below block size, i am not sure how exactly it
> would work out in this case then. i think i bumped that up a little
> some long time ago.
>
>
> What else may be different different? i am at CDH3u0 here. your input
> seems to be 25% wider, which may affect AB' job map time in particular
> by at most that, which in my case would constitute about 15s
> difference, but hopefully not so much anything else.
>
> My setup is actually pretty meager, we have 10 quad-cpu nodes with 16G
> RAM each and they are already running hbase on top of it. which is why
> i mostly avoid running jobs much over 500m mem there, or i start
> scratching something on the disk. So memory doesn't seem to be the
> factor. (But my cluster is a barebone assembly sitting in the same
> rack which is a big plus). I also currently allow 4x4 map/reduce tasks
> per node.
>
> if you can point me where i can download your input exactly (without
> or with minimum prep steps, i think i can't spend much energy on that
> while at the office), i may try to try on your exact input as well.
>
>
> ------
> On a side note i found an error with my work on broadcasting via
> distributed cache. After patching it up, i actually see no difference
> whatsoever between direct streaming and using distributed cache, just
> like i initially figured. My timing for AB' multiplication revolves
> around 3 min 10 s, distributed cache or streaming. Since i did that
> patch anyway, i will be offering it for commit but i will probably
> have to put it on the review board as it touches some iterator stuff
> that Sean previously crafted. the branch name with broadcast option
> enabled is called MAHOUT-922-2.
>
>
>
> On Sun, Dec 18, 2011 at 12:05 AM, Sebastian Schelter <[EMAIL PROTECTED]> wrote:
>> I ran it with 8 map and 4 reduce tasks allowed per machine using
>> -Dmapred.child.java.opts=-Xmx1024m (each machine has 32GB RAM).
>>
>> Most of the time was spent in QRReducer not in ABtMapper. I was promised
>> to get better monitoring next week, I'll rerun the tests and hopefully