Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - Decomposing large graphs


Copy link to this message
-
Re: Decomposing large graphs
Sebastian Schelter 2011-12-17, 22:58
On 17.12.2011 17:27, Dmitriy Lyubimov wrote:
> Interesting.
>
> Well so how did your decomposing go?

I tested the decomposition of the wikipedia pagelink graph (130M edges,
5.6M vertices making approx. quarter of a billion non-zeros in the
symmetric adjacency matrix) on a 6 machine hadoop cluster.

Got these running times for k = 10, p = 5 and one power-iteration:

Q-job 1mins, 41sec
Bt-job 9mins, 30sec
ABt-job 37mins, 22sec
Bt-job 9mins, 41sec
U-job 30sec

I think I'd need a couple more machines to handle the twitter graph
though...

--sebastian
> On Dec 17, 2011 6:00 AM, "Sebastian Schelter" <[EMAIL PROTECTED]>
> wrote:
>
>> Hi there,
>>
>> I played with Mahout to decompose the adjacency matrices of large graphs
>> lately. I stumbled on a paper of Christos Faloutsos that describes a
>> variation of the Lanczos algorithm they use for this on top of Hadoop.
>> They even explicitly mention Mahout:
>>
>> "Very recently(March 2010), the Mahout project [2] provides
>> SVD on top of HADOOP. Due to insufficient documentation, we were not
>> able to find the input format and run a head-to-head comparison. But,
>> reading the source code, we discovered that Mahout suffers from two
>> major issues: (a) it assumes that the vector (b, with n=O(billion)
>> entries) fits in the memory of a single machine, and (b) it implements
>> the full re-orthogonalization which is inefficient."
>>
>> http://www.cs.cmu.edu/~ukang/papers/HeigenPAKDD2011.pdf
>>
>> --sebastian
>>
>