Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Mahout, mail # user - Re: Decomposing large graphs


+
Dmitriy Lyubimov 2011-12-19, 15:58
+
Sebastian Schelter 2011-12-17, 14:00
+
Ted Dunning 2011-12-17, 20:40
+
Dmitriy Lyubimov 2011-12-17, 16:27
Copy link to this message
-
Re: Decomposing large graphs
Sebastian Schelter 2011-12-17, 22:58
On 17.12.2011 17:27, Dmitriy Lyubimov wrote:
> Interesting.
>
> Well so how did your decomposing go?

I tested the decomposition of the wikipedia pagelink graph (130M edges,
5.6M vertices making approx. quarter of a billion non-zeros in the
symmetric adjacency matrix) on a 6 machine hadoop cluster.

Got these running times for k = 10, p = 5 and one power-iteration:

Q-job 1mins, 41sec
Bt-job 9mins, 30sec
ABt-job 37mins, 22sec
Bt-job 9mins, 41sec
U-job 30sec

I think I'd need a couple more machines to handle the twitter graph
though...

--sebastian
> On Dec 17, 2011 6:00 AM, "Sebastian Schelter" <[EMAIL PROTECTED]>
> wrote:
>
>> Hi there,
>>
>> I played with Mahout to decompose the adjacency matrices of large graphs
>> lately. I stumbled on a paper of Christos Faloutsos that describes a
>> variation of the Lanczos algorithm they use for this on top of Hadoop.
>> They even explicitly mention Mahout:
>>
>> "Very recently(March 2010), the Mahout project [2] provides
>> SVD on top of HADOOP. Due to insufficient documentation, we were not
>> able to find the input format and run a head-to-head comparison. But,
>> reading the source code, we discovered that Mahout suffers from two
>> major issues: (a) it assumes that the vector (b, with n=O(billion)
>> entries) fits in the memory of a single machine, and (b) it implements
>> the full re-orthogonalization which is inefficient."
>>
>> http://www.cs.cmu.edu/~ukang/papers/HeigenPAKDD2011.pdf
>>
>> --sebastian
>>
>
+
Dmitriy Lyubimov 2011-12-18, 03:24
+
Dmitriy Lyubimov 2011-12-18, 03:27
+
Sebastian Schelter 2011-12-18, 08:05
+
Dmitriy Lyubimov 2011-12-19, 08:09
+
Sebastian Schelter 2011-12-19, 10:00
+
Dmitriy Lyubimov 2011-12-19, 16:14
+
Sebastian Schelter 2011-12-19, 16:27
+
Dmitriy Lyubimov 2011-12-19, 16:34
+
Dmitriy Lyubimov 2011-12-19, 17:48
+
Sebastian Schelter 2011-12-19, 14:15
+
Dmitriy Lyubimov 2011-12-19, 16:07
+
Dmitriy Lyubimov 2011-12-17, 23:04
+
Dmitriy Lyubimov 2011-12-17, 23:13
+
Dmitriy Lyubimov 2011-12-17, 23:25
+
Dmitriy Lyubimov 2011-12-18, 01:49