|
Vincent Xue
2011-06-26, 20:47
Ted Dunning
2011-06-26, 21:29
Vincent Xue
2011-06-26, 22:13
Lance Norskog
2011-06-26, 23:46
Danny Bickson
2011-06-26, 23:54
Jake Mannix
2011-06-27, 01:40
Ted Dunning
2011-06-27, 01:51
Jake Mannix
2011-06-27, 03:34
Sean Owen
2011-06-27, 10:55
Jeff Hansen
2011-06-28, 15:31
Vincent Xue
2011-06-28, 15:46
|
-
An inmemory sparse matrix multiplierVincent Xue 2011-06-26, 20:47
Hi.
I was wondering how useful an in memory sparse matrix multiplier would be for mahout. In my current project I needed to multiply many large sparse matrices but submitting hundreds of jobs to the cluster creates too much overhead. I wrote up an implementation of sparse matrix multiplication using threads which can multiply a 30,000 x 48,0000 matrix by its transpose in about 5 minutes using 16 cores. Granted this matrix is composed mostly of 1s, and -1s, (with about 16 elements per row), is this considered fast? I have seen that my implementation is much faster than iterating though a matrix naively and would like some input to whether or not my 5 minute benchmark is by skewed. Many thanks for the input, Vincent
-
Re: An inmemory sparse matrix multiplierTed Dunning 2011-06-26, 21:29
Regarding speed:
How many non-zero elements? What is the size of your input matrices? How long does it take to read the matrices without doing any multiplication? Your test matrices seem small for big sparse matrices. This sort of thing could be very useful. On Sun, Jun 26, 2011 at 1:47 PM, Vincent Xue <[EMAIL PROTECTED]> wrote: > Hi. > > I was wondering how useful an in memory sparse matrix multiplier would > be for mahout. In my current project I needed to multiply many large > sparse matrices but submitting hundreds of jobs to the cluster creates > too much overhead. > > I wrote up an implementation of sparse matrix multiplication using > threads which can multiply a 30,000 x 48,0000 matrix by its transpose > in about 5 minutes using 16 cores. Granted this matrix is composed > mostly of 1s, and -1s, (with about 16 elements per row), is this > considered fast? I have seen that my implementation is much faster > than iterating though a matrix naively and would like some input to > whether or not my 5 minute benchmark is by skewed. > > Many thanks for the input, > Vincent >
-
Re: An inmemory sparse matrix multiplierVincent Xue 2011-06-26, 22:13
NonZero Elements: >1,000,000
Input Matrix Size: 30,000 x 480,000 and 480,000 x 30,000 Reading a matrix is dependent on how many non zero elements there are. To read each element of the matrix, including non zero, would probably take just as long as it currently does using the SparseMatrix.getQuick(). I will test this implementation using larger, denser matrices. So far, I have been using it only with my design matrix but I will generate some denser random matrices and test the performance soon. Vincent On Sun, Jun 26, 2011 at 10:29 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Regarding speed: > > How many non-zero elements? > > What is the size of your input matrices? > > How long does it take to read the matrices without doing any multiplication? > > Your test matrices seem small for big sparse matrices. > > This sort of thing could be very useful. > > On Sun, Jun 26, 2011 at 1:47 PM, Vincent Xue <[EMAIL PROTECTED]> wrote: > >> Hi. >> >> I was wondering how useful an in memory sparse matrix multiplier would >> be for mahout. In my current project I needed to multiply many large >> sparse matrices but submitting hundreds of jobs to the cluster creates >> too much overhead. >> >> I wrote up an implementation of sparse matrix multiplication using >> threads which can multiply a 30,000 x 48,0000 matrix by its transpose >> in about 5 minutes using 16 cores. Granted this matrix is composed >> mostly of 1s, and -1s, (with about 16 elements per row), is this >> considered fast? I have seen that my implementation is much faster >> than iterating though a matrix naively and would like some input to >> whether or not my 5 minute benchmark is by skewed. >> >> Many thanks for the input, >> Vincent >> >
-
Re: An inmemory sparse matrix multiplierLance Norskog 2011-06-26, 23:46
The problem is that SparseMatrix does not iterate through the matrix
in a sparse manner. It uses the algorithms from AbstractMatrix which iterate directly through the rows & columns. This is appropriate for DenseMatrix, not SparseMatrix. The way to handle this is to recast the Matrix/Matrix options via the methods that get row and column vectors and walk them via the iterators. Multiplying by +1/-1 does not alter the runtime. SparseMatrix has special encoding for 0-value entries. If you do a special version of SparseMatrix that also encodes for +1/-1 entries, then your batch jobs would really fly. Are +1/-1/0/double matrices common in algorithms? Would this be commonly used in Mahout? Lance On Sun, Jun 26, 2011 at 3:13 PM, Vincent Xue <[EMAIL PROTECTED]> wrote: > NonZero Elements: >1,000,000 > Input Matrix Size: 30,000 x 480,000 and 480,000 x 30,000 > > Reading a matrix is dependent on how many non zero elements there are. > To read each element of the matrix, including non zero, would probably > take just as long as it currently does using the > SparseMatrix.getQuick(). > > I will test this implementation using larger, denser matrices. So far, > I have been using it only with my design matrix but I will generate > some denser random matrices and test the performance soon. > > Vincent > > On Sun, Jun 26, 2011 at 10:29 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: >> Regarding speed: >> >> How many non-zero elements? >> >> What is the size of your input matrices? >> >> How long does it take to read the matrices without doing any multiplication? >> >> Your test matrices seem small for big sparse matrices. >> >> This sort of thing could be very useful. >> >> On Sun, Jun 26, 2011 at 1:47 PM, Vincent Xue <[EMAIL PROTECTED]> wrote: >> >>> Hi. >>> >>> I was wondering how useful an in memory sparse matrix multiplier would >>> be for mahout. In my current project I needed to multiply many large >>> sparse matrices but submitting hundreds of jobs to the cluster creates >>> too much overhead. >>> >>> I wrote up an implementation of sparse matrix multiplication using >>> threads which can multiply a 30,000 x 48,0000 matrix by its transpose >>> in about 5 minutes using 16 cores. Granted this matrix is composed >>> mostly of 1s, and -1s, (with about 16 elements per row), is this >>> considered fast? I have seen that my implementation is much faster >>> than iterating though a matrix naively and would like some input to >>> whether or not my 5 minute benchmark is by skewed. >>> >>> Many thanks for the input, >>> Vincent >>> >> > -- Lance Norskog [EMAIL PROTECTED]
-
Re: An inmemory sparse matrix multiplierDanny Bickson 2011-06-26, 23:54
+1/-1 matrices a.k.a bipolar matrices are very common in many domains.
For example in CMDA, linear analog circuit design, and compressed sensing. I guess having a special treatment for this case is useful. On Sun, Jun 26, 2011 at 7:46 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > The problem is that SparseMatrix does not iterate through the matrix > in a sparse manner. It uses the algorithms from AbstractMatrix which > iterate directly through the rows & columns. This is appropriate for > DenseMatrix, not SparseMatrix. > > The way to handle this is to recast the Matrix/Matrix options via the > methods that get row and column vectors and walk them via the > iterators. > > Multiplying by +1/-1 does not alter the runtime. SparseMatrix has > special encoding for 0-value entries. If you do a special version of > SparseMatrix that also encodes for +1/-1 entries, then your batch jobs > would really fly. > > Are +1/-1/0/double matrices common in algorithms? Would this be > commonly used in Mahout? > > Lance > > On Sun, Jun 26, 2011 at 3:13 PM, Vincent Xue <[EMAIL PROTECTED]> wrote: > > NonZero Elements: >1,000,000 > > Input Matrix Size: 30,000 x 480,000 and 480,000 x 30,000 > > > > Reading a matrix is dependent on how many non zero elements there are. > > To read each element of the matrix, including non zero, would probably > > take just as long as it currently does using the > > SparseMatrix.getQuick(). > > > > I will test this implementation using larger, denser matrices. So far, > > I have been using it only with my design matrix but I will generate > > some denser random matrices and test the performance soon. > > > > Vincent > > > > On Sun, Jun 26, 2011 at 10:29 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > >> Regarding speed: > >> > >> How many non-zero elements? > >> > >> What is the size of your input matrices? > >> > >> How long does it take to read the matrices without doing any > multiplication? > >> > >> Your test matrices seem small for big sparse matrices. > >> > >> This sort of thing could be very useful. > >> > >> On Sun, Jun 26, 2011 at 1:47 PM, Vincent Xue <[EMAIL PROTECTED]> wrote: > >> > >>> Hi. > >>> > >>> I was wondering how useful an in memory sparse matrix multiplier would > >>> be for mahout. In my current project I needed to multiply many large > >>> sparse matrices but submitting hundreds of jobs to the cluster creates > >>> too much overhead. > >>> > >>> I wrote up an implementation of sparse matrix multiplication using > >>> threads which can multiply a 30,000 x 48,0000 matrix by its transpose > >>> in about 5 minutes using 16 cores. Granted this matrix is composed > >>> mostly of 1s, and -1s, (with about 16 elements per row), is this > >>> considered fast? I have seen that my implementation is much faster > >>> than iterating though a matrix naively and would like some input to > >>> whether or not my 5 minute benchmark is by skewed. > >>> > >>> Many thanks for the input, > >>> Vincent > >>> > >> > > > > > > -- > Lance Norskog > [EMAIL PROTECTED] >
-
Re: An inmemory sparse matrix multiplierJake Mannix 2011-06-27, 01:40
On Sun, Jun 26, 2011 at 1:47 PM, Vincent Xue <[EMAIL PROTECTED]> wrote:
> Hi. > > I was wondering how useful an in memory sparse matrix multiplier would > be for mahout. In my current project I needed to multiply many large > sparse matrices but submitting hundreds of jobs to the cluster creates > too much overhead. > > I wrote up an implementation of sparse matrix multiplication using > threads which can multiply a 30,000 x 48,0000 matrix by its transpose > in about 5 minutes using 16 cores. Granted this matrix is composed > mostly of 1s, and -1s, (with about 16 elements per row), is this > considered fast? I have seen that my implementation is much faster > than iterating though a matrix naively and would like some input to > whether or not my 5 minute benchmark is by skewed. > about 16 elements per row (and column, I'm guessing?), means if you did this using the algorithm I usually use for matrix multiplication (matrix summing the outer products of the columns), then you're looking at the matrix sum of 48k matrices, each of which only has 16^2 = 256 entries. So it should take only O(256 * 48k) = 12 million multiplications and 12 million additions. This seems like it should be doable in well under a second using this algorithm. On the other hand, if you're doing 30k * 30k = 900M dot products, most of which are zero, but require 16 operations each to discover this, you'll need closer to 15 billion operations. This would be lot slower. -jake > > Many thanks for the input, > Vincent >
-
Re: An inmemory sparse matrix multiplierTed Dunning 2011-06-27, 01:51
And going down teh columns in a sparse matrix could do this to you.
On Sun, Jun 26, 2011 at 6:40 PM, Jake Mannix <[EMAIL PROTECTED]> wrote: > On Sun, Jun 26, 2011 at 1:47 PM, Vincent Xue <[EMAIL PROTECTED]> wrote: > > > Hi. > > > > I was wondering how useful an in memory sparse matrix multiplier would > > be for mahout. In my current project I needed to multiply many large > > sparse matrices but submitting hundreds of jobs to the cluster creates > > too much overhead. > > > > I wrote up an implementation of sparse matrix multiplication using > > threads which can multiply a 30,000 x 48,0000 matrix by its transpose > > in about 5 minutes using 16 cores. Granted this matrix is composed > > mostly of 1s, and -1s, (with about 16 elements per row), is this > > considered fast? I have seen that my implementation is much faster > > than iterating though a matrix naively and would like some input to > > whether or not my 5 minute benchmark is by skewed. > > > > about 16 elements per row (and column, I'm guessing?), means if you > did this using the algorithm I usually use for matrix multiplication > (matrix summing the outer products of the columns), then you're > looking at the matrix sum of 48k matrices, each of which only has > 16^2 = 256 entries. So it should take only O(256 * 48k) = 12 million > multiplications and 12 million additions. This seems like it should > be doable in well under a second using this algorithm. > > On the other hand, if you're doing 30k * 30k = 900M dot products, > most of which are zero, but require 16 operations each to discover > this, you'll need closer to 15 billion operations. This would be lot > slower. > > -jake > > > > > > Many thanks for the input, > > Vincent > > >
-
Re: An inmemory sparse matrix multiplierJake Mannix 2011-06-27, 03:34
Thus the right approach is transpose first, then do an in-memory "map" (in
parallel) over the rows of the result, outer multiplying each and sending the rows of this outer product to the "reduce" step, which sums (again in parallel, since their independent there should be no sychronization cost) these rows. Even though its in-memory, its still classic MapReduce, and well suited to parallelism (although its a good example of a nicely parallel algorithm which is *not* "emberassingly parallel" - both the map and reduce are nontrivial and different). Now that I think about it further, *self* multiplication (ie A * A'), where the entries are all +/- 1 lends itself to even more optimization chances: the rows of the outer product matrix (produced in the Map phase, for each column of the input matrix) are all equal to either that row, or its negative. Calculating these is just changing 16 signs, not 256 multiplications. -jake On Jun 26, 2011 6:51 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: And going down teh columns in a sparse matrix could do this to you. On Sun, Jun 26, 2011 at 6:40 PM, Jake Mannix <[EMAIL PROTECTED]> wrote: > On Sun, Jun 26, 2011...
-
Re: An inmemory sparse matrix multiplierSean Owen 2011-06-27, 10:55
(I can file a JIRA for this.)
On Mon, Jun 27, 2011 at 12:46 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > The problem is that SparseMatrix does not iterate through the matrix > in a sparse manner. It uses the algorithms from AbstractMatrix which > iterate directly through the rows & columns. This is appropriate for > DenseMatrix, not SparseMatrix. >
-
Re: An inmemory sparse matrix multiplierJeff Hansen 2011-06-28, 15:31
I think you've just described the approach laid out in this document --
http://www.google.com/url?sa=t&source=web&cd=1&ved=0CB8QFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.144.9712%26rep%3Drep1%26type%3Dpdf&ei=WfAJTtr8Ju6FsgKz6cyeAQ&usg=AFQjCNFxQPXXnYw93eBpUknGy2iZJESmHw&sig2=zFxncZg4n7b_mueF4mbGPQ There's one other trick you could employ -- since the resulting matrix is symmetric, you really only have to bother with half of the multiplications and sums (plus the diagonal). After all, if A*A' = B, then Bij = Bji. So if you only calculate Bij for i <= j, you can flip the matrix to fill in the other half. That means when you're doing your cross product in the mapper phase you've mentioned, you only need to do half the cross multiplications: for(int j = i; j <= i;j++)emit i, j, i*j; Of course this is all based on the assumption that we're multiplying a matrix with itself transposed. On Sun, Jun 26, 2011 at 10:34 PM, Jake Mannix <[EMAIL PROTECTED]> wrote: > Thus the right approach is transpose first, then do an in-memory "map" (in > parallel) over the rows of the result, outer multiplying each and sending > the rows of this outer product to the "reduce" step, which sums (again in > parallel, since their independent there should be no sychronization cost) > these rows. > > Even though its in-memory, its still classic MapReduce, and well suited to > parallelism (although its a good example of a nicely parallel algorithm > which is *not* "emberassingly parallel" - both the map and reduce are > nontrivial and different). > > Now that I think about it further, *self* multiplication (ie A * A'), where > the entries are all +/- 1 lends itself to even more optimization chances: > the rows of the outer product matrix (produced in the Map phase, for each > column of the input matrix) are all equal to either that row, or its > negative. Calculating these is just changing 16 signs, not 256 > multiplications. > > -jake > > On Jun 26, 2011 6:51 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > > And going down teh columns in a sparse matrix could do this to you. > > On Sun, Jun 26, 2011 at 6:40 PM, Jake Mannix <[EMAIL PROTECTED]> > wrote: > > On Sun, Jun 26, 2011... >
-
Re: An inmemory sparse matrix multiplierVincent Xue 2011-06-28, 15:46
I tested the performance of the threaded matrix multiplication on one
machine and my results are provided. I can multiply large matricies, however the running time is proportional to the number of rows in matrix A and is limited by how much memory exists on the machine. I was able to get the product of 2 matricies with 1,000,000 non zero elements in about 30 minutes. My implementation of sparse matrix multiplication is similar as Jake describes. During "load" I store each matrix as 2 OpenIntObjectHashMap of Vectors, one for rows and one for columns. Though it uses up 2x as much memory for each matrix, it works well for sparse matrices. For the multiplication, I queue up the tupples and distribute them be processed on threads. The results are a bit inconsistent, but I think this is an artifact from the the random sparse matrix generator I used. Concluding remarks: The amount of time a calculation takes is heavily determined by the number of rows in the matrix. (This is because each row is a thread in the system). The implementation scales well with lots of columns. =====================Details Below ===================== Here is how I generated my results: 1) Choose 2 dimensions and a density 2) Create 2 random matrices, Matrix A (row x column) and Matrix B (column x row) 3) Multiply AxB I tested its ability to scale by: - Scaling up the dimensions of the matrix while keeping the density the same. (number of non-zero elements will still increase) - Scaling up the dimensions of the matrix while keeping the number of non zero elements the same - Scaling up the number of of non zero elements for a large dimension. ================================================= Started by increasing row dimensions Making: 10000 10 that are 0.01 full Non Zero Elements: 1000 Non Zero Elements: 1000 Took this long to make the random matrices: 47 Using 16 processors Rows: 10000 Columns: 10 Finished In Time Finished and Took: 4509 ================================================Non Zero Elements: 10000 Non Zero Elements: 10000 Took this long to make the random matrices: 167 Using 16 processors Rows: 100000 Columns: 10 Finished In Time Finished and Took: 419494 ~ 6.99156667 minutes ================================================Making: 100000 100 that are 0.01 full Non Zero Elements: 100000 Non Zero Elements: 100000 Took this long to make the random matrices: 284 Using 16 processors Rows: 100000 Columns: 100 Finished In Time Finished and Took: 328268 ~ 5.47113333 minutes ================================================Making: 100000 1000 that are 0.01 full Non Zero Elements: 1000000 Non Zero Elements: 1000000 Took this long to make the random matrices: 4349 Using 16 processors Rows: 100000 Columns: 1000 Finished In Time Finished and Took: 1408320 ~ 23.47200 minutes (Need Lots of Ram) =================================================Started scaling up column dimensions here, (constant non zeros) Making: 100000 1000 that are 0.0010 full Non Zero Elements: 100000 Non Zero Elements: 100000 Took this long to make the random matrices: 338 Using 16 processors Rows: 100000 Columns: 1000 Finished In Time Finished and Took: 339435 ~ 5.65725 minutes ================================================Making: 100000 10000 that are 1.0E-4 full Non Zero Elements: 100000 Non Zero Elements: 100000 Took this long to make the random matrices: 270 Using 16 processors Rows: 100000 Columns: 10000 Finished In Time Finished and Took: 386642 ~ 6.44403333 minutes ================================================Making: 100000 100000 that are 1.0E-5 full Non Zero Elements: 100000 Non Zero Elements: 100000 Took this long to make the random matrices: 273 Using 16 processors Rows: 100000 Columns: 100000 Finished In Time Finished and Took: 412046 ~ 6.86743333 minutes ================================================Making: 100000 1000000 that are 1.0E-6 full Non Zero Elements: 99999 Non Zero Elements: 100000 Took this long to make the random matrices: 376 Using 16 processors Rows: 100000 Columns: 1000000 Finished In Time Finished and Took: 352324~ 5.87206667 minutes ================================================Making: 100000 10000000 that are 1.0E-7 full Non Zero Elements: 100000 Non Zero Elements: 100000 Took this long to make the random matrices: 337 Using 16 processors Rows: 100000 Columns: 10000000 Finished In Time Finished and Took: 411278 ~ 6.85463333 minutes =================================================Started Scaling up density here Making: 100000 100000 that are 2.0E-5 full Non Zero Elements: 200000 Non Zero Elements: 200000 Took this long to make the random matrices: 531 Using 16 processors Rows: 100000 Columns: 100000 Finished In Time Finished and Took: 375450~ 6.2575 minutes ================================================Making: 100000 100000 that are 3.0E-5 full Non Zero Elements: 300000 Non Zero Elements: 300000 Took this long to make the random matrices: 706 Using 16 processors Rows: 100000 Columns: 100000 Finished In Time Finished and Took: 411555 ~ 6.85925 minutes ================================================Making: 100000 100000 that are 4.0E-5 full Non Zero Elements: 400000 Non Zero Elements: 400000 Took this long to make the random matrices: 897 Using 16 processors Rows: 100000 Columns: 100000 Finished In Time Finished and Took: 509539 ~ 8.49231667 minutes On Tue, Jun 28, 2011 at 4:31 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote: |