How dense is the product of Sparse Matrices?

31 Oct 2012

This post is to be filed in the “useless but fun” category. A friend of mine was doing some Hadoopy stuff a few days ago, experimenting with rather large sparse matrices and their products. Long story short, we ended up wondering how sparse the product of 2 sparse matrices should be.

A sparse matrix is a matrix which is primarily filled with zeros. The product of two matrices involves computing the dot product of each row of the first one, with each column of the second one. There is clearly a relationship between how dense the two matrices are, and how dense the result should be. As an obvious illustration, if we consider 2 matrices populated only with zeroes - as sparse as it gets - their product is obviously 0. Conversely, two matrices populated only with ones - as dense as it gets - will result in a “completely dense” matrix. But… what happens in between?

Should I expect the product of 2 sparse matrices to be more, or less dense than the original matrices? And does it depend on how large the matrices are? What would be your guess?

Because I am lazy, I figured I would let the computer do some work for me, and run some simulations. Plus, this was the excuse I needed to finally play with Matrices in F#.

I’ll define the density of a Matrix as the proportion of its elements that are non-zero. After referencing the F# PowerPack, we write the following function:

This is what we expect - the dense matrix contains 3 non-zero entries out of 4 (75% of entries), whereas the sparse example contains 25% of non-zero entries. We are in business.

To keep things simple, let’s look at square matrices, and perform the following experiment: generate multiple random matrices with a given density, multiply them, and compute the density of the result.
Here is what I came up with:

We instantiate one Random - the default .NET Random Number Generator - which we’ll reuse throughout, to avoid the classic non-random random issue. The create function returns a n x n matrix, which will have on average the requested density; it is initially populated with 0, and each entry is randomly “replaced” by a 1, with a probability based on the density.

And we are now ready to run simulations. The parameters n, d and r correspond to the size of the matrix, the density, and the number of runs that will be performed. We initialize an infinite sequence, where each step creates 2 random matrices, multiplies them and returns the density of the result; we take r elements of that sequence, average it out, and voila! We have a quick-and-dirty simulation.

So how do things look? Let’s try this for densities increasing from 0% to 50%, by 5% increments:

fordensityin0.0..0.05..0.5dosimulation10density1000|>printfn"Density %f -> Result is %f"density

Reassuringly, the density of the product is increasing with the density of its elements. What’s interesting though is that the answer to our question isn’t clear cut: for very sparse matrices, the product is even more sparse, but as density increases, past a certain limit, the product becomes denser.

How about the size of the matrix? Easy enough:

forsizein5..5..50dosimulationsize0.11000|>printfn"Size %i -> Result is %f"size

Visibly, the size of the matrices does have an impact on the product density: the larger the matrix, the denser the product.

So what? Well, a few things. First, with F# and the interactive window, it took me about 15 minutes to get this done, including reading this post on linear algebra in F#. The more I use the REPL, the more I love it, it’s just an incredible tool for exploration. Then, we got some interesting results - but also more questions: we established that size matters (larger matrices product are denser than smaller ones), and the product density looks like a S-shaped function of the density of the inputs, with an inflexion point. This is one of the drawbacks of numeric methods - on the one hand, for very cheap we gained a sense for what is going on, on the other hand, this is rather useless in figuring out how to relate formally the size and density of the matrices, with the density of the product - this would be a job better done with paper, pencil, and some effort.

That effort is beyond what I am prepared to invest in this question, so I’ll leave it at that. However, if you know anything about this, I would love to hear about it - I confess I was particularly intrigued by the S-shape relationship between the input and output density, hopefully I can let that one go.

For convenience, here is the full script code:

#r"FSharp.PowerPack.dll"letdensity(M:Matrix<float>)=letelements=M.NumRows*M.NumCols|>(float)letnonZero=M|>Matrix.map(fune->ife=0.0then0.0else1.0)|>Matrix.sumnonZero/elementsletdense=matrix[[42.0;0.0];[-1.0;123.0]]letsparse=matrix[[0.0;1.0];[0.0;0.0]]printfn"Dense matrix: %f"(densitydense)printfn"Sparse matrix: %f"(densitysparse)letrng=newSystem.Random()letcreatendensity=Matrix.createnn0.0|>Matrix.map(fune->ifrng.NextDouble()>densitythen0.0else1.0)// Run r times the product of 2 matrices// of density d, and size n, and compute// the average densityletsimulationndr=Seq.initInfinite(funindex->letm1=createndletm2=createnddensity(m1*m2))|>Seq.taker|>Seq.average// Relationship between density and densityfordensityin0.0..0.05..0.5dosimulation10density1000|>printfn"Density %f -> Result is %f"density// Relationship between size and densityforsizein5..5..50dosimulationsize0.11000|>printfn"Size %i -> Result is %f"size