I am not sure if this is the right place to ask this question, but I believe there will be people here who do computations on computer algebra packages like Sage in their work.
I have been using Sagemath to perform some matrix rank computations. It turns up a few bizarre results occasionally.

For example, I had to find the rank of a matrix ($100 \times 150$) with large integer entries (entries of magnitude in the range of $1$ to $10^{15}$). When I wrote the code with the matrix M declared as matrix(ZZ, R, C), or as matrix(QQ, R, C), it returns a rank of around 90 (which I believe is correct), whereas if I declare the matrix over the reals as matrix(RR, R, C), it returns a rank of around 50, which I believe is too low based on some conjectures I have.

So, overall I am curious, what are the standard way(s) to implement rank computation (and does it differ based on reals, or rationals) and where can I read more about these? If these issues arise due to precision errors, how can I get around them? And more importantly, how do I know beforehand that my computation is susceptible to precision errors?
(I tried looking up the source code of Sagemath a couple of times, but I was quickly lost, so I hope someone can point me to the precise documentation/source code)

$\begingroup$I think that almost all rank calculations over $\mathbb{R}$ will be subject to error. I think that the standard advice is to compute singular values instead. If you see some very small singular values, then a clear gap, then some larger ones, then you can guess that the very small values should really be zero. Of course, you will not be able to prove anything rigorously by this method alone. It may be that Sage is doing something like this internally and applying an inappropriate cutoff.$\endgroup$
– Neil StricklandNov 28 '17 at 12:29

11

$\begingroup$Three axioms of linear algebra: (1) Don’t compute the inverse of a matrix (unless it’s unitary) (2) Don’t multiply two matrices. (3) Never compute the rank of a matrix over the reals.$\endgroup$
– Chris GodsilNov 28 '17 at 13:02

5

$\begingroup$@Nikhil : I do not know for sure, but I would guess that over $\mathbb{Q}$, Sage uses exact arithmetic. Whether that means you should trust the computation is a philosophical question. Should we ever trust a computer program? Should we trust computations that we do by hand?$\endgroup$
– Timothy ChowNov 28 '17 at 18:11

4

$\begingroup$Gaussian elimination with integer matrices can produce huge intermediate results already for 100x150 matrices. That can sometimes be avoided with lattice reduction tricks. It can also be avoided in practice by working modulo a few "random" large primes; the resulting upper bound on the rank might in principle not be sharp but if you want a rigorous proof you can probably build on mod-p results. (Often such commands offer a flag that lets you choose between a heuristic answer and an answer that comes with a proof but takes longer to compute.)$\endgroup$
– Noam D. ElkiesNov 28 '17 at 18:30

4

$\begingroup$. . . with $a \in {\bf Z}^{100}$. Then a reduced basis for this lattice should start with $60$ vectors $(0,a)$ with $Ta=0$, because any vector $(MTa,a)$ with $a$ not in $\ker T$ has size at least $M$. Once you've found a basis containing $60$ such vectors, you're done because $T$ has rank at most $150-60$. Note that even if $M$ must be taken as large as $10^{100} > H^6$ this is way better than Gaussian elimination which would have you doing arithmetic with (ratios of) integers of size $H^{90}$.$\endgroup$
– Noam D. ElkiesNov 29 '17 at 0:26

1 Answer
1

I don't know what algorithm Sage actually uses, but computing rank over the integers is fun and easy: Complexity of computing matrix rank over integers . It is NOT so easy if you want good running time, and for that there are a number of papers of Arne Storjohann, which show that one can do it asymptotically as fast as for real matrices (a surprising result, in view of coefficient blow-up). Storjohann has actually implemented his algorithms, and I am sure Sage uses this or something like it.

As for the reals, the fastest way to compute rank is to compute the singular value decomposition, and throw away singular values below some cutoff (probably around $10^{-6}$). As pointed out by many in the comments, rank is very unstable, so unless you want the "pseudo-rank" (as above, defined by the size of the singular values), don't go there.