In this third episode, we cover what a complex conjugate is, how to compute how fast a wavefunction’s phase is rotating (given dpsi/dt), and roughly what energy is in terms of a wavefunction. That’s even more information than last time, so please don’t hesitate to ask any questions!

I’ve finally gotten Episode 2 on how to solve the Schrödinger equation out! Be sure to ask any questions you might have, so that I can address them in the next episode, and subscribe to be sure you don’t miss it. If you like/dislike the episode, please like/dislike the video, and feel free to share it if you really enjoyed it.

This has taken a huge amount of work, and I still had to cut corners to make the deadline, but hopefully you like it. I’m entering the YouTube Next EDU Guru competition, and I had to get it ready by October 1st. Apologies for the low volume and stupid thumbnail that I can’t change to something more representative of the video.

After 2 years, 1 month, and 1 day, I’m finally back in the swing of things with a new release of Inventor IDE! Version Alpha 6 is planned to be the last major alpha release. I will probably make a few smaller updates as 6a, 6b, etc. It now contains the preliminary performance analysis tool, which I’ll explain in due time. More mundane features added include Find/Replace/Find All References, and a bunch of obscure-but-problematic bugs fixed.

Me? Writing a blog post? Well, you see, it all started with this Wikipedia page on the “Jacobi method for complex Hermitian matrices”. I wanted to apply the Jacobi eigenvalue algorithm to diagonalize complex matrices, since it had worked well for real symmetric matrices, but the page on it only covers real symmetric matrices, not the complex equivalent, Hermitian matrices. I was glad to see there was the additional page supposedly explaining how to do something similar for complex Hermitian matrices, but alas, it’s explained in such complicated, ambiguous terms that it’s at best unreadable, and at worst wrong. I implemented two different interpretations of the page’s content, and both failed on the simplest possible example…

…so I derived my own in a much simpler way, and figured out a more general algorithm. Other people have undoubtedly found the exact same method, but papers I’d found on the topic didn’t seem to state their methods much clearer than the Wikipedia page. The issue is that I derived it in a completely different way than what’s on Wikipedia. It’s not a matter of “fixing a mistake” in the page; by Wikipedia’s rules, this counts as original research (despite that others have probably done the same), so it’s not allowed on Wikipedia. The only other suggestion people had was a blog post to make sure that a correct version is online somewhere, so here it is.

In a nutshell, the Jacobi eigenvalue algorithm works by one-by-one choosing the largest off-diagonal element, and rotating the two vectors associated with that element such that the element becomes zero. Do this enough times and all off-diagonal elements will be quite close to zero (if the matrix is Hermitian), leaving the eigenvalues. The eigenvectors come from applying the rotations to the identity matrix. The only thing necessary to do this is to know the eigenvectors that diagonalize a 2×2 matrix, namely

where is the matrix we’re trying to diagonalize, and is the element we’re trying to eliminate. If is Hermitian, are real, and . However, I won’t assume that is Hermitian, since even if it’s not, this same approach can still be used to get what’s known as the Schur decomposition, namely , where is upper triangular and the eigenvalues of are on the diagonal of . If is Hermitian, the Schur decomposition is the same as the eigendecomposition, so . In the non-Hermitian case, it doesn’t give the eigenvectors, but those can be more easily computed afterward. I’ll split this into two cases.

Case 1:

This case is where you don’t have two diagonal elements that are almost identical compared to the off-diagonal element. If you did, dividing by them could cause chaos, madness, and megalomania… or just invalid results. We start by subtracting from our 2×2 matrix, since we’re only interested in its (pseudo-)eigenvectors , and multiples of can be diagonalized using the same eigenvectors. That basically means that adding/subtracting multiples of won’t change . Then, we divide the 2×2 matrix by , which also doesn’t change the (pseudo-)eigenvectors. This gives us

For simplicity, we’ll just relabel this as

Now, if we want to make by multiplying by a unitary matrix on the right and its conjugate transpose on the right, a bit of trial and error can give you that

is a solution to that problem. Note that the square roots are square roots of complex numbers, not necessarily real numbers. You can verify that this works in the original case by multiplying out and confirming that the lower off-diagonal element is zero. If is Hermitian, both off-diagonal elements should be zero. Case 1 complete.

Case 2:

This case is where you have two diagonal elements that are almost identical compared to the off-diagonal element. Simple solution: pretend that they are exactly zero. You now have

The zeros make things much simpler, as we can now find

This time, since we’ve assumed that the diagonal elements are exactly the same, it probably won’t make the off-diagonal element exactly zero, but it’ll probably be negligibly small, and if it’s ever the largest at some later point, it’ll just get picked again to be eliminated, only probably without the diagonal elements being so close the 2nd time around.

Again, you don’t have to take my word for it. You can easily verify this solution by just multiplying it out.

How did you get these?

If you’re wondering how I figured these out (after a weekend of trial and error of different methods), I knew that if the off-diagonal element was non-zero (which it must be, else there’s nothing to do), there is some non-zero component of the second element in each vector of . I temporarily ignored the restriction of having unit vectors and set both of those components to 1, but didn’t ignore the restriction of having the two vectors being complex orthogonal (i.e. ), giving me

I know I want that off-diagonal element zero, so

Since the scales of the 2 vectors are arbitrary, including the relative scales, I scaled one by , and the other by . Then I just normalize them both (they happened to be the same length, ), and that’s the solution above. I did the same for case 2, only it was even simpler. Easy… it just took a weekend to figure out what the easy way was.

As I sit here waiting for Matlab to compute in over 25 minutes what I can now compute in 10 milliseconds in mostly-unoptimized C++ code running in debug mode, I figured I should write a blog post stressing the importance of a few small differences that can result in >150,000x speedups like this.

Note that I’m not putting down Matlab, since it’s not designed to run fast; I’m just using this as an example of why it’s important to not get complacent with performance.

1. Precomputation

This is may be the largest performance difference between the two implementations, though it might be a tight race between this and the next two differences. Precomputation, however, is very widely applicable, easy to do, and could have been applied to the Matlab code to some extent.

The basic idea is: instead of computing the same data repeatedly, compute them once and save them.

In those words, it sounds so obvious, but it encompasses a broad range of things, some of which are less obvious. For example, in this case, I’ve precomputed the data before running the program and read it in from input files. I’ll be running the program using the same data possibly thousands of times, and those data took longer to compute than to read in, so there’s no sense in recomputing them each time. In particular, I’ve computed about 1,000 points on each of a bunch of curves, and the program linearly interpolates between them, which is plenty sufficient accuracy in this case.

There are of course cases where precomputation doesn’t make sense, such as if the computation is as fast as looking up the data or if the amount of data to save is unreasonably large, but in a very large range of applications, precomputation can speed things up a ton.

2. Native Code

People will yell until they’re blue in the face that interpreted code or Just-In-Time-compiled code is usually as fast as native code (i.e. compiled to machine code ahead of time). They’re dead wrong, and usually somewhere up the chain, they got their information from a marketing department or a study funded by Sun, IBM, etc, even though they’re probably not aware of it. Nonetheless, I was honestly shocked that such extremely simple tests confirm this, e.g. running the following code in Java versus in C++.

C++ compilers trivially replace the multiplication with addition, and the division with a bit shift (and a special trick because the compiler doesn’t clue in that the value is always positive or even). Of course, a really smart compiler would realize that the entire loop is useless, since the starting number is positive and even, but none of them appear to be that smart. Java however, will not even do the simple replacement no matter how many times the loop is run. This is visible by comparing against the performance of:

In Java, this new code is several times faster than the previous, whereas in C++, the performance is (almost) identical between the two, since it’s already made that replacement.

In my case today, I’m not comparing against Java; I’m comparing against Matlab, which not only doesn’t compile its code ahead of time, but literally interprets the text representing the code as it’s running, similar to VisualBasic or Fortran. That adds factors of thousands for just figuring out what the program is doing. It just doesn’t stand a chance outside of its prebuilt libraries. Inside its prebuilt libraries, e.g. matrix multiplication or diagonalization code written in C++, it’s fast, so one must try to use them where reasonable.

3. Customization

As much as Matlab provides some great functionality that can be a pain in the butt to implement from scratch, such as diagonalization of big matrices, it (understandably) doesn’t let you specify everything you know that may help speed up the computation.

For example, matrix-matrix multiplication where you input two matrices and get back their product is very slow because of memory caching issues. However, if the matrix on the right of the multiplication is already transposed ahead of time, it’s much faster. (I’ll save the explanation of why for another time.) Matlab doesn’t know whether there’s a way to rework your code such that some matrices are pre-transposed and others aren’t, it just receives two matrices and has to spit out the product. If you can use your knowledge of your code to do this, you can get a big performance boost.

Another example in this case is if you have good estimates of the eigenvalues and eigenvectors of the matrix you’ve got, it’s much faster to find them exactly, but Matlab’s diagonalization just uses random guesses.

Conclusion

As I mentioned at the top, the C++ implementation is mostly unoptimized, so I’m still expecting more performance improvements as I scale it up and optimize it, (since speeding up a 10ms computation isn’t all that useful.) I’ll try to remember to report again when it’s multi-threaded and vectorized. Nonetheless, simple changes like those above can make a huge difference in the performance of software.