License

First I will give props to professor Georgios Evangelidis who created the wonderful ecc algorithm
https://www.mathworks.com/matlabcentral/fileexchange/27253-ecc-image-alignment-algorithm--image-registration-The ecc algorithm is a means to find the transformation between two images ( registration ) . It works incredibly well. It is of the direct intensity based pyramidal iterative registration methods family ( like matlab's imregister()).
This algorithm in my wide review of algorithms is the most accurate homography ( linear ) registration for smooth images. However, until this weekend it ran at about 5 seconds per frame for an image of size 256x256 on my i5 8Gb ram computer. This is not fast enough for many applications.
By leveraging some tricks I reintroduce this algorithm as a ! real-time high quality registration for almost any size image !. This is possibly the best bang ( accuracy ) for the buck ( run-time ) you will come across . I have also reviewed algorithms in python and cpp -- none were able to both run this quickly and give such good results consistently.
The tricks I used :
** I thought of this as being a machine learning problem in which I am trying to fit 8 parameters. The ecc algorithm already sets up the solution as a gradient decent.
1) Naturally I first optimized run time of every slow part of the code by vectorizing and using conv2 ... etc
2) I allow a different number of iterations per pyramid level. The use of the 2nd through 2nd to last pyramid level is only to slightly improve the rough registration with the new added information, and so a very limited number of iterations is required at these pyramid level.
3) I run the method on a very small subset of the pixels . There are up to 8 parameters which we are aiming to optimize. If the transformation is close to linear then I expect the distribution of suggested translation to be in a normal distribution about the true linear transformation across the image. For that reason 15 pixels per parameter ( with x and y translation each so 30 samples) should be more than enough to give an extremely accurate approximation of the true transformation. This is essentially stochastic gradient decent rather than pure gradient decent.
4) I compensate by adding a learning rate to improve the stability of the algorithm. This is a common practice in iterative learning schemes. A momentum term may also be helpful here , but I have not had a chance to try it out just yet.
5) I also choose high information ( high gradient ) pixels to farther improve each guess.

Potential improvements:
1) Avoid pyramiding by fitting a Gaussian process every 10 iterations and using it to deal with relatively large transformations. Pyramiding right now is by far the biggest "time sucker".
2) Add a momentum term ( an din general study the stability of the learning process ).
3) Use an experts method to avoid digressions. I envision this as running the algorithm three times from scratch for 5 iterations and choosing the one best of the three solutions on a common slightly larger set of points ( ~250-1000 ). I ran then repeat that process.
5) Stop iterations at each pyramid level once the algorithm reaches a satisfactory rho ( correlation ) factor. At ridiculous as this sounds , since run time of each iteration is so quick adding an if statement may not be worthwhile.
6) Use a faster rough registration method such as cross correlation to avoid pyramiding.

please note:
1) While this algorithm is fairly stable and usually comes at a good answer , it does sometime fail. To avoid failure in your application modify the number of iterations, number of pyramids, number of points per iteration and learning rate.
2) I have not thoroughly tested this code , so please let me know if you run into trouble and i'll do my best to help .