Study the Mott metal insulator transition on a Dimer Hubbard Model.

Using Cython in real case example

Python has grown to be my preferred choice when it comes to programing.
It is just so simple to import from its huge selection of modules and
just build upon them to get things working. But at some point, when all
the design phase is over, it just feels so slow when it has to work.

In this case, I’m referring to numerical calculations. Indeed the numpy
and scipy modules are of great help, letting all the numerical calculations
to run faster in their C or Fortran coded engines. But what of the
algorithm I code in Python, it is just going to be the slow part of
my work. So I have to avoid writing too much python and prefer calling
all this C coded modules. YES!!, this is just it. BUT is not enough, because
numpy and scipy only provide very general functions and it is up to
the user to make a use out of them.

So once the user has its algorithm working, he needs to speed it up. Coding
it all into C might be too much of a hassle and error prone. He also might
loose many of the python modules he depends on in the first place and the
code will become less readable. The alternatives for now are to use Cython
and the not so new project numba, this two are designed to establish
the link between python ease of use and C level performance. My problem is that
all the simple examples and tutorials you get from them are to highlight
the huge performance boost you can get don’t really match my real life
situations. Their examples are based to situations where the complete
algorithm your is stand alone and the python interpreter is slowing this
down. So a quick call to numba or Cython static typesetting will just
solve the problem and give you 3 orders of magnitude performance boost.

My story begins because I was already calling the fast C functions
of numpy and scipy, and those, should be the most time consuming
functions of my algorithm. But they weren’t, there was enough python code
to slow it down. So my first hope was to call numba, but there was no
performance gain as it can’t deal with my calls to scipy functions. So
Cython was the way to go, I’m certainly not aware of the many possibilities
Cython permits to speed up my code, I still believe there is room for
improvement, and I would be happy to hear from them in the comments.

So I now present you how I solved my problem and gained a performance
boost of 4x. Indeed it is not so glamorous or amazing as 1000x you get
from the advertised Cython version of the code, but it is a new start
in performance gains over my python code.

So before I start I will stress: HAVE TESTS FOR YOUR CODE!
It doesn’t help at all to get faster execution times just to arrive to
the wrong results faster. Debugging is a pain, and it is more painful
every time your new modifications break old code. Testing lets you test
over smaller pieces of your code and make it easier to track what broke
your code. I love python so much because there are very simple frameworks
to write your tests, so just use them. When is a good time to write your
test? If you can before your own module great. After you wrote the module
is also good, and even if it gave you the right results from the beginning
you have to keep track it continues to do so as your code evolves.

The goal of this work is to implement a Quantum Monte Carlo algorithm
to solve the Anderson impurity model and use it in the context of DMFT.
There are already many implementations of this code spread around the
world many of them in Fortran and C/C++ so why not use them? Well, first
they are more complicated to understand, modify, and some times to big
packages. Second, I needed to learn and modify it on my own style.

The complete software is freely available on github at learn-dmft, matching
the first release of the code(because the code itself will evolve as new ideas
come in bugs appear and get fixed. I will focus in this post on the local
changes to speed up the code. This code snippets are not aimed to run
independently but are give in the aim to provide some insight to the work
done.

Lets start by profiling the code. Here you see only the relevant parts
I want to focus on for this short example. Here one sees that most of time
is spent inside the imp_solver which performs the Monte Carlo update,
that needs to update a dense matrix each time, gnew. Almost all of the time
would need to be spend in the matrix update, but because I’m using scipy
interface to BLAS it just doesn’t take that much time and it is the Metropolis
update scheme which is manually implemented in python that takes all the time.

Based on this example for the Ising model
I wanted to test numba, but it just did not help, because numba can’t, to my
knowledge, compile the calls I do to scipy. Static typing in Cython didn’t
work either as the main delay was presented by calling still the scipy module
of BLAS. So I need to use Cython to perform the direct call into BLAS. It was
thanks to tokyo, a Cython wrapper to BLAS
that I could learn this. Although scipy seems to be inserting in its
development branch the Cython wrappers to BLAS, but that will be for a later
time.

Recoding this into Cython syntax it becomes
It is important to keep in mind that numpy normally uses C ordered arrays,
but if you happen to use scipy.linalg.blas functions in some steps, your numpy arrays will
become Fortran ordered. Then it is important to keep that ordering when
you call BLAS. (This was my big bug, and why you need tests). So pay special
attention of the keyword of CblasColMajor to keep the Fortran Column ordering
provided by the numpy arrays. The copy instruction is also necessary, and there
should be better ways of data copying and not creating new numpy arrays.

This first action dropped the amount of python calls almost by half. The new
Cython function is faster by almost half. The changing number of calls is
because I’m a bad profiler and I’m not using the same seed for the random
number generator, but in this case is has quite negligible effects. The total
execution time dropped by a bit more than a second.

Now it’s time to redo the update function that is in charge of the Metropolis
algorithm. It has a very simple structure. It uses most of its time calling
the random number generator and doing the algebra operations over simple floats

But once rewritten into Cython it becomes:
Notice that in this case I’m again calling external functions available in C
like the ones of the GSL libraries. After this changes the end result
is a completely cythonized Metropolis Monte Carlo update function that
runs entirely on C. As it is shown by the profiler.
The amount of python function calls dropped again this time by 5 times. And the
update function internally calling gnew requires in total 0.841s. Droping
the total time used by the impurity solver from 3.591s to 0.841s that is
a 4.12x speed gain. And most of the time would be spend as expected in the
matrix update function gnew.

Óscar Nájera

During my PhD I always wanted to have an interactive way of plotting a nice looking Bethe Lattice and particular to my work a Dimer Bethe lattice. Because this visualization wasn’t an essential part of my research I never investigated into it. It is enough to draw them on paper for any practical purpose. I was then confronted with my PhD oral presentation, great opportunity to procrastinate and get the nice unessential plots done.

Git is a fantastic tool to version control code. And the advantages of version controlling my work are so evident that I want to version control everything else I do besides code. Most of my work consists of editing text files, and I have even forced my workflow into text files just to be able to version control more of it. Thus, I keep track of my source code, org-mode notes and then some \(\LaTeX\) files.

I’m guilty of having an online blog and not writing on it. I must confess that working on its looks and internals is much more fun for me than actually putting some words in plain English for everyone to read. But that is not the point, the content is the point of a blog.
This website has gone few visual updates over the years. Each time I decide to embrace new technologies(at least to me) and to work on top of somebody else’s work instead of trying again to do the design by myself.

I just found a post on using the right dictionary and I immediately had to review its reference: You’re probably using the wrong dictionary by James Somers. I don’t regret it. The core message is that we shall not only use dictionaries to find the meaning of words we don’t know, but to use them to enrich our prose. I never considered there would be such a dictionary that allows for such kind task.

TL;DR Emacs is the way to go
Optimization is my way of procrastination. This time it is about getting a decent text editor. I am at the point where it is just annoying that all my text is getting scattered between different editors and that many times I have to open my same files under different editors to get what I need done.
Why IDEs never come with a spell checker?

Python has grown to be my preferred choice when it comes to programing. It is just so simple to import from its huge selection of modules and just build upon them to get things working. But at some point, when all the design phase is over, it just feels so slow when it has to work.
In this case, I’m referring to numerical calculations. Indeed the numpy and scipy modules are of great help, letting all the numerical calculations to run faster in their C or Fortran coded engines.

This comes after an old idea, that has been bugging me in the last years: “I can’t do everything myself anymore, there is just no time for such thing”. I’m not a specialist in everything, I chose to focus long ago in physics. Thus, for all those web development things I can’t do them on my own anymore. I have to use framework and templates.
This is a general Idea. Just take whatever is already done and use it to build upon it.

Today my website experienced a complete redo with a new design. The previous presentation was established to mimic my blogger site. That required a great amount of effort because I like simple things and to simplify the blogger theme was quite challenging, I not always get CSS do what I want. Also the web site could be seen in web browser but gave already problems in mobile phone browsers, and doing the translation of the blogger site version for mobiles was an unwanted challenge.

I’ll aim at not counting mi posts for future releases of content. But for now, since this look more like biographical notes of my life I can’t find better titles for the posts. It’s also because I’m not packing just one subject but some history of my live in the last time.
Now for the first improvement y manage to bring myself to writing this post only after a month. And in the last time many things have changed.

On my first post after blog opening, I offered to write as often as possible to record my learning and share ut. Unfortunately, even for me, it has not happened. This is the second post. That does not mean, I spend my time without getting new knowledge. It means instead that I still need to practice about this blogging task and discipline myself to it. Certainly the lack of monetary income hinders me from publishing.