Adam is a machine learning optimization algorithm. “Adam” is not an acronym strictly speaking (which is why it’s not capitalized) but it stands for “adaptive moment estimation”.

Adam was first published in July 2015 (24 months ago as I write this post) and has quickly become one of the main algorithms used for neural network training. Things are moving very fast in the field of machine learning.

The only way I can completely understand an algorithm is if I can implement the algorithm in code. So this morning, while on a break from speaking at a conference I’m attending, I fired up Visual Studio and tackled an Adam demo.

I kept things simple and attempted to implement Adam to find the minimum of a dummy loss function w0^2 + w1^2, which is sometimes called the sphere function. The solution is w0 = w1 = 0.

I located the source research paper at https://arxiv.org/pdf/1412.6980v8.pdf. Compared to many research papers, the Adam paper is very well written and I didn’t have too much difficulty understanding it – but I’ve been reading such papers for many years so it makes sense that I’d understand the paper.

Anyway, after a couple of hours, I had a demo up and running. There’s still a lot about the Adam algorithm I don’t understand yet, but coding up a demo is a big first step towards full understanding.