This is exactly the kind of content I love to see -- interesting technical topic, completely outside of any area of expertise of mine, clearly presented, interesting, and pretty obviously useful. Please post more!

There's "math knowledge" and then there's "math knowledge". I'm still learning the beginner stuff but so far all that it seems to need is basic calculus (ie. "do you understand what a gradient is and how to calculate one numerically?") and some linear algebra ("do you know what matrices are and how to multiply them together?")

Basic calculus doesn't seem that hard though. You understand asymptotes, right? When something gets closer to a certain value, but doesn't quite touch it, like 1/x with 0. That leads quite naturally to limits, where you say "ok, at a close enough value, we'll just ignore that little bit and say it touches".

Then come derivatives. So you have a function graph, and you're trying to determine the slope at a certain point. It's really simple for say y = x, where you just take any two points, but what if the graph is curvy? You'd take the point and then a point close to it, and calculate the slope for that. If the function is fairly smooth, that's fairly close to the actual slope. So you push the other point closer to the original point, and the slope you calculate gets closer to the real slope. You edge closer and closer, and suddenly it sounds a lot like a limit, where your initial point is x, your second point x + h, and h becomes smaller and smaller, so h -> 0.

Here's what's cool though. Since you didn't use 5, or 7.23, but x, you can put any point into this, so you've got a function that maps x onto the slope at x of the original function (at least if I remember correctly). You play around with a couple of different kinds of functions and arrive at various rules for differentiating things without having to do the whole limit thing, like x^2 -> 2x, x^3 -> 3x^2, e^x -> e^x, etc.

And derivatives are really handy. Say in kinematics, velocity is the derivative over time of position, acceleration is the derivative over time of velocity, and suddenly a whole bunch of things make a lot of sense.

Integrals are pretty neat too. So you want to estimate the area under a function from point a to point b. so you draw a box, and there's an area you're missing, or an area you're including but shouldn't, but it's close to the actual area underneath the function. So you think...hrm, I could use two boxes of half the width, that'd be closer to what I want to find. Then three, ten, a hundred, with the sum of the area of boxes getting closer to the actual area. Said more mathy, it'd be a to b, with a width of b-a, and each box gets a width b-a/h, and h gets infinitely large. Then we make that little limit jump again, and get a function. We do that for a couple of functions, deduce some rules, so we don't need to go through the entire process whilst integrating simple functions. a -> x + C, x -> 1/2x^2 + C for example.

With some understanding of the FTC (fundamental theorem of calculus), which roughly states that derivatives and integrals are inverse operations of each other (isn't that neat?), it again helps us make sense of things. For example, look at the equations in physics for constant accelerations: v = v_i + at, x = x_i + vt + 1/2at^2. See how v is the integral of a, and x is the integral of v? For the first one, "v_i" is the constant factor C, and "at" is what comes from the factor a. For the second one, "x_i" is the constant factor C, "vt" is "v_i" integrated, and "1/2at^2" is the integral of "at".

Thanks for writing all of this. I enjoyed it. It makes no sense to me how calculus makes so much sense. It all just fits together. It's an eerie high point in the "unreasonable" effectiveness of mathematics.

I enjoyed your writeup too. If I had to choose between being able to low level do everything without having a high level understanding of how they fit together and on the other hand having this high level understanding with no low level applying skills I would definitely choose the high level view. It is IMHO far more valuable to know how they fit together.

Go show someone how to calculate a gradient vector (assuming they can calculate basic derivatives) and multiply two matrices, and then try to teach them how gradient descent works or the SVM derivation. That's not going to end even slightly well for your average person. There really is no silver bullet for this stuff. You either need some mathematical maturity, or you need to put in an exorbitant amount of time to understand things.

In general, if you're an user of ML (as in, apply already known ML methods to your particular problem instead of researching better conceptual ML methods) then you'd be able to work on a higher abstraction levels where all the math is inside the libraries and you're not really touching it, much less implementing the formulae.

Not really, you can go far playing around with off the shelf components. IMO, the math is less helpful in NN than other areas because your trying to create a useful aproximation not get a specific correct answer. Thus knowing why something works is only so helpful.

Did clarifai have a different company name a few years ago? I built a realtime threaded discussion app with images a few years years ago (like secret or yikyak but in 2011) and used a REST API that was capable of recognising actual body parts.

It generally worked well, with the exception of false negatives for cartoon nudity and false positives for pastrami.