How To Understand Derivatives: The Quotient Rule, Exponents, and Logarithms

Last time we tackled derivatives with a “machine” metaphor. Functions are a machine with an input (x) and output (y) lever. The derivative, dy/dx, is how much “output wiggle” we get when we wiggle the input:

Now, we can make a bigger machine from smaller ones (h = f + g, h = f * g, etc.). The derivative rules (addition rule, product rule) give us the “overall wiggle” in terms of the parts. The chain rule is special: we can “zoom into” a single derivative and rewrite it in terms of another input (like converting “miles per hour” to “miles per minute” — we’re converting the “time” input).

And with that recap, let’s build our intuition for the advanced derivative rules. Onward!

Division (Quotient Rule)

Ah, the quotient rule — the one nobody remembers. Oh, maybe you memorized it with a song like “Low dee high, high dee low…”, but that’s not understanding!

It’s time to visualize the division rule (who says “quotient” in real life?). The key is to see division as a type of multiplication:

We have a rectangle, we have area, but the sides are “f” and “1/g”. Input x changes off on the side (by dx), so f and g change (by df and dg)… but how does 1/g behave?

Chain rule to the rescue! We can wrap up 1/g into a nice, clean variable and then “zoom in” to see that yes, it has a division inside.

So let’s pretend 1/g is a separate function, m. Inside function m is a division, but ignore that for a minute. We just want to combine two perspectives:

f changes by df, contributing area df * m = df * (1 / g)

m changes by dm, contributing area dm * f = ?

We turned m into 1/g easily. Fine. But what is dm (how much 1/g changed) in terms of dg (how much g changed)?

We want the difference between neighboring values of 1/g: 1/g and 1(g + dg). For example:

What’s the difference between 1/4 and 1/3? 1/12

How about 1/5 and 1/4? 1/20

How about 1/6 and 1/5? 1/30

How does this work? We get the common denominator: for 1/3 and 1/4, it’s 1/12. And the difference between “neighbors” (like 1/3 and 1/4) will be 1 / common denominator, aka 1 / (x * (x + 1)). See if you can work out why!

If we make our derivative model perfect, and assume there’s no difference between neighbors, the +1 goes away and we get:

(This is useful as a general fact: The change from 1/100 to 1/101 = one ten thousandth)

The difference is negative, because the new value (1/4) is smaller than the original (1/3). So what’s the actual change?

g changes by dg, so 1/g becomes 1/(g + dg)

The instant rate of change is -1/g^2 [as we saw earlier]

The total change = dg * rate, or dg * (-1/g^2)

A few gut checks:

Why is the derivative negative? As dg increases, the denominator gets larger, the total value gets smaller, so we’re actually shrinking (1/3 to 1/4 is a shrink of 1/12).

Why do we have -1/g^2 * dg and not just -1/g^2? (This confused me at first). Remember, -1/g^2 is the chain rule conversion factor between the “g” and “1/g” scales (like saying 1 hour = 60 minutes). Fine. You still need to multiply by how far you went on the “g” scale, aka dg! An hour may be 60 minutes, but how many do you want to convert?

Where does dm fit in? m is another name for 1/g. dm represents the total change in 1/g, which as we saw, was -1/g^2 * dg. This substitution trick is used all over calculus to help split up gnarly calculations. “Oh, it looks like we’re doing a straight multiplication. Whoops, we zoomed in and saw one variable is actually a division — change perspective to the inner variable, and multiply by the conversion factor”.

Phew. To convert our “dg” wiggle into a “dm” wiggle we do:

And get:

Yay! Now, your overeager textbook may simplify this to:

and it burns! It burns! This “simplification” hides how the division rule is just a variation of the product rule. Remember, there’s still two slivers of area to combine:

The “f” (numerator) sliver grows as expected

The “g” (denominator) sliver is negative (as g increases, the area gets smaller)

Using your intuition, you know it’s the denominator that’s contributing the negative change.

Natural Logarithm

ln(10) is the time to grow from 1 to 10, assuming 100% continuous growth

Ok, fine. How long does it take to grow to the “next” value, like 11? (x + dx, where dx = 1)

When we’re at x=10, we’re growing exponentially at 10 units per second. It takes roughly 1/10 of a second (1/x) to get to the next value. And when we’re at x=11, it takes 1/11 of a second to get to 12. And so on: the time to the next value is 1/x.

The derivative

is mainly a fact to memorize, but it makes sense with a “time to grow” intepreration.

A Hairy Example: x^x

Time to test our intuition: what’s the derivative of x^x?

This is a bad mamma jamma. There’s two approaches:

Approach 1: Rewrite everything in terms of e.

Oh e, you’re so marvelous:

Any exponent (a^b) is really just e in different clothing: [e^ln(a)]^b. We’re just asking for the derivative of e^foo, where foo = ln(x) * x.

But wait! Since we want the derivative in terms of “x”, not foo, we need to jump into x’s point of view and multiply by d(foo)/dx:

The derivative of “ln(x) * x” is just a quick application of the product rule. If h=x^x, the final result is:

We wrote e^[ln(x)*x] in its original notation, x^x. Yay! The intuition was “rewrite in terms of e and follow the chain rule”.

Approach 2: Independent Points Of View

Remember, deriviatives assume each part of the system works independently. Rather than seeing x^x as a giant glob, assume it’s made from two interacting functions: u^v. We can then add their individual contributions. We’re sneaky though, u and v are the same (u = v = x), but don’t let them know!

From u’s point of view, v is just a static power (i.e., if v=3, then it’s u^3) so we have:

And from v’s point of view, u is just some static base (if u=5, we have 5^v). We rewrite into base e, and we get

We add each point of view for the total change:

And the reveal: u = v = x! There’s no conversion factor for this new viewpoint (du/dx = dv/dx = dx/dx = 1), and we have:

It’s the same as before! I was pretty excited to approach x^x from a few different angles.

By the way, use Wolfram Alpha (like so) to check your work on derivatives (click “show steps”).

Question: If u were more complex, where would we use du/dx?

Imagine u was a more complex function like u=x^2 + 3: where would we multiply by du/dx?

Let’s think about it: du/dx only comes into play from u’s point of view (when v is changing, u is a static value, and it doesn’t matter that u can be further broken down in terms of x). u’s contribution is

if we wanted the “dx” point of view, we’d include du/dx here:

We’re multiplying by the “du/dx” conversion factor to get things from x’s point of view. Similarly, if v were more complex, we’d have a dv/dx term when computing v’s point of view.

Look what happened — we figured out the genric d/du and converted it into a more specific d/dx when needed.

It’s Easier With Infinitesimals

Separating dy from dx in dy/dx is “against the rules” of limits, but works great with infinitesimals. You can figure out the derivative rules really quickly:

Product rule:

We set “df * dg” to zero when jumping out of the infinitesimal world and back to our regular number system.

Think in terms of “How much did g change? How much did f change?” and derivatives snap into place much easier. “Divide through” by dx at the end.

Summary: See the Machine

Our goal is to understand calculus intuition, not memorization. I need a few analogies to get me thinking:

Functions are machines, derivatives are the “wiggle” behavior

Derivative rules find the “overall wiggle” in terms of the wiggles of each part

The chain rule zooms into a perspective (hours => minutes)

The product rule adds area

The quotient rule adds area (but one area contribution is negative)

e changes by 100% of the current amount (d/dx e^x = 100% * e^x)

natural log is the time for e^x to reach the next value (x units/sec means 1/x to the next value)

With practice, ideas start clicking. Don’t worry about getting tripped up — I still tried to overuse the chain-rule when working with exponents. Learning is a process!

Happy math.

Appendix: Partial Derivatives

Let’s say our function depends on two inputs:

The derivative of f can be seen from x’s point of view (how does f change with x?) or y’s point of view (how does f change with y?). It’s the same idea: we have two “independent” perspectives that we combine for the overall behavior (it’s like combining the point of view of two Solipsists, who think they’re the only “real” people in the universe).

If x and y depend on the same variable (like t, time), we can write the following:

It’s a bit of the chain rule — we’re combining two perspectives, and for each perspective, we dive into its root cause (time).

If x and y are otherwise independent, we represent the derivative along each axis in a vector:

This is the gradient, a way to represent “From this point, if you travel in the x or y direction, here’s how you’ll change”. We combined our 1-dimensional “points of view” to get an understanding of the entire 2d system. Whoa.

When will Math curriculums begin combining concepts in meaningful ways like this? Calculus classes like to split ‘Power Rule,’ ‘Quotient Rule,’ and ‘Chain Rule’ into discrete sections, when really they’re consequences of the same basic idea. Perhaps it’s less labor-intensive teaching distinct formulas to be memorized, but it’s just another reason people hear ‘Calculus’ and immediately glaze over.

And while I’m lamenting–your mention of infinitesimals brings up another sore spot of mine. A Calc TA told me how separating ‘dy/dx’ is ‘against the rules,’ as you say, and I took it to heart. Imagine poor, confused me a couple semesters later in DiffEq: “I thought this was against the rules!” The limit-based approach to teaching Calculus needs some serious revision, particularly for non-mathematicians moving into practical fields.

@Joe: I hear you — we slice and dice concepts and miss the cohesive whole. All the calculus rules are just examples of how different subparts can contribute to the whole, but I’m only seeing that now, 10+ years after high school. Ugh.

And yeah — there’s so much “don’t do this, I don’t know why, but don’t!” in math. Why is it against the rules? What are the “rules”? Limits are a seatbelt introduced to address theoretical concerns many, many years after Calculus was put into use. Learning about seatbelts is fine, but don’t dive into them before you explain what a car [i.e., calculus] is!

What helped me understand derivatives is this: If y=e^x, then y’=e^x, which is of course related to your favorite number, e, which does seem to have more significance than pi. A graph and an explanation could help others.

If I had known this existed while I was taking calculus, there would have been so fewer headaches. I had always known on a subconscious level that there were connections between calculus and the earlier maths–my teacher even confirmed that by joking that all other classes were “pre-calculus”–but for the life of me, I could never find those connections. And they were right there mocking me the whole time! This helped more than any lecture, peer teaching, or textbook ever could. Thanks!