If you present calculus students with a definite integral, their Pavlovian response is “Take the anti-derivative, evaluate it at the limits, and subtract.” They think that’s what it means. But it’s not what a definite integral means. It’s how you (usually) calculate its value. This is not a pedantic fine point but a practically important distinction. It pays to distinguish what something means from how you usually calculate it. Without this distinction, things that are possible may seem impossible. [1]

For example, suppose you want to compute the following integral that comes up frequently in probability.

There is no (elementary) function whose derivative is exp(-x2). It’s not just hard to find or ugly. It simply doesn’t exist, not within the universe of elementary functions. There are functions whose derivative is exp(-x2), but these functions are not finite algebraic combinations of the kinds of functions you’d see in high school.

If you think of the definite integral above as meaning “the result you get when you find an antiderivative, let its arguments go off to ∞ and -∞, and subtract the two limits” then you’ll never calculate it. And when you hear that the antiderivative doesn’t exist (in the world of functions you’re familiar with) then you might think that not only can you not calculate the integral, no one can.

In fact the integral is easy to calculate. It requires an ingenious trick [2], but once you see that trick it’s not hard.

Let I be the value of the integral. Changing the integration variable makes no difference, i.e.

and so

This integral can be converted to polar coordinates. Instead of describing the plane as an infinite square with x and y each going off to infinity in both directions, we can think of it as an infinite disk, with radius going off to infinity. The advantage of this approach is that the Jacobian of the change of variables gives us an extra factor of r that makes the exponential integral tractable.

From this we get I2 = π and so I = √π.

This specific trick comes up occasionally. But more generally, it is often the case that definite integrals are easier to compute than indefinite integrals. One of the most common applications of complex analysis is computing such integrals through the magic of contour integration. This leads to a lesson closely related to the one above, namely that you may not have to do what it looks like you need to do. In this case, you don’t always need to compute indefinite integrals (anti-derivatives) as an intermediate step to compute definite integrals. [3]

Mathematics is filled with theorems that effectively say that you don’t actually have to compute what you conceptually need to compute. Sometimes you can get by with calculating much less.

* * *

[1] One frustration I’ve had working with statisticians is that many have forgotten the distinction between what they want to calculate and how they calculate it. This makes it difficult to suggest better ways of computing things.

[2] Lord Kelvin said of this trick “A mathematician is one to whom that is as obvious as that twice two makes four is to you. Liouville was a mathematician.”

[3] If you look back carefully, we had to compute the integral of exp(-r2) r, which you would do by first computing its anti-derivative. But we didn’t have to compute the anti-derivative of the original integrand. We traded a hard (in some sense impossible) anti-derivative problem for an easy one.

When I was in high school, one year I made the Region choir. I had no intention of competing at the next level, Area, because I didn’t think I stood a chance of going all the way to State, and because the music was really hard: Stravinsky’s Symphony of Psalms.

My choir director persuaded me to try anyway, with just a few days before auditions. That wasn’t enough time for me to learn the music with all its strange intervals. But I tried out. I sang the whole thing. As awful as it was, I kept going. It was about as terrible as it could be, just good enough to not be funny. I wanted to walk out, and maybe I should have out of compassion for the judges, but I stuck it out.

I was proud of that audition, not as a musical achievement, but because I powered through something humiliating.

I did better in band than in choir. I made Area in band and tried out for State but didn’t make it. I worked hard for that one and did a fair job, but simply wasn’t good enough.

That turned out well. It was my senior year, and I was debating whether to major in math or music. I’d told myself that if I made State, I’d major in music. I didn’t make State, so I majored in math and took a few music classes for fun. We can never know how alternative paths would have worked out, but it’s hard to imagine that I would have succeeded as a musician. I didn’t have the talent or the temperament for it.

When I was in college I wondered whether I should have done something like acoustical engineering as a sort of compromise between math and music. I could imagine that working out. Years later I got a chance to do some work in acoustics and enjoyed it, but I’m glad I made a career of math. Applied math has given me the chance to work in a lot of different areas—to play in everyone else’s back yard, as John Tukey put it—and I believe it suits me better than music or acoustics would have.

Francis Su has created an iPhone app MathFeed that gives a stream of new math content: blog posts, book reviews, popular journal articles, and tweets. You can also get the same content via Twitter. Check it out!

Here are a few things I’ve had to figure out in the process of setting up Emacs on a Mac, in particular with getting shell-mode to work as I’d like. Maybe this will save someone else some time if they want to do the same.

I’ve used a Mac occasionally since the days of the beige toasters, but I never owned one until recently. I’ve said for years that I’d buy a Mac as soon as I have a justification, and I recently started a project that needs a Mac.

I’d heard that Emacs was hard to set up on Mac, but that has not been my experience. I’m running Emacs 25.1 on macOS 10.12.1. Maybe there were problems with earlier versions of Emacs or OS X that I skipped. Or maybe there are quirks I haven’t run into yet. So far my only difficulties have been related to running a shell inside Emacs.

Path differences

The first problem I ran into is that my path is not the same inside shell-mode as in a terminal window. A little searching showed a lot of discussion of this problem but no good solutions. My current solution is to run source .bash_profile from my bash shell inside Emacs to manually force it to read the configuration file. There’s probably a way to avoid this, and if you know how please tell me, but this works OK for now.

Manually sourcing the .bash_profile file works for bash but doesn’t work for Eshell. I doubt I’ll have much use for Eshell, however. It’s more useful on Windows when you want a Unix-like shell inside Emacs.

Update: Dan Schmidt pointed out in the comments that Emacs reads .bashrc rather than .bash_profile. It seems that Mac doesn’t read .bashrc at all, at least not if it can find a .bash_profile file. I created a .bashrc file that sources .bash_profile and that fixed my problem, though it did not fix the problem with Eshell or the path problem below.

Scrolling command history

The second problem I had was that Control-up arrow does not scroll through shell history because that key combination has special meaning to the operating system, bringing up Mission Control. Quite a surprise when you expect to scroll through previous commands but instead your entire screen changes.

I got around this by putting the following code in my Emacs config file and using Alt-up and Alt-down instead of Control-up and Control-down to scroll shell history. (I’m using my beloved Microsoft Natural keyboard, so I have an Alt key.)

Another path problem

The last problem I had was running the Clojure REPL inside Emacs. When I ran lein repl from bash inside Emacs I got an error saying command not found. Apparently running source .bash_profile didn’t give me entirely the same path in Emacs as in a terminal. I was able to fix the following to my Emacs config file.

(add-to-list 'exec-path "/usr/local/bin")

This works, though there are a couple things I don’t understand. First, I don’t understand why /usr/local/bin was missing from my path inside Emacs. Second, I don’t understand why adding the path customizations from my .bash_profile to exec-path doesn’t work. Until I need to understand this, I’m willing to let it remain a mystery.

Update: LaTeX path problem

After fixing the problems mentioned in the original post, I ran into another problem. Trying to run LaTeX on a file failed saying that pdflatex couldn’t be found. Adding the path to pdflatex to the exec-path didn’t work. But the following code from the TeX Stack Exchange did work:

Should I get a PhD?

Do you have any advice for people going out on their own?

Shortly after I went out on my own, I wrote this post responding to questions people had about my particular situation. My answers there remain valid, except one. I said that planned to do anything I can do well that also pays well. That was true at the time, but I’ve gotten a little more selective since then.

Can you say more about the work you’ve been doing?

Nearly all the work I do is covered under NDA (non-disclosure agreement). Occasionally a project will be public, such as the white paper I wrote for Hitachi Data Systems comparing replication and erasure coding. But usually a project is confidential, though I hope to be able to say more about some projects after they come to market.

Miscellaneous other questions

I wrote an FAQ post of sorts a few years ago. Here are the questions from that post that people still ask fairly often.

For many years, rivals University of Texas and Texas A&M University played each other in football on Thanksgiving. In 1999, the game fell one week after the collapse of the Aggie Bonfire killed 12 A&M students and injured 27.

The University of Texas band’s half time show that year was a beautiful tribute to the fallen A&M students.

Yesterday I got a review copy of The Power of Networks. There’s some math inside, but not much, and what’s there is elementary.

I’d say it’s not a book about networks per se but a collection of topics associated with networks: cell phone protocols, search engines, auctions, recommendation engines, etc. It would be a good introduction for non-technical people who are curious about how these things work. More technically inclined folks probably already know much of what’s here.

I don’t know how people take it, but here’s what I meant by it. Sometimes you can find a smarter way to work, and if you can, I assume you’re doing that. Don’t drive nails with your shoe if you can find a hammer. But ultimately the way to get things done is hard work. You might see some marginal increase in productivity from using some app or another, but there’s nothing that’s going to magically make you 10x more productive without extra effort.

Many people have replied on Twitter “I think you mean ‘work smart.'” At some point “work smarter” wasn’t a cliché, but now it is. The problem of our time isn’t people brute-forcing their way with hard, thoughtless work. We’re more likely to wish for a silver bullet. We’re gnostics.

Smart work is a kind of hard work. It may take less physical work but more mental work. Or less mental work and more emotional work. It’s hard work to try to find a new perspective and take risks.

One last thought: hard work is not necessarily long work. Sometimes it is, but often not. Hard creative work requires bursts of mental or emotional effort that cannot be sustained for long.

We are very good at building complex software systems that work 95% of the time. But we do not know how to build complex software systems that are ultra-reliably safe (i.e. P_f < 10^-7/hour).

Emphasis added.

Developing medium-reliability and high-reliability software are almost entirely different professions. Using typical software development procedures on systems that must be ultra-reliable would invite disaster. But using extremely cautious development methods on systems that can afford to fail relatively often would be an economic disaster.

I used to wonder why people “convert” from one technology to another. For example, someone might convert from Windows to Linux and put a penguin sticker on their car. Or they might move from Java to Ruby and feel obligated to talk about how terrible Java is. They don’t add a new technology, they switch from one to the other. In the words of Stephen Sondheim, “Is it always or, and never and?”

Rivalries seem sillier to outsiders the more similar the two options are. And yet this makes sense. I’ve forgotten the psychological term for this, but it has a name: Similar things compete for space in your brain more than things that are dissimilar. For example, studying French can make it harder to spell English words. (Does literature have two t’s in French and one in English or is it the other way around?) But studying Chinese doesn’t impair English orthography.

It’s been said that academic politics are so vicious because the stakes are so small [1]. Similarly, there are fierce technological loyalties because the differences with competing technologies are so small, small enough to cause confusion. My favorite example: I can’t keep straight which languages use else if, elif, elseif, … in branching.

If you have to learn two similar technologies, it may be easier to devote yourself exclusively to one, then to the other, then use both and learn to keep them straight.

How does this function compare to its limit, exp(x)? We might want to know because it’s often useful to have polynomial upper or lower bounds on exp(x).

For x > 0 it’s clear that exp(x) is larger than Tn(x) since the discarded terms in the power series for exp(x) are all positive.

The case of x < 0 is more interesting. There exp(x) > Tn(x) if n is odd and exp(x) < Tn(x) if n is even.

Define fn(x) = exp(x) – Tn(x). If x > 0 then fn(x) > 0.

We want to show that if x < 0 then fn(x) > 0 for odd n and fn(x) < 0 for even n.

For n = 1, note that f1 and its derivative are both zero at 0. Now suppose f1 is zero at some point a < 0. Then by Rolle’s theorem, there is some point b with a <b < 0 where the derivative of f1 is 0. Since the derivative of f1 is also zero at 0, there must be some point c with b < c < 0 where the second derivative of f1 is 0, again by Rolle’s theorem. But the second derivative of f1 is exp(x) which is never 0. So our assumption f1(a) = 0 leads to a contradiction.

Now f1(0) = 0 and f1(x) ≠ 0 for x < 0. So f1(x) must be always positive or always negative. Which is it? For negative x, exp(x) is bounded and so

f1(x) = exp(x) – 1 – x

is eventually dominated by the –x term, which is positive since x is negative.

The proof for n = 2 is similar. If f2(x) is zero at some point a < 0, then we can use Rolle’s theorem to find a point b < 0 where the derivative of f2 is zero, and a point c < 0 where the second derivative is zero, and a point d < 0 where the third derivative is zero. But the third derivative of f2 is exp(x) which is never zero.

As before the contradiction shows f2(x) ≠ 0 for x < 0. So is f2(x) always positive or always negative? This time we have

f2(x) = exp(x) – 1 – x – x2/2

which is eventually dominated by the –x2 term, which is negative.

For general n, we assume fn is zero for some point x < 0 and apply Rolle’s theorem n+1 times to reach the contradiction that exp(x) is zero somewhere. This tells us that fn(x) is never zero for negative x. We then look at the dominant term –xn to argue that fn is positive or negative depending on whether n is odd or even.

Another way to show the sign of fn(x) for negative x would be to apply the alternating series theorem to x = -1.

In geometry, you’d say that if a square has side x, then it has area x2.

In calculus, you’d say more. First you’d say that if a square has side nearx, then it has area nearx2. That is, area is a continuous function of the length of a side. As the length of the side changes, there’s never an abrupt jump in area. Next you could be more specific and say that a small change Δx to a side of length x corresponds to approximately a change of 2x Δx in the area.

In probability, you ask what is the area of a square like if you pick the length of its side at random. If you pick the length of the side from a distribution with mean μ, does the distribution of the area have mean μ2? No, but if the probability distribution on side length is tightly concentrated around μ, then the distribution on area will be concentrated near μ2. And you can approximate just how near the area is to μ2 using the delta method, analogous to the calculus discussion above.

If the distribution on side lengths is not particularly concentrated, finding the distribution on the area is more interesting. It will depend on the specific distribution on side length, and the mean area might not be particularly close to the square of the mean side length. The function to compute area is trivial, and yet the question of what happens when you stick a random variable into that function is not trivial. Random variables behave as you might expect when you stick them into linear functions, but offer surprises when you stick them into nonlinear functions.

Suppose you pick the length of the side of a square uniformly from the interval [0, 1]. Then the average side is 1/2, and so you might expect the average area to be 1/4. But the expected area is actually 1/3. You could see this a couple ways, analytically and empirically.

First an analytical derivation. If X has a uniform [0, 1] distribution and Z = X2, then the CDF of Z is

Prob(Z ≤ z) = Prob(X ≤ √z) = √ z.

and so the PDF for Z, the derivative of the CDF, is -1/2√z. From there you can compute the expected value by integrating z times the PDF.

You could check your calculations by seeing whether simulation gives you similar results. Here’s a little Python code to do that.

Now lets look at an exponential distribution on side length with mean 1. Then a calculation similar to the one above shows that the expected value of the product is 2. You can also check this with simulation. This time we’ll be a little fancier and let SciPy generate our random values for us.

print( sum(expon.rvs(size=N)**2)/N )

When I ran this, I got 1.99934, close to the expected value of 2.

You’ll notice that in both examples, the expected value of the area is more than the square of the expected value of the side. This is not a coincidence but consequence of Jensen’s inequality. Squaring is a convex function, so the expected value of the square is larger than the square of the expected value for any random variable.

The hazard function of a probability distribution is the instantaneous probability density of an event given that it hasn’t happened yet. This works out to be the ratio of the PDF (probability density function) to the CCDF (complementary cumulative density function).

For the standard normal distribution, the hazard function is

and has a surprisingly simple continued fraction representation:

Aside from being an elegant curiosity, this gives an efficient way to compute the hazard function for large x. (It’s valid for any positive x, but most efficient for large x.)

Sam Northshield [1] came up with the following clever proof that there are infinitely many primes.

Suppose there are only finitely many primes and let P be their product. Then

The original publication gives the calculation above with no explanation. Here’s a little commentary to explain the calculation.

Since prime numbers are greater than 1, sin(π/p) is positive for every prime. And a finite product of positive terms is positive. (An infinite product of positive terms could converge to zero.)

Since p is a factor of P, the arguments of sine in the second product differ from those in the first product by an integer multiple of 2π, so the corresponding terms in the two products are the same.

There must be some p that divides 1 + 2P, and that value of p contributes the sine of an integer multiple of π to the product, i.e. a zero. Since one of the terms in the product is zero, the product is zero. And since zero is not greater than zero, we have a contradiction.