After having understood the inclusion-exclusion principle by working out a few cases and examples in my earlier post, we are now ready to prove the general version of the principle.

As with many things in mathematics, there is a “normal” way of doing proofs and there is the “Polya/Szego” way of doing proofs. (Ok, ok, I admit it’s just a bias I have.) I will stick to the latter. Ok, let’s state the principle first and follow it up with its proof in a step-by-step fashion.

Inclusion-Exclusion Principle: Let there be a set of objects. Suppose out of these objects, there are objects of type , objects of type objects of type and objects of type . Also, suppose denote the number of objects that are simultaneously of type AND AND AND AND respectively. Then, the number of objects that are NOT of type is

.

— Notation —

Let (finite or infinite) be the universe of discourse. Suppose . Then, the characteristic function of is defined as

if ,

and otherwise, for all .

For example, suppose . Let (i.e. even integers.) Then, , and so on.

Note: and for all . Here, denotes the empty set. Due to this, we will use and interchangeably from now.

—

Lemma 1: iff for all .

Proof: We first prove the “only if”part. So, suppose . Let . If , then . But, we also have , in which case, . If, on the other hand, , then . Hence, in either case, for all .

We now prove the “if” part. So, suppose for all . Let . Then, , which forces , which implies . Hence, , and this completes our proof.

Note: If is finite, then .

Lemma 2:

and

for all . (Here, is the complement of .)

Proof: The proof for each case is elementary.

Lemma 3: Suppose is finite. If , then the characteristic function of is , i.e.

for all .

Proof: Note the above is an extension of the third part of lemma . A simple induction on the number of subsets of proves the result.

—Proof of the inclusion-exclusion principle —

Now, suppose are subsets of objects of type , respectively. Observe that the set of objects that are NOT of type is simply the region outside of all the oval regions! (Look at the previous post to see what this means.) And this region is simply the subset . Using the first part of lemma , we see that the characteristic function of this outside region is , which from lemma is the same as

.

Expand the last expression to get

.

Now, sum over all the elements of and use the second part of lemma to obtain the desired result. And this completes our proof.

[Update: Thanks to Andreas for pointing out that I may have been a little sloppy in stating the maximum modulus principle! The version below is an updated and correct one. Also, Andreas pointed out an excellent post on “amplification, arbitrage and the tensor power trick” (by Terry Tao) in which the “tricks” discussed are indeed very useful and far more powerful generalizations of the “method” of E. Landau discussed in this post.The Landau method mentioned here, it seems, is just one of the many examples of the “tensor power trick”.]

The maximum modulus principle states that if (where ) is a holomorphic function, then attains its maximal value on any compact on the boundary of . (If attains its maximal value anywhere in the interior of , then is a constant. However, we will not bother about this part of the theorem in this post.)

Problems and Theorems in Analysis II, by Polya and Szego, provides a short proof of the “inequality part” of the principle. The proof by E. Landau employs Cauchy’s integral formula, and the technique is very interesting and useful indeed. The proof is as follows.

From Cauchy’s integral formula, we have

,

for every in the interior of .

Now, suppose on . Then,

,

where the constant depends only on the curve and on the position of , and is independent of the specific choices of . Now, this rough estimate can be significantly improved by applying the same argument to , where , to obtain

, or .

By allowing to go to infinity, we get , which is what we set out to prove.

Polya/Szego mention that the proof shows that a rough estimate may sometimes be transformed into a sharper estimate by making appropriate use of the generality for which the original estimate is valid.

I will follow this up with, maybe, a couple of problems/solutions to demonstrate the effectiveness of this useful technique.

The Harvard College Mathematics Review (HCMR) published its inaugural issue in April 2007, and the second issue was out almost a fortnight ago. Clearly, the level of exposition contained in the articles is extremely high, and it is a pleasure reading all the articles even though a lot of it may not make a lot of sense to a lot of people. I would recommend anyone to visit their website and browse all their issues. For problem-solvers, the problem section in each issue is a delight!

Anyway, the first issue’s problem section contained a somewhat hard inequality problem (proposed by Shrenik Shah), which I was able to solve and for which my solution was acknowledged in the problem section of the second issue. The HCMR carried Greg Price’s solution to that particular problem, and I must say his solution is somewhat more “natural” and “intuitive” than the one I gave.

Well, I want to present my solution here but in a more detailed fashion. In particular, I want to develop the familiar inequality up to a point where the solution to the aforesaid problem turns out to be somewhat “trivial.” The buildup to the final solution itself is non-trivial, however. This post relies on the material presented in the classic book Problems and Theorems in Analysis I by George Pólya and Gabor Szegö. Again, I strongly recommend all problem-solvers to study this book. Anyway, we will now state the problem and discuss its solution. (I have slightly changed the formulation of the problem in order to avoid any possible copyright issues.)

Problem: For all distinct positive reals and , show that

.

First, let us discuss some facts.

1.inequality: If are positive real numbers, then

,

with equality if and only if .

Proof: For a hint on proving the above using mathematical induction, read this. However, we will make use of Jensen’s inequality to prove the above result. We won’t prove Jensen’s inequality here, though it too can be proved using induction.

Jensen’s inequality: Let be a continuous convex function. Let be positive reals such that . Then for all , we have

,

with equality if and only if .

Now, in order to prove (1), consider the function defined by . Indeed, is continuous on the stated domain. Also, , which implies is convex on . Therefore, using Jensen’s inequality and setting , for positive reals , we have

(since is monotonically increasing on .)

This proves the first part of the inequality in (1). Now, replace each with for to derive the second part of the inequality. And, this proves the inequality.

We can generalize this further. Indeed, for any positive reals and positive reals , replacing each with for , and using Jensen’s inequality for once again, we obtain the following generalized inequality, which we shall call by a different name:

2.Generalized Cauchy Inequality (non-integral version) : For any positive reals and positive reals , we have

.

Now, the remarkable thing is we can generalize the above inequality even further to obtain the following integral version of the inequality.

3.Generalized Cauchy Inequality (integral version) : Let and be continuous and positive functions on the interval [] ; also suppose is not a constant function. Then we have

,

where .

Okay, now we are finally ready to solve our original problem. First, without any loss of generality, we can assume . Now, we shall use the above version of the Generalized Cauchy Inequality, and so we set and . Here and are both positive functions on the interval []. Also, note that is not a constant function.

Thus, we have .

Also, .

And, .

Combining and , we immediately obtain the desired inequality. And, we are done.