Right at least once with high probability

Menu

As is quite evident from the sparse content of this blog, I have a tendency to jump around from subject to subject. As of late I have been making a concerted effort to become a better coder, an effort that takes literally all of my energy. It is disappointing to let all the ideas I had for this site fall by the wayside, but perhaps new ideas can take their places. I ran across a fun problem while practicing the other day, and although it is an easy problem for the experienced coder I was happy to solve it, so here it is.

The problem is taken directly from Codeforces and can be found here, and if you click on the link to the “tutorial” on the lower right side of the page you can find their solution. I didn’t read it too carefully but it looks like more or less the same thing. But you might as well read mine, right?

If you didn’t click the link, here’s the problem statement (sans their creative background story).

Exercise 1Given two integers and with . How many sequences of integers are there such that and for ?

I stared at this problem for a while with no helpful ideas. I knew there should be a recurrence of some sort, and the idea that removing any element from a sequence of length generates a sequence of length sort of supported this idea. Ultimately though I realized that none of this was making use of the key property of the sequences, the divisor requirement. This requires that all elements of the sequence be divisors of the last element, . This made me think of the sequence as a process of accumulating prime factors of , and since the sequences are easily partitioned by their last element, I figured this should lead somewhere.

Slightly more formally, let be the number of valid sequences in the question and let be the number of valid sequences of length where (end with ). Clearly , so if we can compute in reasonable time we should be good. I actually am terrible at estimating the time it will take code to run, it is something I definitely need to learn. For now we will steer clear of the discussion of complexity with the confidence that the algorithm “looks probably good enough.”

So now our challenge is to compute , and as we mentioned we are going to need the prime factorization. Actually it turns out we only need the exponents of the distinct prime factors, but I realized this far too late and didn’t really feel like modifying the code. Computing prime factorization is probably a well-studied topic, but for our purposes the simple process of dividing out all prime factors works fine. Here’s the code.

So now we’ve got our prime factorization, but how do we use it? As I mentioned before I like to think of the sequence as the process of accumulating prime factors, so in the “gap” between each element of the sequence we throw in some new prime factors. We also create an artificial at the beginning of the sequence since we can start from any divisor. Since numbers can repeat we aren’t required to put any prime factors in a given gap, and since the last number has to get to we have to use all the prime factors eventually. Splitting them up by primes, say , we can also just do the assignments of independently. There are slots, one preceding each element of the sequence, we are permitted to have empty slots, and we must eventually assign all s. We recognize this as the number of nonnegative solutions to which is just . Since the distinct prime factors can be dealt with independently this gives us our expression:

Proposition 1If the prime factorization of is , then .

The corresponding code shouldn’t be interesting, but apparently Python’s math library doesn’t feel like including a function to compute . Writing one isn’t hard but I had to show off my impressive Python knowledge. Unfortunately doing so exposed the fact that I don’t know how to call the basic operators as functions, so it became slightly less than impressive.

The code for computing isn’t interesting at all, but I suppose it serves as a reasonable summary. Again it was hardly necessary to record the actual prime factors since we only need their exponents, but Python handled it nicely enough for me to not really care.

All in all there are a ton of improvements I could probably make. My binom function is mostly a joke, I could probably cache prime factorizations, etc. I was pretty happy with the solution though and it ran in well under the time limit on Codeforces, so futher optimization seemed a little unnecessary. I hope I can solve some more interesting problems as I improve as a coder – I say my goal is to improve coding but in reality my ability to design algorithms also lags far behind where I would like it. At the risk of being terribly mistaken once again, expect more of these types of posts in the future.

My pursuit of fake mathematician status is on hold for a bit as I find myself getting deeper into statistical learning and optimization. I’m in a humanities online class that talks about statistical learning, taught by Trevor Hastie and Rob Tibshirani. I don’t think I’m really the target audience but I do think it helps me read their book, Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [1]. I really like the material but sometimes I find it very difficult, so their high-level non-technical explanations help a lot. In one of their lectures they said that a particular proof was fun, and so of course I had to go look into it. It’s actually an exercise in their book, and I liked it a lot so I figured I’d write it up. As for my rule about not posting solutions to exercises in books, I have fewer reservations in this case because a solution manual is available here. Actually, before we look at the exercise I should give some relevant background and definitions (for my benefit as much as anyone else’s).

We’re concerned with the problem of fitting a function to data. This is of course really vague and there are tons of approaches, but the approach we’re going to use is called polynomial splines, more specifically cubic splines. This basically involves splitting the feature space into a few different regions and fitting a polynomial (cubic) to the data in each region. For convenience we’re going to work in a one-dimensional feature space, and the x-coordinates which define the regions we are splitting the feature space into will be called knots. We also have a few other requests; we want our function to be continuous and we want its first two derivatives to be continuous. In general when working with splines using polynomials of degree we require continuous derivatives. Hastie and company note that cubic splines are the most common and that making the second derivative continuous makes the transition imperceptible to the human eye. I have absolutely no idea why this is a good criterion concerning continuity – honestly I suspect it’s not, since I see no further discussion. But moving on, here’s a more formal-ish definition of a cubic spline.

Definition 1A cubic spline interpolant with knots on feature space and target set is a function defined as follows:

where is a cubic function with , , and for . A natural cubic spline is a cubic spline such that and are linear.

Presumably the are actually fit to data in some way, although I suppose in a strictly technical sense that’s not required. A natural cubic spline is sometimes preferred because polynomials are not really to be trusted when they go off to infinity. There’s still a problem here, though. How do we actually pick the knots ? I suppose in some scenarios there might be a definitive divide in the data, but in general it is not at all obvious. But like everything in statistical learning (at least in my experience so far), a simple idea comes to the rescue. Just make all the knots! This is the maximal set of useful knots since adding more cannot improve the fit. This is called the smoothing spline. It’s not actually immediately clear why this is a great idea; while we will have minimal training error, why should we expect such an approach to produce a stable hypothesis function? That’s where the exercise posed by Professors Hastie and Tibshirani comes in.

Exercise 1Given points with and for , consider the following optimization problem:

Show that the minimizer over all functions of class defined on is a natural cubic spline interpolant with knots at each ( is the class of functions which have continuous first and second derivatives).

This objective function has a simple interpretation; the first term is the residual sum of squares, and the second term is a regularization or penalty term with tuning parameter lambda which penalizes large second derivatives. With any function that interpolates the data will be a minimizer, and with we will be forced to use a linear function, so the problem collapses to least squares, which is a sort of degenerate natural cubic spline. It is much more clear why the minimizer of this objective would be a good model than why just making all data points knots would produce a good model, but it turns out that they are actually one and the same.

We begin with the ever-classic proof technique. Let be the natural cubic spline interpolant to the pairs and let be another function of class interpolating . We’re going to show that if is as good as a solution as , then it is equal to on . Let . It’s not too hard to show that can perfectly interpolate the data (cubic splines with knots are defined by a set of basis functions of size ), but we’ll just assume it here. Consider the following calculation, where we let and for convenience.

integration by parts

since is linear outside the knots

splitting the integral

integration by parts again

since is a cubic

is linear outside the knots

is constant between knots

If we plug into this result we see that it implies that , and we can now use Cauchy-Schwarz (the operation defines an inner product assuming and are square-integrable).

Equality holds if and only if , i.e. is identically zero on . This implies that their difference is linear on , but since they agree on points that must actually be zero, so is identically zero on .

Armed with this result it’s apparent that the objective function will evaluate to something greater on any that interpolates the data, so is the unique minimizer over all functions that perfectly interpolate the data. What about functions not perfectly interpolating the data? Could there exist a where some function that’s not a cubic spline will produce a slightly greater residual sum of squares but a significantly smaller penalty term? No; for any such function , let be the set of points it evaluates to on the . We can find a cubic spline that perfectly interpolates these points and run through the same argument with an objective function using the points to show that the penalty term will be smaller in . Since the residual sum of squares in the original objective function is the same for and and the penalty term is the same across the two optimization problems, our cubic spline is optimal. We can safely conclude that our cubic spline is a unique minimizer.

There are actually a ton of other really cool ideas in this book. Hopefully I’ll find something to say about them instead of just doing one of the exercises, but honestly sometimes I just like to see a finished writeup. I will try to incorporate some of that next time.

In this post I’m basically just following along with General Topology by Kelley [1] aided by Wikipedia. Absolutely no thought went into this reference choice so I hope it isn’t impossibly difficult. I will also break my rule about posting solutions to exercises here because I don’t have confidence in making up my own exercises and I think this book is sufficiently old (published 1955) that no one is using it in class anymore. Elementary point-set topology, like elementary analysis, is at least fun to think about before it gets too abstract for my abilities. As always I use my reference for the statement of the theorems and the supplementary text and try to prove the theorems myself.

With topology I start from the very beginning, so I hope all readers with be able to follow along to some degree. The first question I had to ask is what the basic unit of study is. It turns out it’s a topological space. How is it defined? Apparently in a bunch of different ways, as we are about to see.

We first define a topological space in the following manner:

Definition 1 (Topology by Open Sets)A topological space is an ordered pair of a set and a family of subsets with the following properties:

Both and are in .

For any , .

The union of a family of subsets of is also in .

The set is the space (universe) and is called the topology on .

The second stipulation of course implies that the intersection of any finite number of sets in is still in . By contrast the third stipulation must not only hold for finite sets but also countably and uncountably infinite sets as well. Interestingly enough there are a bunch of ways to get to an equivalent definition according to Wikipedia, but we’ll stick to this one to begin with. This is the definition everyone seems to use. Kelley actually works in a marginally different way – he doesn’t have the first condition at all and just defines to be the union of all sets in . Therefore is in by definition. I find this a little strange because he talks about pairs just like everyone else, but in the case of his definition is already completely determined by , so I don’t really understand why this is done. His third condition is also worded slightly differently: he says that “the union of any subfamily of sets of is also in .” Under this phrasing he goes on to say that this implies that since is a subfamily of the family of sets and its union is . I don’t want to get too technical for no reason so I’m just going to stick with the majority here. It’s just a matter of what to verify when showing something is a topology anyway.

The sets in are called the open sets. Technically openness is a property of the topology, so they are -open, but I think it will be mostly clear from context. If there are two topologies in question then we’ll specify which we are talking about. A go-to example of a topological space is of course the real numbers. The usual topology for the real numbers refers to the family of sets which contain an open interval about each of their points. First we’ll better define an open interval and verify that it in fact forms a topology:

Proposition 2Let an open interval be a subset of such that for every there exists such that contains and is contained in . Then the family of sets defined by all open intervals on is a topology.

Proof: Since there does not exist an it is included in our family of sets. Similarly if we consider then any such that is contained in so this too is in our family of sets.

If and are disjoint open intervals then their intersection forms the empty set, so that case is fine. If and are open intervals where there exists an , let and . Then is contained in , so this is again open.

Let be the union of some family of open sets where is an index set. Then for any we know for some , and so must contain an interval that contains which in turn must be contained in , so is again open.

That proof is so easy Kelley didn’t even bother to include it, but I figured I’d write it up to decrease the probability I was misunderstanding everything. Of course under this definition open intervals conveniently correspond to open sets, and since a set is closed iff it is the complement of some open set, closed intervals correspond to closed sets. To my dismay (and the cost of about an hour of staring at a piece of paper), this particular naming scheme does not hold up in more general cases. The usual topology does not allow sets to be both open and closed, excepting the empty set and the entire space, but a generic topology allows this for more than just the two sets.

A neighborhood of a point is a set which contains an open set containing . A neighborhood system of a point is the family of all neighborhoods of . This sets up Kelley’s first two theorems:

Theorem 3 A set is open if and only if it contains a neighborhood of for every .

Proof: If a set is open then it itself is a neighborhood of any , so that direction is simple. If contains a neighborhood of for every , it also contains an open set which contains . Consider the union of all these open sets, call it , which is by definition again open. Then clearly since implies . We also have since is just a union of subsets of , so and is open.

Theorem 4If is the neighborhood system of a point , then finite intersections of members of belong to , and each set which contains a member of belongs to .

Proof: The latter statement is kind of obvious, because a set which contains a neighborhood of must contain an open set containing , so it is also a neighborhood of .

Let be neighborhoods of . Then there exist open sets containing such that and . We see that is open by definition and , so contains an open set containing and is therefore a neighborhood of

This brings us to another definition of a topological space, which we then show to be equivalent to our first definition.

Definition 5 (Topology by Neighborhoods)Let be a function which assigns to each a non-empty family of sets such that the following properties hold:

If , then .

If and are members of , then .

If and , then .

If , then there is a member of such that and for each .

Then the family of all sets such that whenever is a topology on .

Exercise 1 (1B(a) from Kelley)Show that if is a topological space and for each let be the family of all neighborhoods of . Then the four conditions in the above definition hold.

Proof: We’ll go down the list

If then it’s a neighborhood of so by definition it contains an open set which contains . Therefore it contains .

This is our second theorem.

If , then contains an open set containing . If then contains that same open set containing , so .

We can choose to be the open set contained in which contains . Clearly and . Since is itself an open set it must be a neighborhood for all its points.

So these four conditions are implied by our open sets definition. As you might expect, the second part of the exercise is to show that these four conditions imply our open sets definition.

Exercise 2 (1B(b) from Kelley)If is a function which assigns to each a non-empty family of sets satisfying the first three conditions in the previous exercise, then the family of all sets , such that whenever , is a topology on . If the fourth condition is satisfied, then is precisely the neighborhood system of relative to the topology .

Proof: A more explicit definition of is .

The empty set is trivially in and the entire space is as well by the third condition, since it contains all sets (and is non-empty). Let be sets in . Let be a point in , so and . Since we have . Then by the second condition we have so and . Now let with for all (an index set). We know means that for some , so since . But is a subset of so by the third condition .

Finally, if we also have the fourth condition, we can show that is the neighborhood system of .

If then if we have . By the fourth condition there exists a which is open (by theorem 3) and , so is a neighborhood of (and is in the neighborhood system). If we have a set which is a neighborhood of then it contains an open set which contains . We know that by the fourth condition so if then by the third condition. Thus implies so . We conclude that is the neigbhorhood system of .

We’re going to run through a few definitions and theorems now to prevent this from getting too long. An accumulation point of a subset is a point for which all neighborhoods of intersect at a point other than . This is just a step toward the way the neighborhoods definition of a topology expresses closed sets. A subset of a topological space is closed if and only if it contains all its accumulation points. These are of course all slightly generalized terms of what people learn in the first few weeks in analysis. The closure of a set is the intersection of all the closed sets containing . This leads to a theorem which I will write up the proof for, having skipped enough of them already. Unfortunately I got tripped up on this one so I had to go use Kelley’s proof, which starts with this preliminary theorem:

Theorem 6 The union of a set and the set of its accumulation points is closed.

Proof: For some let be an open set containing and not intersecting , which exists because . In fact, no points of are in by definition and no points are in since is a neighborhood of all its points. Taking the union of all such over all we get an open set which is the complement of , so must be closed.

This makes the second half of the next theorem, the part I got stuck on, obvious.

Theorem 7 The closure of any set is the union of the set and the set of its accumulation points

Proof: Let denote the set of accumulation points of . If then it’s obviously in the closure of since closed (or any) sets containing must contain . Similarly if then must be an accumulation point of a set containing , and since closed sets must contain all their accumulation points it must be in any closed set containing . So if then .

If then it’s in every closed set containing . Since is closed by the previous theorem and obviously contains , .

This gives us yet another way to define a topological space. A closure operator on is an operator which assigns to (not to be confused with complement) such that the following axioms are satisfied:

Theorem 9 (Topology by Closure Operator)Let be a closure operator on , let be the family of all subsets of such that , and let be the family of complements of members of . Then is a topology on and (with respect to ) for all .

Proof: We know that by the first definition, so . Similarly since since . Since we’re dealing with closed sets directly it’s probably easier to apply demorgan’s to the open sets definition of a Topology. We then need to show that the union of two sets in is in and the intersection of an arbitrary number of sets in is in . For , we have using the fourth axiom, so . If where is an index set and for all , we have the following chain: , so . By the second axiom , so and .

Finally we want to show that is actually just . I took the massive implied hint here and used the third axiom since it hasn’t come up yet. By that axiom we see that is actually in . But is the intersection of the closed sets containing , i.e. the sets in containing , and since we see that . For the other direction we note that for some (possibly empty) . So , and since is actually in because it’s an intersection of elements of , we have . Putting the two together yields , completing the proof.

To conclude we’ll look at one more way to specify a topological space after the relevant definitions. The interior of a set is the set of such that is a neighborhood of . It’s denoted . I tend to think of the interior operator as the opposite of the closure operator, and indeed it can be shown that the interior is the largest open subset of , as opposed to the smallest closed set containing . This kind of duality holds up pretty well. The interior operator, as you might imagine, takes to . Now we’re going to define a topology one last time, based on this exercise:

Exercise 3 (1C in Kelley)If is an operator which carries subsets of into subsets of , and is the family of all subsets such that , under what conditions will be a topology for and the interior operator relative to this topology?

Definition 10 (Topology by Interior Operator)Let \textsuperscript{} be an interior operator on where is the family of subsets such that , and let the following conditions hold:

We have .

For all , .

For all , .

For all , .

Then is a topology on and is its interior operator.

It did not take a lot of creativity to come up with those conditions, of course. Since this post has more than its fair share of uninteresting proofs and this one is extremely similar to the previous one, I will omit it.

That admittedly got really long and dry. I wanted to do all the really mundane and boring exercises for practice and confirmation of understanding. I wouldn’t even recommend a reader go too carefully through the proofs to learn (if you are then I recommend reading a real resource). I am, however, hopeful that in coming posts we’ll get to some more tangible results. In those perhaps there will be more value in cursory reading.

Source: [1] Kelley, John. General Topology. Published 1955 by D. Van Nostrand Company, Inc.
Used for the statement of definitions, theorems, comments, and theorem proofs where specified.

In this post I’m using Dummit and Foote [1] and Wikipedia for reference. As a minor disclaimer some of Dummit and Foote’s proofs are more general. I made a few extra assumptions that at least in my understanding still give us results that are sufficiently powerful for our purposes. As with all these theorems the proofs are simple. I have omitted large chunks, mostly the parts I consider verification, but if you’re following along I would recommend at least mentally checking off properties. Dummit and Foote actually introduce quotient groups using homomorphisms rather than defining normal subgroups and then quotient groups but while I think that’s a reasonable approach I don’t think it’s how most people first learned the concept. I went with the traditional approach of beginning with normal subgroups.

Anyway, let’s get on to the material. First a brief and common lemma will make my life a bit easier:

Lemma 1When is a group and , we have

Proof: If then for some , and we have , so . Conversely if then for some we have so .

As a last comment, in case this notation isn’t standard, Dummit and Foote use to say is a subgroup of , to say is a normal subgroup of , and to say is isomorphic to .

Theorem 2 (The First Isomorphism Theorem)Let be a surjective homomorphism. Then and . For ease of writing we’ll let be .

Proof: First we’ll verify that .

Closure: For we have , so .

Identity: Since for any we have , and so .

Inverses: For any we have , but since we also have that and so .

To show that we just need to show that where is well-defined, a homomorphism, injective, and surjective.

Well-defined: If , then by lemma 1 and so , , , and finally , so which means that is well defined.

Homomorphism: Since , is a homomorphism of groups.

Injective: This amounts to reversing the proof that is well defined: We have , so we can conclude by lemma 1 and so is injective.

Surjective: For any there exists a such that since is surjective, and so , so is surjective.

Dummit and Foote also point out an immediate consequence of this theorem that sort of sheds some light on its utility:

Corollary 3If is a group homomorphism then is injective iff .

Proof: If isn’t surjective then we need only consider its image, so we can just assume that it is. If is injective then it’s a bijection so the statement is obvious. If then and so is an isomorphism, so it’s injective.

A picture I like to maintain in my head looks like this:

Where is , , is defined as , and is defined as . We can think of as , and while this doesn’t immediately appear helpful, in the case where we can see that this is actually a lot of information about , since it fixes , in this case . By relating to this composition we preclude the idea that it might be mapping elements in an unequal fashion because of Lagrange’s theorem. In my mind I think of this as saying that the set of homomorphisms on are in some sense equivalent to the set of normal subgroups, although this is mostly speculation.

The next two theorems heavily utilize the first, so hopefully the result makes sense. We’ll see some application

Theorem 4 (The Second Isomorphism Theorem)Let be a group and let with normal. Then , , and .

Proof: The first two claims are a matter of verification, so I’ll skip those. As a result of those claims those we know that is a normal subgroup of so is well defined. For the other quotient group is normal in by assumption and since it’s also normal in (note that this is not guaranteed for some where ).

Here’s an outline of what we’re going to do. We want to show that , and we know that . So it would be ideal if we could define a such that . If we do this then by the first isomorphism we’ll be done.

An obvious guess at how to define is to let . Since all our quotient groups are legitimate this is well-defined and everything, and it’s easy to show that it’s a surjective homomorphism. Really all that remains to be shown is that . The identity of is , so the kernel of is the set of elements where . By lemma 1 this implies that , and so the kernel of is the set of elements in but also in by definition, so . So we have a such that with , so by the first isomorphism theorem we’re done.

I was actively trying to apply the first isomorphism theorem in that proof. I’m sure there are more direct ways to prove it, but I don’t think this one is too bad. It basically illustrates a situation where if you define a homomorphism you can make some non-obvious statements about the structure of . In this case the second isomorphism theorem was here to tell me how to fit the pieces together to make the result seem nice, but the idea is more general. Here’s the relevant part of the lattice diagram of , where quotient groups formed by subgroups joined by the red lines are isomorphic.

Dummit and Foote also call this the diamond isomorphism theorem. The third isomorphism theorem, stated next, is another example of showing isomorphism by defining the appropriate homomorphism.

Theorem 5 (The Third Isomorphism Theorem)Let be a group with and . Then and .

Proof: Again I’ll leave it to the reader to verify the first part. The last part is more fitting pieces into the first isomorphism theorem. Let be defined by which can be verified as a surjective homomorphism. Then is the set of cosets of the form such that , by again by lemma 1 this implies that , so the kernel is the set of cosets of the form , better known as the elements of , and so by the first isomorphism theorem we’re done.

Putting aside the obvious jokes about canceling the , after the first isomorphism theorem we pretty much went back to our symbol-manipulating ways. We could work through some examples of groups but I think there is only limited utility in that. A real application of the theorems is more desirable, and these do come up in solvable groups if I recall correctly. This gives me an excuse to get sidetracked in the Galois theory proof that quintics are not solvable in radicals. Personally this is one of my favorite results in all of math, which is why I was looking for an excuse to go learn about it.

Some authors, Dummit and Foote included, have a fourth isomorphism theorem that for a group and normal subgroup shows a bijection between the subgroups of which contain and the subgroups . I don’t think this is any less useful than the other theorems, but I think enough algebra has been covered for our purposes. Next time I will write a little bit about definitions in topology, a subject I have no more than a week of formal education in, so despite the elementary level it will be a challenge. After that I will try to talk a little about Hilbert’s Nullstellensatz and basis theorem (which I am only marginally more educated in), and then we should be ready to at least attempt a foray into our end goal of algebraic geometry.

Sources:
[1] Dummit and Foote. Abstract Algebra, third edition, pages 97-99. Published 2004.
Used for the statement of isomorphism theorems.

I am not much of a mathematician, and I think it is unlikely I will ever be. I know some basic facts about analysis and algebra but certainly nothing beyond that. This makes me a little sad. Recently I’ve been trying to read Hartshorne’s book on algebraic geometry. Unfortunately given the extremely suspect state of my intuition combined with some shoddy recall, this has been a struggle to say the least. I have a tendency to bounce around from subject to subject a bit much though, so this time I will try to make a concerted effort to keep with it, even if that means I have to go back and re-learn basic math. To that end I’m going to write a few posts about algebra and how I attempt to understand the structure. There is absolutely nothing novel about this and it is more for me than for anyone else, but if I am able to pursue algebraic geometry in more depth I also hope they will serve as a small reference for any readers wishing to follow along. Also in this vein it should be noted that any general comments I make should be taken with heaps of salt as they are mostly just guesses about what I think may be important.

I won’t start entirely from the beginning even though that wouldn’t be entirely useless to me. Here are the Wikipedia pages for groups, rings, ideals, and fields. In the introduction of page on fields you can see the chain of algebraic structures. These names sometimes come up, but since I personally don’t do a good job of keeping them straight I’ll try to mention definitions when they do. We also will skip over the definitions of normal subgroups, quotient groups, and homomorphisms. Algebraic geometry (at least in my limited understanding) has commutative algebra as a prerequisite so the article on commutative rings is also relevant, but I’ll actually write a bit about that myself. To clarify one possible issue, I will always assume that rings always have a one.

A very fair question is what applications algebraic geometry has or what it even is about. To this I have no good answer. The Wikipedia page lists some areas it is useful in, but I must be clear that I have zero understanding of how it is applied in those fields. Hartshorne’s book itself makes a very brief statement but then returns to the question after the first chapter. I will attempt to do the same. At the very least I have been informed that it finds some application in biology, and despite my abhorrence of the subject, it’s hard not to find that a little interesting. With our minimal assumptions of knowledge I will write up a little bit about basic algebra and the most basic of definitions in topology before attempting to dive into Hartshorne. If you’re interested in a real resource, in school I worked out of Algebra by Michael Artin and Abstract Algebra by Dummit and Foote, both of which I thought were perfectly reasonable.

Even when I get to algebraic geometry my commentary will of course be less insightful than Hartshorne or any other author (or most math students). I hope there will be some value in watching someone struggle to learn the topic rather than reading the perspective of someone who knows it inside and out. As always, any comments are more than appreciated.

This wasn’t really my week in terms of results, but I decided to post what I was up to anyway. I always try to solve my problems without help from other sources. Sometimes I make up my own problems that I think shouldn’t be too hard and try to solve those by myself. Sometimes that works, but sometimes I am very incorrect about the difficulty of the problem.

I mentioned previously that I’m in CVX 101, an online class on convex optimization. The homework, unsurprisingly, involves using their software (called CVX, based in Matlab) to solve convex optimization problems. I unfortunately did not bother to read the users manual very carefully before attempting some optimization problems. I got a little frustrated because it wouldn’t let me use something like as an objective function, even though that’s clearly convex. If I had actually read the users guide I would have known that CVX only allows objective functions constructed from a pre-specified set of rules, and functions like the one I was trying didn’t fall into that category. For example in this problem I rewrote the function as , which is a little inconvenient but CVX accepts since the sum of squares of affine functions is always convex. I asked myself why they would make life so difficult, and I had a guess. I was pretty sure that determining the convexity of a polynomial in variables was NP-hard.

Doing anything with multivariate polynomials is hard in my experience. For example, determining whether a polynomial in variables has at least one real zero is NP-hard. We can reduce from -SAT. One of my professors insisted on stating decision problems in the following manner and I got used to it, so bear with me. Here’s the statement for -SAT:

INSTANCE: A boolean formula in -CNF form with variables QUESTION: Is the formula satisfiable?

And here’s the statement of the multivariate polynomial problem:

INSTANCE: A polynomial QUESTION: Does have a zero on

For a clause with literals , , map it to a term in a polynomial . Map the rest of the clauses appropriately, and let’s call the polynomial we formed . We see that has a zero if and only if our instance of -SAT is satisfiable. If -SAT has a satisfying assignment and is true in that assignment, let , otherwise let . Clearly a satisfying assignment produces a zero of . If has a zero then all the terms are zero because they’re all nonnegative, and a term is zero if and only if one of the components is zero, so one of the variables is zero or one. Reverse the mapping and we have a satisfying assignment. A last detail is that if we have a variable that isn’t zero or one in the root of , it doesn’t matter in the assignment so we can ignore it.

I’ve never gotten any real verification that thinking like this is a good idea, but when I see a reduction like that I generally think of the problem we reduced to as being “much harder” than the first problem. In this case we could have made all kinds of restrictions to ; we only needed a polynomial of degree or less and a very specific type of polynomial. Well, -SAT is already strongly NP-hard so determining zeroes in multivariate polynomials must be really, really hard, right? Of course in this case we know that the multivariate polynomial problem is in NP and so it’s capped in difficulty, so it sort of implies that going from the restricted types of polynomials to the rest of polynomials is actually easy.

Anyway, back to the convexity problem. Here’s a statement:

INSTANCE: A polynomial QUESTION: Is convex in ?

If is degree , , or this is pretty simple. If it’s degree one the answer is always YES because it’s affine, so convex. If it’s degree three then the answer is always NO. We need only consider the cubic terms. If we find a term of the form then taking the derivative with respect to twice will make it linear in , and so a diagonal element of the Hessian can be made negative, which implies that it’s not positive semidefinite by the extended version of Sylvester’s criterion. The same arguments works for terms of the form . If all the cubic terms are of the form then all diagonal elements are zero, and the principal minor using rows and columns and is negative since it’s on the diagonal and on the off-diagonal. As for degree two polynomials, we can just write it in its quadratic form plus some affine stuff. Then is convex iff is positive semidefinite, which we can either test for with Sylvester’s Criterion or old-fashioned high school math, row reduction. I’m not actually sure which is faster but they’re both polynomial time.

At this point I was feeling pretty good. I had knocked out a few cases, and all that remained was showing this for even degree or higher polynomials. In fact if you can show that deciding the convexity of a degree polynomial is hard then you can just tack on a term like for some which is of course convex and you’ll show that deciding the convexity of a higher degree polynomial is also NP-hard. I tried a bunch of stuff but nothing I did really got anywhere. After a few days I decided to give in and started googling for some answers.

It turns out that this is was solved somewhat recently. My guess that determining convexity is hard was correct. It’s proven here by Ahmadi, Olshevsky, Parrilo, and Tsitsiklis [1]. The general outline is that determining non-negativity of biquadratic forms, which are expressions of the form where and is NP-hard, as shown in [2]. That reduction uses Clique in [3]. In general I find it kind of difficult to make the jump from discrete structures to continuous objects in reductions. The actual paper is clear and shockingly easy to read, so I won’t summarize it here, but I recommend it. The authors in [1] showed the reduction from the biquadratic form problem to convexity, as well as a bunch of other results: the hardness of determining strict and strong convexity and a polynomial time algorithm for determining quasiconvexity.

At any rate, my plans for proving that determining the convexity of quartics is NP-hard were squashed, but I figured relaxing the problem to rational functions was an acceptable concession. After all, you can easily type a rational function into Matlab, so it’s still a concern for the coders of CVX. After a little bit I realized that there was a somewhat dirty way to do this problem as a reduction from the problem of determining whether a polynomial in variables has a zero. As far as Matlab is concerned, a function like can’t be evaluated at . Using this you can just take a polynomial in variables and determine whether is convex, since if it has no zeroes then the polynomial is which is convex and if it has one or more zeroes then it’s not convex according to Matlab because the domain isn’t even convex.

I don’t really like that reduction, so I kept searching. I also relaxed the problem even more, allowing myself to determine the convexity of functions with exponentials and logarithms. Unfortunately I am still on the hunt. If I do manage to find one I’ll edit it in here, but until then that’s all I’ve got.

Lately I’ve come across a lot of linear algebra. Apparently it’s pretty important. One topic I always thought was pretty fantastic was spectral graph theory, or analyzing graphs from their adjacency or similar matrices. I never really got a chance to look into it at school, but I started trying to understand some very basic results which I think are pretty fascinating. Of course, after I proved these results I looked at more organized approaches to spectral graph theory only to find that I had missed the main point entirely. That’s why I wouldn’t call the content of this post spectral graph theory. It’s more just facts about the Laplacian matrix.

For a simple graph on vertices we’ll define its Laplacian (I will drop the from now on) as the matrix with entries as follows:

This matrix has a lot of interesting properties. It’s obviously symmetric. It’s also singular, since it maps to because the degree is the only positive term and is exactly the number of s in each row. We can actually say more interesting things about its eigenvalues. First we’ll show that is positive semidefinite, then we’ll show that the multiplicity of the eigenvalue is exactly the number of components in .

The fact that is positive semidefinite is proven in a short and elegant fashion on Wikipedia. Naturally I disregarded it and set out to prove it on my own. This resulted in a proof that I am not really pleased with since it unnecessarily makes use of some pretty high-powered ideas to prove what is probably a very simple result.

At any rate, there are two facts we’re going to have to use that I won’t prove here. The first is Sylvester’s Criterion, which states that a symmetric matrix is positive definite if and only if its leading principal minors are positive. To review in case you forgot (because I did), a minor is the determinant of the matrix you get when you cut out a set of rows and a set of columns. A principal minor is the determinant of the matrix you get when the indices of those rows and columns are the same. A leading principle minor means you cut out the rows and columns from to (i.e. the upper left submatrix of size ).

The second fact we’re going to use is the extension of this to positive semidefinite matrices. It says that symmetric a matrix is positive semidefinite if and only if all its principal minors are nonnegative. I came across this fact when my friend linked me to his convex optimization homework here[1], where it is stated without proof. Unfortunately there doesn’t seem to be an elegant justification even assuming Sylvester’s criterion. Overall though it is a bit useless. Sylvester’s criterion at least provides a sort of computationally efficient way to test for positive definiteness. The extension clearly does no such thing, since it requires checking minors.

With those out of the way here’s how we’re going to proceed. We’re going to only deal with Laplacians of connected matrices for now and the extension will be simple enough. We will encounter matrices which are almost Laplacian but have least one diagonal element that is too large by an integer amount. Phrased another way, this is a matrix of the form where is a laplacian and is a nonzero positive semidefinite diagonal matrix with integer entries. I’m going to call this a super-Laplacian. Be warned that this is probably horribly non-standard, although it shouldn’t be.

Our first claim is that a super-Laplacian matrix is positive definite. This can be shown by induction on (the size of the matrix). The case where is trivial. The inductive step needs a simple fact. If we take a super-Laplacian (or even a normal Laplacian) and throw out the same subset of rows and columns we’re going to get a super-Laplacian matrix. The way I understand this is throwing out rows and columns is like deleting a set of vertices from the graph, except we’re not reducing the degrees of the vertices that had edges going into the set we just threw out. Futhermore, since we’re only working with connected graphs for the time being, a vertex in the set of vertices we threw out had to be adjacent to some remaining vertex (and so that diagonal entry will be too large), so our modified matrix is super-Laplacian.

Getting back to the proof, we see that all the principal minors of a super-Laplacian are positive by our assumption. All that remains to be shown to satisfy Sylvester’s Criterion is that the determinant of the matrix itself is positive. Any super-Laplacian can be constructed by adding to its diagonal one entry at a time from a Laplacian matrix. When we increase one diagonal entry from the Laplacian, what happens to its determinant? Consider the traditional expansion along a row or column containing that entry. We increase it (increase since we’re on the diagonal so the associated sign is positive) by the minor resulting from throwing out that vertex, i.e. the determinant of a super-Laplacian. By our assumption this is positive and so we added a nonzero quantity to the determinant. Further additions do not cause problems since we are similarly increasing by the determinant of a super-Laplacian. So our matrix has positive determinant and by Sylvester’s Criterion it is positive definite.

We’re basically done now by the extended version of Sylvester’s Criterion. A principal minor of a Laplacian is the determinant of a super-Laplacian matrix which is nonnegative, so the Laplacian of a connected graph is positive semidefinite. Extension to graphs with components follows by relabeling vertices (or switching rows and columns, depending on how you want to think about it) to get a block matrix of the form

where are positive semidefinite since they’re connected components. The block matrix is positive semidefinite, which can be seen straight from the definition. So all Laplacians are positive semidefinite.

The other result we’re going to show is a relation between the eigenvalues of the Laplacian and the number of components in the graph. What we just did already suggests as much, but we’ll show it a bit more explicitly. Let be the eigenvalues of ordered from least to greatest. Clearly they’re all nonnegative and there are of them (counting multiplicity) since is diagonalizable. We’re going to show that the multiplicity of the eigenvalue is exactly the number of components in (recall that every has eigenvector and so is always ).

If has components, then we can find linearly independent vectors in its null space. For ease of visualization we can relabel to form a block diagonal matrix as we did above, and then the independent eigenvectors are just with the appropriate sizes of blocks. So .

If has fewer than components, say components, then we can find columns (and rows) to throw out and we will have a super-Laplacian matrix, since we can throw out one vertex from each component giving us a block diagonal matrix of super-Laplacians. This is full rank, so our original matrix is of rank at least , so the dimension of the null space of is at most , and we can conclude that .

Like I said, this is far from the heart of spectral graph theory, but it’s a fun thing to play around with since the concepts are really basic and easy to understand. Another thing to note that I haven’t proven: the second smallest eigenvalue of is called the algebraic connectivity of . We saw a little bit of how that might work since if and only if is connected. Apparently a larger means it is more connected, although not quite in the sense of -vertex or -edge connectivity. That more or less sums up the basic facts on Wikipedia, but I’m sure there are many more simple observations to be made.

Sources:
[1] The material for this course helped me a lot in trying to approach this problem, and of course provided the extended Sylvester’s Criterion. I’m in the Stanford Online versionand I really enjoy it.

It’s been a while since I’ve posted so I figured I’d post something short to get back in the swing of things. I spent a week chasing some results in spectral graph theory without knowing anything about it. I still have a hope to find something there but first I need to fill in some gaps in my knowledge. Anyway, recently I came across this problem set. I really like all the problems but the question about caught my attention. I figured I would look into it.

Apparently this is also a standard model, but I hadn’t heard of it. A graph is generated by taking vertices and randomly placing edges in the graph. The most notable difference is that vertex pairs are selected with replacement, so there is a possibility of parallel edges. The problem from that assignment allows for self-loops, which is just insanity. I also don’t want to post solutions to other people’s problems. But mostly it is insanity, so we’ll stick to the more standard model.

Actually, some natural questions about this model are well-known results. We can ask for what value of do we get a double edge with high probability; this is just the birthday paradox. We can ask for what value of do we have a subgraph that is complete; this is coupon collector. Another question we can ask is if , what is a bound on the maximum number of parallel edges?

That last one probably isn’t as common but it was actually a homework problem for me once [1]. It was phrased as follows: If we toss balls into bins by selecting the bin for each ball uniformly at random, show that no bin contains more than balls with high probability, more specifically .

This an exercise in Chernoff bounds, which are essential in basically every randomized algorithm I’ve ever seen. It places bounds on the probability that a random variable with a binomial distribution is really far from its mean. Actually it can be the sum of negatively associated random variables, but that’s not really important here. The really nice thing is that the bound is exponentially small. There are a bunch of different forms but they’re all derived in a similar manner. The version we’re going to use is best when we want to bound the probability that a random variable is really far from its mean. Moving past this extremely technical terminology, the bound states that .

The overall plan, as per usual, is to bound the probability that any bin has more than and then complete the problem by union bounding. For this to work we want to aim to bound the probability a given bin by . We let be the number of balls in a given bin, is if the th ball is thrown in the bin, otherwise. Clearly and the are independent, so we can apply Chernoff. I’m also going to be really lazy with constants, it’s definitely possible to be more precise. We see that , so if and then

That last step wasn’t obvious at all to me, so here’s a short justification. Let and

At this point we’re done, since the union bound implies that the probability any bin has more than balls is less than and so the probability that no bin has exceeded is .

That didn’t actually have anything to do with the random graph model specifically, I suppose, but the balls in bins idea has obvious applications elsewhere, like hashing and job assignments. Returning to , we’re just going to look at the number of isolated vertices in a very brief and incomplete fashion.

A vertex is isolated with probability since it’s basically repeated attempts to hit one of the edges out of possible. Thus the expected number of isolated vertices is . If we let and take the limit we see that

and so if there is almost surely no isolated vertex. If this looks familiar it’s because it’s basically the same thing as . This is kind of obvious anyway. At least from the perspective of isolated vertices the fact that you can have multiple edges makes very little difference. The only change is the probability of getting an edge adjacent to a vertex is now fixed and we’re altering the number of attempts. The variance computation also appears to be identical, so I won’t put it here.

When I was trying to make my own modifications to for weighted graphs I was trying all kinds of strange things. I tried generating an edge with probability and then selecting a weight from with different distributions. Strange things happened. This is also a pretty nice and easy to understand model. I’ll come back to it once I have learned something about spectral graph theory.

Sources:
[1] I looked around for the website of the class I took the balls and bins problem from, but I can’t seem to find it. The class was Probability and Computing, 15-359, taught by professors Harchol-Balter and Sutner in the spring of 2012.

One of the popular models for random graphs is Erdos-Renyi. We generate by taking nodes and between every pair generating an edge with probability . This is nice because it makes computing properties of the graph really easy. The downside is that very few structures actually follow this model – for instance in social networks if and are adjacent and and are adjacent, then and are more likely to be adjacent. This is clearly not the case in . It’s a shame, but at least it’s fun to play with.

A monotone property of a graph is one that is always preserved by adding more edges. For a monotone property then it is a natural question to ask for what values of does have with probability one as goes to infinity? Since the property is monotone there will be some function of at which point almost surely has property . There are in fact sharp thresholds for many properties (as always for a more complete discussion of this , check out [1]). We’ll focus on connectivity, clearly a monotone property.

We’re going to take a shot at bounding the threshold value from below. Our first approach is an extremely loose bound. Let be the random variable representing the number of edges in (we’ll just call it from now on). If goes to zero then with high probability has no edges at all. This is evident from Markov’s (Hopcroft and Kannan call it the First Moment Method). Since , if drops to zero then the probability there are any edges at all drops to . Since the expected number of edges in the graph is , we see that if where then . So if when , almost surely has no edges and is therefore disconnected with high probability.

Actually we can do a little better than this even with our super-naive approach. We don’t need to drop to zero. A graph is disconnected if it has fewer than edges, so if goes to zero, is almost surely disconnected. Therefore all we need is that . This revision suggests that will do the trick since , and so . For this drops to so is almost surely disconnected.

It should come as no surprise to find that this is still a terrible, terrible bound. There’s a long way between having edges and being connected, since a graph with randomly placed edges is very unlikely to be connected. We can achieve a marginally better bound without doing too much work though. A graph is connected if and only if it has a spanning tree. If a graph almost surely does not have a spanning tree then it is disconnected with high probability. Let be the number of spanning trees in . If drops to then almost surely does not have a spanning tree and so is almost surely disconnected. We know . If then , which goes to . So our new lower bound for is , better by literally an infinitesimal amount.

The actual optimal lower bound for is where . In fact, connectivity experiences what is called a sharp threshold. That is, there exists a such that almost surely is disconnected for where and is almost surely connected for . The proof is a little messy for my tastes, so we’ll just finish the process of bounding from below. The following proof is just taken from Hopcroft and Kannan.

We will now investigate the expected number of isolated vertices (vertices with degree ). Clearly if almost surely has an isolated vertex then is almost surely disconnected. Let be the number of isolated vertices, so . If , then , so . So if then almost surely has no isolated vertices.

Unfortunately this is not actually enough to tell us that for there are almost surely isolated vertices. It could be that most isolated vertices are all in some small set of graphs and the rest all have no isolated vertices. To fix this we need what Hopcroft and Kannan call the second moment method, which is basically using Chebyshev’s to show that if goes to then is almost surely not zero since . An immediate consequence of this by the definition of variance is that if then is almost surely greater than zero.

This is generally a little messier since variance computations are not quite as nice as computing the mean, but in this case it isn’t too bad. We want . Splitting it up into the typical indicator random variables gives us . Since , that part is . There are terms in the second sum and each term is where we avoided double-counting the edge between vertices and . Now we can just compute , substitute in where and send it to infinity.

So almost surely has an isolated vertex is .

As mentioned before, if and , it is possible to show that is almost surely connected (instead of that it almost surely has no isolated vertices). This is shown by Hopcroft and Kannan by considering the “giant component.” In as increases a single component contains most () of the vertices, and the rest are isolated vertices. Once is where the isolated vertices disappear because the giant component has swallowed the whole graph.

Erdos-Renyi is an interesting model to play around with, mostly because it’s the only model I can easily follow most of the math for. For a while I was trying to apply some basic spectral graph theory techniques to it, but I couldn’t make it hold up except in the most basic of facts. For instance the expected Laplacian of is obvious, and if you apply Kirchhoff’s Theorem to it then you get the expected number of spanning trees. What I couldn’t do was figure out the expected second smallest eigenvalue, which is zero if and only if is connected. Perhaps a little more creativity was needed on that front. Hopefully I get a chance to review it.

A while ago I came across this problem from Stanford’s Putnam seminar:

Two points are independently selected uniformly at random from a line of length . What is the probability that they are at least a distance apart, where ? (Putnam 1961 B2)

This problem has a very neat solution which exploits what I find to be a rather clever technique. We put the line on a coordinate system and say it runs from to . Let be the coordinate of the first point and be the coordinate of the second, where and are random variables which are clearly uniformly distributed. Now draw the unit square, as in the following diagram [1]:

The shaded region represents the area in which the points are at least apart. It has area , so the probability the two points are at least apart is simply .

This trick can be applied to a whole host of problems. As far as I can tell the only requirement is that we are sampling from uniform distributions. One somewhat classic problem is to find the probability density function of a random variable where . By the same trick we can compute the CDF of , we just need to split it into cases:

The CDF is therefore when and when . Differentiating gives us the triangular distribution when and when .

Looking at the last post I thought an interesting idea would be to generalize this to uniform random variables. This is of course a solved problem: the density function is known as the Irwin-Hall distribution. I wanted to take a run at it from our high-dimensional geometry perspective, since we can apply the same idea as above to the unit cube in dimensions to compute the CDF of Irwin-Hall.

Before we attack this we need to know how to compute the volume of the standard -simplex, or the volume of the set where the s are nonnegative. We’ll follow the explanation given in [2]. First let’s consider the simplex in defined by and the origin. This is just the set . We can fit exactly of these in the unit cube since there are such orderings and each point in the unit cube is in one of these simplices (I’m fudging some issues with the boundaries but it should be fine I think). Thus the original simplex has volume . We can map this simplex to the standard simplex by the transformation . This is a transformation with determinant one so the volume remains unchanged, so the volume of the standard simplex is .

I drew some pictures of the three dimensional case since some interesting things happen there. We’re keeping track of the volume underneath the plane that’s contained within the unit cube. Similar to the two-dimensional case, our function will be piecewise since it changes when it hits vertices of the cube. First our simplex increases at an obvious rate, maintaining a volume of . It looks kind of like this:

Then the function changes when passes . The way we compute the new volume is by taking the volume of the red simplex and subtracting out the volume of the blue simplices (outlined in the following figure), which gives us .

Finally the function changes for a last time when passes . We could just compute the volume of the missing simplex and subtract that from one, but we’ll try a method that works a little more generally. Note that the blue simplices have overlapped to form new simplices which are outlined in green below. Our method of computation is to take the red simplex, subtract out the blue simplices, and then add back in the green simplices. It’s basically inclusion-exclusion. This gives us .

This appears to generalize in a fairly simple manner. Note that our function changes based on the integer part of . The first piece of our function is naturally just . When hits , the plane hits exactly vertices (more precisely it hits ). Our volume computation then requires that we subtract out the newly formed simplices. These have side length and there are exactly of them, since one forms at each vertex.

If then we see that we will be computing . Each time crosses an integer the plane hits vertices (choose which of the coordinates are ) and exactly that many new simplices are formed to be added or subtracted out, depending on parity. Thus our CDF is exactly

And differentiating with respect to gives us our PDF

Admittedly there are easier ways to go about proving the expressions for the PDF or CDF are valid, but I like this one since it minimizes the amount of “clairvoyance” involved. As a whole I find this trick with the unit cube to be fairly useful whenever solving problems with uniform distributions. I’m not sure of legitimate practical applications but it seems to come up in contest math from time to time. Hopefully someday I will spot an opportunity to use it in a real-world setting.