Stat 900: A Creative Way to Organize Your Final Project Report

The Usual Environment

Almost everyone in the class is either engaged in thesis research or expects to engage in thesis research very shortly. The way you write up your research for your thesis will depend a bit on the subject area, but it will almost inevitably follow what I call the "P to P+1" Model.

The Usual Model

One takes paper P and copies the bibliography except for the addition of a few more recent works. One reprises with modest variations the motivating paragraphs of paper P, and, most important, one explains how P+1 is new in comparison to P. One then does some honest work. Finally, one writes a conclusion that reminds the reader of the larger context for this work and delicately suggest work that may come later.

With few (but regular) exceptions, both good papers and bad papers follow this pattern.

Almost all mathematical progress comes from such evolutionary increments, and one cannot complain. Hadamard's proof of the prime number theorem was "proof mined" by Dirichlet in his study of primes in an arithmetic progression, and the world is better off for Dirichlet's efforts. What we call Central Limit Theory is an amalgam of hundreds of papers that followed the P to P+1 pattern.

Can We Try Something Different?

Here I encourage you to try something out of the usual mold.

You can forget about the introduction. You can forget about anything complicated --- or general, or new. You don't have to sell anything in the conclusion. Your bibliography is not a genealogy of the subject; you just cite what you use.

You can focus on being interesting and clear.

My bet is that if you do this once early in life, then when you go back to the "P to P+1" Model (as we all must do), then you will go back a stronger and wiser person. You will even have the chance of writing the kind of papers that people read, remember, and thank you for having written.

Try this Model --- for Our Class Only

I am decently confident that the best way for you to write a report that will be genuinely interesting to another reader is for you to begin with a simple problem.

The problem should be posed in the simplest way that you can possibly pose it, yet the problem should be one that you do not expect the reader to be able to solve right away.

Example 1: Show that in any sequence of n^2+1 real numbers, there is either a monotonic increasing subsequence of length n+1 or a monotonic decreasing subsequence of length at length n+1.

Example 2: Show that in any graph with k^2+1 edges, there are either k+1 disjoint edges or there is a "star" with k+1 edges.

You can then give a picture or a numerical example to make sure that your reader understands the problem completely.

If you like you can also give a one or two linepep talk. For the second example, you can honestly say "The solution to this problem leads to generalizations that have had far reaching applications in complexity theory."

Next you coach the reader into seeing how one can solve this problem.

Once the first solution is complete, you propose the "next problem," make sure it is understood, and then you engage its solution. You repeat this same process until you have honestly and completely coached your reader into understanding a sequence of facts and ideas that are genuinely worth knowing.

Why I Like This

If you follow this model, you have a much smaller chance of deluding yourself about what is interesting or not. You also can see what insights are valuable versus what might just be technical clutter.

Also, by giving the solutions as you go, you are forced to be clear and complete. It is an added benefit that you have a solved problem behind you that you can use to provide honest motivation for the "next problem."

This process is modular. You can write it from the beginning to the end without having to do substantial recycling and revising. You can also stop when you have "enough." Or, if one line of thought runs out of steam before you have "enough" you can just start up with another new problem. No one says that they all have to fit together.

Finally, this process lets you focus all of your attention on what you find most interesting and most informative. A paper has an aim other than to inform and entertain can easily get trapped in issues that an honest person has to see as dull and uninteresting.

Following the Plan

I'll give you more examples of this paradigm in class. I can also work with you to put your material into this form. I think you will find the experience extremely useful. You might even start to feel sorry for those folks who never engaged any alternative to the ragged old P to P+1 model.

Putting Your Material Into This Form

Here you also start with a source paper P, but now the task is to put P into the form of some interesting problems and solutions. Along the way you are sure to have some original insights, but it is fine if these insights are just expositional.

The problem with which you begin will typically be harder to state than the toy examples given above. You may have to give some preliminary definitions, but you'll have to be very careful about this. You must give the reader just the background that is needed for the problem: no more and no less.

I promise that any paper that you honestly find interesting can be profitably recast using the "Problem-Solution" model. Moreover, if you do this reframing in a careful, thoughtful way, you will be astonished at how much more deeply you understand the paper.

Almost without trying, you will have set aside tons of fluff, and you will see that the fluff was not missed in the least.

You are very likely see that your source paper "just had one trick" and that trick could be completely mastered once you understood it in a simple, jargon-minimized context. This mastery may have eluded you if you had not gone through this rendering process.

If you happened to find a "two trick" or "three trick" source paper --- well that is both great and unusually lucky!

The process will have helped you to isolate those parts of your source paper that are "purely technical" i.e. the kind of messing around for which no insight ever emerges. These bits are irksome, but sometimes they are unavoidable. You may have to accept some deus ex machina, but please be reluctant to do so.

The real bottom line is that by following this processs will have gotten to the heart of your source paper. This is a very worthwhile thing to do before collapsing back into the P to P+1 world.

Smaller Tricks for an Improved Exposition

Decide on what matters. For example, many inequalities have the form X<CY, where C is a constant. If you can get the best C, then the value of C may matter. If you can't find the best C, then the value of C typically does not matter in the least. If a simple argument gives the inequality with a "bad" C, give that argument first. If it is instructive to work on improving this C, then (and only then) show how the proof can be "squeezed" to get a better C. This two-step process is infinitely preferable to just jumping to the more "cluttered" problem.

Always prefer simplicity over generality --- at least at first. Take your functions to be smooth; take your domains to be nice; take your random variables to have the moments that make life easy; assume that suprema are attained. Do whatever you can to make life easy. Do something non-trivial, then if you think there is still honest content in a more technical version of your problem, you can honestly go back and deal with the more technical problem.

Choose the right level of abstraction. Despite the preceding statement about generality, what we are really after is minimizing clutter and intellectual overhead. If what you are doing works in any complete metric space, it may be clearer to work in a general metric space than to work in some specific metric space. Still, typically we want to keep things as concrete as possible. One can always note at the end that "the same argument works in any metric space."

Optimize the Value of Your Displayed Statements (Lemmas, Theorems, Problems, etc.) In the best of all possible worlds, these displayed statements would be understandable by someone who has read nothing else in the paper. I often get interested in a paper because I see some deeply embedded fact that interests me. I don't have this opportunity if all of the lemmas begin with "Under the conditions of Theorem 6.2...". A a practical matter, we are often forced to give up some modularity, but you should work to keep as much as you can.

Choose Notation Thoughtfully --- And Conventionally --- with a Few Exceptions. I recently read an exposition by an (otherwise) talented expositor who works in differential equations. He had a Markov "transition matrix" that he chose to write so that the column sums were one (but row sums were not). I had to rewrite his argument to understand it. I was not amused. Still, if a field is not ancient (like Markov chains), you should feel free to improve the field's notation.

Avoid Confusing Collisions in Notation. In probability, tau is a typically a stopping time. If you use it as a constant, I have to keep reminding myself what you have done. Be aware of the "reserve uses" that certain symbols have in your field. In combinatorics, tau is a typically a covering number! As for epsilon and delta, well, if you let them go to infinity, you are beyond redemption!

Smaller Points on Thoughtful Notation:

Don't just choose function names and arguments thoughtlessly. Make conscious decisions. Make the names as memorable and as well-coordinated as you can.

Don't say "Let f(A) denote the area of A." Just use the notation area(A), etc.

If you have f and g and need a "similar pair" consider using phi and gamma. That is, exploit the natural isometry of the Roman and Greek alphabets when you can.

Take care with dummy variables and ranges of integration.

Use the best latex tools and use them wisely. Find out how the pros control spacing, set out matrices, set out exact sequences, use \eqref{}, etc.

Seeing (and Removing) "Amateurish Arguments". If you look at the work of experienced mathematical writers, you will seldom see a displayed equation of more than a few lines --- no matter how deep or complicated the argument. On the other hand, you will find Ph.d. theses that have multiple instances of displayed equations that cover an entire page. Why? My guess is that the student just banged in the notes for the proof and did not understand that something else was needed. For publication, arguments need to be presented for "informing" --- not just for "checking." Readers are most interested in learning new techniques. They don't get paid to checking your assertions. If they sniff something wrong, they won't dig into your argument, they'll start working on a counter-example. That is their shortest path to P+1.

Don't use words that are not defined. This rule is brutally violated in much of the applied mathematics literature, and people get by with it because there is a shared culture. Still, when one is careful about definitions, good things happen mathematically. In particular, one often discovers that an undefined word reveals a "collapsed distinction" --- a situation where one word is being used for many mathematical objects that are technically (or even conceptually) different. Sometimes we benefit from being vague, but to get your license for occasional use of vague words, you have to pay lots of dues. Much more often we benefit from drawing and refining distinctions. Synthesis is almost always easy; it's analysis that pays the rent.

Don't Tell the Reader Everything. When you study a problem deeply, you will discover many things that will simply confuse (or bore) most readers. Such discoveries may have cost you a lot of time, and they may be instructive to (some) experts. Still, you have to understand that such fine points are deadly dull to most readers. If you stuff them into your paper, you can spoil an otherwise excellent piece of work. Writers of fiction and non-fiction often find that to make their story move well they have to delete many paragraphs that they loved when they first wrote them. Good writers have the courage to delete such paragraphs. They call this part of the revision process "killing your babies" because it is so brutally hard.

Big Picture

I do hope that you can commit to writing a clear and interesting report. I am sure that everyone in the class is capable of writing one that would be a delight for me to read. It takes some investment of time, but the investment will pay dividends.

The hardest part of the whole process is to have the courage to be honest with yourself about what really is clear and interesting.