There are two options: you can write an essay related to a topic covered
in class, or you can solve a problem that requires mathematical thinking.
In the list below, the first category is marked with (Essay), the second with (Math).
Regardless whether you take a more mathematical project or
an essay, it is important to not just rehash things you found, but
to submit a serious effort on your part.

For essays, we expect a text of about 3,500 words (or more!).
(If you have the wonderful gift of writing succinct yet an intelligent, cogent essay that illuminates many aspects of the topic you picked, you are welcome to hand in a shorter essay. Experience has taught us, however, that students who feel they need much less space typically need to do some more research. Whatever you do, if your text is shorter than our expectations, don't start padding it, PLEASE.)

For mathematical projects we want to see something that is related to
the material covered in class, but was not actually covered in the
class or in the labs. Again, we ask that you prepare a substantial
report, explaining what the problem is you looked at, how it fits in
with a topic we saw in class, and then explaining clearly what you
figured out, and how you did it. If there are some conclusions to
be drawn, give them! There is no length minimum for a mathematical project.

You can also do something that is intermediate, where you need to
explain some mathematical concept to make your point, and where you then
go on to a more general discussion. In this case our length expectations apply again.

If you pick a topic not on the list below, you MUST clear it with Professors
I. Daubechies or S. Hughes BEFORE the midterm break.

In every case, you MUST mention all the sources you have used for your paper.

Pseudo Random Digits Generation (Math).
Very often, scientists need random numbers for their computer programs,
for example, when they are simulating random processes that occur in
nature. Since computers don't actually do anything randomly, it is
impossible to actually get a random number from a computer. But for many
applications, a set of numbers that seem random are good enough, so we
settle for that. A computer does possess a pseudorandom number generator,
which is just a program that generates numbers that seem random in many
respects, but are not, in fact, random.

Let us use modular arithmetic to try to generate a sequence of random
digits. We will use the prime number p = 1069 in our computation. We will
use the number 342 as our "seed", or starting point. Now we generate the
our first three digits. We calculate seed * seed (modulo p) = 342 * 342
(mod 1069) = 443. Then we generate the next three digits by multiplying
the result 443 by the seed 342 modulo p, and we will get 777. Then we
continue: we multiply each result by the seed modulo 1069 to get the next
3-digit entry of the sequence. In case we obtain a number below 100, we
add leading zeros to get three digits total. In case we get a number more
than 999, we disregard this number and continue (that is, we do not use it
in our sequence, but we still multiply it with 342 to obtain the next 3-digit number).

Explain why we want to add leading zeros to numbers with fewer than
3-digits, and explain why we disregard numbers with more than 3 digits.
That is, explain why the digits would be less randomly distributed if we
didn't follow these procedures.

Explain why this pseudo random digit generator will always end up
generating digits in cycles. (That is why it is a *pseudo*-random digit
generator.)

Try different seeds. Find seeds that generate cycles of different lengths.
Explain how we should try to choose seeds to make the cycle longer.
Explain which seeds we need to avoid, so as not to get easy patterns.

Wilson's Theorem.
Let p be an integer greater than one. p is prime if and only if (p-1)! = -1 (mod p).

This beautiful result is of mostly theoretical value because it is relatively difficult to calculate (p-1)!
In contrast it is easy to calculate ap-1,
so elementary primality tests are built using Fermat's Little Theorem rather than Wilson's Theorem.
For example, the largest prime ever shown prime by Wilson's theorem
is most likely 1099511628401 (as far as we know), and even with a clever approach to calculating n!,
this still to about one day on a SPARC processer. But numbers with
tens of thousands of digits have been shown prime using a converse of Fermat's theorem in a less than an hour.

Explain why Wilson's theorem is true.
Make an estimation of how much time proving primality with Wilson's theorem can take.

There are several primality tests there which are good for special case for the
number n (the primality of which we would like to prove). Pick one of the test and explain how it works.

There are some probabilistic tests there. Pick one of them and explain how it works.

Search the Internet and find the biggest known prime. How many digits does it have? What is special about this number?

Explain why the biggest known prime has much more digits than one of the numbers
nobody can factor with present-day factoring algorithm.

Secret Sharing Schemes (Essay).
Imagine the government trusted you and 15 other people to do a secret
project this Saturday. To enter the extreme-security building you work in,
you and your co-workers must enter a secret number, call it X, into a
computer on the door of the building. You get a secret key (which is
different from X), and your co-workers get their own secret keys, but the
government doesn't trust any one of you to enter the building alone, so
for security reasons, you must get together with at least 10 people with
different keys (one or two of the 15 people might be sick, so you'll still
be fine without them) and do a computation together to figure out the
secret key to the door. The computation won't work without all 11 keys.
What sort of computation should the government set up for you to do?

"Secret sharing schemes were discovered independently by Blakley and
Shamir. The motivation for secret sharing is secure key management. In
some situations, there is usually one secret key that provides access to
many important files. If such a key is lost (for example, the person who
knows the key becomes unavailable, or the computer which stores the key is
destroyed), then all the important files become inaccessible. The basic
idea in secret sharing is to divide the secret key into pieces and
distribute the pieces to different persons in a group so that certain
subsets of the group can get together to recover the key."

The security scheme of Shamir is based on polynomial interpolation. Please
describe in detail how and why this scheme works, and give an example of a
set of 4 keys which together encodes a message using Shamir's scheme. If
any of these keys are missing, one should not be able to decode the
message! For some good descriptions of this scheme, please check out:

"The good news about cryptography is that we already have the algorithms
and protocols we need to secure our systems. The bad news is that that was
the easy part; implementing the protocols successfully requires
considerable expertise. The areas of security that interact with people --
key management, human/computer interface security, access control -- often
defy analysis. And the disciplines of public-key infrastructure, software
security, computer security, network security, and tamper-resistant
hardware design are very poorly understood. Companies often get the easy
part wrong, and implement insecure algorithms and protocols. But even so,
practical cryptography is rarely broken through the mathematics; other
parts of systems are much easier to break. The best
protocol ever invented can fall to an easy attack if no one pays attention
to the more complex and subtle implementation issues. Netscape's security
fell to a bug in the random-number generator. Flaws can be anywhere: the
threat model, the system design, the software or hardware implementation,
the system management. Security is a chain, and a single weak link can
break the entire system. Fatal bugs may be far removed from the security
portion of the software; a design decision that has nothing to do with
security can nonetheless create a security flaw."

Please write a 10 page essay describing 5 or 6 incidences that involve
security flaws in computer systems. Please describe in detail how the
hacker broke into the system, and whether or not these break-ins were due
to mathematical flaws in the encryption scheme. Please try to focus your
discussion away from specific hacker's personal lives and try to
concentrate on the way these hackers were able to bypass security systems
by finding flaws in the current system. Try to talk mostly about technical
problems with the systems and touch only briefly on social aspects.

Just to satisfy your curiousity, one hacker you might want to read about
(but only briefly mention in your essay!) is Kevin Mitnick, AKA Condor. He
did a lot of hacking by tricking people into giving him passwords and
codes. It's generally known that the weakest link in security are the
employees, not the computers! Here's a little paragraph about him from the
following website: http://www.takedown.com/bio/index.html

"That break-in occurred over Memorial Day weekend in 1981, when Kevin and
two friends decided to physically enter Pacific Bell's COSMOS phone center
in downtown Los Angeles. COSMOS, or Computer System for Mainframe
Operations, was a database used by many of the nation's phone companies
for controlling the phone system's basic recordkeeping functions. The
group talked their way past a security guard and ultimately found the room
where the COSMOS system was located. Once inside they took lists of
computer passwords, including the combinations to the door locks at nine
Pacific Bell central offices and a series of operating manuals for the
COSMOS system. To facilitate later social engineering they planted their
pseudonyms and phone numbers in a rolodex sitting on one of the desks in
the room. With a flourish one of the fake names they used was "John
Draper," who was an actual computer programmer also known as the legendary
phone phreak, Captain Crunch, the phone numbers were actually misrouted
numbers that would ring at a coffee shop pay phone in Van Nuys."

"Digital watermarking, sometimes called "fingerprinting," allows copyright
owners to incorporate into their work identifying information invisible to
the human eye. When combined with new tracking services offered by some of
the same companies that provide the watermarking technology, copyright
owners can, in theory, find all illegal copies of their photos and music
on the Internet and take appropriate legal action."

Please create your own "digital watermark" by altering 4 images in a
way which is imperceptible to the human eye (unless one knows where to
look), so that the new image contains your watermark. The watermark must
have the following properties:

One should not be able to identify the presence of the
watermark, even if one zooms in very closely (unless of course, they know
how you did it).

If one alters the image slightly to try to remove the
watermark,
they should not easily be able to do it.

Please write an essay explaining exactly how you implemented this
watermark (please describe your algorithm in detail) and describe how you
would test whether or not an image had been watermarked by you. Also,
please tell us how one might destroy your watermark in a way that would
only alter the image slightly (perhaps by changing the brightness, size,
adding random noise, rotating the image, or other). In other words, how
could someone destroy your watermark even if they didn't know what it was
(does it have a weakness)?

Currently, the game is set up so that the first player can always win
using the strategy (if the amount of objects in the three piles do not
parity add to 0 at the start of the game). Please answer the following
questions about Nim.

1) True or false: If there were 4 piles instead of 3, the first player
could still always win. If it's true, explain why, if it's false, give a
counterexample of a game where the first player always tries to play by
the strategy but still loses.

2) True or false: If there were 3 players instead of 2, and assuming
players 2 and 3 don't play by the winning strategy described and assuming
players 2 and 3 both want to win, the first player almost always loses if
she or he plays by the strategy. Explain why or why not.

3) If we change the rules a little bit, so that the last player who picks
up a piece loses (instead of wins), is it possible to alter the first
player's strategy so that she or he can still win?

The webpage http://www.cut-the-knot.com/front.shtml actually has a lot of
games that are Nim in disguise! Pick 2-3 of these games (i.e. Nimble,
Northcott's Game, Plainim, Scoring, or Turning Turtles) and describe in
detail why these games and their winning strategies are the same as Nim.
(You must describe the winning strategy!) Also, for each game you pick,
play a sample game between you (who knows the winning strategy) and an
opponent (who doesn't know the strategy), describing every move and how
you knew to make that move. Draw a diagram of the game board after every
move that is made to show us what you mean, and after each diagram, write
down how you knew to make that particular move.

Error Correction and Compression

Hamming Codes (Math).
In class we discussed the 4-7 Hamming code, and we used the rules (as inspired by Venn diagrams in the movie):

There is also a variant on this construction, on which one uses a slightly different correspondence.
In this variant, a 4-bit sequence d1 d2 d3 d4 gets translated to the 7-bit codeword d1 d2 d3 d4 d5 d6 d7 with:

d5 = d1 d2
d4

d6 = d1 d3
d4

d7 = d2 d3
d4

The advantage of this method is that there is a very neat way to check whether a given 7-bit
string is a codeword or not; if it isn't, it also
gives a neat trick to find the digit that should be changed to revert back to a codeword.
Here is this trick: given d1 d2 d3 d4 d5 d6 d7,

1) Compute the 3-digit string c1 c2 c3 as follows:

c1 = d4 d5
d6
d7

c2 = d2 d3
d6
d7

c3 = d1 d3
d5
d7

2) If c1 c2 c3 turns out to be 000 then d1 d2 d3 d4 d5 d6 d7 is a codeword.

3) If c1 c2 c3 is not 000, then it gives, in binary, the label of the d-bit to be changed.
For instance, if c1 c2 c3 is 101,
then d5 (and only d5) should be changed (because 101 is the binary notation for 5); if c1c2c3 is 011,
then d3 (and only d3) should be changed (because 011 is the binary notation for 3 -
the leading zero could be disregarded).

The assignment for this midterm topic is to

check that this alternate 4-7 Hamming code, with its neat correction trick, works.

this way of Hamming coding is really constructed by requiring that the trick with the c1 c2 c3 works,
and deriving what then the formulas for the d5, d6, d7 (and c1, c2, c3) should be.
Use the same idea to construct the 11-15 Hamming code, with the corresponding neat correction procedure.

Miraculous Compression (Essay).
In April of 1992, a small technology company announced that they had
developed a compression algorithm that could compress all files of more
than 64K in size to about 1/16th their original size. They claimed that
the compressed output file created by their algorithm can be used as
the
input file to subsequent executions of the program. In their words:

This feature of the utility is known as recursive or iterative compression, and
will enable you to compress your data files to a tiny fraction of the original size.
In fact, virtually any amount of computer data can be compressed to under 1024 bytes using [the
method] to compress its own output files muliple times. Then, by repeating
in reverse the steps taken to perform the recusive compression, all original data can be decompressed
to its original form without the loss of a single bit.

Sounds like a great idea. Try it with the LZ applet used in the
Compression Lab and with a standard compression program (e.g., WINZIP).
That is, compress a text/file and then try compressing the compressed
text/file. For the applet, this can be done by copying the compressed
text from the compression window into the (cleared) input text window. Does
the compression work a second time? Try the process again with the doubly
compressed text/file. And again. What happens?

There is a fundamental mathematical principle at work here. This principle
is one of the foundations of a branch of mathematics called Information Theory.

It is possible to show (using the mathematics you've learned in Math
Alive), that the 1992 claim is clearly false. Let us assume the claim is
that there exists an algorithm which can compress every possible string of
64K = 64*1024*8 bits = 524,288 bits into a string of just 64*1024*8/16
= 32,768 bits. We can show that this is impossible using a simple
counting argument.

Despite the fact that we can show such claims as the ones made by the
company are ludicrous, they are made with alarming regularity.
In fact, a number of U.S. patents have been issued for schemes which we
can prove could never work. For example, in 1996, patent #5533051 entitled
"Method for data compression" was granted which claims a method "for
reducing an input string by one bit regardless of the bit pattern of
the input string." This can also be proven false via a counting argument.

In your paper,

Show that the company's claim from 1992 cannot be true.

Show that the 1996 patent's claim cannot be true.

Discuss why it is difficult to compress random data as compared to English text.

Experiment with the LZ applet (or WINZIP) compressing large English
texts and estimate the redudancy of English text encoded using
ASCII. Describe your experiments and justify your estimate.

Estimate the redundancy of other languages (e.g., French, Spanish,
German, Hebrew) using the same technique. Which language did you
find is the most redundant? Why do you think this is the case?

Define in your own words what "entropy" means with regard to
information theory (the listed reference will be useful) and
discuss how it relates to the redundancy of a language.

Describe some of the techniques used to estimate the redundancy or
entropy of English in the listed reference and relate those
estimates to your estimates of redundancy.

The following reference will be useful:
Silicon Dreams, Robert Lucky

Bar Code Schemes (Math).
Find some bar code schemes that we did NOT discuss (and that are
not written up in the notes), and try to find out how they work. Don't
forget to discuss how effective the check digit system is, for several
types of mistakes! You could also find another check digit system, figure
out what the formula is for the check digit, and discuss its effectiveness.

Probability and Statistics

"Chance" (Essay).
"Chance" is a magazine about statistics and its uses in society, from government issues to sports
(http://www.public.iastate.edu/~chance99/).
The feature articles demonstrate the use (and sometimes misuse) of statistical analysis of data in the social,
biological, physical and medical sciences.

Choose a topic that interests you from one of the "Chance" articles. These
articles are often followed up with Letters to the Editor offering counterpoints,
as well as rebuttals from the original authors. Follow one such chain of exchanges.
Consult the original cited sources, and write your own Letter to the Editor.

To illustrate the wide range of topics covered, here is a list of the
feature articles in issues of a few years ago:

STATISTICS FOR HOMELAND DEFENSE by David L. Banks

DOES HAVING BOYS OR GIRLS RUN IN THE FAMILY (Fall 2001) by Joseph
Rodgers and Debby Doughty

SURPRISES FROM SELF EXPERIMENTATION by Seth Roberts with discussion
by
Robert Rosenthal and Donald Rubin

THE EFFECT OF ADMISSIONS TEST PREPARATION: EVIDENCE FROM NELS88 by
Derek Briggs

Case 1. As chance would have it, you neither have, nor have access to, any coins. You do, however, have a fair die.
Explain how you would use the die to make your decision.

Case 2. You have one coin, which is biased in such a way that tails occur with probability close to 49%.
You would like to use it, but you do not want to flip it more than twice.
What can you do to make your probability much closer to 1/2, than the original 49%?

Case 3. You have only one coin, which is biased in such a way that tails occur with probability 1/3.
Can you use this coin? Explain how. Can you make the probability exactly 1/2?

Case 4. You have only one coin, which is, however, biased in such a way that tails occurs with probability 1/4.
Do you think you could use this coin to help you make your decision? If you can, explain how;
if you can't explain why.

Case 5. You have one coin, which is biased in such a way that tails occur with probability p. Can you use this coin?
Explain how.

Case 6. You have a biased coin, and you do not know the probability with which tails occur.
How can you use the coin to create the probability close to 1/2?

Case 7. You have a biased coin, and you do not know the probability with which tails occur.
You need to design a way to use this coin that allows a yes/no decision, where each alternative (yes or no)
has probability exactly 1/2. Because this may not be possible, you are willing to accept a plan in which
with low probability you cannot make the decision at all, but if you make it you will do so with
probability exactly 1/2 for each alternative.

Case 8. In another situation you have a perfect coin, but you need to make one decision out of three.
That is, you want to use it to generate probability 1/3 (you are allowed to make many flips). Can you do that? Explain.
Can you generate any fixed probability with one coin? Explain. Describe the probabilities that you can generate exactly.

Case 9. A classical puzzle is to design ways to "toss a fair coin over the phone".
That is, Alice and Bob agree that one of them will be in charge of a chore that neither really likes.
If they were in the same room, they would toss a coin for it, but they are at opposite sides of Princeton and
have to reach a decision over the phone. Bob first proposes to toss a coin and to tell Alice, but Alice doesn't
trust that he will relay the answer truthfully, and she refuses to use this method. Bob next proposes the following:
he will open the Princeton White Pages phone book for this year, he will read a phone number to her,
and she has to reply immediately whether it is on a left or right page. Bob then tells her the name corresponding
to the phone number, and she can look up for herself whether she was correct or not.
If she is correct, she wins the coin toss, if not, she loses. What do you think of this protocol?,
That is, would you use it to toss a coin over the phone? Is there a possibility for either Bob or Alice to cheat?

The Placebo Effect (Essay).
In 1955 Henry Beecher published a paper in the The Journal of the
American Medical Association entitled "The powerful placebo" which
essentially claimed that one third of people who take a suger pill
(but are told the pill is an effective drug) will report an improvement
in their condition. This originated the idea of the "placebo effect"
and for decades this effect was accepted as scientific fact.
Recently, the "placebo effect" has come under attack by a number
of physicians who claim that the "placebo effect" is simply due to a
misinterpretation of data, and may truly be just in our heads (and
not exist at all).

In your essay discuss the arguments for and against the "placebo effect".
What statistical analysis led to the "placebo effect"? What analysis
led to the questions of whether the "placebo effect" actually exists?
What lurking variables may cause the improvements often attributed to the
"placebo effect"? Can you think of any study that would resolve the issue?

You may find the following three pdf files and webpage useful (in
addition, the web has a great deal of information on this topic):

Lottery (Math).
The government of Chandan conducts a lottery every week. You can buy a card for one dollar listing the numbers from 1 to
49. You can cross out 3, 4, 5, or 6 numbers and submit your card.

When all the cards are collected the government takes 50% of all the money paid.
The rest of the money goes to a pot to be divided among the winners. The winners are decided during a
TV show when an innocent
child picks randomly 6 balls from a pile of balls numbered 1 through 49. If all the numbers you crossed
belong to the set of 6 numbers drawn, you are a winner.

What is the probability to become a winner if you cross out 3 numbers? Or 4? Or 5? Or 6 numbers?

The money in the pot is distributed among the winners in such a way that those who guessed 3 numbers
receive 10 times less than those who guessed 4, 100 times less than those who guessed 5 and 1000 times
less than those who guessed 6.

What is your expected return on each dollar you invest in playing this lottery if you
cross out 3 numbers? Or 4? Or 5? Or 6 numbers?

Many people are playing this lottery every week. There are always winners who selected 3 or 4 numbers.
But winners who select 5 or 6 numbers do not occur every week. Suppose that you know that this week nobody
guessed 6 correct numbers. What is your expected return on 3, 4, or 5 guesses? Answer the same question
if nobody wins on 5 guesses. What about when nobody wins on either 5 or 6 guesses?

The proportion of people who guess 3 numbers varies very much from week to week. Sometimes it is one out of 50,
sometimes it is one out of 500. Can you explain how can this happen?

A similar lottery was run for a while in Russia. Some players had found a way to make a living of this lottery
(without cheating). When the government noticed that some people won far too often, they started investigation
because they suspected mischief. As soon as the strategy of the frequently winning players became public, it stopped
being effective. Can you find an explanation for both the strategy and its loss of effectiveness.

Find some big lottery in the US with a jackpot. Describe it. Describe the probabilities of
winning with different guesses. How large should the jackpot be at the closing time
so that your expected return would be more than 100%?
Does it depend on the number of people playing that time? Suppose you buy millions of cards and cover all
possible combinations of numbers. For which jackpots would this be profitable?
Does this depend on the number of winners? For which jackpots is it profitable when you are the only winner?
When is it profitable if there are 2 winners? 3 winners? Can you estimate the number of winners?

Of course other suggestions you may have would be very welcome! Several
of you have contacted me with great proposals for topics, and if anyone
else still wants to do so as well, please get in touch with me!

As announced in class, the absolutely firm deadline for handing in the
midterm papers is at noon on Friday, March 27.