In which the author ponders the question, "If you admit that you are a hypocrite, are you really a hypocrite?" He then provides his honest commentary on a number of fascinating topics. He insists, however, that his readers form their own opinions.

My simulation matches up the teams that play in the NCAA bracket and uses one of the schmes below to generate a probability for a Monte Carlo simulation of games between the teams.

Probability scheme 1: Sagarin ratings only

The first simply uses the Sagarin ratings to create a probability of the team 1 winning. Probability = team 1 Sagarin /( Team 1 Sagarin + Team 2 Sagarin). I use the Predictor Sagarin Rating because that is what he suggests for predicting the score and outcome of a game. A random number from 0 to 1 which is less than the probability above means that team 1 wins, otherwise its team 2.

I calculated every team's probability of winning vs every other team and then plotted this vs the difference in seeds. A -15 means a 1 seed played a 16 seed. This scheme results in probabilities that only vary from 58% to about 50% for matchups between seeds with up to 15 difference to even. Unfortunately no 16 seed team has even beaten a number 1 seed so this scheme leave the games too evenly matched and does not reflect the history of outcomes in the tournament.

Simulation results with this scheme show the number of simulations out of 1000 that a given seed was the champion. The actual history is here. The results in the chart show far too high a probability that low seeds are the champion in the tournament in these simulations.

A histogram of the teams with seeds and the number of times they are champions in 10,000 simulations, shows that Kansas is the most likely winner, but the spread of the data even includes the unlikely play in winner at 16 seed as a champion. This simulation is unrealistic.

Probability scheme 2: Seed difference and tournament history only

Another approach is to use the seeds of the team in the tournament. With 25 years or so of data I captured the number of times a favorite beat an underdog based on the seed difference. For instance, never has a 16 seed beaten a 1 seed, while 8 vs. 9 seeds are almost 50/50. I use the data from 25 years of round of 64, round of 32 and round of 16 and then fit a line assuming that even seeds are 50/50 and that a seed difference of 15 (1 vs. 16) will result in a favorite win 99.07% of the time. That represents 1 in 108, though this upset has never occurred in 26 years of data, it will happen someday, and that could be as soon as 1 this year. Thus (26*4+3) wins/(27*4) attempts is 99.07%.

I did not use the fitted line in the curve above because of its unrealistic probabilities at high seed difference. While this approach captures the history, I feel this approach neglects the variation between similar seeded teams as reflected in the Sagarin ratings. Additionally the history shows pretty wide variations in outcome.

Simulation results with this scheme show the number of simulations out of 1000 that a given seed was the champion. These results are more similar to the historical outcomes, but the matchups between evenly seeded teams will be tossups that ignore the differences as determined by the Sagarin ratings.

A histogram of the teams with seeds and the number of times they are champions in 10,000 simulations, shows that Kentucky is the most likely winner, with low seeds favored to be champions, but I fear that it neglects the difference in teams as represented by the Sagarin ratings. This simulation is unrealistic.

The final approach combines the two by scaling the average of the Sagarin ratings probability by the expected probability due to seeds as predicted by historical performance. Thus we make sure the average for teams. In practice I add the residuals of the line fitted through the Sagarin rating probabilities to the line fitted by setting the 15 difference probability to 99.07% and the even difference to 50%.

Thus the probabilities reflect the historical data with a more realistic and very rare chance of 16 seeds beating 1 seeds but with the Sagarin ratings to sort between evenly matched teams.

Simulation results with this scheme show the number of simulations out of 1000 that a given seed was the champion. The results is similar to the seed difference with history scheme above, but now the Sagarin ratings are included.

A histogram of the teams with seeds and the number of times they are champions in 10,000 simulations, shows that Duke is the most likely winner, and low seeds are still favored as is true historically. This is the simulation scheme we will proceed with.

Saturday, March 13, 2010

This manhole cover had water shooting out of it today due to the large amount of rainfall. It is right next to Shellpot Creek at the lowest part of the valley so I can see why water might be under some pressure when it finally gets there.

UPDATE (3-21-2010): Apparently water shooting out of the manhole cover means that it is broken. That week we saw crews examining the manhole cover and apparently they replaced it and its cement collar and everything.

Clicking the test button opens a small window in front of the webpage and first asks for your address, but not e-mail or name. The privacy statement says that it will record the IP address as well.

Then it runs a download, upload, latency and jitter (variation in latency) test. I was never able to gee it to go past the latency test.

It would fail to show the jitter results and then show what looks like the java download page through the window of the test. Since it is only beta, perhaps it will be fixed when the final version is out.

Since Verizon FIOS promises 20 MB/s down and 5 up, it appears I am getting what I paid for.

Wednesday, March 10, 2010

I think these are called Truck Nutz. I was surprised and perhaps I show my prejudices to see this truck in the Brandywine Hundred turning off of Marsh Road to someone's house. Does someone with this poor taste in truck accessories live there? We will never know.

Yestereday at 2:30pm I saw this accident at RT 141 and RT100 just south of the Tyler McConnel bridge. I was travelling northbound to go over the bridge but this person looked like they had just come over the bridge and was travelling SB. The front is all smashed up but I did not see another car with corresponding damage. I am guessing that they hit someone turning left onto RT 100 in fron of them.

State troopers and County police were already there. Perhaps they had moved the other car already. This is a bad intersection to make a left at going northbound unless you have the green turning arrow. People also like to stretch the light and go through when it has just turned red.

Sunday, March 07, 2010

Boing Boing points to a post about obsolete professions and highlights one called the lector, a person who read newspapers to cigar rollers. When we were in Key West on vacation on October of 2007 we went to the The Key West Museum of Art & History in the Custom House where we saw a painted wood carving called "Old Island Days No. 99 - A Fabulous Industry" which depicted cigar rollers being read to by el lector.

From the artwork description:

Old Island Days No. 99 - A Fabulous Industry

This painting shows a section of the Edwardo Gato Cigar Factory where hundreds of cigar makers created elaborate cigars sold around the world. Pedro Sanchez, the artist's father is the read, el lector, standing on a platform called, la tribuna. He entertained and educated the workers by reading newspapers in the morning and famous novels in the afternoon. Each worker paid Pedro Sanchez 25 cents per week.

I also appreciated the detail in the signs on the wall describing the first flight from key West to Cuba...

Monday, March 01, 2010

I have not finished watching all of the Olympic women's curling on my DVR, so do not tell me the results. At one point I had 12 hours on there, and I am sure I have watched 24 hours worth, compressed down to 50% to 33% with aggressive skipping (a curling pun if you get it). Partly the reason you can watch three hours of curling in one is because NBC tends to cut about 4 to 8 rocks out of each end, usually the first ones, and fill it with commercials, and then they fill the breaks with commercials. I do have to do some skipping ahead during the early part of the end and through timeouts, because much as I would like to, I do not have the time to watch it all the way through.

Here are some cartoons to keep the Olympic curling fun alive a few more days.