Sunday, August 13, 2017

Most of the books I read are recently published, but occasionally I read an older book to continue my education, as I like to put it. This week I read the 1921 novel We by Yevgeny Zamyatin, this edition translated by Natasha Randall in 2006. Some reviews call it one of the first dystopian novels, but that isn't accurate. After Edward Bellamy's successful 1888 utopian novel Looking Backward: 2000-1887, both utopias and dystopias became the fashion for many years to come. Most of these books and their authors are now forgotten, but in English, readers will still recognize 1895's The Time Machine by H.G. Wells, and will know the name of Jack London, though his 1908 book The Iron Heel about the future dystopian struggles leading to utopian socialism is not nearly as popular as his thrilling boy's adventures.

The book We is compact, but for me it was hard to keep focused. It is written in the first person, so the narrator has to describe what is everyday to him (or her) in ways that will make sense to people like his readers who have never seen this world. There are other such books, but it's a tricky proposition. Zamyatin decides to describe things in mathematical terms and colors. As a mathematician, much of his mathematics irritates the hell out of me. For example, his narrator D-503, a mathematically trained engineer in charge of building an interplanetary spaceship, has a particular distaste for the square root of -1, which he calls "irrational". We usually call this number imaginary, but technically he is right. The number we often call i is not the ratio of two integers. Mathematicians call it algebraic.

What irritates me more is that an engineer should know that this very odd idea is of immense practical value in electrical engineering. Using both real and "imaginary" numbers together creates complex numbers and this system very cleanly represents the physical fact that electric currents naturally produce counter-currents that run in a perpendicular direction. Electrical engineers find this idea so useful, then call the square root of -1 j instead of i, so any reference to imaginary is erased. The great mathematician Gauss hated that "imaginary number" was already stuck in the mathematical vocabulary even in his era a century before Zamyatin, and wanted positive to be replaced by direct, negative by inverse and the imaginary directions to be coined lateral and inverse lateral. It is completely possible Zamyatin was never taught this.

Now that I have indulged myself to two paragraphs of mathematical quibbling, let me get to my complaints as a reader of speculative fiction. It's hard to understand some never seen world when the writing relies heavily on bad mathematical descriptions and a made up language of his own personal feelings about colors. Worse still, the first person narrator has a breakdown in the middle of the story where he believes he has died, and several chapters after this point are later to be understood as dreams or hallucinations caused by fever.

The world where D-503 lives is a city made of glass where all lives are supposed to be completely visible to everyone else to make sure everyone is doing exactly what they should be, but there is an exception for when people have sex. The sex component is excessively important to the plot, and anyone who has seen through Hugh Hefner's idea of utopia can see it for the juvenile male fantasy it is. People can have sex with anyone who can agree to have sex with them, and men are completely free from the burdens of fatherhood. It also presents women who have once given consent and wish to rescind it as horrible and duplicitous creatures. It never assumes to a man he might not be an ideal lover.

In short, if you have never read We, you have my leave to never read it. The book has fans that range from Garry Kasparov, the former world chess champion who is strongly capitalist and just as strongly anti-Putin, to Noam Chomsky, the renowned linguist whose political views are sometimes described as libertarian socialist. Chomsky has said We is superior to Nineteen Eighty Four, which he considers wooden. Just to add a little more interest to reading this book I have said you shouldn't read, Orwell considers it completely superior to Brave New World.

Here's where I stand on these provisos to my bold and underlined main position above. Huxley and Orwell did not get along and I am 100% on Team Orwell. As a prose stylist, Orwell runs rings around Huxley and Zamyatin, though I will admit I cannot read Zamyatin in the original Russian, which is my problem, not his. A point on which I agree with Orwell that We is better than Brave New World is both books have characters who are considered great poets in morally empty times. What would such a poet write? Zamyatin gives examples, Huxley does not.

Point to Zamyatin.

More importantly than any political position or literary merit, Orwell understood the connection between politics of any stripe and lying. Here are his six rules of writing, from his essay Politics and the English Language.

Never use a metaphor, simile or other figure of speech which you are used to seeing in print. (Many of Orwell's examples are now thankfully out of date. The best modern example is the completely meaningless cliche "thoughts and prayers".)

Never use a long word where a short one will do.

If it is possible to cut a word out, always cut it out.

Never use the passive where you can use the active.

Never use a foreign phrase, a scientific word, or a jargon word if you can think of an everyday English equivalent.

Break any of these rules sooner than say anything outright barbarous.

To summarize, if you are intrigued by my description or the testimonials, by all means read We. If you want to take my advice instead, find some Orwell you haven't read, especially his collections of essays. In particular, Shooting an Elephant should be at least as famous as The Declaration of Independence or the preamble to The Constitution.

Saturday, August 5, 2017

Many stories have been written this year saying Trump voters are still happy with Trump. Almost none have been written about Clinton voters still pissed that she won by nearly 3,000,000 votes and over 2% of the popular votes and Trump was still installed as president by the seriously anti-democratic Electoral College system.

In contrast to these anecdotes, polls make the attempt to gauge the general public opinion using something approximating scientific methods. The big failure of poll-based prediction in 2016 makes me less confident in these numbers, but statistical methods never promise certainty. That said, the polling numbers for Trump's popularity after six months in office show a public growing quickly disenchanted.

I follow 22 different polling companies, getting their results from the Pollster page funded by The Huffington Post, but I have more confidence in looking at six companies that poll every week or even more often. The two tracking polls that update almost every day are Gallup and Rasmussen. The four polls that give weekly numbers are Politico, SurveyMonkey, YouGov and Ipsos/Reuters. I never consider any one polling company to be the most reliable, but I would rank these six at least as reliable as the companies that poll only two or three times a month or even less and much more reliable than the very sporadic pollsters.

The graph speaks for itself. While there are ups and downs in the average written in blue and the median written in red, the general trend is downhill. Since the middle of July, the numbers have taken a steep fall. On July 11, Trump's net popularity averaged -11 percentage points and the median was -13.5 points. As August began, those numbers sunk to -19.7 on average and a median of -21.5 percentage points.

On the left is slope graph for the six companies, showing their net numbers on January 31 and July 31. Two points are difficult to read due to exact overlap. On the far right, both YouGov and Ipsos/Reuters had Trump at -1 point net in January, while Rasmussen and Gallup now concur that Trump is at -22 percentage points when the unfavorable number is subtracted from favorable.

The first and most obvious point is that everything is downhill. Politico, represented by the light blue line at the top, has been consistently the kindest to Trump, but currently even they have his net favorable numbers at -10 percentage points, worse than even Gallup had at the end of January. The steepest fall is the yellow line, representing Rasmussen, a poll well known throughout this century as being very kind to conservatives. In January, only Politico and Rasmussen gave Trump a net favorable score. Now, Rasmussen is tied with Gallup giving Trump a -22 point rating, only surpassed in the negative direction by Ipsos/Reuters at -24.

Let me repeat that no poll is perfect and even a collection of polls won't always give us an accurate read. For example, in last year's polls of Pennsylvania, not even one company gave Trump the lead, which made his win there all the more shocking. But having written that, I present this data as an antidote to anecdotes. For all the reporters who can find Trump voters still happy with their choice, the polling companies can find large masses of voters who realize they made a horrible mistake in November.

Thursday, August 3, 2017

Longtime readers will know that I love to collect data. Many blog posts have had a collection of data as the jump off point, but there are times when I collect data hoping to see a pattern and none becomes apparent, or I see some trend but I'm of several minds about how to present it.

One type of data set I have been collecting for two and a half years concerns the average temperature in Oakland. The website Weather Underground publishes not only the daily temperature highs and lows, but compares each day to the average over the last fifteen years. I have used this data in my statistics classes, showing how to take large sets and input them in calculators using frequency tables. Most Texas Instruments calculators will balk at a data set with 365 or 366 values, but because of repetition of values, we can get all the data in the set and the important statistics from these samples, notably the five number summary - an old school way to look at outliers - and also average and standard deviation, the more modern way to discuss what numbers on a list are remarkably high or remarkably low.

This is a dot-plot of the 366 days of 2016 in Oakland, each day listed as the number of degrees above or below the average for the previous fifteen years. The tallest stack of dot is at zero degrees. This represents the mode of the set. Obviously, there are a lot more dots to the left of the tallest stack than there are to the right. The other two famous measures of center, the mean and the median, are not so apparent from this graph. The median is 2 and the average is about 2.604, with a standard deviation of 5.659. Simply put, the more commonly used measures of center say the temperature in 2016 is warmer than the rest of the century.

You might say this is evidence of climate change in Oakland. I am not 100% convinced. Here are my reasons.

1. Should I trust the average daily temperatures given by the website? The averages stay the same for weeks at a time, not even wobbling by a degree. That smells like they are averaging not just all the single day temperatures for example, but maybe taking the average of several days in a row, then averaging that over fifteen years.Not sure this is kosher.

2. Should I trust the t-score method and the p-value it produces? The t-score test uses average/(standard deviation) x sqrt(size of set) as the test statistic. In this case, that would be
2.604/5.659 x sqrt(366) ~= 8.803. This is a crazy big number for a t-score and it produces a p-value so small it has to be written in scientific notation, 2.705 x 10 ^ -17. Written in regular notation, this is 0.0000000000000002705, which is crazy close to zero. A paper publishes with a p-value this small is basically saying, "I'm right, so shut the fuck up."

But let me note here that statistics is math mixed with opinion, and not every statistician loves the t-score/p-value method used with a data set like this. Most notably, W. Edwards Deming, the famously practical statistician credited with turning the Japanese economy around after World War II, argued that if there was any difference between any two sets, all you needed was a large enough sample size to prove that difference significant. In this case, the large sample size gives us a multiple in the formula of sqrt(366), which is about 19. Since a t-score of 3 will give us a very impressive p-value, having this relatively large number in the formula guarantees an impressive p-value.

3. How should we think about a year in terms of climate change data? A hot or cold day is not climate change. I am skeptical about counting a month as a long enough time to have meaning, though Dr Michael E. Mann often tweets about a month being the hottest or second hottest (fill in the month in question) in history. Mann is not an alarmist, as was made clear when he poured cold water on the New York magazine article from earlier this year that was all doom and gloom. While not an alarmist, he does want to keep climate change in the news, and it is a slow moving process, at least from the standpoint of the 24 hour news cycle.

But I have no problem about thinking a year is a length of time where we can talk about the numbers as having meaning when discussing climate change. Personally, I am uncertain as to whether years should be the basic unit of measure or should be clumped into groups to have clearer meaning. My simile is this. A year has meaning, but if we compare it to grammar, is a year a sentence or a word or merely a letter? When I wrote my math blog about climate change, I argued that we should look at periods of time between strong El Niño years that included a strong La Niña year as the basic unit.

So those are my provisos and quibbles. Here is the data.

2015: The temperature in 2015 was 2.605° F warmer than the average of the previous fifteen years and the standard deviation was 5.659° F. With a sample of 365 days, this data set makes a very convincing argument that things are getting warmer. Using the average and standard deviation method, an unusually cold day would be 9° F lower than average. That happened once. An unusually hot day would be 14° F higher than average. That happened seventeen times, and very unusually hot days wound be over 20° F hotter than average, which happened three times.

2016: The temperature was 2.242° F warmer than the fifteen year average and the standard deviation was 5.447° F. It didn't warm up quite as much as 2015, but the lower standard deviation would mean the t-score/p-value number would again be hard to argue against. There were no days that count as unusually cold (again, 9° F colder than average), but eighteen days at 14° F hotter than average and six days above 19° F hotter than average.

First seven months of 2017: So far, the average temperature is 2.321° F warmer than the previous fifteen average with a standard deviation of 5.480° F. No days have been unusually cold so far, twelve have been unusually hot and three have been very unusually hot. The cutoff points for unusually hot and very unusually hot are 14° F above average and 19° F above average, respectively. These thresholds are unchanged from the 2016 numbers, which is not surprising because the averages and standard deviations are so similar.

Conclusion: Here in Oakland it's getting warmer. 2015 shows the largest change upward, but note that 2015 is part of the last fifteen year average when measuring 216 and 2017. I'd love to get more raw data from a weather station that has produced data continuously for a few decades and I have an idea of how to achieve that. I also want to come up with a good way to define a heat wave and I think I have the start of an idea I need to flesh out.

Tomorrow, another math-y blog post, this time about Trump's approval numbers.