R has a number of very good packages for manipulating and aggregating data (plyr, sqldf, ScaleR, data.table, and more), but when it comes to accumulating results the beginning R user is often at sea. The R execution model is a bit exotic so many R users are very uncertain which methods of accumulating results are … Continue reading Efficient accumulation in R →

Did anyone else notice that this DC multiple-murder case seems just like a Pelecanos story? Check out the latest headline, “D.C. Mansion Murder Suspect Is Innocent Because He Hates Pizza, Lawyer Says”: Robin Flicker, a lawyer who has represented suspect Wint in the past but has not been officially hired as his defense attorney, says […] The post Ripped from the pages of a George Pelecanos novel appeared first on…

In our newest column, we take on the recent media obsession with companies who make robots that hire people. (link) As with most articles about data science, the journalists failed to dig up any evidence that these robots work, other than glowing quotes from the people who are selling these robots. We point out a number of challenges that such algorithms must overcome in order to generate proper predictions. We…

Riccardo Rebonato (R) has a fascinating new paper, which builds on important earlier work of Cieslak and Povala (2010) (CP). The cool thing about CP is the way it advances and blends certain aspects of both the spanning literature ("all infor...

Mon: Ripped from the pages of a George Pelecanos novel Tues: “We can keep debating this after 11 years, but I’m sure we all have much more pressing things to do (grants? papers? family time? attacking 11-year-old papers by former classmates? guitar practice?)” Wed: What do I say when I don’t have much to say? […] The post On deck this week appeared first on Statistical Modeling, Causal Inference, and…

I think I've already mentioned this work here and here: after much tribulation, mostly due to the fact that we had to co-ordinate a relatively large number of papers in a single journal issue, we are very close to the publication of our work on the Ste...

Base SAS contains many functions for processing strings, and you can call these functions from within a SAS/IML program. However, sometimes a SAS/IML programmer needs to process a vector of strings. No problem! You can call most Base SAS functions with a vector of parameters. I have previously written about […] The post Convert a vector to a string appeared first on The DO Loop.

Three Problems and a Solution Modern teaching methods for statistics have gone beyond the mathematical calculation of trivial problems. Computers can enable large size studies, bringing reality to the subject, but this is not without its own problems. Problem 1: … Continue reading →

The Raleigh News & Observer published a front-page article about the effect of wealth and poverty on high school athletics in North Carolina. In particular, the article concluded that "high schools with a high percentage of poor students rarely win titles in the so-called country club sports—tennis, golf and swimming—and […] The post Wealth and winning in NC high school athletics appeared first on The DO Loop.

This is a screencast of my UseR! 2015 presentation: Tiny Data, Approximate Bayesian Computation and the Socks of Karl Broman. Based on the original blog post it is a quick’n’dirty introduction to approximate Bayesian computation (and is also, in ...

Last week I ran into a younger colleague who said he had a conference deadline that week and could we get together next week, maybe? So I contacted him on the weekend and asked if he was free. He responded: This week quickly got booked after last week’s NIPS deadline. So we’re meeting in another […] The post The 3 Stages of Busy appeared first on Statistical Modeling, Causal Inference,…

Following my previous post I have decided to try and use a different method: generalized boosted regression models (gbm). I have read the background in Elements of Statistical Learning and arthur charpentier's nice post on it. This data ...

After a post from almost two years ago inviting folks to pose the book with famous Bayesians or non-Bayesians (deceased or not), the book has finally visited a monument to Laplace! Shown below (scroll down) are photos kindly taken by Carlos Ungil. Than...

The celebrated radio quiz show star says: There’s this study done by the Pew Research Center and Smithsonian Magazine . . . they called up one thousand and one Americans. I do not understand why it is a thousand and one rather than just a thousand. Maybe a thousand and one just seemed sexier or […] The post Ira Glass asks. We answer. appeared first on Statistical Modeling, Causal Inference,…

Stephen Senn Head of Competence Center for Methodology and Statistics (CCMS) Luxembourg Institute of Health This post first appeared here. An issue sometimes raised about randomized clinical trials is the problem of indefinitely many confounders. This, for example is what John Worrall has to say: Even if there is only a small probability that an individual factor is […]

Recently, I was listening in on the conversation of some colleagues who were discussing a bug in their R code. The bug was ultimately traced back to the well-known phenomenon that functions like 'read.table()' and 'read.csv()' in R convert columns that are detected to be character/strings to be factor variables. This lead to the spontaneous

I don’t understand why any researcher would choose not to use panel/multilevel methods on panel/hierarchical data. Let’s take the following linear regression as an example: , where is a random effect for the i-th group. A pooled OLS regression model for the above is unbiased and consistent. However, it will be inefficient, unless for all […]

My colleague Robert Allison finds the most interesting data sets to visualize! Yesterday he posted a visualization of toothless seniors in the US. More precisely, he created graphs that show the estimated prevalence of adults (65 years or older) who have had all their natural teeth extracted. The dental profession […] The post The relationship between toothlessness and income appeared first on The DO Loop.

Gabe Murray wrote to Andrew Gelman, asking for comments about the accusations hurled at the current Tour de France front-runner Chris Froome. He said: This post by VeloClinic has been getting a lot of media attention in the past few days, within the context of Chris Froome's dominant performance in the Tour de France: http://veloclinic.com/estimating-the-probability-of-doping-as-a-function-of-power/ The assumptions seem very dubious to me, and I would love to see a critique…

One of the great things about writing a statistics book was finding an excuse to read about dozens of topics that I knew a little about but hadn't got around to studying in depth. Even so, there were a number of topics I ended up missing out on complet...

One of the smart things Noah (at WNYC) showed to my class was his NFL fan map, based on Facebook data. This is the "home" of the visualization: The fun starts by clicking around. Here are the Green Bay fans...