There’s a wonderful interview at the Notices with last year’s Abel Prize winner John Tate (video here). He blames the fact that his name is on so many mathematical results and concepts on Serge Lang. The 2011 Abel Prize winner will be announced on March 23rd.

Sir Michael Atiyah’s February 1 talk at the College de France titled A Geometer Explores the Universe is now on-line.

I love this article in the WSJ about the crisis at JP Morgan. The key point it highlights is that looking only at the high-level analysis and summaries can be misleading, you have to look at the raw data to see the potential problems. As data become more complex, I think its critical we stay in touch with the raw data, regardless of discipline. At least if I miss something in the raw data I don’t lose a couple billion. Spotted by Leonid K.

On the other hand, this article in the Times drives me a little bonkers. It makes it sound like there is one mathematical model that will solve the obesity epidemic. Lines like this are ridiculous: “Because to do this experimentally would take years. You could find out much more quickly if you did the math.” The obesity epidemic is due to a complex interplay of cultural, sociological, economic, and policy factors. The idea you could “figure it out” with a set of simple equations is laughable. If you check out their model this is clearly not the answer to the obesity epidemic. Just another example of why statistics is not math. If you don’t want to hopelessly oversimplify the problem, you need careful data collection, analysis, and interpretation. For a broader look at this problem, check out this article on Science vs. PR. Via Andrew J.

Some cool applications of the raster package in R. This kind of thing is fun for student projects because analyzing images leads to results that are easy to interpret/visualize.

The overview article on “Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis” associated with the invited talk at the upcoming PODS 2012 meeting is on the arXiv here.

The monograph on “Randomized Algorithms for Matrices and Data” is available in NOW’s “Foundations and Trends in Machine Learning” series here, and it is also available on the arXiv here.

Click here for information (including the slides and video!) on the Tutorial on “Geometric Tools for Identifying Structure in Large Social and Information Networks,” given originally at ICML10 and KDD10 and subsequently at many other places. (The slides are also linked to below.)

The overview chapter on “Algorithmic and Statistical Perspectives on Large-Scale Data Analysis” is finally on the arXiv here; the book in which it will appear is in press; and a video of the associated talk is here.

Logistic regression vs. multiple regression—–Many statisticians seem to advise the use of logistic regression over multiple regression by invoking this logic: “A probability value can’t exceed 1 nor can it be less than 0. Since multiple regression often yields values less than 0 and greater than 1, use logistic regression.” While we can understand this argument, our feeling is that, in the applied fields we toil in, that argument is not a very practical one. In fact a seasoned statistics professor we know says (in effect): “What’s the big deal? If multiple regression yields any predicted values less than 0, consider them 0. If multiple regression yields any values greater than 1, consider them 1. End of story.” We agree.

Maybe mostly useful for me, but for other people with Tumblr blogs, here is a way to insert Latex.—From Simply Statistics

Harvard Business school is getting in on the fun, calling the data scientist the sexy profession for the 21st century. Although I am a little worried that by the time it gets into a Harvard Business document, the hype may be outstripping the real promise of the discipline. Still, good news for statisticians! (via Rafa via Francesca D.’s Facebook feed).—From Simply Statistics

The counterpoint is this article which suggests that data scientists might be able to be replaced by tools/software. I think this is also a bit too much hype for my tastes. Certain things will definitely be automated and we may even end up with a deterministic statistical machine or two. But there will continually be new problems to solve which require the expertise of people with data analysis skills and good intuition (link via Samara K.)—From Simply Statistics

An interview with Brad Efron about scientific writing. I haven’t watched the whole interview, but I do know that Efron is one of my favorite writers among statisticians.

Slidify, another approach for making HTML5 slides directly from R. (1) It is still just a little too hard to change the theme/feel of the slides (2) The placement/insertion of images is still a little clunky, Google Docs has figured this out, if they integrated the best features of Slidify, Latex, etc. into that system, it will be great.

Here is an interesting review of Nate Silver’s book. The interesting thing about the review is that it doesn’t criticize the statistical content, but criticizes the belief that people only use data analysis for good. This is an interesting theme we’ve seen before. Gelman also reviews the review.—–Simply Statistics

Suggested New Year’s resolution: start a blog: A blog forces you to articulate your thoughts rather than having vague feelings about issues; You also get much more comfortable with writing, because you’re doing it rather than thinking about doing it; If other people read your blog you get to hear what they think too. You learn a lot that way. || Set aside time for your blog every day. Keep notes for yourself on bloggy subjects (write a one-line gmail to yourself with the subject “blog ideas”).

################## From SimplyStats ##################Editor’s Note: Last yearI made a list off the top of my head of awesome things other people did. I loved doing it so much that I’m doing it again for 2014. Like last year, I have surely missed awesome things people have done. If you know of some, you should make your own list or add it to the comments! The rules remain the same. I have avoided talking about stuff I worked on or that people here at Hopkins are doing because this post is supposed to be about other people’s awesome stuff. I wrote this post because a blog often feels like a place to complain, but we started Simply Stats as a place to be pumped up about the stuff people were doing with data. Update: I missed pipes in R, now added!

I’m copying everything about Jenny Bryan’s amazing Stat 545 class in my data analysis classes. It is one of my absolute favorite open online set of notes on data analysis.

Ben Baumer, Mine Cetinkaya-Rundel, Andrew Bray, Linda Loi, Nicholas J. Horton wrote this awesome paper on integrating R markdown into the curriculum. I love the stuff that Mine and Nick are doing to push data analysis into undergrad stats curricula.

Somebody tell Hector Corrada Bravo to stop writing so many awesome papers. He is making us all look bad. His epiviz paper is great and you should go start using the Bioconductor package if you do genomics.

Hilary Mason founded fast forward labs. I love the business model of translating cutting edge academic (and otherwise) knowledge to practice. I am really pulling for this model to work.

Rafa and Mike taught an awesome class on data analysis for genomics. They also created a book on Github that I think is one of the best introductions to the statistics of genomics that exists so far.

Hilary Parker wrote this amazing introduction to writing R packages that took the twitterverse by storm. It is perfectly written for people who are just at the point of being able to create their own R package. I think it probably generated 100+ R packages just by being so easy to follow.

FiveThirtyEight launched. Despite some early bumps they have done some really cool stuff. Loved the recent piece on the beer mile and I read every piece that Emily Oster writes. She does an amazing job of explaining pretty complicated statistical topics to a really broad audience.

David Robinson’s broom package is one of my absolute favorite R packages that was built this year. One of the most annoying things about R is the variety of outputs different models give and this tidy version makes it really easy to do lots of neat stuff.

Chung and Storey introduced the jackstraw which is both a very clever idea and the perfect name for a method that can be used to identify variables associated with principal components in a statistically rigorous way.

I rarely dig excel-type replacements, but the simplicity of charted.co makes me love it. It does one thing and one thing really well.

The hipsteR package for teaching old R dogs new tricks is one of the many cool things Karl Broman did this year. I read all of his tutorials and never cease to learn stuff. In related news if I was 1/10th as organized as that dude I’d actually you know, get stuff done.

Whether I agree with them or not that they should be allowed to do unregulated human subjects research, statistics at tech companies, and in particular randomized experiments have never been hotter. The boldest of the bunch is OKCupid who writes blog posts with titles like, “We experiment on human beings!”

In related news, I love the PlanOut project by the folks over at Facebook, so cool to see an open source approach to experimentation at web scale.

No wonder Mike Jordan (no not that Mike Jordan) is such a superstar. His reddit AMA raised my respect for him from already super high levels. First, its awesome that he did it, and second it is amazing how well he articulates the relationship between CS and Stats.

I’m sure this actually happened before 2014, but the Bioconductor folks are still the best open source data science project that exists in my opinion. My favorite development I started using in 2014 is the git-subversion bridge that lets me update my Bioc packages with pull requests.

rOpenSci ran an awesome hackathon. The lineup of people they invited was great and I loved the commitment to a diverse group of junior R programmers. I really, really hope they run it again.

Dirk Eddelbuettel and Carl Boettiger continue to make bigtime contributions to R. This time it is Rocker, with Docker containers for R. I think this could be a reproducibility/teaching gamechanger.

Regina Nuzzo brought the p-value debate to the masses. She is also incredible at communicating pretty complicated statistical ideas to a broad audience and I’m looking forward to more stats pieces by her in the top journals.

Barbara Engelhardt keeps rocking out great papers. But she is also one of the best AE’s I have ever had handle a paper for me at PeerJ. Super efficient, super fair, and super demanding. People don’t get enough credit for being amazing in the peer review process and she deserves it.

Ben Goldacre and Hans Rosling continue to be two of the best advocates for statistics and the statistical discipline – I’m not sure either claims the title of statistician but they do a great job anyway. This piece about Professor Rosling in Science gives some idea about the impact a statistician can have on the most current problems in public health. Meanwhile, I think Dr. Goldacre does a great job of explaining how personalized medicine is an information science in this piece on statins in the BMJ.

Michael Lopez’s series of posts on graduate school in statistics should be 100% required reading for anyone considering graduate school in statistics. He really nails it.

Emma Pierson is a Rhodes Scholar who wrote for 538, 23andMe, and a bunch of other major outlets as an undergrad. Her blog, obsessionwithregression.blogspot.com is another must read. Here is an example of her awesome work on how different communities ignored each other on Twitter during the Ferguson protests.

The Rstudio crowd continues to be on fire. I think they are a huge part of the reason that R is gaining momentum. It wouldn’t be possible to list all their contributions (or it would be an Rstudio exclusive list) but I really like Packrat and R markdown v2.

Julian Wolfson and Joe Koopmeiners at University of Minnesota are straight up gamers. They live streamed their recruiting event this year. One way I judge good ideas is by how mad I am I didn’t think of it and this one had me seeing bright red.

Pipes in R! This stuff is for real. The piping functionality created by Stefan Milton and Hadley is one of the few inventions over the last several years that immediately changed whole workflows for me.