After 1st semester of Statistics PhD program

Have you ever wondered whether the first semester of a PhD is really all that busy? My complete lack of posts last fall should prove it

Some thoughts on the Fall term, now that Spring is well under way [edit: added a few more points]:

RMarkdown and knitr are amazing. When I next teach a course using R, my students will be turning in homeworks using these tools: The output immediately shows whether the code runs and what its results are. This is much better than students copying and pasting possibly-broken code and unconnected output into a text file or (gasp) Word document.

I’m glad my cohort socializes outside the office, taking each other out for birthday lunches or going to see a Pirates game. Some of the older PhD students are so focused on their thesis work that they don’t take time for a social break, and I’d like to avoid getting stuck in that rut.
However! Our lunches always lead us back to the age old question: How many statisticians does it take to split a bill? Answer: too long. I threw together a Shiny app, DinneR, to help us answer this question

The first-year PhD courses in Statistics and in Machine Learning have rather different approaches.

Statistics professor: Just assume we can compute this estimator. In class we’ll prove that the estimates are reasonably good (e.g. we’ll bound the probability that an estimate is far from the true value).

Machine Learning professor: Just trust me that this algorithm gets useful estimates. In class we’ll prove that we can compute it in a reasonable amount of time (e.g. we’ll bound the number of steps until the algorithm converges).

Somewhere between these ideas, I ran into the sensible concept of optimizing only until your solution is within statistical error. For example, say you only have enough data to publish an estimate with a confidence interval of +\- 0.1 units. If your optimization algorithm is computer-intensive, then running it until it converges to +\- 0.00001 units is just a waste of time. For instance, see Bottou & Bousquet’s “The Tradeoffs of Large Scale Learning.”

My ML professor, midway through a classification-focused semester, finally discussing regression for 10 minutes: “…And that’s all you need to know about regression.” My Regression professor, at end of semester, finally discussing classification for 20 minutes: “…And that’s all you need to know about classification.”

In any class that covers proofs or other long detailed arguments, handouts+chalkboards are seriously better than slideshows. With a chalkboard, you can show the whole proof at once—so if students get lost halfway through, they can still see the claim we’re proving and all the steps we’ve made so far. But when you cram a proof onto slides, either you oversimplify to get it onto one slide; or you split it across slides, so that we lose the continuity (and may even forget what we’re trying to prove).

Good homeworks and quick feedback are critical. One of my classes had weekly homeworks, each directly tied to the material we just covered, each problem expanding on a good question or illustrating an interesting principle from class. Homeworks were graded within a week, every single time.
In another class, we had just a few homeworks, very loosely tied to the lecture contents and usually at a very different level (way too easy or too hard relative to what the lecture covered). Although this class had the same number of students and TAs as the other one, we never got our homeworks back in less than 2 weeks—and one of them took a full 2 months to return!

TAing is a mixed bag. I enjoy holding office hours and being there during lab sessions to help students understand something they were missing. I do not so much enjoy grading homeworks and labs by those students who don’t ask questions, don’t come to office hours, and clearly don’t read the comments I leave on their assignments since I see them make the same mistakes over and over. I especially don’t like finding instances of cheating. Urgh.

I was a bit worried about coming back to grad school as an “older” student (the youngest guy in our 1st-year PhD cohort is almost a decade younger than me!). But it’s been great, actually:

My schedule seems much saner than some of my classmates’. Quite a few seem to stay in the office until late most nights, then may sleep through a morning class. For me, after years of waking at 6:30 to spend an hour on the crowded metro to work… it’s been luxurious to sleep in until 7:30 or 8, walk to school in half an hour in the fresh air, have a focused workday of reasonable length, and come home for dinner with my wife, actually relaxing in the evening instead of studying until 3am. Yes, there’s the occasional late night, but occasional is the key word there.

The income’s lower than my old job, of course, but Pittsburgh is much cheaper than DC, especially for housing. Besides: my previous school loans are all paid off, I have a fair chunk of retirement savings already earning interest, and my wife and I are used to budgeting. (YNAB is an excellent tool for this—I will blog about it at some point. If you’re interested, here’s a slight discount referral code, or you can wait for the big sale they seem to have every 3-4 months.)[My point is: despite the drop in income, we’re still more financially secure (thanks to savings and paid-off loans) than if I’d gone straight into the PhD from my MSc.]

As Cosma Shalizi points out: “Note to graduate students: It is important that you internalize that you are, in fact, a badass…” With age and experience, I’m far more able to speak confidently when it’s called for (e.g. giving a talk), and far less intimidated about tackling new topics, talking to professors, writing papers, speaking at conferences, etc.

On the other hand, despite longer experience as a statistician than my classmates, I appreciate and admire that they are much better at many things. I’m really impressed by my various classmates’ command of topics like real analysis and measure theory, scientific computing, or practical knowledge about fields like physics or economics.

Pittsburgh is a great town. Affordable housing, decent bus system, beautiful scenic views from the inclines, friendly people, livable walkable neighborhoods, tons of good food, extensive and well-run library system… It has a lot of what I liked about Portland, without as much of the “Portlandia” over-the-top hipsters. There are also beautiful old buildings, like the Carnegie Natural History Museum (with its sweet dinosaur exhibit) and UPitt’s Cathedral of Learning. The weather right now is pretty snowy/icy, but I don’t mind—I’m honestly impressed by how well Pittsburgh just goes ahead and deals with winter weather, in comparison to DC’s city-wide shutdown every time a snowflake is sighted.

Edit:Here’s another good post on the first semester of a PhD program, from several mathematics students. I agree with most of the responses, especially the ones that conflict each other

How could I forget: go Steelers! We haven’t been to a Steelers or Penguins game yet, but they’re on the list…
(I admit, I’m also glad that all three major local teams have the same colors. Makes it much easier for my non-sports-savvy self.)

I agree with the RMarkdown and knitr comment. It’s good from the student side, too.

Not sure how often the assignments are. Here’s an idea that I think would be workable for both you and the students…for the first assignment, maybe let them turn it in the usual way. For the second assignment through the end use, RMarkdown and knitr.

Good idea. The other option is that RStudio has a “R Notebook” button: It’ll take a plain R script (no fancy Rmd formatting needed) and basically wrap Rmd around it, making a very simple HTML output document. So even using only R code and comments, you can still get a nice reproducible document, much better than cut-and-paste’ing output from the console.

Thanks for the nice words about RStudio and knitr. I’m so glad that you enjoyed them.

Coincidentally, I just graduated as you came into a stats PhD program. I’d say do not worry about real analysis or measure theory. Real analysis is too far away from real data analysis, and measure theory measures too much for statistics (a little bit probability theory can get you going smoothly in statistics unless you are aiming at theoretical research). Elder people often tend not to agree with me, but there really have been a lot more to learn as a stats PhD nowadays than 30 years ago. I think your previous job should give you a clearer direction on what your priorities will be. Good luck!

Thanks, Yihui. I also just gave a short talk on knitr yesterday, and there was a lot of interest. Thank you for making such excellent tools!

Thanks for the advice as well. You’re absolutely right that there is so much more for statisticians to learn today… though I’m actually enjoying my measure-theoretic probability class more than I expected