How to flunk your rotation in informatics: insights from burrowing mammals

Trigger warning: this post contains graphic descriptions of Talpidae-phobic violence. Sorry, no French language stuff here–come back tomorrow (or so) for our usual exploration of the implications of the statistical properties of language for second-language learners.

Here’s some advice on how to flunk your rotation in informatics. I’ve written this with details that are specific to my particular field–natural language processing–but the broader ideas apply to informatics in general, to dissertation-writing in most academic fields that I can think of, and outside of academia, to software development jobs, to grant-writing, or to almost anything with a deadline at which you will be evaluated at some point. Following this advice won’t guarantee that you’ll flunk your rotation, but not following it is an excellent way to improve your chances of passing.

Be afraid to ask questions

This is the biggie. Afraid that people will think you’re stupid if you ask questions? Don’t be–they’ll definitely think that you’re stupid if you don’t, and then don’t figure stuff out some other way. The absolute best students I’ve known were two people who had weekly appointments with me while they were doing their studies, specifically to ask questions. One of them is a rapidly rising star at a government research institute now, and the other is running a bioinformatics program. If you can’t get over your fear of asking questions, your chances of professional success are low. (I don’t mean to imply that I’m any good at answering questions–but, something about the nature of that interchange seems to have made some sort of contribution to their educations.)

Don’t make a schedule

As soon as you figure out what you’re doing for your project, don’t do what we do in the military—if you want to flunk your rotation. What we do in the military: write down a list of every step that has to be accomplished to get from where you are now to where you need to be at the end of the rotation. Are you going to think of everything? No–but, you’re going to think of most things. Don’t obsess about that.

Now put the due date by the last thing in your list of things that have to be done. Work backwards, estimating the time by which you will hit each of the preceding steps.

Now ask a question: is the date by which you would need to have started in order to get done on time already past? If so: go back to your advisor, because you need to modify your project–now. If not: great! So far, you’re on track!

A good way to flunk your rotation is to not have any way to estimate whether or not you’re on schedule to finish on time. If you don’t want to flunk your rotation: make a realistic schedule that lists everything that you have to do, and by when each step needs to be finished. (See the sidebar for one way to do this.) Go back to your timeline frequently, and make sure that you’re on track to finish by the due date. If you’re not on track: figure out what you need to do differently to get back on schedule. If you are on track: great! Part of the beauty of working out your timeline early is that you find out quickly if you’re falling behind, but to my mind, the real beauty of working out your timeline is that if you see that you’re on schedule, you have a license not to be anxious. No point in sweating if you’re on track to finish on time at the moment. Schedules can be anxiety-inducing if you fall behind, but that’s OK–if you’re falling behind, you want to figure that out now, not a month from now. The thing is, schedules can also be reassuring–if you know that you’re not behind, then there is no reason at all to lie awake at night worrying.

Don’t establish immediately that there’s data available on which to test your system

This is the number-one informatics-specific rookie mistake. (The being-afraid-to-ask-questions thing is an indiscriminate killer of everyone.) Suppose that your rotation project is to build a system that whacks moles. You’re going to want to demonstrate that it does, in fact, whack moles: if you can’t actually get your hands on any moles, you’re going to be asking the faculty to just take your word for it that this would be a really, really great mole-whacker, and that’s not likely to happen. If you find out two weeks before your rotation ends/your conference submission deadline/your grant submission deadline that there’s no data available with which to test your interesting hypothesis, it’s probably game over–come back next semester/year/shift in national scientific priorities and try again. On the other hand, if you realize very quickly that there’s this interesting hypothesis but no existing data with which to test it, and then you propose a way to create the data and an associated evaluation methodology, that’s an excellent approach to doing a rotation/writing a paper/getting a grant. You can use the data to test your hypothesis in the next rotation/paper/grant proposal, and you’ll be the first one to do so (important in academia), ’cause there was never any data around that would have let anyone do the experiment before.

Neil Sarkar, the Founding Director of Brown University’s Brown Center for Biomedical Informatics, makes a related point that is crucial for people doing rotations in biomedical informatics: “One thing to also consider is importance of knowing when an Institutional Review Board protocol must be filed… And not trying to evade the process of getting Institutional Review Board approval…” It’s important to think about this up front, and if you need this kind of institutional approval, you want to ask for it early, because these things can take an amazing amount of time just to prepare the request, and then you have to wait through the approval process, too.

An aside: I’m guessing that all of you non-informatics people out there are thinking that I’m just making things up with this whole issue of mole availability or lack thereof–click here for the search page of Jackson Labs, which exists in large part to connect researchers with mice that have very specific genetic characteristics needed for an incredible variety of experimental investigations. You say that you need some Chinese hamster ovary cells? I ask: what kind? Click here for the CHO-K1 line. 575 euros. They’re super-important in research on therapeutic recombinant proteins. You say you’re a surgeon who does kidney transplants, and you want to do a better job of getting kidneys to survive between when you take them out of the recently-departed and put them into the recipient? You need to understand metabolism at low temperatures. You say you want to understand metabolism at low temperatures? You need to understand hibernation. You say you want to understand hibernation? You need a lab full of arctic ground squirrels. How does a surgeon who does kidney transplants get their hands on a bunch of arctic ground squirrels? Go to the Arctic Circle (during the summer, obviously, ’cause they hibernate in the winter) with a bunch of carrots–see here for an article about how fun this is (warning: graphic picture of an arctic ground squirrel on an anesthesia machine), here for how to figure out where to put your traps, and here for details on things like the trade-offs associated with large traps versus small traps, the relative effectiveness of selective site trapping versus grid trapping, how to use a girth hitch sling to allow a single person to handle an arctic ground squirrel alone, and some stuff about toe amputation that we don’t need to go into. This undoubtedly sounds like a lot of work, and it is. It could be worse, though–if what your research requires is woodchucks (useful for the study of a particular kind of liver cancer called hepadnavirus-associated hepatocellular carcinoma), you may have to raise them in the lab yourself. This is a huge big deal if you’ve got a deadline, because they only breed in March and April, and then they’re pregnant for a month, and then they don’t actually have very large litters after all of that. Now, if you’re reading this, you probably are studying some forms of informatics, and thinking: this guy’s full of shit–I don’t need no stinking woodchucks. But, keep in mind that the CRAFT corpus took over three years to build, and PropBank has been growing for well over a decade. Data is precious, and sometimes it’s expensive, and it’s not always there when you need it–unlike Chinese hamster ovary cells, it’s often not possible to just go to a web site and buy what you need. So, if you don’t want to find yourself doing the informatics equivalent of scooping the woodchuck litter boxes while the rest of your classmates are giving triumphant rotation talks, the question of availability of data for testing your system has to be the very first thing that you resolve after you walk out of your new rotation supervisor’s office to go sit in your carrel with a warm feeling in your heart and visions of an endowed professorship at Stanford. Let me repeat the word available–the fact that your medical school has 10 petabytes of electronic health records with all of the data that you need in them does you no good whatsoever if you can’t get access to them.

Don’t establish scoring criteria up front

You want to have a conversation with your rotation supervisor very early in the process about what will constitute success. Suppose that your project is to build a system that whacks moles. What does it mean to have built a system that whacks moles? Does it have to be a successful system, or can it just exist? If it has to be successful: what does “successful” mean? Does it have to kill the moles, or is it OK to just tap them on the head? Maybe it’s actually preferable to just tap them on the head? If you don’t ask, you won’t know. Does it have to whack every mole, or is it OK if it focusses on whacking the moles that smell bad? If it whacks one mole one time, does that satisfy the requirements of the mole-whacking-system-building project, or does it need to continue whacking moles unto eternity, and if so, what are the requirements regarding the ability of the system to continue whacking moles when the zombie apocalypse comes and there is no more electricity? If it misses 1 mole out of 10, would that still constitute mole-whacking? What about if it misses 5 moles out of 10? Suppose that what’s really wanted is a system that whacks every mole, every time, exactly on the top of the head, with uniformly fatal results, all the way through the zombie apocalypse until the spirit of cooperation, mutual assistance, and recognition that we are all connected in a web of interdependence restores humanity to its rightful zombie-free position on the planet–but, although your system is only catching 50% of the moles and sometimes it punches them in the stomach instead of whacking them on the head, and you don’t really have a good plan for the whole what-happens-when-there’s-no-more-electricity thing, but in the process of building the system, you’ve come across a really novel approach to thinking about mole-whacking that is likely to yield real insight into the nature of moles, the nature of whacking, and how to think about speciesist violence in terms of a general framework with applicability to subterranean mammals as a whole, and possibly also some of the smaller lizards–but, not until a couple months after your project is over and grades are submitted. This might seem persnickety, but I have most definitely seen the situation where the student (or software engineer, or grant writer, or whatever) thought that they were supposed to be whacking moles in the sense of small fossorial mammals, but what their rotation supervisor was looking for was a system that whacks moles in the sense of a spy who has integrated themself into an organization, and those situations most definitely did not end in a way that led to the student feeling happy. (See above for how you can use fear of asking questions about things like this to increase the chances of flunking your rotation.)

A pithier version of the preceding, very long paragraph: the great suicidologist Ed Shneidman used to say that “the most dangerous four-letter word in the English language is only.” (If you’re not a native speaker of English: a “four-letter word” is an idiom meaning a curse word–fuck, shit, piss, etc.) The biggest warning sign of an impending rotation-failure (or comprehensive exam, or missed grant deadline, or whatever) is the word something in your topic. If your description of your topic is I’m going to do something with mole-whacking/semantic role labelling/protein structure prediction, then you still have major gaps in your conception of the project, and you have no idea what will constitute success–or a failing grade, either. Seriously: sounds simplistic, but the presence of something is a strong diagnostic.

Spend a lot of time obsessing about minor details early in the process

Have you been tasked with building a mole-whacker? Put a lot of time into thinking about moles with bad breath, moles with nice breath, and moles that would be really cute if only they did something about their taste in Restoration essayists. Are you going to build a system that does deep analysis of subtle differences between different kinds of change-of-state verbs? Spend a lot of time thinking about how you’re going to detect the ends of sentences. (If you’re not a language processing person: getting a computer program to recognize the ends of sentences is a lot harder than you might be thinking. But, it’s not super-crucial to the bigger problem of deep analysis of subtle differences between different kinds of change-of-state verbs.) If there’s one thing that I’ve learnt from spending a lot of time around French people, it’s that minor details are important. But, you need to have the big picture in your mind all the time, and if you have a 10-week rotation and you spend two weeks of that time thinking about how to do a perfect job of finding the ends of sentences, then you have reduced your chances of successfully completing your project quite a bit, unless it’s about improving the ability of computer programs to find the ends of sentences. (If you’re not a language processing person and you think that I’m just making this shit up: click here for a paper on the role of finding the ends of sentences in the task of finding bacteria habitats, or here for a paper on event response potentials as they relate to prospective and retrospective processes at sentence boundaries, or here for a paper on why you need a support vector machine with a linear kernel (or so the authors claim) to tell the difference between a period at the end of an abbreviation and a period at the end of a sentence in clinical documents (health records).)

Don’t differentiate between aspects of the approach that do and don’t test your hypothesis

By now you might accept that it’s important not to spend a lot of time obsessing about minor details early in the process. But: how do you know what makes something a “minor detail”? Minor details are things that have very little to do with actually testing your hypothesis. Now, you’re thinking: I’ve discussed what counts as success with my rotation supervisor, and we reached the consensus that analyzing subtle details of different kinds of change-of-state verbs means reaching an F-measure of 0.80 on the Semantics Evaluation Conference Official Subtly Different Change-Of-State Verb Test Set. What if I pick the wrong find-the-ends-of-sentences system, and that reduces my performance to 0.79, when it could have been 0.81 if only I’d picked the right find-the-ends-of-sentences system? In that case, I would suggest that you renegotiate what you’re doing with your rotation supervisor. The question with which you would start the conversation: what’s interesting about getting an F-measure of 0.80 versus 0.79? How would that change our knowledge of the world, or software for analyzing subtle differences in the various and sundry kinds of change-of-state verbs, or moles, or whatever? Can we frame the project in terms of a question of some sort that might have broader implications for how one might approach this kind of task in the future, such that my career doesn’t succeed or fail on the basis of whether or not I’m good at finding the ends of sentences?

Don’t have a hypothesis

If you would like to flunk your rotation, it’s helpful to not have a hypothesis. If you don’t have a hypothesis, then you’re less likely to know whether or not you’ve tested anything, which means that neither you nor the faculty who will be grading your rotation project will know whether or not you finished your rotation project. That’s not a guaranteed way to flunk your rotation–you’ll leave the faculty in the position of guessing whether or not you finished it, and maybe they’ll guess that you did–but, it’s a pretty good one.

Don’t know why you’re doing your project

On some level, you always know why you’re doing your project–you’re doing it because your advisor thinks that it would be a good idea. But, why? Let’s step back a bit. Suppose that you have a hypothesis in hand. From a practical perspective, you care about knowing why you’re investigating that particular hypothesis out of a universe of possible hypotheses because if you know why you’re investigating that particular hypothesis, you’re more likely to do a good job of investigating it, or so I assert. Some reasons that I assert that: we discussed above the importance of being able to differentiate between things that take up a lot of time but don’t actually test the hypothesis and things that do contribute to testing the hypothesis. In fact, if you know why you’re testing the hypothesis, then you might realize (hopefully early in the process) that your specific hypothesis isn’t actually going to contribute very much to achieving whatever it is that was your rotation advisor’s motivation for suggesting the project in the first place. That’s the practical reason. There’s a more general reason, too: you’re a graduate student. You want to get a graduate degree. In most fields, we give people graduate degrees when they have contributed some significant piece of knowledge to the stock of what we know. You can certainly contribute pieces of knowledge to the stock of what we know without having any kind of broader conceptual framework (say, a theory) for understanding why those pieces of knowledge would be relevant to someone somewhere, but it’s harder to contribute a significant piece of knowledge to what we know without some kind of broader conceptual framework. It’s that broader conceptual framework that establishes the context that defines your piece of knowledge as significant or not; your piece of knowledge consists, in some sense, of whether or not your results are consistent with your hypothesis; your hypothesis is more likely to be a useful hypothesis if you know why you’re evaluating it. There has been far more written about what makes a hypothesis a useful hypothesis (or not) than I will ever understand before I retire, but it’s worth your while to check out at least some of it. You can find relevant stuff in epistemology, or in philosophy of science, or in statistics–there’s something for every taste.

The epistemology of flunking rotations: Where I got all of this stuff

Some of this stuff comes from my own experience of flunking things–I left graduate school feeling like I knew a lot more about how to not get a PhD than I did about how to get one. I asked a number of people who teach in graduate programs of computer science, medical informatics, bioinformatics, and linguistics to look at the post, and incorporated their comments. The rest comes from years of watching people flunk rotations, as well as flunk master’s thesis defenses, comprehensive exams, prelims… Also watching people miss deadlines for conference submissions, grant submissions, software releases–and I’ve missed more than one of those myself. Learn from my mistakes–it’s a hell of a lot less painful than learning from your own!