If you don’t know what you’re in for, go here for past advice columns and here for an explanation of the name Pythia. Most importantly,

Please submit your questions for Aunt Pythia at the bottom of this column!

——

Dear Aunt Pythia,

Do you prefer that we ask you fake [sex questions] or [fake sex] questions? From your website it seems that you prefer the former, but would you also be amused by the latter?

Fakin’ Bacon

Dear Fakin,

I can’t tell, because I’ve gotten neither kind (frowny face).

If I started getting a bunch then I could do some data collecting on the subject. If I had to guess I’d go with the latter though.

Bring it on!

Aunt Pythia

——

Aunt Pythia,

Are boredom and intelligence correlated?

Bored

Dear Bored,

It has been my fantasy for quite a few years to be bored. Hasn’t happened. All I can conclude from my own experience is that being a working mother of three, blogger, knitting freak, and activist is not correlated to boredom.

Aunt P

——

Dear Aunt Pythia,

How can I get my husband to pee IN the toilet?

Pee I Shouldn’t See Ever, Dammit

Dear PISSED,

Start by asking him to be in charge of cleaning the bathroom. If that’s insufficient ask him to sit down to pee – turns out men can do that. If he’s unwilling, suggest that you’re going to pee standing up now for women’s lib reasons (whilst he’s still in charge of cleaning the bathroom).

Hope I helped!

Aunt Pythia

——

Dear Aunt Pythia,

How can I get my wife to stop nagging me about peeing in the toilet?

Isaac Peter Freely

Dear I.P. Freely,

Look for a nearby gas station and do your business there. That’ll shut her up.

Auntie P

——

Dear Aunt Pythia,

Is there any reason I should bother knitting a Klein Bottle? Isn’t just knowing I could do it enough? Or would it actually impress (or give pleasure to) others?

Procrastinating Parametricist

Dear PP,

If you’re looking for an excuse to knit a Klein Bottle, find a high school math teacher that would be psyched to use it as an exhibit for their class.

If you’re trying to understand how to rationalize the act of knitting anything ever, give up immediately, it makes no sense. We knitters do it because we love doing it.

Love and kisses,

Aunt Pythia

——

Dear Aunt Pythia,

I’m a Data Scientist (or Business Analytics pretending to be a Data Scientist) and I’m the leader of a small team at the company work for. We have to analyse data, fit models and so on. I’m struggling right now trying to figure out what’s the best way to manage our analysis.

I’m reading some stuff related to project management, and some stuff related to Scrum. However, at least for now, I don’t think they exactly fit our needs. Scrum seems great for software development, but I’m not so sure it works well for modeling development or statistical prototyping. Do you have any ideas on this? Should I just try scrum anyway?

Typically, most of our projects begin with some loose equirements (we want to understand this and that, or to predict this and that, or to learn the causal effect of this and that). Then, we get some data, spend sometime cleaning and aggregating it, then doing some descriptive analysis, some model fit and then preparing to present our results. I always have in mind what our results will look like, but there is always something I didn’t expect to intervene.

Say, I’m calculating the size of control group and then I realize my variables of interest aren’t normally distributed and have to adapt the way we compute sample size of control group. Then either I do a rough calculation based on assumptions of normality of data or we study and adapt new ways to better approximate our data (say, using a lognormal distribution). Anyway, I’ll probably delay our results or deliver results with inferior quality.

So, my question is, do you know of any software or methodology to use with data science or data analysis in the same ways as there is Scrum for software development?

Brazilian (fake?) Data Scientist

Dear B(f)DS,

I agree with you, data projects aren’t the same kettle of fish as engineering projects. By their very nature they take whimsical turns that can’t be drawn up beforehand.

Even so, I think forcing oneself to break down the steps of a data project can be useful, and for that reason I like using project management tools when I do data projects – not that it will give me a perfect estimate of time til completion, which it won’t, but it will give me a sense of trajectory of the project.

It helps, for example, if I say something like, “I’ll try to be done with exploratory data analysis by the end of the second day.” Otherwise I might just keep doing that without really getting much in return, but if I know I only have two days to squeeze out the juice, I’ll be more thoughtful and targeted in my work.

The other thing about using those tools is that upper-level managers love them. I think they love them so much that it’s worth using them even knowing they will be inaccurate in terms of time, because it makes people feel more in control. And actually being inaccurate doesn’t mean they’re meaningless – there’s more information in those estimates than in nothing.

Finally, one last thing that’s super useful about those tools is that, if your data team is being overloaded with work, you can use the tool to push back. So if someone is giving you a new project, you can point to all the other projects you already have and say, “these are all the projects that won’t be getting done if I take this one on.” Make the tool work for you!

To sum up, I say you try Scrum. After a few projects you can start doing a data analysis on Scrum, estimating how much of a time fudge factor you should add to each estimate do to unforeseen data issues.

I hope that’s helpful,

Aunt Pythia

——

Please submit your question to Aunt Pythia!

Share this:

Like this:

Related

There’s no reason dudes have to pee standing up when it’s done in someone’s home. I have no idea when and where guys’ masculinity somehow became attached to this silly and immature practice. Larry David did a funny bit on the subject in one of his Curb episodes.

When I was about ten years old I accidentally walked in on my grandfather peeing in his bathroom (thankfully my one and only “walked in on” experience growing up). He was sitting down backwards, facing the tank. Not sure about the backwards part, but I later realized he probably sat down for no other reason than it keeps the bathroom floor clean and dry. Seemed like a good enough reason to me, so I adopted the practice decades ago (in people’s homes anyway).

mathematrucker

May 25, 2013 at 11:14 am

It just occurred to me that maybe sitting backwards represented a compromise for my grandfather: he was willing to sit down, but not “like a girl”.

KW

May 25, 2013 at 12:37 pm

I’ve found that scrum can be rather useful as long as you can control the length of the sprints and you don’t try to use it to plan too far out. For instance, maybe the first sprint is one week, and it consists of the tasks

* “Understand how the data is laid out (1d)” with a deliverable of a repo filled with i/o code
* “Exploratory analysis (2d)” where the ticket has a bunch of suggested metrics and visualizations to make with a small buffer for “and others” and the deliverable is a bunch of visualizations (I keep track of this sort of work in a running Google doc shared with my team with references to git commits)
* “Preliminary model development (2d)” whose deliverable is a document which outlines the math and models I’m going to try first and relies on the visualizations I conveniently made the few days before in the previous document.

Now that “Preliminary model development” gives you a whole host of tickets to try out. You know about how long it will take you to code up an HMM or run a regression or train some neural net or whatever. The deliverable in each case is code in a repo and entries in a running document of your progress. Then just reserve a day in your sprints for “Taking stock” or something like that, and then by the end of the week you’ll know exactly how to make a new sprint.

When you’re planning more long term, I wouldn’t try until you’ve completed the first sprint and have made that preliminary model development document. You should be able to build an “if all goes well” game plan from it. And then you just pad your long term planning with a couple weeks of “it never goes well” time. If your manager insists on the next eight one-week sprints being planned out explicitly for a data project, she’s crazy and will be disappointed no matter what. So use stories with themes like “model coding”, “model comparison”, “documentation”, “training”, and what not to describe those sprints. That’s about as specific as it can get, I think, but it does help to keep you on track.

Sounds like a plan to me. One of the implied assumptions of Scrum, it seems to me, is that a complex project can be (at least in the short term or in the regime of small variation) broken up into sub-projects that are meaningful to the complex project. This is an assumption of linearity, which may be useful in the small, but must be carefully tested in the large (such as in the interaction of the completed sub-projects in the ongoing evolution of the complex project).

A simplified example of this process might be say a complex project that, unbeknownst to the Scrum team, is described by a string of factors of variable x including the expression sin(x). A sprint could be done that found, in the small, a proportionality to x. To deliver this sub-project result as an isolated, positivist product component that is prime-time-ready for the stakeholder or customer would be a complete mistake, because x is like sin(x) only in a severely limited linearized regime.