Author: Anders S

Why are 76% of all math PhDs awarded to men? One major reason, according to Stanford math professor Jo Boaler, is the way math is taught.

At Stanford University, I teach some of the country’s highest achievers. But when they enter fast-paced lecture halls, even those were successful in high school mathematics start to think they’re not good enough. One of my undergraduates described the panic she felt when trying to keep pace with a professor: “The material felt like it was flying over my head,” she wrote. “It was like I was watching a lecture at 2x or 3x speed and there was no way to pause or replay it.” She described her fear of failure as “crippling.” This student questioned her intelligence and started to rethink whether she belonged in the field of math at all.

Research tells us that lecturers typically speak at between 100 and 125 words a minute, but students can take note of only about 20 words a minute, often leaving them feeling frustrated and defeated.

This style of teaching doesn’t work for lots of people — one college math class was enough to turn me off from math. But it hits women and people of color especially hard.

When students struggle in speed-driven math classes, they often believe the problem lies within themselves, not realizing that fast-paced lecturing is a faulty teaching method. The students most likely to internalize the problem are women and students of color.

But there’s no reason math has to be taught the way it currently is. Recently Boaler ran an interesting math teaching experiment that had impressive results.

In a recent summer camp with 81 middle school students, we taught mathematics through open, creative lessons to demonstrate how mathematics is about thinking deeply, rather than calculating quickly. After 18 lessons, the students improved their mathematics achievement on standardized tests by an average of 50%, the equivalent of 1.6 years of school.

What’s true in math is even more true in data science: if we want more people to use data science, we need to take a hard look at how it’s taught.

Yesterday, Lea Verou, author of the fabulous book CSS Secrets, announced the launch of Mavo.

Mavo helps you turn your static HTML into reactive web applications without a single line of programming code and no server backend.

Although Mavo is a tool for creating websites and web apps, I think it’s also got a lot to teach data science.

Just as there are a bunch of data science tools that let you quickly take care of business via an easy to use UI, there are hundreds of drag and drop tools for easily designing websites & web apps. And just like easy to use data science tools, these web building tools are “easy to use” right up until you want make something that’s even a little different from what the tool’s creators originally envisioned. As a Mavo research paper puts it,

research indicates that there are high levels of dissatisfaction with [Content Management Systems (CMSs) for building websites]. One reason is that CMSs impose narrow constraints on authors in terms of possible presentation – far narrower than when editing a standalone HTML and CSS document.

What happens when you need to move beyond these narrow constraints? The same thing that happens with data science: a heck of a lot of blood, sweat, and tears.

It is indicative that even implementing a simple to-do application similar to the one in Figure 1 needs 294 lines of JavaScript (not including comments) with AngularJS, 246 with Polymer, 297 with Backbone.js, and 421 with React. Other JavaScript frameworks are in the same ballpark.

Yuck!

Mavo overcomes this problem by extending HMTL so you can do an awful lot with just a few lines of simple code. For example, Mavo has a wonderful system for letting you store data in the browser, in GitHub, or on Dropbox just by adding a little HTML, and you can easily edit that data with an auto-generated, customizable UI.

Similarly, you can create a slider, store its value in a variable, and display the result with just two simple lines of HTML:

Slider value: [strength]/100

Want to display the slider value as a percentage? It’s easy to add a calculation:

Slider value: [strength/100]

At the same time, because it’s built in HTML, you’ve got a lot of control over how it looks; just change the HTML and CSS and you’re good to go.

What’s particularly nice about Mavo is that if you outstrip what’s built into it, you can switch to full blown Javascript.

MavoScript is based on a subset of JavaScript, with a few simplifications to make it easy to use even by people with no JavaScript knowledge. If a Mavo expression is more advanced than MavoScript, it is automatically parsed as JavaScript, allowing experienced web developers to do more with Mavo expressions.

Similarly, Mavo was “designed for extensibility from the ground up,” allowing you to add plug-ins to extend what users can do using HTML.

OK, you say, but still, you’re working in HTML. That’s going to turn off a lot of folks, right?

Mavo says no — and they have peer-reviewed research to back up that claim. They did a usability study with 20 users, and they discovered that

Even users with no programming experience were able to quickly craft Mavo applications.

It’s worth pausing for a moment to acknowledge what the Mavo crew did. Rather than just assuming that because they and the people who decided to check out their work like Mavo, it’s easy for beginners, they ran a study to find out. Considering that research has demonstrated that most programming languages are no easier for beginners to understand than a coding language that was randomly generated, that’s a really big step.

Obviously, if all you ever want to do is build really simple websites, a drag and drop tool is going to be hard to beat. But what Mavo has shown is that it’s possible to create a tool that gives ordinary users an awful lot of room to grow without getting clobbered by a very steep learning curve. Pandas, R, D3, and the rest of data science could learn a lot from this accomplishment.

You’ve started using a data science tool that’s supposed to “empower users.” And for some features, that’s true; it’s really easy to get some things done. But as soon as you need to take one step beyond those features — which almost always happens — it’s bang-head-against-wall time.

But that’s ok. You’re a data analyst. You know the drill. Spend enough time with a tool and eventually you’ll get it. In a few months, the weirdnesses will be second nature to you.

But there’s a big if: only if you spend a lot of your time as a data analyst.

That’s not true for a lot of people who need to crunch data. They probably have a few weekly/monthly reports or analyses that are critical to their work. But they only tweak these reports once or twice a year.

Two to three times a year, they do get to spend more time on analysis. For example, at the beginning of the year they may set up some reports to track their team or department’s new goals, and they analyze the results at the end of the year. They may also have a quarterly report they tweak every once in a while. But they’re not spending time every week or even every month immersed in the tool.

And that is going to bite them on the ass. Even if initially they can carve out enough time to figure out the bizarre commands needed to get something done, six months from now will they remember what they did and why? Not likely.

It’s not that these analysts don’t want to spend more time crunching data. They can see the potential of what they could do with the data they have if they only could spare the time. But it’s only a small slice of the work on their plate.

Ironically, if it was quicker & easier to do some work in data science, they might be able to muck around more frequently. Right now, it’s just not worth the hassle given all the other work on a typical part-time data analyst’s plate.

As AI technology improves, there will be even more part-time analysts who are struggling with this challenge. IBM, Microsoft, and hundreds of startups are trying to figure out how to automate as much of the work involved in using machine learning and other complex techniques. The closer they get to putting these techniques in the hands of Excel power users, the more likely the world of data science will include lots of people who are actively flexing their data science muscles infrequently.

Most of data science is built around the implicit assumption that the people who do it will either be working full time or part time on it. That assumption is understandable: in the world of coding, it’s largely true. But for data science to reach its full potential, it’s going to need to embrace users who don’t or can’t spend anywhere near that kind of time.

Boot camps have become increasingly popular way for folks in the community to get started in Data Science. It’s understandable why. Data Science can be pretty overwhelming at first, so getting a concentrated dose with lots of support can be invaluable.

I have a tremendous amount of respect for the people who make data science boot camps happen, and they have made a huge difference in the lives of some of the people who have gone through them. But I think we are at the point where we are hitting the limits of boot camps.

First, most boot camps cost more money than many people can afford. A number of programs aimed at increasing the diversity of Data Science offer scholarships for some or all of their participants. But given how much boot camps cost to run, they can only reach a limited number of people. As a model, it just doesn’t scale – and given how many data science jobs there are out there, that’s a serious problem.

Similarly, most boot camps take far more time than many people can afford. Again, boot camps that try to increase the diversity of data science work very hard to help folks overcome this barrier. But for single working parents and many other people, a model built on one very concentrated dose of learning over several months just isn’t going to work.

Finally, most boot camps simply can’t afford to provide real support once the boot camp is over. This is an issue for a lot of folks who go through boot camps. Because no matter how dedicated the instructors are, many folks can only absorb so much info at one time. That’s a problem even if you’re just learning one programming language or skill. But to retain even a basic mastery of the array of skills many data science jobs require, boot camps don’t offer a good answer.

So in addition to boot camps, we need another approach that can scale up. That’s why Data Chefs argues for creating a continuum of tools and smooth the the learning curve among these tools . Part of the reason we need boot camps is that learning these tools is way too hard. Many of these tools are open source, and of the tools that aren’t they are very interested in growing their markets. There is no reason why a movement couldn’t change the trajectory of these tools to make it far easier to get started and far easier to make progress.

Similarly, there’s no reason we couldn’t create a more robust, community-centered ecosystem around learning and using these tools so a much wider range of folks could get exposed to them, get their feet wet, and begin to make progress at a pace that their lives could handle.

But won’t this take a lot of work? Yes, it will. But so do boot camps.

Boot camps require a staggering amount of time and energy – one of the many reasons I have so much respect for the people who make them happen. For all the time and energy that go into boot camps, they can only reach a limited number of people. And for the most part, each boot camp – or school of boot camps – is an island unto itself. As a result, they never get the payoff of having many people across many communities working together towards a common goal.

So maybe it’s time to think about taking some of the considerable energy going into boot camps right now and use it to build a solution that can reach a lot more people.

One of Data Chefs’ core assumptions is that there’s no reason data science can’t be accessible to a much wider audience. Some people think that’s crazy. Slicing and dicing data, making sense of data – it’s just too complicated for anyone other than an expert.

Back in the early 60s, that’s exactly how most folks thought about medicine. Nancy Miriam Hawley recalls recalls an encounter she had with her OB/GYN:

Imagine me as a 23 year old professional young woman asking a question after the doctor (he) recommended that I use a new –to- market pill for birth control. What’s in this pill? I ask. His response: condescending pat on my head and literally said “don’t worry your pretty little head!”

Minus the head pat, that was pretty much the standard answer doctors were expected to give. They had years and years of intensive training. How could anyone — let alone a woman — be expected to have any real say in their treatment given that they couldn’t possibly understand medicine?

In 1969, Hawley and several other women who had met at a women’s conference decided it was time for a change.

We had all experienced similar feelings of frustration and anger toward specific doctors and the medical maze in general, and initially we wanted to do something about those doctors who were condescending, paternalistic, judgmental and noninformative. As we talked and shared our experiences with one another, we realized just how much we had to learn about our bodies. So we decided on a summer project: to research those topics which we felt were particularly pertinent to learning about our bodies, to discuss in the group what we had learned, then to write papers individually or in groups of two or three, and finally to present the results in the fall as a course for women on women and their bodies.

As we developed the course we realized more and more that we really were capable of collecting, understanding, and evaluating medical information. Together we evaluated our reading of books and journals, our talks with doctors and friends who were medical students. We found we could discuss, question, and argue with each other in a new spirit of cooperation rather than competition. We were equally struck by how important it was for us to be able to open up with one another and share our feelings about our bodies. The process of talking was as crucial as the facts themselves. Over time the facts and feelings melted together in ways that touched us very deeply, and that is reflected in the changing titles of the course and then the book, from “Women and Their Bodies” to “Women and Our Bodies” to, finally, “Our Bodies, Ourselves.”

Today, the idea that we couldn’t understand enough about medicine to have an informed opinion seems about as antiquated as using leeches. In fact, these days you can even get a degree in the art and science of making medical information accessible to the public.

And as complex as data science is, it’s not in the same league as medicine. To understand the human body, you need to understand biology, physics, chemistry, psychology, statistics, etc. In fact, medicine is so complex that even someone with years and years of training in one medical specialty isn’t qualified to have an expert opinion about another specialty.

So the next time someone talking about data science does the equivalent of patting you on the head, remember that the only reason that they can get away with that crap is that we are just at the beginning of a movement that’s committed to do in data science what those women did “about those doctors who were condescending, paternalistic, judgmental and noninformative.”

Sarah Drasner is an expert in the arcane, super geeky world of Scalable Vector Graphics (SVG) animation — basically one of the main ways to do really cool interactive work, like data viz, on the web. Parts of SVG animation can be mind-numbingly painful enough that it can make daytime drinking under your desk seem like a very reasonable response. Drasner’s book, SVG Animation, which was published by O’Reilly, is hands down the best book on this subject. And yet in 2017, she still has to put up with crap like this:

When we first started Data Chefs, we thought we were going to focus on making it a lot easier to clean up and slice & dice data. Many folks in “data science” will tell you that they spend the vast majority of their time prepping their data before they can analyze it, so if we could make this easier for folks in the community, it would be a real win.

After banging our heads against various doors, we realized we were trapped the chicken or the egg problem: without concrete to show folks, the idea that they could have any say on how easy/hard the tools for wrangling data were seemed overwhelming, and without community folks to figure out what “easier” looked like, we wouldn’t be able to convince data geeks to build easier-to-use tools.

So over the past six months, we’ve gradually been shifting our focus. You’re going to start seeing a lot more about data visualization on this blog. That’s because even if people are scared of or overwhelmed by the idea that they could shape the tools they use, everybody likes shiny objects. So going forward, we’re going to explore what it means to make an organization data visualization-literate, and we’re going to do some D3 experiments to see if we can smooth its learning curve.

That said, we aren’t entirely giving up on data manipulation. Although we weren’t able to put together a community of folks, we did learn a lot about the problems that a Data Chefs approach would have to solve; I’ll blog about it in the next few months.

At the beginning of January, Motherboard’s Michael Byrne argued that if you had a New Year’s resolution to start learning how to code, you should learn to do it ” the hard way.”

I learned to program in C at a community college and I wouldn’t have done it any other way….. I was hooked on problem solving and algorithms. This is a thing unique to programming: the immediate, nearly continuous rush of problem solving. To me, programming feels like solving puzzles (or, rather, it is solving puzzles), or a great boss fight in a video game….

The point is to learn programming as it is nakedly, minus as much gunk and fluff that can possibly be removed from the experience.

Let’s put aside for a moment the “hard way” vs “gunk and fluff” pseudo-macho, my-head-was-shoved-in-a-locker-in-high-school-and-I-haven’t-got-over-it framing used by guys like Byrne. There are an awful lot of people who’d agree this is the best way to teach anyone who wants to learn. But that’s because it appeals to them, not because empirical evidence backs them up.

Researchers in the University’s Informatics department asked pupils at a secondary school to design and program their own computer game using a new visual programming language that shows pupils the computer programs they have written in plain English.

Dr Kate Howland and Dr Judith Good found that the girls in the classroom wrote more complex programs in their games than the boys and also learnt more about coding compared to the boys.

Why did girls do so much better? Here’s what Good thinks is happening:

Given that girls’ attainment in literacy is higher than boys across all stages of the primary and secondary school curriculum, it may be that explicitly tying programming to an activity that they tend to do well in leads to a commensurate gain in their programming skills.

In other words, if girls’ stories are typically more complex and well developed, then when creating stories in games, their stories will also require more sophisticated programs in order for their games to work.

And it isn’t just that these girls are more skilled at telling complex stories; they also enjoy doing it.

It’s an important lesson, not only for teaching children but also adults. If you want to make Data Science more accessible, the first thing we need to ask is where are the audiences we are trying to reach coming from? If we understand what get someone fired up and what skills they bring to the table, it can go a long way in unlocking their ability — and just as importantly, their desire — to excel in this new field.

UPDATE: I’d also like to point out that there are a decent number of guys like me who, unlike Byrne, think solving abstract puzzles is boring as hell. In my three decades of coding and managing complex software projects, this lack of enthusiasm for abstract puzzles hasn’t been a problem so far.

Aliya Rahman, a former union organizer and former director of Code for Progress, gave a terrific talk called Desegregation in the Age of ‘Innovation’.
If you want to hear a very smart, funny, nuanced, in-the-trenches take on how to use / work in tech to help folks build power for social change, check it out.

One of the reactions I’ve gotten to the argument behind my last post is that it’s unrealistic to think we can smooth data science’s learning curve. When you get beyond very simple point and click, you’ve got to immerse yourself in the dirty details of how statistics, machine learning, etc. work. In other words, we can’t really make data science accessible because the body of knowledge you need to go beyond baby steps is just too large.

When I first ran into this argument, I would reply with stories about the skilled practitioners in the field I’ve worked with who’ve forgotten a lot of what they learned in, say, intro stats – couldn’t perform a chi-square test by hand if their life depended on it – but still produce very powerful, highly influential work. These days my answer is a lot simpler.

Let’s have a show of hands of everyone who has relatives or friends who are amazing cooks. Now keep your hand up if most of those amazing cooks know the chemistry and physics behind what they do. Not a whole lot of hands left up.

It’s not that these amazing cooks don’t have any of the knowledge that’s embodied in chemistry and physics. They know a lot about how to work with boiling water, how you know when something they’ve been frying is done, etc. But the model they have in their head – or “in their fingers” – isn’t the one you get in chemistry class.

I think Data Chefs is going to end up demonstrating that’s also true for data science: you don’t need to be a Data Chemist to bake great data cookies. I don’t have any concrete empirical data to back me up. But neither do the people who are saying it can’t be done. All we know for sure is that that’s not how it’s been taught in the past. And if the data-driven revolution has taught us anything, you wouldn’t want to build the foundation of data science training on “but that’s the way it’s always been done.”