the personal blog of Akshay Agrawal

Per my sister’s recommendation, I recently picked up Travels with Charley, Steinbeck’s account1 of a cross-country road trip he took one summer with his beloved Poodle in tow.

Steinbeck’s favorite kind of journey is a meandering one. By his own admission, he’s “going somewhere” but “doesn’t greatly care whether” he arrives2. Reflecting upon a leisurely detour through Maine’s potato farms, he writes,

everything in the world must have design or the human mind rejects it. But in addition it must have purpose or the human conscience shies away from it. Maine was my design, potatoes my purpose.

It’s tempting to interrogate whether your pursuits are meaningful, be they hobbies or careers3. A degree of such interrogation can be constructive: living with intention necessitates a design and a purpose. But indulge too much and you risk descending into a Hamlet-esque, nihilistic spiral that will inevitably derail your pursuit. The last thing you (and certainly I) want is to end up as Camus’ strawman, the individual who cannot cope with his discovery that life is without meaning. That Steinbeck’s design was Maine and his purpose potatoes is a gentle reminder that our own designs and purposes need not be grand. All that we require of them is to exist.

Footnotes[1] The introduction to the book’s 50th anniversary edition cautions readers against taking Steinbeck’s story too literally, for he was “a novelist at heart.” But the book reads truthfully enough and, just as important, entertainingly enough. As author and writing instructor John McPhee joked in an interview with The New Yorker’sDavid Remnick, 94 percent accuracy is good enough for creative non-fiction.[2] Approaching our actions with such a sentiment is precisely the Bhagavad Gita’s prescription for attaining the Good Life. For that matter, it is also the prescription of Kierkegaard’sFear and Trembling. Both recommend we resign ourselves to the frustration of our desires, but that we do so happily so that we may pursue them nonetheless. If this sounds difficult to you, you’re not alone; Kierkegaard’s narrator describes this process as something he cannot hope to understand, though he spends the entire text describing it.[3] Academics at MIT’s Sloan School of Management recently asked 135 people what made their work meaningful. For many, meaningful work is simultaneously “intensely personal” and bigger than themselves.

Say you’ve got a normal random variable with mean zero and variance one. Draw a sample from that distribution. Now take that point you’ve just sampled and spawn a unit-variance normal distribution centered around it; draw a sample from this distribution. Continue this process $N$ times for arbitrary $N$. What is the distribution of the $N$th sample?

Michael told me that he and his friends at Cornell thought up this problem during a lull in a dinner-time conversation. The solution is after the break. A hint: It’s either simpler than it appears or, if you’ve done a bit of math, just as simple as it appears. read more »

Earlier this summer, I crossed the Atlantic and traveled to Madrid to give a talk at the 8th International Conference on Educational Data Mining. I presented a prototype, built by myself and my colleagues at Stanford, that stages intelligent interventions in the discussion forums of Massive Open Online Courses. Our pipeline, dubbed YouEDU, detects confusion in forum posts and recommends instructional video snippets to their presumably confused authors.

The Educational Data Mining Conference took place in Madrid this year. Pictured above the Retiro Pond in Buen Retiro Park. It has nothing to do with EDM. But I enjoyed the park so please enjoy the picture.

No, not that kind of EDM
Educational Data Mining — affectionately collapsed to EDM — might sound opaque. From the society’s website, EDM is the science and practice of

developing methods for exploring the unique and increasingly large-scale data that come from educational settings, and using those methods to better understand students, and the settings which they learn in.

Any educational setting that generates data is a candidate for EDM research. So really any educational setting is a candidate, full stop. In practice, EDM-ers often find themselves focusing their efforts on computer-mediated settings, like tutoring systems, educational games, and MOOCs, perhaps because it’s easy to instrument these systems to leave behind trails of data.

Popular methods applied to these educational settings include student modeling, affect detection, and interventions. Student models attempt to approximate the knowledge that a student possesses about a particular subject, just as a teacher might assess her student, while affect detectors classify the behavior and emotional states of students. Interventions attempt to improve the experience of students at critical times. My own work marries affect detectors with interventions in an attempt to improve MOOC discussion forums.

Making discussion forums smarterI became interested in augmenting online education with artificial intelligence a couple of years ago, after listening to a talk at Google and speaking with Peter Norvig. That interest lay dormant for a year, until I began working as a teaching assistant for a Stanford MOOC. I spent a lot of time answering questions in the discussion forum, questions asked by thousands of students. Helping these students was fulfilling work, to be sure. But slogging through a single, unorganized stream of questions and manually identifying urgent ones wasn’t particularly fun. I would have loved an automatically organized inbox of questions.

The YouEDU architecture. Posts are fed to a classifier that screens posts for confusion, and our recommender then fetches clips relevant to the confused posts.

That these discussion forums were still “dumb”, so to speak, surprised me. I reached out to the platforms team of Stanford Online Learning, who in turn sent me to Andreas Paepcke, a senior research scientist (and, I should add, an incredibly supportive and kind mentor). It turned out that I wasn’t the only one who wished for a more intelligent discussion forum. I paired up with a student of Andreas’ to tackle the problem of automatically classifying posts by the affect or sentiment they expressed.

Our initial efforts at affect detection were circumscribed by the data available to us. Machine learning tasks like ours need human-tagged data — in our case, we needed a dataset of forum posts in which each post was tagged with information about the affect expressed in it. At the time, no such dataset existed. So we created one: the Stanford MOOCPosts dataset, available to researchers upon request.

The dataset powered the rest of our work. It enabled us to build a model to predict whether or not a post expressed confusion, as well as a pipeline to recommend relevant clips from instructional videos to the author of that confused post.

YouEDU was not meant to replace teaching assistants in MOOCs. Videos are notoriously difficult to search through (they’re not indexed, like books are), and YouEDU simply helps confused students find content relevant to the topic they’re confused about. Our affect classifiers can also be used outside of YouEDU — for example, they could be used to highlight urgent posts for the instructors, or even for other students in the forum.

Data mining is not nefariousMy experience at EDM was a great one. I learned lots from learned people, made lasting friends and memories, and so on. I could talk at length about interesting talks and papers — like Streeter’s mixture modeling of learning curves, or MacLellan’s slip-aware bounded logistic regression. But I won’t. You can skim the proceedings on your own time.

The EDM community is tightly knit, or at least more tightly knit that that of ACM’s Learning @ Scale, the only other education conference I’ve attended. And though no raves were attended, EDM-ers did close the conference by dancing the night away in a bar, after dining, drinking, and singing upon the roof of the Reina Victoria.

Festivities aside, a shared sense of urgency pulsed through the conference. As of late, the public has grown increasingly suspicious of those who collect and analyze data en masse. We see it in popular culture: Ex Machina, for example, with its damning rendition of a Google-like Big Brother who recklessly and dangerously abuses data, captures the sentiment well. The public’s suspicion is certainly justified, but its non-discriminating nature becomes problematic for EDM-ers. The public fears that those analyzing student data are, like Ex Machina’s tragic genius, either greedy, hoping to manipulate education in order to monetize it, or careless, liable to botch students’ education altogether. For the record, neither is true. EDM researchers are both well-intentioned and competent.

What’s an EDM-er to do? Some at the conference casually floated the idea of rebranding — for example, perhaps they should call themselves educational data scientists, not miners. Perhaps, too, they should write to legislators to convince them that their particular data mining tasks are not nefarious. In a rare example of representative government working as intended, Senator Vitter of Louisiana recently introduced a bill that threatens to cripple EDM efforts. The Student Privacy Protection Act, a proposed amendment to FERPA, would make it illegal for researchers to, among other things, assess or model psychological states, behaviors, or beliefs.

Were Vitter’s bill to go into effect as law, it would potentially wipe out the entire field of affect modeling. What’s more, the bill would ultimately harm the experience of students enrolled in online courses — as I hope YouEDU shows, students’ online learning experiences can be significantly improved by intelligent systems.

Now, that said, I understand why folks might fear a computational system that could predict behavior. I could imagine a scenario in which an educator mapped predicted affect to different curricula; students who appeared confused would be placed in a slow curricula, while those who appeared knowledgeable would be placed in a faster one. Such tracking would likely fulfill the prophecies of the predictor, creating an artificial and unfortunate gap between the “confused” and “knowledgeable” students. In this scenario, however, the predictive model isn’t inherently harmful to the student’s education. The problem instead lies with the misguided educator. Indeed, consider the following paper-and-pencil equivalent of this situation. Our educational system puts too much stock in tests, a type of predictive tool. Perform poorly on a single math test in the fifth grade and you might be placed onto a slow track, making it even less likely you’ll end up mathematically inclined. Does that mean we should ban tests outright? Probably not. It just means that we should think more carefully about the policies we design around tests. And so it is for the virtual: It is the human abuse of predictive modeling, rather than predictive modeling in and of itself, that we should guard against.

Equipped with shiny machine learning tools, computer scientists these days are optimizing lots of previously manual tasks. The idea is that AI can make certain procedures smarter — we can capitalize on a system’s predictability and implicit structure to automate at least part of the task at hand.

For all the progress we’ve made recently in soulmate-searching pipelines and essay-grading tools, I haven’t seen too many applications of AI to computer infrastructure. AI could solve interesting infrastructure problems, particularly when it comes to distributed systems — in a reflexive sort of way, machines can and should use machine learning to learn more about themselves.

Being smart about it: The case for intelligent storage systems
Distributed systems cover a lot of ground; to stop myself from rambling too much, I’ll focus on distributed storage systems here. In these systems, lots of machines work together to provide a transparent storage solution to some number of clients. Different machines often see different workloads — for example, some machines might store particularly hot (i.e., frequently accessed) data, while others might be home to colder data. The variability in workloads matters because particular workloads play better with particular types of storage media.

Manually optimizing for these workloads isn’t feasible. There are just too many files and independent workloads for humans to make good, case-by-case decisions about where files should be stored.

The ideal, then, is a smart storage system. A smart system would automatically adapt to whatever workload we threw at it. By analyzing file system metadata, it would make predictions about files’ eventual usage characteristics and decide where to store them accordingly. If a file looked like it would be hot or short-lived, the smart system could cache it in RAM or flash; otherwise, it could put it on disk. Creating policies with predictive policy would not only minimize IT administrators’ work, but would also boost performance, lowering latency and increasing throughput on average.

From the past, a view into the future: Self-* storage systems
To my surprise, there doesn’t seem to be a whole lot of work in making storage systems smarter. The largest effort I came across was the self-* storage initiative, undertaken by a few faculty over at CMU back in 2003. From their white paper,

There’s a wealth of interesting content to be found in the self-* papers. In particular, in Attribute-Based File Prediction, the authors propose ways to exploit metadata and information latent in filenames to bucket files into binary classes related to their sizes, access permissions, and lifespans.

Predictions were made using decision trees, which were constructed using the ID3 algorithm. With the root node corresponding to the entire feature space, ID3 splits the tree into two sub-trees corresponding to the feature that seems like the best predictor (the metric used here is typically information gain, but the self-* project used the chi-squared statistic). The algorithm then recursively builds a tree whose leaf nodes correspond to classes. As an aside, it turns out that ID3 tends to overfit training data — these lecture notes discuss ways to prune decision trees in an attempt to increase their predictive power.

The features used were coarse. For example, files’ basenames were broken into three chunks: prefixes (characters preceding the first period), extensions (characters preceding the last period), and middles (everything in between); directories were disregarded. These simple heuristics proved fairly effective; prediction accuracy didn’t fall below 70 percent.

It’s not clear how a decision tree trained using these same features would perform if more granular predictions were desired, or if the observed filenames were less structured (what if they lacked delimiters?). I could imagine a much richer feature set for filenames; possible features might include the number of directories, the ratio of numbers to characters, TTLs, etc.

From research to reality: Picking up where self-* left off
The self-* project was an ambitious one — the researchers planned to launch a large scale implementation of it called Ursa Major, which would offer 100s of terabytes of automatically tuned storage to CMU researchers.

I recently corresponded with CMU professor Greg Ganger, who led the self-* project. It turns out that Ursa Major never fully materialized, though significant and practical progress in smart storage systems was made nonetheless. That the self-* project lives no longer doesn’t mean that idea of smart storage systems should die, too. The onus lies with us to pick up the torch, and to continue where the folks at CMU left off.

I took a trip up to San Francisco’s Exploratorium, some two weeks past. Though recently relocated, the Exploratorium is comfortably familiar. It’s still packed with exhibits that span the spectrum from mystically enchanting (one station lets museum-goers create delicate purple auroras that warp and spiral in a glass tube) to delightfully curious (another rapidly spins dozens of Lego Batmen and dolphins, making them dance to the tune of the Caped Crusader’s catchy theme song).

Exhibits at this unconventional museum are designed to stir your curiosity. It’s hard to resist playing with them, but of course there’s no need to — almost everything is hands-on. Photo by Sara Yang.

I meandered through the museum, all the while searching for a particular treasure. Just before the closing bells rung, I stumbled upon it: the cloud chamber, a large, humming, refrigerated box with a sky-facing window that allows for the observation of cosmic radiation. Cosmic rays hail from the beyond the solar system. They collide in the earth’s atmosphere, and minuscule particles rain torrentially upon us in the aftermath. The cloud chamber makes an otherwise imperceptible and invisible downpour from the heavens palpably visible, if only for a fleeting moment.

Our homemade cloud chamber consists of a small box with a lid lined with black felt. In order to nudge muons into uncloaking themselves, we douse the felt with isopropanol and heat it from above with my desk lamp.

The sight brought me back four years, to the first time I saw muons zip hither and thither through the same chamber. I had spent the better part of that year in my garage, tinkering with a friend of mine by the name of Hemanth on our own chamber for a science project.

On a nostalgic whim, I called up Hemanth the next day. We decided to fire up the chamber once again, for old times’ sake. We scrounged the necessary components, lugged them to Hemanth’s garage, and got started. Pulverizing dry ice, we began working to the sound of snow crunching underfoot and the sight of fumes eddying about.

With thick gloves and sturdy hammers, we first crush the dry ice into a coarse powder and pack it tightly into a Styrofoam base, on top of which the chamber sits. The one-two punch of a cooling source and a heating source forces the alcohol into a supersaturated, supercooled state. Muons streaking through the chamber rip electrons off the vapor, causing water molecules to visibly condense around their paths.

I followed our procedure as if on autopilot; my mind wandered and let bittersweet memories leak. We packed the dry ice into a foam base (days colored by failed prototype runs), doused the chamber with isopropanol (afternoons brightened by faint flashes of muons), and positioned my lamp atop the box (nights illuminated by the bluish glow of computer monitors).

Our small glass box held us rapt, as we saw the ghosts of muons pass through it. Unfortunately, the streaks are difficult to capture on camera.

We left the chamber to run for some time. When we returned, muons were streaking visibly through it. Spellbound, we lingered by the chamber for over half an hour. Four years ago, an anxious desire to create something novel and a preoccupation with results left little room for wonder. Now, we could stare into the cloud chamber for but the simple sake of doing so. The muons that passed through it, falling like delicate strands of spider web, were, paradoxically, both otherworldly and earthly. Our small glass box, glued together by a mom-and-pop craft shop, had become a window into the universe’s secrets. The sight was as humbling as it was beautiful.

If you’d like to take a look at the data referenced in this post and the script I wrote to gather them, feel free to head over to my Github.

I’ll admit it: I take stock in video game reviews. I don’t read them religiously, nor am I insulted when a site rates a favorite of mine a bit too low. I read reviews to find out which games I might like. With so many games and so little time, reviews and ratings are, for me, a much-needed filtering process.

But video game ratings come with problems. On a 10 point scale, what’s the difference between a 7.3 and a 7.5? Even a 7 and an 8? And why does it seem like reviewers hardly dish out scores from the bottom half of their scales? How can reviewers package their opinions, highly subjective and finicky things that they are, into definitive scores? Numbers feel a lot more objective than words. It might be this perceived objectivity that makes people protest the seemingly arbitrary scores that reviewers select. (I wouldn’t call these ratings arbitrary — they’re backed by opinion. Subjective? Yes.)

Present-day game rating aggregators add to these problems. Averaging across multiple sites, each with their own distributions, makes things messy. A score of 7 from one site very likely might be equivalent to a 5 from another. And what about those troublesome letter-grade ratings? Metacritic converts Cs to 50s — but last time I checked, a C mapped to a 70 percent or so in school. Metacritic claims that they generate their aggregate scores after running their data through a weighted, proprietary algorithm, and I’m sure they do. But I question their algorithm’s efficacy. Pick a few games on their website and do an old-fashioned average of the scores they list. The score you come out with will likely closely resemble their aggregate score.

But I digress.

I decided to do some investigation into the nature of game ratings. I’m currently in the process of building a dataset of game ratings from different sites. I started with IGN. After a bit of web scraping, I collected review data on the 75,005 games that IGN kept track of as of July 13, 2013. Of those, only 17,027 had ratings in IGN’s index (others were not rated, marked as “NR”). I coded a python script to collect the data and used BeautifulSoup for HTML parsing. I analyzed the data with an evaluation copy of Wizard, a suite of statistical tools.

Without further ado, here are some summary statistics of IGN’s rating distribution, peppered with some tidbits of information that I found interesting.

A distribution of IGN’s 17,027 rated games. The distribution has a mean of 6.877 with a standard deviation of 1.762, and a median of 7.2 with an IQR of 2.1. All but one of the 51 0s are data collection errors that I didn’t prune out.

Mean score: Approximately 6.9
Standard deviation: Approximately 1.7
Skew: left

IGN’s index erroneously assigns a number of games 0s in their index (at least one game, though, actually did manage to earn a zero). I didn’t manually prune these entries from my data, so the above summary statistics are slightly off.

The highs and the lows
Only 314 games, or about two percent of rated games, received a score of 9.5 or above. IGN decorated 38 games with 10s. Classic games (the Zelda, Mario, and Pokemon franchises) dominated here. Some newer games did manage to break into the Hall of 10s, notably Naughty Dog’s third Uncharted entry and their recent The Last of Us. Rockstar’s GTA games and Red Dead DLC also carved out spots for themselves, as did a couple of Kojima’s Metal Gear Solid games. Some 10s you might not have heard of include Checkered Flag, Joust, and Shanghai (all Atari Lynx games), and Tornado Mania, a mobile game. Infinity Blade II was the sole representer of the iPhone.

Of the major modern consoles, the Nintendo Wii had the lowest median rating (a 6.8). The Wii U, Xbox 360, PS3, and PC have median scores of 7.5.

A closing remark
With a median of 7.2 and an IQR of 2.1 (i.e., 50 percent of scores lie between 6.0 and 8.1), it does look like IGN awards higher scores more often than not. This does not mean that they’re doing anything illicit (this might seem obvious, but you’d be surprised at some of the shoddy “journalism” out there that sensationally misinterprets data). Perhaps IGN thinks that most games just aren’t that bad.

I’ve got a hunch that other review sites’ distributions won’t match IGN’s, just as they likely won’t match each other. So I’ll gather more data — maybe I’ll be able to do something with them.

Update, July 23, 2013: Previously, this article linked to a website that I believed had misinterpreted ratings data. This post no longer links to that site.

Update, July 28, 2013: This post now links to my Github repository containing the source code and data referenced here.

Does your website load too slowly? Try using a content delivery network and caching static content. Read on for details and a quick-and-easy three-step guide.

I’ve never been satisfied with Debug Mind’s speed. If you ever visited this site before, you probably drummed your fingers impatiently while your browser slowly painted the web page down the screen. Having done more than my fair share of finger-drumming, I decided to make this site faster: Debug Mind now uses both CloudFlare, a content delivery network and optimizing service, and W3 Total Cache, a WordPress caching plugin.

Below you’ll find an outline of the steps I took to speed up my website. While the third step is WordPress-specific, the first two should be applicable to most websites.

Step one: Diagnose the bottlenecks
Before making your website faster, you might want to know why it’s slow. If you don’t care for the “why” and simply want results, skip ahead to step two. For those of you curious about why your site loads slowly, I recommend completing step one.

Analyze HTTP requests with Chrome Developer Tools
The Chrome browser packages a handy set of tools that you can use to analyze websites. If for some reason you don’t want Chrome on your machine, alternatives to Developer Tools exist for other browsers and a Google search will likely find them.

Chrome Developer Tools can help you diagnose speed bottlenecks. Often times uncached images and poorly written plugins slow your website down.

Assuming that you’re using Chrome, navigate to the browser menu’s tools section and click on Developer Tools. Click on the Network tab and then navigate over to the website you want to analyze. Once the website loads, you’ll see a list of requests that your browser made while fetching your site. The timeline tells you how long each request took. Isolate requests with the longest streaks in the timeline. These are the things that are slowing your website down. Chances are that the bottlenecks are image loads and other static media. Other things — perhaps plugins, if you’re on WordPress or something comparable — might also contribute to the lag. read more »

I was careless — in over two years, I never made a backup of this site. You can imagine how I felt when, two days ago, I accidentally deleted every single comment posted here. Fortunately, it turns out that Hostmonster creates courtesy backups for its customers.

I chatted Hostmonster’s tech support and asked them to restore my site for me. They went right to it, and an hour later the restore finished. Or so they claimed — my database was left untouched. I called them up and asked them specifically to restore the database; déjà vu ensued. Since Hostmonster gives me access to these backups, I did a manual restore and lo and behold! The comments had returned.

I lucked out — my comments very well could have disappeared forever. You can bet I’ll religiously backup my site from now (well, actually yesterday) on.

So if you ever find yourself in desperate need of a backup, contact your web hosting service — you might luck out like I did. If your luck runs out with technical support, you can do a manual database restore through (e.g.) phpMyAdmin.

If you don’t already back up your website, I strongly encourage you to start. You’ll probably want to make backups of your database and all your files. An FTP client like Filezilla, coupled with phpMyAdmin, makes backing up your site really simple.

Have any questions about creating and / or restoring backups? I’m happy to help! Just leave a comment and I’ll get back to you.

The New York Times recently published a Guantanamo detainee’s account of the brutal treatment he’s received at his prison. In clear, cutting language, the Yemeni prisoner tell us of his hunger strike — of the tubes thrust “up [his] nose” (and, once, “18 inches into [his] stomach”) that shovel food down his throat.

The violent imagery that he conjures is powerful. But more powerful is the man’s earnest voice. It’s a helpless voice, beaten down by 11 years worth of suffering and humiliation: He tells us of when he was denied toilet usage, and of when he wasn’t permitted to change out of clothes onto which force-fed food had slopped and dribbled. But, remarkably, his is a voice not yet broken:

“And there is no end in sight to our imprisonment. Denying ourselves food and risking death every day is the choice we have made.”

When I first read this article a few days back, I thought that, perhaps, it would effect change, that perhaps the people might rally around it and call for the closing of Guantanamo, or at least ask why it was still open. NPR, The Atlantic, Huffington Post and others took note, and Reddit raised a storm. But I don’t think attention is enough. How do we translate attention into action? When will people demand Guantanamo’s closing? What will it take for Mr. President to belatedly make good on his promise? I don’t pretend to have the answers — but I’m frustrated just the same.