Thursday, September 29, 2011

Yesterday, I thought the best talks I saw were by Devavrat Shah and Dina Katabi.

Dev was talking about how to track where rumors start. The model is you have a graph of nodes that have been infected (by a rumor, or disease, or whatever), and based on the the shape of that graph, you want to figure out where the process started. To get a picture of what I mean, suppose you were looking at a grid graph, and the infected area looked roughly like a circle. You'd expect the infection started somewhere near the middle of the circle. They've placed this in a mathematical framework (a distribution for the time to cross an edge, starting with tree graphs, etc.) that allows for analyzing these sorts of processes. This seems to be the arxiv version of the work.

Dina talked about 802.11n+, which they describe as "a fully distributed random access protocol for MIMO networks. 802.11n+ allows nodes that differ in the number of antennas to contend not just for time, but also for the degrees of freedom provided by multiple antennas." By making use of nulling and alignment, they can extend 802.11 so that instead of competing for time slots, multiple antenna systems can compete for degrees of freedom within a time slot, allowing those devices with multiple antennas to take full advantage (and in particular allowing them to overlap transmissions with devices with fewer antennas). I would have heard about it earlier, I guess, if I had gone to SIGCOMM this year. The project page is here.

There's a whole session this morning on "mean field analysis" and games. Mean field analysis (in my loose interpretation) means pretend your system gets big and turn it into a differential equation -- apparently a useful way to tackle various large-scale distributed agent/learning systems. It's what I used to study load-balancing systems way back for my thesis (and still find an occasional uses for today). Interesting to see it used for another set of problems. Ramesh Johari's student (I didn't catch which one) gave a talk on a really interesting model of dynamic auctions with learning where you could gain some real insight using a mean-field analysis. (How much should agents in an auction "overbid" when they're learning their value for winning an auction? It depends on how much they think they'll gain in the future based on what they learn about their true value.) This seems to be a preliminary version of the work.

Now for the "negative" -- something I'm going to suggest to the Allerton folks. It's time, I think, to find a different location. The Allerton conference center is very beautiful, but it's not suitable, in many respects, for this event any more. I was told there were about 330 registered attendees from outside UIUC, plus an additional 120 or so from UIUC. It's a bit hard to turn that into a true count; many (most?) UIUC people probably drop by 1 day, most attendees probably 1.5-2 days of the three. But Allerton really wasn't designed for a crowd that large. If you're not in one of the "big rooms", and people want to come to your talk, many times they can't get in. For example, the social network session was remarkably popular; the room could seat 40 people, and there were about 20+ people crowded around and outside the doorway -- which was not only not ideal, but it was also disruptive, as the open door meant a lot of noise in the room. Actually, the acoustics aren't particularly good in any of the rooms, even the big ones. Attendance in the early am is very low, in part I think because people staying "in town" have a non-trivial drive to get to Allerton.

When the event was smaller, like 200-250 people, everyone coped with these problems. It's really not working now.

I can understand there's an attachment to the Allerton center -- and how could it be the "Allerton conference" (next year is it's 50th year!) if it wasn't at Allerton? But this has been an issue for years, and only seems to get worse. There must be a conference venue on or near UIUC campus that would work as well -- at the very least, my opinion is it's time to look....

Wednesday, September 28, 2011

I woke up at an absurdly early hour this morning to get on a plane and go to the Allerton conference. I'm giving a talk this afternoon on Invertible Bloom Lookup Tables (arxiv link) (joint work with Michael Goodrich). It's a "lookup table", not a Bloom filter, because you want to be able to store key-value pairs; you query for a key, and get back a value. It's invertible, because besides being able to do lookups, you can "invert" the data structure and get back all the key-value pairs it contains (with high probability, assuming you haven't overloaded the structure). This paper is really on the theory, but if you want to see a cool use for IBLTs, there's a paper by Eppstein, Goodrich, Uyeda, and Varghese from this year's SIGCOMM (Efficient Set Reconciliation) with a compelling application.

Allerton is a different sort of conference, as I've discussed before (like here and here, among others). A mix of invited and submitted papers, a wide diversity of topics, lots of parallel sessions. It seems absurdly crowded this year -- I had to park in the ancillary parking because the lot was full, and the rooms where the talks are held all seem to be bursting. (The conference, unfortunately, really has outgrown the space where the conference is held.) I'm guessing well over 300 registered; I'll have to check. The conference gives me a chance to catch up with colleagues who are more on the EE/networking side; I've seen other CS theorists here in years past, but I haven't noticed anyone yet.

If you're here, maybe come by my talk this afternoon, or say hi if you see me hanging around.

Tuesday, September 27, 2011

As an applied probability exercise, I had my class compute empirically the probability of a game failing on the nth round for the game of Set. (See my previous post about this; here failure means there's no set on the nth round, and they were asked to implement the "choose a random set" strategy if there was more than one.) They were also supposed to try to calculate the probability of using all 81 cards in sets through the course of the game -- what we might call a perfect game.

Some students explored a bit and noted that changing from the random strategy could significantly increase the probability of a perfect game. So I think the next time I decide to use this assignment, I'll turn it into a competition. Students will have to develop a strategy that maximizes the probability of achieving a perfect game. Prizes will go to the best strategy (which hopefully could be easily determined after a few billion runs or so -- I suppose I'll have to introduce some sort of computational limits on the strategy to ensure that many runs can be done in a suitable time frame), and the best "succinct" strategy -- that is, the best strategy that can be described in English in at most a few reasonable sentences.

It's also interesting to think about optimal strategies in the offline case, where the permutation determining how the cards will be dealt out is given in advance. I keep thinking maybe there's a way to effectively calculate whether a perfect game can be found for a given permutation, and then thinking there can't be. (Though, admittedly, I haven't thought a lot yet.) So maybe it makes sense to run the competition for both the online and offline versions of the problem.

Wednesday, September 21, 2011

I think a variety of things made it work. First, and I apologize to Tim Roughgarden for saying it, but the execution was superb -- someone should make him organize more things. Great room, great food, great staff on hand (thanks to Lynda Harris for being incredibly helpful), and a great collection of speakers. Also, an excellent location. Stanford is relatively easy to get to with 2 major airports nearby (but avoid the cabs -- it's now over $100 to take a cab from SFO!), and the location guarantees an audience of Stanford, Microsoft, and Google folks. (While not everyone was there all the time, I understand well over 100 people were registered.)

To the content. It was a great program -- apparently the talks will be on the web later, so you may want to check them out if you weren't there. My favorites would have to be Dan Spielman's talk on smoothed analysis (perhaps because Dan just always gives great talks), and Kevin Leyton-Brown's talk on statistical methods (using learning theory) to predict an algorithm's performance on new instances. (Can you predict the running time of your satisfiability algorithm accurately before running it on a specific instance very quickly just by taking a careful look at the problem? The answer seems to be -- yes you can!)

There was a panel that focused on questions like "Where could we point students to work on these sorts of problems. What are the currently most glaring examples of algorithms whose properties are poorly explained by existing theoretical models, where there might be hope for progress?" and "To what extent can "real-world data" be modeled? Is it important to model accurately the properties of "real-world data"?" While the panel ended up being fairly uncontroversial -- I'm afraid no fights or even major disagreement broke out -- I found it very interesting to listen to the others on the panel give their opinions and insights on these questions.

I think people got into the spirit of the workshop. Kevin, an AI person by trade, found it amazing that theorists were getting together to talk about heuristics and experiments. (Kevin's talk was followed by Dick Karp talking about his work on tuning and validating heuristic algorithms -- though of course not all talks were on that theme.) It will be interesting to see if this workshop inspires any specific new research -- but even if not, it was well organized, well put together for content, and well worth the trip.

Tuesday, September 20, 2011

I was at NYCE 2011 this past Friday. It was a thoroughly enjoyable and productive experience. I feel like I got a conference's worth for the cost of and time commitment of a long commute.

The day started with three hour-long plenary talks, followed by lunch and a poster session, followed by an hour of 10 minute talks, and concluded with two more plenary talks. (And all of that for about $20.)

It turns out a lot can be squeezed into 10 minutes. Halfway into the first speaker's talk, with 5 minutes left on the clock, I was almost convinced he was running out of time. Everyone worked great around the time constraint to succinctly deliver their talks.

The plenary talk roster was varied and exciting (at least by one other account). I particularly enjoyed Jonathan Levin's talk on eBay experiments (pdf of the paper) in part because I was least familiar with it, and in part because I think I can use his key method. Here's the gist: instead of relying on random observations look for experiments that have been conducted on your behalf by market participants. To use his eBay example, auctions have a bunch of parameters: the item, its starting price, its reserve price, and so on. Quite often it turns out you can find sets of auctions ("experiments") that are identical in all but one parameter. This allows you to directly quantify the effect of the varying parameter. To say I wish I'd thought of this is an understatement -- I am trying to convince myself that I actually didn't. (I didn't.)

Of interest -- likely to myself only -- was the passing mention of Swoopo in two plenary talks. Their sudden demise still puzzles me. While writing our Swoopo paper they were running about 200 auctions a day. A few months after EC'10 when I checked again they were running about one hundred. They were responding to reduced demand but why were "entertainment shoppers" driven away? The business model certainly did not die as others have successfully taken Swoopo's place. One possibility is that the barrier to entry in this type business is low. Around Swoopo's peak there were people making money off penny-auction scripts they'd sell for a few hundred bucks. Hundreds of clones sprung. Buy a script and some hosting, run auctions, and ship directly from Amazon. In an interesting parallel the daily deals business also seems to have a very low barrier to entry judging from the hundreds of Groupon clones. I am still kicking myself for stopping data collection after our Swoopo paper was done.

Back to NYCE itself, if there is one thing I'd maybe have wanted to see more of is student talks (especially of the 10 minute variety). I guess the poster session made up for this but as someone with a poster to present I didn't have to chance to walk around and see others'. Which reminds me, thanks to everyone who stopped by my poster, and to the organizers for putting everything together!

Friday, September 16, 2011

I'll be at the Beyond Worst Case Analysis workshop next week, and Allerton the week after. For each, I'll have to miss a class. I also have some other travels that may involve missing classes this semester as well -- I'll be missing more class than usual this semester, but they're for good reasons.

I'll have Justin and Giorgos cover a lecture for me (thanks, guys!), but next week I'm doing an experiment: I'm substituting videos for my class. I've pointed them to 2005 MSRI workshop Models of Real-World Random Networks, and having them watch some talks. One is my survey talk on power laws, so it's still "me" teaching, but then I have a few impressive "guest lecturers" from the workshop to cover the rest; all the talks focus on power laws in some way. I've even put together an "in-class" exercise for them to do on the subject -- out of class, of course.

I wonder if this is a good approach or not. I'll have to ask the students. It's not ideal, but either is cancelling class. Has anyone else started using videos as a solution to the missed lecture problem? Are there ways to make it a more useful experience?

By the way, looking back, that 2005 MSRI workshop had a bunch of interesting talks. Worth checking out sometime if you're interested in the area. I looked younger then.

Reminder : Giorgos has a poster at the New York Computer Science and Economics Day today. Go check it out!

Thursday, September 15, 2011

My frequent co-authors John Byers, Giorgos Zervas, and I posted an extended version of a current submission to the arxiv (as people do), which showed up last Friday or thereabouts. Daily Deals: Prediction, Social Diffusion, and Reputational Ramifications discusses some analysis we did on daily deal sites (Groupon, LivingSocial), including interactions between daily deal sites and social network sites (Facebook, Yelp).

The work got a plug Monday on the MIT Technology Review. They naturally focused on our most "controversial" finding, and I quote them:

"A Groupon deal might boost sales but, it can also lower a merchant's reputation as measured by Yelp ratings, say computer scientists who have analyzed the link between daily deals and online reviews."

Apparently, many people are interested in statements of that form, especially business types, and that's where the fun started. We got some e-mails from people who work for firms that use statistical analyses in planning marketing efforts who had seen the article (and wanted our not-yet-released data set). We noticed the review article was getting tweeted. We started tracking a tweet feed to find out where else it was showing up. (As a very partial list, Business Insider, Chicago Tribune, The Independent, Search Engine Journal.) Monday we were worried that people might try to actually call us up and talk with us, but fortunately, that hasn't happened. Instead of feeling pressured, we've just been able to enjoy watching.

It's amusing to see how these things spread through the Internet. A lot of sites are just cutting and pasting from the MIT Tech Review. I know this because, in what I personally find to be the most amusing of mistakes, they refer to Giorgos as "Georgia Zervas". Giorgos does seem to have multiple name spellings (he also uses Georgios), but Georgia is not quite right. (It is, however, my new nickname for him*.) Georgia Zervas, according to Google, has popped up almost 200 times in the last 3 days.

We weren't really expecting this. I think part of the reason we didn't expect much reaction is pre-summer we put up a placeholder with some initial data: A Month in the Life of Groupon. This didn't seem to get much notice. Indeed, we had submitted it to NetEcon, and were essentially told by reviewers the paper was too boring. I must admit I disagreed, then and now, with the reviewers. But, to be fair, that version of the paper didn't contain the data sets and analysis for LivingSocial, the Facebook Likes, and the Yelp reviews; it just had Groupon data (though we made clear this was an "appetizer" and more was to come). John actually completely disagrees with me. I quote: "For the record, I agree with the NetEcon reject decision. They should be applauded for making us do more work." My take was the bar for a workshop paper was too high if this wasn't sufficiently interesting. Giorgos wonders if by John's logic we're hoping the paper gets rejected again.

Anyhow, Giorgos will be at the New York Computer Science and Economics Day this Friday with a poster on the subject. Stop by and talk with him! (Giorgos recently completed his PhD at BU, and is doing a Simons postdoctoral fellowship with Joan Feigenbaum, while also hanging out some with me at Harvard as an affiliate of our Center for Research on Computation and Society.)

Wednesday, September 14, 2011

People in various have been noting the SODA accepts, but nobody has been talking about the numbers. I count 138 papers accepted. I can't find now how many submissions there were; I recall it was over 600 abstracts, but I think it cut down to 520-540 or so. So this seems like just over 25%.

Is that a good number? Too many? Too few?

I admit, I have trouble believing that there weren't at least 40, 50, maybe 100 of those rejected submitted papers that were of sufficient quality that it would have been just fine to have them in. I could be wrong -- I wasn't on the committee -- but I'd guess there was more good stuff out there.

For those papers that were rejected, where will they go? ICALP and ESA deadlines are pretty far off; there aren't a lot of good homes for algorithmic papers until then. (STOC/FOCS may not be appropriate; other conferences and workshops have, I think, weaker reputations that may make them less desirable, especially for up-and-coming students and younger faculty.) It seems like there's a hole in our schedule. Rather than fill it with yet another conference, wouldn't the community be better off accepting more?

Tuesday, September 13, 2011

While I was not blogging, we spent the few days before the semester started with an 80th birthday workshop for Michael Rabin. Impressively, we were able to pull it off despite the best efforts of Hurricane Irene, thanks to the many speakers who made an extra effort to get there, well beyond the call of duty. Because some speakers couldn't make it (flight cancellations), and we were worried about storm cleanup, we postponed the start until Monday afternoon, but other than that, it went fantastically well. (A credit to the organization of Les Valiant!)

Richard Lipton described it all in this blog post, so I don't have to. (The only negative thing in his post is that he refers to me as Mitz -- clearly because it's the easiest way of distinguishing me from the guest of honor, he doesn't call me that regularly -- which feels strange to me as I haven't been called that regularly since middle school.)

If any coding theory people are reading this, I thought I'd point out that the slides for my talk (as well as the slides for many of the other talks), which was on coding theory, are available here. In particular my talk is about how Michael Rabin's JACM paper on the Information Dispersal Algorithm was remarkably prescient, setting up some basic ideas and foundations for both the later work in LDPC codes and network coding. While the paper is far from unknown (Google scholar has it at well over 1000 citations), I'm not sure how widely appreciated the paper is in the coding theory circles; it was a pleasure for me to go back and reread it with new eyes to prepare for this talk.

Monday, September 12, 2011

My thoughts that I would blog more over the summer, during academic "down time", turned out more of a fiction than I would have thought. The summer proved remarkably busy, with plenty of research, administration, consulting (the economy must be coming back?), and, more enjoyably, time with the kids. The blog was low priority.

With the school year starting, I actually feel I have things to say, so we'll try again.

Classes have started, and Harvard's continuing growth in CS, well, it continues. Our intro CS course ended up with about 500 students last year; this year, we've currently got 650 enrolled. Our 2nd semester programming course designed primarily for majors has gone from 70 to 110 enrolled. Most importantly for me, our intro theory course (the Turing machine class) went from about 55 to 75. My algorithms and data structures class tends to be just a few less than that 2nd semester, so I'll have to plan for a jump up. (My biggest year was 90, back in the (last) bubble days; I'm not sure we'll ever get back there, as students have more class choices now.)

Really, we're trying to put more resources early in the pipeline to attract students to the intro course -- and we seem to be in a healthy positive feedback loop, which is (over time) pushing all the numbers up in later courses. Apparently tech is popular again.

I'm teaching randomized algorithms and probabilistic analysis, a graduate course that splits 1/2 grad and 1/2 undergrad. Currently I have 23 students, which is a nice and healthy but not earth-shattering number.

It's exciting to come back and see these numbers continuing to go up. No wonder we're #2 on Newsweek's Schools for Computer Geeks. (And no, we don't take that seriously.) Besides giving some evidence that we're on the right track, it should make the case for hiring more in CS easier to make.