Friday, August 31, 2007

I've already talked a lot in this blog about deletion channels. Trace reconstruction involves a similar problem. We start with an original binary string X = X1,X2,...,Xn. A trace consists of a string Y1,Y2,...,Ym obtained from the original string by passing it through a deletion channel, where each bit is independently deleted with probability p. The trace reconstruction problem basically asks how many independent traces do you need to see to reconstruct the original string X with high probability. Unlike the coding setting, where X might be chosen from a codebook of our own design, in this setting two natural models to study are when X is uniform over binary strings (so the high probability is over the choice of X and the traces), and in the worst case (where the high probability is just over the traces). Variations of the problem include operations other than deletions (including, say, insertions and errors). As an example application, a set of sensors might be monitoring a sequence of events. Each individual sensor is weak and might miss a given event, in which case the question is how many sensors are needed to reconstruct the event sequence perfectly, with high probability.

Trace reconstruction has some history in the information theory community, and the first CS-style paper I saw on it was by Batu, Kannan, Khanna, and McGregor in SODA 2004. The main result of this paper dealt with random input X and considered p values that were O(1/log n). It seems to me much more natural for p to be constant, and it has remained an open problem to determine an efficient algorithm for constant p.

I mentioned this problem last time I visited Microsoft, and it seemed to resonate with some of the people there. Thomas Holenstein, Rina Panigrahy, Udi Wieder and I have a submission with several results, including an algorithm that for random X and sufficiently small constant probability p requires only a polynomial number of traces and polynomial time (with high probability).

The SODA 2004 uses a majority voting technique -- the bits are determined sequentially, with each string voting on the next bit. A key idea in our new algorithm is a "smart voting" technique. We only let traces vote if there is good reason (based on the already determined bits) to think that the trace has a good prediction for the subsequent bit. That is, only well-informed strings are allowed to vote. Feel free to make your own political analogies. My intuition is that this smart voting technique is a closer analogue to the full belief propagation (or Bayesian analysis) that we want to do than just majority voting. Because of this, I hope this "smart voting" technique is a general approach that will find other applications.

I don't yet have an analysis of a belief-propagation-based algorithm. Also, currently we can't analyze a maximum-likelihood algorithm, that finds the most likely original string X. I also don't know how to implement maximum-likelihood efficiently in this setting. So there's still plenty of open questions in this area.

Wednesday, August 29, 2007

Luca Trevisan points to an article in the AMS by Neal Koblitz on modern cryptography that I think everyone in theoretical computer science (TCS) should read, as it exposes how at least some mathematicians view the culture in TCS. In summary, it's very negative. While I think the article is flawed on many levels, I'd argue that we ought to consider it as constructive criticism, and think about what, if anything, we might learn from it.

For example, one statement I found fairly misguided is the following:

Math departments usually believe theConjecture. For the development of mathematics it is better for someone to publish one excellent paper in n years than n nearly worthless papers in one year.In certain other fields of science - including, unfortunately, computer science and cryptography - the analogous conjecture, while most likely true, is not widely believed.

I accept the constructive criticism that in computer science we perhaps publish too quickly, and too many incremental things. On the other hand, this conjecture has little to nothing to do with reality. For every Wiles, who spent essentially a decade on what is a very important result, there are probably 100 mathematicians who spent a decade on the problem getting essentially nowhere and publishing what in retrospect are worthless papers. And the implication that TCS is producing only a stream of worthless papers is fundamentally incorrect. The question is really whether the culture should be that a person works only on big problems with the goal of having a very small number of big results (say, 1) over their lifetime, or that a person helps the community make progress through smaller and quicker increments (as well as the occasional larger contribution). Given the important tie between TCS and real-world applications, there's a clear reason why the community has developed with a larger focus on small/quick progress, although as a community we are also supportive of people who want to work the other way as well. The phrasing of the conjecture is clever, but I actually think TCS progresses much better with the culture it has than the one Koblitz favors. (Just like sometimes fast local heuristics are much better than slow exact algorithms.)

Rather than just start a tirade against Koblitz, however, I think we'd be best off dissecting the article and understanding what aspects of the TCS culture has led to his opinions and, in many cases, misperceptions. We may find some things worth changing, either to improve our culture, or at least to present ourselves differently to other sciences.

Tuesday, August 28, 2007

I've just returned from a week in the Bay Area, primarily to visit my "corporate sponsors"; Cisco and Yahoo have both provided research money this year, and I felt I owed them a visit before classes start. I also visited Microsoft Silicon Valley; even though I still haven't figured out how to get research money out of Microsoft, I've "consulted" there from time to time, and recently co-wrote a paper with several people there after my last visit.

The main purpose of the visit was to talk to people and try to come up with new things to work on. Corporate money seems to synchronize with annual budgets. If you want another check next year, you need to convince them that funding you is still worthwhile. (That's not unusual; I understand if I ever get a DoD-based grant, it will be similar.) I should point out that all research money I've gotten from Cisco and Yahoo is given as a gift -- no requirements (and much less in overhead taken by Harvard!). It's just that it's nice to get these gifts annually, like a research anniversary present, and that means maintaining the relationship.

I like to think that getting corporate money is a simple winning proposition. I get access to interesting problems; I get to work with talented people; and I get research money for doing what I would be doing anyway. But perhaps I'm deluding myself? Perhaps I'd be working on other grander problems if my agenda wasn't influenced by the vision of others? In my own personal case, I really doubt it, and it feels like a pure jackpot to me, but it's something every researcher probably has to think about.

I'd certainly advise that people interested in "practical algorithms" and "algorithm engineering" should seek collaborations with (and funding from!) such industrial sources. In my experience, it takes some work. (But then again, so does writing NSF grants.) Usually the funding comes after a successful collaboration, not before, or there has to be a clear champion in the organization who can make the case that your work is related to something going on at the company. Building such relationships takes time, and doesn't happen overnight. But it's very worthwhile.

To provide evidence, I'll spend some upcoming posts discussing problems I've recently collaborated on with people at Cisco, Microsoft, and Yahoo.

Friday, August 24, 2007

Next semester, I'm teaching my class on randomized algorithms and probabilistic analysis, based on the Mitzenmacher/Upfal book. Harvard, like many universities, has an "Extension School", and I offer my courses through the distance education program. Basically, my lectures get taped and put online, I put the assignments online, and you or anyone who pays the Extension fee can take the course. While I'm sure my teaching performance on video is reminiscent of say an early Mel Gibson or perhaps a Moe Howard, I personally still don't think distance education is as good as "being there" by any means (at least, not yet...). But it offers an opportunity that people may not otherwise have.

The course is taught at the level of an introductory graduate class, meant for non-theorists as well as theorists. These days, who doesn't need to know randomized algorithms and probabilistic analysis? If you know someone, for example in industry, who might like to take such a course, here's the link to the bare-bones syllabus.

Wednesday, August 22, 2007

In a binary symmetric error channel, n bits are sent, and the channel flips each bit independently with probability p. So, for example, the message sent might be 00110011 and the received message could be 01100011 if the 2nd and 4th bits were flipped. Now suppose the same message is sent through k independent channels, and the receiver sees all of results. (Here k should be thought of as a small constant.) The capacity of this channel can be computed; essentially, in this channel, each bit gets maps to a number in the range [0,k], corresponding to the number of 1's in the appropriate position. (Since all errors are independent, exactly which channels flip a specific bit doesn't matter, just the number of flips matter.) As a specific example, when k = 2, we can think in the following nice way -- if we see two 1's (resp 0's) in bit position i, we think the original bit was 1 (resp 0), and now we have an error with probability p^2. With probability 2p(1-p), we see a 1 and a 0 in the ith position -- this corresponds to an "erasure", since the bit is now equally likely to be a 1 and a 0. So we have a channel that gives errors with probability p^2 and erasures with probability 2p(1-p); we can find the capacity (and codes for) such a channel.

In a binary deletion channel, n bits are sent, and the channel deletes each bit independently with probability p. So, for example, the message sent might be 00110011 and the received message could be 010011 if the 2nd and 4th bits were deleted. Now suppose the same message is sent through k independent binary deletion channels, and the receiver sees all of results. Can we say anything useful here? The problem is automatically more challenging since we only have bounds and don't even know the capacity of the standard deletion channel (when k is 1). This is yet another simply stated question from the theory of coding for deletion channels in need of an idea.

Thursday, August 16, 2007

We request submissions of position papers that propose new directions of research, advocate nontraditional approaches to old (or new) ideas, or generate insightful discussion... As a venue for exploring new ideas, HotOS encourages contributions influenced by other fields such as hardware design, networking, economics, social organizations, biological systems, and the impact of compiler developments on systems and vice versa. We particularly look for position papers containing highly original ideas.

The Workshop on Hot Topics in Networks (HotNets) was created in 2002 to discuss early-stage, creative networking research and to debate positions that reflect on the research direction and needs of the broad networking community. Architecture, high-level design work, and positions that may shape long-term research direction are especially welcome. HotNets is structured to work in synergy with conferences such as SIGCOMM by providing a venue in which innovative work may receive feedback to help it mature into conference papers or otherwise have a long-term impact on the community. To fulfill these goals HotNets calls for short position papers that argue a thoughtful point-of-view rather than full-length conference papers, and maintains a broad and diverse scope.

Does theory need a Hot Workshop? A workshop specifically designed for somewhat wacky or not-fully-baked ideas that may turn into long-range research directions?

Wednesday, August 15, 2007

SIGCOMM (the major networking conference) will be taking place shortly. Although I won't be attending, I thought I'd mention some of the papers that involve theory in a non-trivial way. This list is not meant to be exhaustive, so I apologize for (and welcome comments on) anything I might have missed.

The networking community, I've found, is very amenable to theory. I think there's a real understanding in the community that obtaining systematic improvements in something as wild and large as modern networks requires a solid, fundamental grasp of what's going on. This means good models, and good algorithms and data structures. This doesn't mean that SIGCOMM is full of theory papers; they generally want to see some sort of implementation to see that whatever idea is presented actually works. But many papers have a non-trivial theoretical component, and the SIGCOMM conference is certainly a good place to look for important problems where theoretical insight would be welcome. (The same is true for the larger networking conference, INFOCOM, but that conference is so large it's harder to get a handle on. Also, SIGCOMM aims to be a bit more visionary, or "out there", which I think means there's more opportunity for theory to get involved.)

DTN Routing as a Resource Allocation Problem: DTN's are Disruption Tolerant Networks -- in this paper, focusing on mobile nodes that are only connected intermittently. In one section, the paper sets up an appropriate graph model for the problem, and proves lower bounds for various situations. For example, any algorithm that doesn't know the schedule of node meetings is Omega(n)-competitive compared to an offline adversary when n packets are to be delivered. The paper then moves on to heuristic approaches and an experimental evaluation.

Embracing Wireless Interference: Analog Network Coding: Normally, we think of networking coding as XORing packets, or taking random combinations of packet data over some finite field. In wireless, you may be sending signals, not bits. What do you do then? This paper looks at the problem of effective algorithms for this problem.

Again, these examples aren't exhaustive; many other papers have Lemmas and Theorems in there. In fact, I'd wager that having words like "Lemma" and "Theorem" in the paper is positively correlated with acceptance to SIGCOMM. Maybe someone on the PC will confirm or straighten me out. In any case, I'll be giving all the SIGCOMM papers a closer look....

Tuesday, August 14, 2007

One thing I've been enjoying is that as people seem to discover this blog, they comment on the older posts. I appreciate these comments and I hope the newcomers stick around!

Suresh asked me to enable the comment feed, which would allow people to easily see these newly arrived comments. They appear to be enabled, but while there is a link for the standard blog feed at the bottom of the page is a link for the blog feed, there is no link for the comments. I spent some time playing with the template to no avail. (I'm happy to take advice from anyone who could get the link to show up.)

However, you can just enter the URL directly to your reader. Apparently, the URL you need to get the comments is:

Monday, August 13, 2007

In my undergraduate class, I devote one class (and one programming assignment) to heuristic search methods. I focus on hill climbing, since it is most clearly connected to other things covered previously (greedy algorithms, the simplex algorithm), but I also discuss other well-known methods, including the Metropolis algorithm, simulated annealing, tabu search, go with the winners, genetic algorithms, and, of course, BubbleSearch.

Generally, algorithms textbooks hardly mention heuristics. At best, heuristics enter the picture when discussing approximation algorithms, as though the fact that we can prove that the greedy local search algorithm for max-cut achieves a cut of at least 1/2 the edges is the important thing. I wish the standard textbooks would devote a solid chapter on heuristic techniques; in real life, most students are probably more likely to have to implement a heuristic for some NP-hard problem than to have to implement a minimum spanning tree algorithm.

The theory purists will undoubtedly suggest that heuristics don't belong in an algorithms class, because generally we can't prove formal statements about them. Obviously, I disagree. Heuristics are closely tied to other class concepts -- like greedy algorithms and approximations. Perhaps most importantly, a successful heuristic approach depends on sound modeling and building an understanding of a problem, which are exactly the important skills we should be teaching in undergraduate algorithms. Finally, we need to give students some basic, realistic algorithmic tools for handling NP-hard problems in practice. (2-approximations do not suffice.) So while I acknowledge that students may be exposed further to these and other heuristic approaches in for example an AI class, I think it's important to clearly connect it to what they're learning in algorithms.

Friday, August 10, 2007

I became more interested in (and knowledgeable about) heuristic methods when I was working on a related project at the nearby Mistubishi Electric Research Laboratory (MERL) some years ago. The project on Human-Guided Search studied the benefits of having a human dynamically interact with heuristic optimization algorithms, and along the way we did some work just on the heuristics themselves.

Here's a simple heuristic that can be taught, I think, quite easily and productively to undergraduates. Many hard problems -- including most scheduling, packing, and coloring problems -- have natural greedy heuristics, whereby the items are ordered according to some criterion (work, size, degree), and "placed" one at a time according to that ordering. For example, many standard bin packing algorithms such as first fit decreasing and best fit decreasing fit this type. Given time, a natural way to extend such greedy heuristics is to try additional orderings. While we can't hope to consider all orderings, we could certainly try more than one. Of course, intuition tells us we should prefer orderings close to the greedy ordering.

There are a variety of ways one could do this. A historically popular way is to sequentially choose an item uniformly at random from the top k of the greedy ordering, place it, and remove it from the list. One has to choose the parameter k. A negative feature of this approach is that there are many orderings that will never even be considered under this approach.

We suggest what we call BubbleSearch, which makes use of the Kendall-tau distance. More clearly, the Kendall-tau distance between an ordering A and the greedy ordering B is the number of transpositions of adjacent items you would have to make to get from A to B, which corresponds to the number of swaps you would make using BubbleSort if the sorted order was just the greedy ordering. (Hence the name.)

You could just go through all the permutations of items in order by Kendall-tau distance from the sorted order. Most small perturbations of the greedy ordering, however, give very similar results, leading to little or no improvement. A better way for most problems is a variation of the top k approach. To create a new ordering A, we start with a base ordering B (the greedy ordering). We pick the first item of A as follows: choose the first item of B with probability p, and if it isn't selected, choose the next item of B with probability p, and so on down the list (starting at the beginning again if necessary). Once an item is selected, it become the first element of A, and is removed from B. We continue choosing subsequent items for A the same way with the remaining list from B, starting from the beginning of the remaining list. The probability of obtaining an ordering A is then proportional to (1-p)^d(A,B). Here p is the algorithm parameter, determining how close to the base ordering you are likely to be. To me, this approach is much more intuitive than the top-k approach, and in our experiments appeared to do at least marginally better.

A further improvement is to change the base ordering to be the best ordering you have seen so far. Once you've beaten the greedy ordering, there's no reason to keep it as the base.

A motivation for simple heuristics like BubbleSearch is exactly their simplicity. They are easy to code and rely on essentially no problem-dependent knowledge. If coding time matters, something like BubbleSearch is the way to go.

Randomized extensions to greedy algorithms also give rise to interesting theoretical questions, related to work on "priority algorithms". I don't know of any results bounding the performance of random-top-k, BubbleSearch, or similar randomized greedy variations.

Wednesday, August 08, 2007

Most of you probably don't know about Mitsubishi Electric Research Laboratories (MERL), a small lab nestled in Kendall Square (very close to MIT, and a pleasant walk from Harvard). While they did basic research there, it was definitely more application/development oriented, and it specialized more in areas like UI, graphics, speech, etc. Information theorists would know it as the home of Jonathan Yedidia, a big name in coding and belief propagation, but CS theorists might not ever have heard of it.

I've done some consulting work there over the years. I knew the lab director, Joe Marks, from my time as a Harvard undergrad, and he hooked me up. It was a very nice place -- very theory-friendly in its application-oriented way. Joe clearly had the mindset that the goal of the lab was to generate new ideas, and that would drive new products.

So I was disappointed to hear several months ago that Joe had been removed as lab leader. (Don't feel too bad for Joe -- he's now at Disney, and is chairing SIGGRAPH this week. A talent like him will continue to be successful...) And even more disappointed (but not surprised) to read in Xconomy that MERL was being "re-organized", essentially phasing out the basic research component, and many people were leaving.

I've lived vicariously through this before -- I left Digital Systems Research Center before it disappeared (after Digital was bought by Compaq was bought by HP), but knew several people who went through the process. It's very disturbing to see again how hard it is for companies to make research labs work.

What makes research labs successful? Can they really last long-term in any non-monopoly environment? Let me make a bold prediction. I personally am not planning on retiring until at least age 65. Pick any research division that's around today. (Microsoft, Yahoo, Google, AT&T, Bell Labs, IBM, any of them...) I'd bet on pretty much any of them that their research wing will be gone before I hit 60. (I'm still under 40. OK, maybe I should just bet they'll be dramatically reduced or transformed into Advanced Development for a number of years somewhere along the way...) This is something people who go to research labs should know going in. Odds are likely you'll have to change jobs somewhere along the way -- not because of your own talents, but because of company-level problems. This might make such jobs seem risky, but let's not exaggerate -- when the company has problems, the talent moves to a new company.

I'd love to hear thoughts on what makes good research labs, why I'm wrong about research labs dying out, how people who have experienced such moves feel about them, or anything else on the topic.

Update : Jonathan Yedidia says in the comments that the reports of MERL's death have been highly exaggerated.

Tuesday, August 07, 2007

Peter Winzer argued for using (variants of) Valiant's randomized load balancing (route to a random intermediate spot, and go from there to the end destination) as an actual network architecture, over shortest-path and virtual private network methods. And I do mean argued. A heated discussion broke out among various real networking people as to whether this was realistic. The cost of potentially doubling the transport time (speed-of-light time) for say cross-country transmissions was argued to be unacceptable, even if the payoff was essentially zeroing queueing delays and making a simpler, cheaper (less equipment) network. (Perhaps it could be used for smaller networks, or only to/from the center, to avoid doubling cross-country trips.) It was very entertaining and instructive; one of the classis theoretical ideas, an argument about how it could be used and what its benefit could be, and a counterargument based on practical experience, leaving a somewhat vague end result -- you probably can't tell how useful it is until you build it out...

Ho and Sprintson gave their survey on network coding, covering the background, the integrality gap of network coding for undirected networks (Agarwal and Charikar), finding network codes through integer programming for undirected networks, bounds on the number of encoding nodes needed for various graphs, robust network coding in the face of edge failures, practical implementations, and more.

Monday, August 06, 2007

I'm spending a couple of days at the DIMACS Tutorial on Algorithms for Next Generation Networks. Tracey Ho and Alex Sprintson and giving a talk on Network Coding, and since in the past I've promised a post on open questions in network coding, I'm interviewing them. I apologize that the questions might be vague; the problem with a new area is that it's still a little unclear what the right open problems are, and we all use different lingo. But here are a few...

For 3 source-receiver pairs and unicast, in an undirected network, with arbitrary coding, is there an advantage from coding over routing?(I think the reference paper is Li and Li, Network Coding: The Case of Multiple Unicast Sessions)?

General multiple unicast in directed networks: is there an algorithm that, given a network, computes the capacity? For acyclic networks, there's an implicit characterization, but not an explicit characterization; for cyclic networks, there's not even that.

What is the complexity of finding the optimal (in terms of capacity) non-linear code for various network coding problems? Specific problems include determining the truth/falsehood of the following statements:

Given an instance of the general network codingproblem in a directed graph G and a real numberr, there is a polynomial-time algorithm whichcomputes a solution achieving rate r if one exists,and otherwise reports that this is impossible.

Given an instance of the k-pairs communicationproblem in an undirected graph G and a real numberr, it is recursively undecidable to determinewhether the network coding rate is less than r.

For multicast networks, what is the minimum number of nodes that need to do encoding to make network encoding work? That is, can we minimize the coding complexity in terms of the number of nodes doing coding (or some other reasonable metric).

There must be open questions in the recent work by Koetter/Kschischang on coding for errors/erasures in random network coding. For example, they seem to give general (network-oblivious) bounds. Are there ways to improve their bounds by using knowledge of the network topology?

Is there an algorithm to find the minimum cost subnetwork that guarantees delivery using network coding subject to certain classes of adversarial (or non-adversarial) link failures.

A good place to look for CS-style open problems is probably also the Adler/Harvey/Jain/Kleinberg/Lehman paper On the Capacity of Information Networks, which has a nice section on open problems at the end.

I'm sure there are plenty of places to get such information, but for basic tech news (like what's up with Google and the FCC, voting machines, and the iPhone), I'm enjoying the Machinist blog, out of Salon.com. The longer weekly columns are interesting too; I'm already looking forward to getting a Zonbu.

For straight tech policy, my favorite for quite some time has been Ed Felten'sFreedom to Tinker. The past week he's been commenting on the California e-voting reports, which gives further evidence to what most of us already know (or would have guessed) : current voting machines are fundamentally insecure. I prefer to think this is due to incompetence and ignorance rather than malicious intent on anyone's part, but what do I know. My only complaint is that he doesn't post enough!