Pages

Saturday, January 21, 2006

SODA report: ALENEX...

Today is the pre-SODA workshop today; ALENEX (Experimental Algorithms) and ANALCO (Analytic Algorithms and Combinatorics). David Mount was the plenary speaker at ALENEX, and he gave what I thought was an excellent overview of the area of nearest neighbour methods.

As befits an "experimental" talk, it wasn't your usual survey of techniques. He structured the talk around seven points (the "seven habits of highly effective nearest neighbour searching" as he put it), and skilfully weaved the known universe of NN methods around this structure.

The points: to summarize

Grids are good: quite good for many problems

Quad trees are better, if you need higher dimensions

kd-trees are even better, if you want the best bounds

In high dimensions, only locality sensitive hashing can save us.

The central theme was that there are a few basic ideas that work extremely well in near-neighbour searching, and depending on the kind of data you have (uniform, highly skewed, intrinsically high dimensional, ever-changing), some techniques are better suited than others.

Listening to his talk, I realized that we need something like this for clustering methods as well. Like the nearest-neighbour problem, clustering is a fundamental problem in a variety of domains (maybe it's the 3rd-most important problem in all of science.... more on this later). Like NN, there are a huge number of papers out there on different ways of solving the problem, and like NN, what a practitioner really needs is a way of organizing and classifying the different clustering techniques.

Outtakes:

The hotel is in the middle of downtown, which apparently means 'always down'. The closest restaurant to the hotel is a Burger King, and the next closest is a Checkers. Much anger will be vented at the business meeting on Monday.

People who haven't arrived yet will be happy to know that the wireless in the conference area works fine, and the wireless ghetto is up and running.

The SODA proceedings is once again twice as heavy (and probably 5 times as thick) as my laptop, once again raising the question:

Why can't we get PDF copies of the papers before the conference, or even via password-protection at the conference itself.

There is a bigger issue here about electronic proceedings that I don't really care to get into; all I really want is the ability to read the papers without getting a sprained wrist (or neck).

Last year all the proceedings went for a tour of Minnesota, and we ended up downloading papers from a protected SIAM site. People didn't seem to mind too much.

Did I mention that there are few good eating places near the hotel ? There is only one place that looks like it might have decent coffee, and it's a bit of a hoof away.

In a talk early this morning, John Hershberger started his talk by asserting that determining the shape of points was "the most important problem in science". This apparently stole Dave Mount's thunder, who was then reduced to asserting that the nearest neighbour problem was "the second most important problem in science".

I like the fact that ALENEX and ANALCO have joint registration, so that we can walk into talks in either workshop. There are a number of interesting talks at ANALCO that I will attempt to attend.