Recently in Overthinking Category

April 9, 2012

The IETF RTCWEB WG has been operating on a fast track with
an interim meeting between each IETF meeting.
Since we needed to schedule a lot of meetings,
thought it might be instructive to try to analyze a bunch
of different locations to figure out the best strategy. Here's
a lightly edited version of my post to the RTCWEB WG trying to
address this issue.

Note that I'm not trying to make any claims about what the best set of
venues is. It's obviously easy to figure out any statistic we want
about each proposed venue, but how you map that data to "best" is a
much more difficult problem. The space is full of Pareto optima,
and even if we ignore the troubling philosophical question of
interpersonal utility comparisons, there's some tradeoff
between minimal total travel time and a "fair" distribution of travel
times (or at least an even distribution).

METHODOLOGY
The data below is derived by treating both people and venues as
airport locations and using travel time as our primary instrument.

For each responder for the current Doodle poll, assign a home
airport based on their draft publication history. We're missing a
few people but basically it should be pretty complete. Since
these people responded before the venue is known, it's at
least somewhat unbiased.

Compute the shortest advertised flight between each home airport
and the locations for each venue by looking at the shortest
advertised Kayak flights around one of the proposed interim
dates (6/10 - 6/13), ignoring price, but excluding "Hacker fares".
[Thanks to Martin Thomson or helping me gather these.]

This lets us compute statistics for any venue and/or combination
of venues, based on the candidate attendee list.

The three proposed venues:

San Francisco (SFO)

Boston (BOS)

Stockholm (ARN)

Three hubs not too distant from the proposed venues:

London (LHR)

Frankfurt (FRA)

New York (NYC) (treating all NYC airports as the same location)

Also, Calgary (YYC), since the other two chair locations (BOS and SFO)
were already proposed as venues, and I didn't want Cullen to feel
left out.

RESULTS
Here are the results for each of the above venues, measured in total
hours of travel (i.e., round trip).

XXX/YYY/ZZZ is a three-way rotation of XXX, YYY, and ZZZ. Obviously, mean
and median are intended to be some sort of aggregate measure of travel
time. I don't have any way to measure "fairness", but SD is intended
as some metric of the variation in travel time between attendees.

This was a quick hack, so there may be errors here, but nobody has pointed
out any yet.

OBSERVATIONS
Obviously, it's hard to know what the optimal solution is without
some model for optimality, but we can still make some observations
based on this data:

If we're just concerned with minimizing total travel time, then we
would always in New York, since it has both the shortest mean travel
time and the shortest median travel time, but as I said above, this
arguably isn't fair to people who live either in Europe or California,
since they always have to travel.

Combining West Coast, East Coast, and European venues has
comparable (or at least not too much worse) mean/median values than
NYC with much lower SDs. So, arguably that kind of mix is more fair.

There's a pretty substantial difference between hub and non-hub
venues. In particular, LHR has a median travel time 7 hours less than
ARN, and the SFO/NYC/LHR combination has a median/mean travel time
about 2 hours less than SFO/BOS/ARN (primarily accounted for by the
LHR/ARN difference). [Full disclosure, I've favored Star Alliance hubs
here, but you'd probably get similar results if, for instance, you
used AMS instead of LHR.]

Obviously, your mileage may vary based on your location and feelings
about what's fair, but based on this data, it looks to me like a
three-way rotation between West Coast, East Coast, and European hubs
offers a good compromise between minimum cost and a flat distribution
of travel times.

August 22, 2011

The process of turning raw wool into fabric by hand is extremely time
consuming. Prior to the Industrial Revolution, the production process
operated in a pyramid, with a large number of carders supported a
smaller number of spinners,
supporting an even smaller number of weavers [Note: weaving
is much faster than the other two major technqiues for turning yarn
into cloth: knitting and crocheting]. I've heard varying numbers, but
Wikipedia claims
that the ratio was around 9:3:1.

Isn't it interesting, then, that when you look at the list of common
American surnames, which are often associated with occupations,
that "Weaver" appears at position 190 (.05% of the population) but
"Spinner" appears at 1/50th the rate, at position 7393 (.001%). Carder
is at 4255 (.003%); Carter is, I would assume, a different profession.
[The first 10 names, btw are: Smith, Johnson, Williams, Jones, Brown,
Davis, Miller, Wilson, Moore, Taylor].

I'm not attempting to claim that there's some direct relationship
between last name frequency and historical occupation rates, but
it's still entertaining to speculate on the cause. My initial
suggestion was that carding and spinning were more likely to be
women's work and of course in the West women's surnames don't
get propagated. Mrs. Guesswork suggests that spinning and
carding weren't professionalized the way that weaving was
[prior to the invention of the spinning wheel, spinning technology
was extremely low-tech], so you might spin or card in your spare time,
but weaving requires enough capital equipment that you would
expect it to be done professionally and thus be more likely to
get a surname attached to it.

Equally likely, of course, is that it's just coincidence, but what fun
would that be?

March 5, 2011

As I've mentioned before, a world with a lot of vampires is a world with a blood supply problem.
I recently watched Daybreakers,
which takes this seriously; nearly everyone in the world is a vampire and the
vampires farm most of the remaining humans for blood while sending out
undeath squads to round up the rest. Obviously, this isn't a scalable
proposition and sure enough the vampires are frantically trying to develop
some kind of substitute for human blood before supplies run out.

In a world where synthetic blood isn't possible, there's some maximum
stable fraction of vampires, dictated by the maximum amount of blood
that a non-vampire can produce divided by the amount of blood that a
vampire needs to survive. According to Wikipedia
blood donations are typically around 500ml and you can donate every
two months or so. This works out to about 3 liters of blood per donor
per year. Presumably, if you didn't mind doing some harm to the donors
(e.g., if it's involuntary), you could get a bit more, but this still
gives us a back of the envelope estimate. I have no idea what
vampires need, but if it's say a liter a day, then this tells you that
any more than about 1% of the population being vampires is
unstable. This is of course a classic externality problem, since
being a vampire is cool, but not everyone can be a vampire.
If we wish to avoid over-bleeding, they will need some sort of
system to avoid creating new vampires.

Luckily, this is a relatively well understood economics problem
with a well-known solution: we simply set a hard limit on the number
of vampires and then auction off the rights (cap-and-trade won't
work well unless we have some way of turning vampires back into
ordinary humans). I'd expect this to raise a lot of money which we can
then plow into synthetic research to hasten the day when everyone can be a vampire;
either that or research into better farming methods the better to
hasten the red revolution.

January 9, 2010

I'm in the market for a new motorcycle and have been looking at the
BMW R1150GS/R1200GS. Like cars, motorcycles have a lot of
depreciation the minute they pull off the lot, and because
you're fairly likely to drop your bike anyway, most people I know
figure you might as well buy pre-dropped and look for a
used model. But once you're buying used you have the problem
of figuring out how much you should pay. KBB
motorcycles isn't
much help here because the market is small and the mileage varies
a lot.

An alternate approach is to mine the available data on what
people are offering vehicles for and use this to build an
analytical model for predicting prices; this lets us figure
out what the appropriate asking (which isn't the same as fair;
more on this later) price for a new vehicle is and identify outliers
in either direction.

Below, you can find the list of the relevant bikes on sale on CL for
the past week or so:

Asking

Model

Year

Mileage

1

7650

1150GS

2002

25000

2

7900

1150GS

2001

54000

3

14500

1200GSA

2006

3700

4

8500

1200GS

2005

54000

5

13700

1200GS

2007

3658

6

7400

1150GSA

2004

60000

7

5500

1100GS

1996

23000

8

11500

1200GS

2005

12000

9

7200

1150GS

2002

40000

10

11950

1200GS

2008

29000

11

9600

1200GS

2005

39000

I used a simple OLS regression model to fit this data, using
the model year and mileage for the bike. The result is:

Our model predicts that each year the bike is on the road
it loses about $600 in value and that it loses about $76
for each 1000 miles it has. [Note that I'm treating
mileage and age as independent variables; it might make
more sense to try to estimate "excess" mileage over some
base value, but I don't have the baseline data I would
need.] In any case, we're doing pretty well here: with
only two predictors we are accounting for around 90% of
the price variation. We can see this visually by plotting
the price points against the best fit plane, as below:

Points above the plane (shown with red lines) are likely
too expensive and points below (with blue lines) are worth
checking out to see if they're good deals.

Obviously, we're excluding a lot of variables here. We haven't
captured the condition of the bike, how desperate/motivated the
seller is to get rid of it, what accessories it has, etc.
Looking more closely at the data, the two most
comparatively expensive bikes seem to come with a few
more accessories, so this may have led the owners to think
they could extract more money (I don't think this is really
true, however, since often those items are valuable only
to the original owner). For the purposes of selecting
good deals, we would also like to know how flexible the
seller's price is. It's possible that someone lowballing
the price will also be less flexible because they've
already built that discount into their price. On the
other hand, they could be more motivated, so that
could cut in the other direction.
It would be interested to get secondary data on how much
these bikes actually sell for [you could get some of that
information by seeing if repeated postings have lower prices],
but while that data is available for houses I don't think it is for bikes.

July 5, 2009

The problem with climbing grades is that unlike running,
cycling, lifting, etc. there's no objective measure of
difficulty. Routes are just graded by consensus of other
climbers, in this case the gym's routesetters. As
a result, some routes are easier than others—and of course
since different climbers have different styles, which
routes are easiest depends on the climber as well—and
as a practical matter some routes are really harder or easier
than their rated grade.1 Of course, given that there's no
objective standard, you could argue that this isn't a
meaningful statement, but that's not really true:
a difficulty grade is really a statement about how many
people can do a route, so if you have a bunch of routes
which are rated at 5.10 and I can't climb any of them
but I jump on a new route rated 5.10, and race up it with no effort, that's a sign
it's not really a 5.10. This is actually a source of real
angst to people just starting to break into a grade—at
least for me—since
if I can do it, I immediately expect that the rating
is soft.

It would be nice to have a more objective measurement of
difficulty. While we can't do this just by measuring
the route (the way we can with running, for instance)
that doesn't mean the problem is insoluble; we just need
to take a more sophisticated approach.
Luckily, we can steal a solution from another problem domain:
psychological testing. The situations are actually
fairly similar: in both cases we have a trait (climbing
skill, intelligence) which isn't directly measurable. Instead, we can
give our subjects a bunch of problems which are generally easier
the higher your level of ability. In the psychological domain, what we
want to do is evaluate people's level of ability; in the
climbing domain, we want to evaluate the level of difficulty
of the problems. With the right methods, it turns out that
these are more or less the same problem.

The technique we want is called Item Response Theory (IRT). IRT assumes that
each item (question on the test or route, as the case may be)
has a certain difficulty level; if you succeed on an item,
that's an indication that your ability is above that level. If you
fail, that's an indication that your ability is below that
level. Given a set of items of known difficulties, then,
we can can quickly home in on someone's ability, which is how
computerized adaptive tests work. Similarly, if we take
a small set of people of known abilities and their performance
on each item, we can use that to fit the parameters for
those items.

It's typical to assume that the probability of success on each
item is a logistic curve. The figure below shows an item
with difficulty level 1.

Of course, this assumes that we already know how difficult
the items are, but initially we don't know anything: we just
have a set of people and items without any information
about how good/difficult any of them are.
In order to do the initial calibration we start by collecting a
large, random sample of people and have them try each item. You end
up with a big matrix of each person and whether they succeeded or
failed at each one, but since you don't know how good anyone is other
than by the results of this test, things get a little complicated. The
basic idea behind at least one procedure, due to Birnbaum,
(it's not entirely clear to
me if this is how modern software works; the R ltm documentation is a
little opaque) is to use an iterative technique where you assign
an initial set of abilities to each person and then use that to
estimate the difficulty of each problem. Given those assignments,
we can re-fit to determine people's abilities.
You then use those estimates to
reestimate the problem difficulties and iterate back and forth until
the estimates converge, at which point you have estimate of
both the difficulty of each item and the ability of each
individual.
(My description here is based on Baker).

As an example I generated some toy data with 20 items and 100 subjects
with a variety of abilities and fit it using R's
ltm
package. The figure below shows the results with the response
curves for each item. As you can see, having a range of items with
different difficulties lets us evaluate people along a wide range
of abilities:

Once you've done this rather expensive calibration stage, however,
you can easily calculate someone's abilities just by plugging in
their performance on a small set of items. Actually, you can
do better than that: you can perform an adaptive test where
you start with an initial set of items and then use the
response on those items to determine which items to
use next, but even if you don't do this, you can get results
fairly quickly.

That's nice if you're administering the SATs, but remember
that what we wanted was to solve the opposite problem: rating
the items, not the subjects. However, as I said earlier,
these are the same problem. Once we have a set of subjects
with known abilities, we can use that to roughly calibrate the
difficulty of any new set of items/routes. So, the idea
is that we create some set of benchmark routes and then
we send our raters out to climb those routes. At that
point we know their ability level and can use that to
rate any new set of climbs.

There's still one problem to solve: the difficulty ratings we
get out of our calculations are just numbers along some
arbitrary range (it's conventional to aim for a range
of about -3 to +3 with the average around 0), but we want
to have ratings in the Yosemite Decimal System (5.1-5.15a as
of now). It's of course easy to rescale the difficulty
parameter to match any arbitrary scale of our choice, but
that's not really enough, because the current ratings are
so imprecise. We'll almost certainly find that there
are two problems A and B where A is currently
rated harder than B but our calibrated scale has B harder
than A. We can of course choose a mapping that minimizes
these errors, but because so many routes are misrated it's probably better to start with a
smaller set of benchmark routes where there is a lot of
consensus on their difficulty, make sure they map correctly,
and then readjust the ratings of the rest of the routes
accordingly.

Note that this doesn't account for the fact that
problems can be difficult in different ways; one
problem might require a lot of strength and one
require a lot of balance. To some extent, this is
dealt with by the having a smooth success curve
which doesn't require that every 5.10 climber be
able to climb every 5.10 route. However, ultimately
if you have a single scalar ability/difficulty
metric, there's only so much you can do in this
regard. IRT can handle multiple underlying abilities, but
the YSD scale we're trying to emulate can't, so
there's not too much we can do along those lines.

Obviously, this is all somewhat speculative—it's
a lot of work and I don't get
the impression that route setters worry too much about the
accuracy of their ratings. On the other hand, at least
in climbing gyms if you were
able to integrate it into a system that let people keep
track of their success in their climbs (I do this already
but most people find it to be too much trouble), you
might be able to get the information you needed to
calibrate new climbers and through them get a better
sense of the ratings for new climbs.

Acknowledgement: This post benefitted from discussions with
Leslie Rescorla,
who initially suggested the IRT direction.

1. This seems to be especially bad for very
easy and very hard routes. I think the issue with very easy
routes is that routesetters are generally good climbers and so
find all the routes super-easy. I'm not sure about harder
problems, but it may be that they're near the limit of
routesetters abilities and so heavily dependent on whether
the route matches their style.

March 27, 2009

Sorry about the lack of content last week—was at
IETF and just didn't have time to write anything.
I should have some more material up over the weekend.
In the meantime, check out this photo of the bathroom
sink at the Hilton where we were having the
conference:

That thing to the left of the sink is an automatic soap dispenser
(surprisingly, powered by a battery pack underneath the sink). Now
notice that the sink itself is manually operated. Isn't this kind of
backwards? The whole point of automatic soap dispensers and sinks in
bathrooms is to appeal to your OCD by freeing you from having to touch
any surface which has been touched by any other human without being
subsequently sterilized. But when you wash your hands, the sequence of
events is that you turn on the water, wet your hands, soap up, rinse,
and then turn off the water. So, if you have a manually operated
sink, people contaminate the handles with their dirty, unwashed
hands, which means that when you go to turn the sink off, your
just-washed hands get contaminated again. The advantage of
automatic faucets, then, is the automatic shutoff, which
omits the last stage.
By contrast, having the soap
dispenser be automatic doesn't buy you that much
because you only need to touch it before
washing your hands. There's probably some analogy here to
viral spread in computer systems, but for now let's
just say that this is how security guys think.

February 6, 2009

Dan Savage addresses the difficult ethical issue of the mutual obligations of the laptop user
and the coffee shop in which he works:

Don't want people to sit in your cafe with their laptops? There's a
simple solution: don't have WiFi. But if you're going to have WiFi
then for fuck's sake have fucking WiFi. And if your WiFi isn't
working, if it's down and it's gonna be down all day, you might wanna
mention that to people before they wait in line, buy a coffee, leave a
tip, sit down, and pull out their computers. Because then each and
every one of those computer users is going to walk up to the counter
and ask if you have WiFi. It's an asshole move to look at each laptop
computer user/customer in turn like they've just asked you if you have
herpes. And if it really kills you to sneer out, "Yeah, we have WiFi,
but it's down," then put a little sign on the door that says the
WiFi's out. Then laptop users won't bother you with their questions,
their presence, or their patronage.

UPDATE: And laptop users? Tip based on the amount of time you intend
to spend in the cafe, not on the price your beverage; buy your
refills; share tables; and always remember that you're not actually in
your office.

I occasionally work in coffee shops, so this is a topic I've given
some thought to.
I think it's pretty clear that there's some implicit obligation
for patrons to fork over some money occasionally and not just sit
at a table (yes, yes, I realize that there's no contract requiring
you to do so, but think about the equilibrium issues here: if nobody
ever paid for their drinks you can bet that coffee shops
would start forcing you to rent tables.) But this doesn't tell
you how much to spend or how to allocate your payments between
the coffee shop and the staff.

If the shop is pretty full,
I think it's reasonably clear: you're depriving the shop of
space that could be used by paying customers so you should
be buying a bit more than the average customer. The same
logic holds for the staff, since presumably those customers
would tip.
If the shop is mostly empty, though, the situation seems a
little more complicated. You're not costing the shop any
money and WiFi is basically free for the shop to offer
(the router is cheap and the Internet service is a fixed cost.)
That doesn't mean you don't need to fork over any money,
since, as I said, there's an implicit obligation, but
I have no idea what the right amount is. I usually buy a drink
when I come in and then maybe one every hour or two.
It's not clear how much to tip the staff either: their work
scales with the number of drinks you order, so my instinct
is whatever fraction of your food and drinks you usually
would tip.

As far as the shop's obligation to you, the flip side of the implicit
contract is that they will offer you Wi-Fi ("Wait", I hear you object,
"why should you even think they have Wi-Fi, let alone rely on it?"
That seems simple: some coffee shops advertise it and even in shops
which don't many if not most of the customers are regulars and
so know it's provided and often went to the shop explicitly to work.).
Obviously, that doesn't mean it needs to work perfectly, but if they
know it's hosed they should probably tell you before you've plonked
down your money.

January 30, 2009

While listening to KQED's latest pledge drive, I noticed
something funny about their thank you gift
schedule.
This
time, they offered the option to have you not take any
gift but instead donate it to the SF Food Bank..
The schedule looks like this:

Donation ($)

Meals

40

2

60

5

144

33

360

180

This seems strangely non-linear, which suggests something
interesting, namely, that the fraction of your pledge that
KQED uses to pay for thank you gifts as opposed to using to
fund their operations. There's way too few points here to do
a proper fit but I can't help myself. Playing around with
curves a bit, a quadratic seems to fit pretty well,
with parameters: Meals = .0014 * Donation^2 + 1.2.
It's not just the
$360 data point that throws it out of whack, either.
There's apparent nonlinearity, even in the first three points.
(Again, don't get on me about overfitting: with only four
points there's only so much you can do.)
I'm not sure what this suggests about their business model. Naively,
I would have expected the fraction of your donation that goes to
gifts to go down as your gift went up. Indeed, you might
have thought that they would take a small loss on the smallest
pledges just to get people involved and then move to the upsell
at some later date.
Thinking about it some more, I guess the natural model is that
KQED as trying to extract money from you up to the point where
the marginal dollar they extract from you costs them a marginal
dollar in gifts (or in this case food bank donations)
at which point they stop. So, as people's marginal
utility of having given something, anything, to KQED declines, they need to
keep jacking up gift quality faster than the size of the donation
to keep extracting your cash. Other theories are of course welcome.

January 19, 2009

Mrs. G. and I were up in San Francisco last weekend and while
on our way to Fog City News
we ran into someone we knew. This was sort of surprising,
so I got to thinking about how probable it was (or wasn't).
Grossly oversimplifying, my reasoning goes something like
this:

The population of San Francisco is about 800,000. Let's call it 10^6.
I know perhaps 100 people in the city at any given time. There are
maybe 20-50 people on any given stretch of city block. Say I walk for
an hour at 3 mph and that the average block is 100m long, so I walk
for 50 blocks in that time and pass on the order of 10^{3} people.
If we assume people are randomly distributed
(this is probably pessimistic, since I know that I spend
most of my time in SF in a few places and I assume my friends
tend to be somewhat similar) then
I have a .9999 chance of not knowing any given
person I run into. If we assume that these are independent
events then I have a .9999^{1000} chance of not knowing
any of those people [technical note: this is really
(999900/1000000) * (9998999/999999) * ..., but these
numbers are large enough and we've made enough other
approximations that we can ignore this.] .9999^1000 = .90
so if I walk around the city for an hour, I have about
a 1/10 chance of meeting someone I know.
That doesn't sound too far out of line.

December 29, 2008

One of Slate's odder sections is the "Green Lantern", where
they take on some simple question like "should I buy a natural
or artificial Christmas Tree" and try to analyze it from an
environmental perspective. The most recent
article asks
whether you should throw away your leftovers or flush them
down the garbage disposal. Unfortunately, the articles tend
to be pretty useless: sometimes they have a real answer
but often they thrash around for a while giving you
the pros and cons of each option and conclude that maybe you
should do A and maybe you should do B:

The research is unambiguous about one point, though: Under normal
circumstances, you should always compost if you can. Otherwise, go
ahead and use your garbage disposal if the following conditions are
met: First, make sure that your community isn't running low on
water. (To check your local status, click here.) Don't put anything
that is greasy or fatty in the disposal. And find out whether your
local water-treatment plant captures methane to produce energy. If it
doesn't--and your local landfill does--you may be better off tossing
those mashed potatoes in the trash.

Or maybe not... Here's another example:

If these ideas don't excite you, the Lantern recommends putting the
new cash toward insulating your family's home. Of course, whether this
makes sense depends on your local climate and whether you buy or
rent. (Likewise, the current state of your home will determine just
how much insulation your $100 will buy.) For the rest of you, it might
be wisest to replace any antiquated, energy-inefficient appliances you
might have--along the lines spelled out here. (Let's put aside the
complicated question of carbon offsets, which will be addressed in a
future column. Suffice to say that they wouldn't be the Lantern's
first choice.)

I'm not saying I can do any better; rather I think this is reflective
of a systemic problem with this kind of overall cost/benefit analysis.
While it's possible to measure the power consumption, carbon
emissions, etc. of any particular microactivity, it's pretty hard
to do an overall cost/benefit analysis of whether you should do
A or B when each of them consists of a whole bunch of individual
activities, all of which require their own analyses. The economist
type answer is to levy Pigouvian taxes on each individual component
(e.g., carbon taxes) and then let the market sort things out.
I don't know if that would work any better, though, but I don't see
people being able to do this kind of analysis for each individual
purchasing decision either.