Apportionment and rounding schemes

How many congressmen
should each US state have (given that the state populations are known)?

In "party list" PR voting systems, how many
seats should each party get (given that the number of votes for each party is known)?

And it also arises in pure math in the area of simultaneous diophantine approximation.

In both cases, the problem is that you cannot have 3.7 congressmen – it is necessary to
round to integers.
Actually, by giving congressmen unequal vote-weights in the legislature,
"3.7 congressmen" would
be possible in some sense. But we shall ignore that.

Let Pk be the (known) population of state k, and let Sk be
its number of congressmen, where the total number
S=∑kSk
of congressmen is fixed by US law to be
S=435.

The question is: what should the Sk be?

Hamilton/Vinton method

Let P=∑k Pk be the total population.
Start with Sk = ⌊Pk · S/P⌋
(rounding every ratio down).
Then if the total number of seats is too small, start awarding extra seats
to the states pre-sorted in decreasing order by rounded-off fractional parts, until
the correct total number of seats is achieved.
Hamilton's method has a very appealing "quota property":
Each state gets either its "lower quota"
⌊Pk · S/P⌋
or its "upper quota"
⌈Pk · S/P⌉
worth of seats.

The USA abandoned the Hamilton/Vinton method after the
"Alabama paradox" arose. That is:
With fixed population proportions,
Hamilton's method can cause a state to lose a seat as a consequence of
an increase in the total number of seats.
I.e. it violates "house size monotonicity."
While that may not matter now that the US House has its size fixed at 435 seats...
Hamilton also violates population-pair monotonicity, i.e.
it can cause state A to lose seats relative to
state B even though A's population expands and B's contracts!
Here are some examples of that:

The following may be the simplest example where
Hamilton disobeys house monotonicity ("Alabama paradox").
If we increase the total number of seats from 4 to 5, state C
loses a seat.

State

Population

#seats(4)

#seats(5)

A

5

2

3

B

3

1

2

C

1

1

0

The Alabama paradox was discovered in 1880, when it was found that increasing the total
number of seats would decrease Alabama's number from 8 to 7.
It is expected to occur about once every 8 apportionments
in a model involving
exponentially distributed state populations.

Hamilton disobeys population-pair monotonicity.
After people move around,
state C gains 5.1% population (largest % gain) while state B is the only decreaser; but
C loses a seat (only seat-loser) while B keeps the same number of seats.

State

Population(I)

#seats(I)

Population(II)

#seats(II)

Pop.Change%

A

554

5

566

6

+2.2%

B

290

3

270

3

-6.9%

C

156

2

164

1

+5.1%

total

1000

10

1000

10

0%

Balinski & Young
[Fair Representation: Meeting the Ideal of One Person, One Vote (2nd edition),
Brookings Institution Press 2001]
pointed out that, in the year 1900, Hamilton/Vinton gave
Virginia 10 seats and Maine 3,
whereas in 1901, Virginia would have gotten 9 seats and Maine 4,
despite
the fact that the Virginia/Maine population ratio was 2.67 in 1900 and 2.68 in 1901.
[1901 populations based on linear interpolation between 1900 and 1910].

Desirable-sounding Properties which an apportionment method may or may not satisfy

House monotonicity:
if population of whole country goes up (preserving proportions) no state can
lose seats. [No longer matters much for US congress since total #seats now fixed at 435.]

Population-pair monotonicity:
If population of state A increases but state B decreases, then
A should not lose seats while B stays the same or gains seats.
More generally, if A's percentage population change exceeds B's, then
A should not lose seats while B stays the same or gains seats.

Quota:
Always should give a state either A or A+1 seats (where the ideal ratio lies between).

Theorem (Balinski & Young):
All "divisor methods" (and, essentially, only divisor methods)
are both House and population-pair monotone;
but they all disobey quota.
(However, these disobeyals can be rare. Webster, Dean, and the Huntington-Hill method
have never violated quota on any US states dataset so far, and according to
some vaguely-described computer experiments by B&Y would be
expected to do so only with probability 0.00061, 0.00154, and 0.000286 respectively,
which in the case of Webster would be once every 1600 apportionments, i.e. once every 16000 years.
My own computer experiments
agree Webster is by far the least likely of the 5 classic
divisor methods to suffer a quota violation, and also agrees Hill and Dean are
in second and third places
respectively, but estimates the quota-violation
probability for Webster as 0.0016, which far exceeds
B&Y's estimate but still is 6300 years between violations.
Meanwhile Hamilton exhibits a monotonicity paradox about once every 18 apportionments
in B&Y's same model.
B&Y also showed that
divisor methods avoid paradoxical results when new states are added.)
Webster also satisfies quota if there are ≤3 states (but the others among
the classic five divisor methods only do so if there are ≤2 states).

Meanwhile, Hamilton satisfies quota but disobeys both monotonicity properties.
That leads to the question of whether an apportionment method exists that satisfies all three
properties. The answer
is "no" – the last two
properties are incompatible – but
"yes" if we only ask for the first and third properties
(new method found by Balinski & Young in 1974).

Here's Balinski & Young's 1974 apportionment method which is both house-monotone and
obeys quota:
Let Pk
denote the population, and Sk the number of seats assigned so far, to state k.
We initially assign all states 0 seats: Sk=0.
We now repeatedly award a new seat to whichever state maximizes Pk/(1+Sk)
subject to the constraint that adding this new seat not violate the "upper quota" condition,
i.e. seats are not awarded that cause
Sk≤⌈Pk∑Sk/∑Pk⌉
to be violated.
We just keep awarding new seats in this way, never violating quota (which B&Y proved trivially
can always be done), until the correct
total number ∑Sk results.
Although Balinski & Young initially recommended this method, ultimately they
rejected their own invention
because it fails population-pair monotonicity.
Also G.D.Birkhoff in 1976, and Cyril Carter in 1982, both claimed
based on empirical numerical experiments, that this method was "biased"
in favor of large-population states. (Birkhoff and Carter's claims were
not very convincing because the
number of their experiments were very small.)
Jonathan W. Still [SIAM J. Applied Math. 37,2 (1979) 401-418]
then found an infinite class – in fact the entire class – of house-monotonic and
quota-obeying apportionment methods (Balinski & Young's method is just one
member of Still's class). All of Still's methods are of the following form:

Start with all seat-counts Sk=0.

Find the "eligibility set" E (which, Still proved, is always nonempty)
of states for which (a) you can give that state another seat without violating quota, and (b)
a tricky "lookahead condition" explained by Still
assures that if you do so, it will not force a quota violation
later (throughout the infinite future of seat awards).

Pick a member of E (in any way at all, including random, or to maximize some arbitrary
"quality function")
and award that state an additional seat (increment Sk for that k).

Go back to step 2 until the total number of seats reaches the desired value S.

Still eventually recommended a house-monotone version of Hamilton's method in which the state
you award an extra seat to in step 3, is the one that currently is
the largest additive amount below its upper-quota.

Why is it impossible for any method to satisfy both quota
and population-pair monotonicity?
Consider the following 18-seat apportionment
problem in which both
Webster and Huntington-Hill give the same seat-counts –
but both violate quota (violation shown in red) – actually every "divisor method"
violates quota in this scenario:

State

Population

Quota

#seats(Webster&Hill, use divisor=8.43)

A

13

1.625

2

B

13

1.625

2

C

13

1.625

2

D

105

13.125

12

Total

144

18

18

Now consider any method which tries to satisfy quota by means of A, B, or C giving up
a seat to D.
We claim that whichever one you pick (since they are all the same), you can use it to
construct a population-pair paradox: say you picked A. Now move the people
around so as to decrease A's and D's populations and increase B's and C's
(with A having the largest decrease, percentagewise). Then the obvious apportionment
will yield a population "paradox" since D would give
up a seat to A despite A's faster population decrease. To exhibit that in detail:

State

Population

Quota

#seats(Any quota-obeying method)

A

130

1.625

1

B

130

1.625

2

C

130

1.625

2

D

1050

13.125

13

Total

1440

18

18

and then after 60 people move (decreasing D's population by 5.0% and A's by 5.4%):

State

Population

Quota

#seats(Hamilton, Hill, Webster, Dean, Adams)

A

123

1.5375

2

B

160

2

2

C

160

2

2

D

997

12.4625

12

Total

1440

18

18

...which is a population-pair paradox in the sense that D lost a seat to A despite
A's largest-percentage population decrease. This does not actually prove
that every quota-obeying method must exhibit such a paradox – for that see
theorem 6.1 of B&Y's book – but it at least makes it plausible.

Splitting & combining incentives:
Jefferson's method (which is equivalent to d'Hondt's method)
"encourages mergers" because by merging into one,
two states can gain seats but
never lose them.
Theorem (see Balinski & Young thms 9.1 & 9.2)
Jefferson is indeed the unique population-pair-monotonic
apportionment method
that offers such a combining-incentive (and as a free bonus, Jefferson also always obeys
the "lower quota" condition).
(B&Y proposition 6.4):
Adams is exactly the opposite: with Adams, states can gain seats by splitting,
but never lose seats (except once the number of states exceed the number of seats,
Adams becomes undefined), and it is the unique population-pair monotone method
that thus-offers splitting-incentive (and as a free bonus, Adams
also always obeys the "upper quota" condition).

Bias:
Vaguely speaking, if a method "tends to favor small-population states"
(or large-population ones) it is "biased."
There are, however, many different possible formulas for assessing such
"bias" and "favortism."

Theorem:
Hamilton, Webster, Dean, and Hill all are "limit unbiased." That is,
if the state populations are selected independently at random from some continuous and
bounded probability density, the number of states is held fixed,
and the total #seats is taken to infinity, then
(almost surely) the bias (measured as a correlation coefficient)
will tend to 0. But Jefferson and Adams are not
limit unbiased; they favor large- and small-population states, respectively.

Near the quota:
A method is "near the quota" if no transfer of a seat from one state to another
can bring both states nearer to their exact ratios. (Birkhoff 1976
instead called this "binary consistency.")
Balinski & Young (their theorem 6.2) proved Webster to
be the unique population-pair-monotone method that is always "near the quota."
Further, they proved this regardless of whether "nearer" is measured either in terms of
additive or relative difference.
But Ernst (his theorem 4.3)
countered with an analogous theorem indicating that Huntington-Hill was "near the ideal"
in the sense that for it, no transfer of a seat from one state to another
can diminish certain other measures of departure from ideality.
Hamilton/Vinton also is "near the quota," but (Birkhoff 1976 pointed out)
Balinski & Young's quota-obeying method is not.

Uniformity:
Balinski & Young [Proc. Nat'l. Acad. Sci. USA 73 (1980) 684-686]
defined an apportionment method to be "uniform"
if when we restrict to a state-subset, the apportionment within this sub"country" still is a valid
apportionment under the same apportionment rule. They showed every method which is both
uniform and population-pair monotone, must be a "divisor method."
They also showed no method exists that obeys both uniformity and quota.

The 5 classic "divisor methods"

A "divisor method" gives a state Sk = r(Pk/Q)
seats where r is the appropriate rounding function for that method, and Q is a
"divisor," i.e. a number chosen to make the total number
∑k Sk of seats come out right.
(Here r is a monotonic-increasing integer-valued function of the real numbers.)

Note Hill, Dean, and Adams
"naturally" enforce
a one-seat minimum.
E.V.Huntington in 1928
[The Apportionment of Representatives in Congress,
Transactions Amer. Math'l. Soc. 30,1 (Jan 1928) 85-110
(pdf); this method had however also been considered earlier
by Sainte-Laguë
in 1910]
showed that these "classic five" were the only divisor methods
in which no transfer of a seat from one state to another can decrease
some "measure of inequity" where this measure was required by Huntington to
be an absolute value of the difference of two rational functions
of the two state's populations and seat counts, with degree≤1
numerator or denominator in each, and such that equal seat/population ratio would
zero this difference. However, Huntington did not show
(contrary to misleading claims in B&Y's book) this for more-general "measures of inequity"
such as more-complicated rational functions or non-rational functions.
Therefore I deny the idea that these classic five methods are somehow uniquely favored
and all other divisor methods should be excluded from consideration.

Although it is not a "divisor method,"
Hamilton/Vinton can be thought of in a similar way to those methods
if we use an additive rather than multiplicative, "divisor." That is,
you find a real value Δ such that if it is added to all
the ideal (non-integer) seat-counts, then they are rounded off to nearest integers, you get the
correct total number of seats.

Unfortunately, the divisor methods can fail –
that is, a divisor that leads to the right total number of seats can simply fail to exist.
Hamilton's method (if rephrased as a pseudo-divisor method as above) also can fail.
For a simple example, consider 5 seats to be divided among 2 states with
exactly equal populations.
Any apportionment will necessarily be unfair (e.g. 3 and 2) and the divisor methods
simply cannot
do that. But with some random tiebreaking in such circumstances, we are ok; and in practice
exact population ties are extremely rare
and have never occurred, so this has not been an issue.
[In our computer simulations no such failure ever occurred.]

Connection to d'Hondt, Droop, and Sainte-Laguë methods

J.Q.Adams, T.Jefferson, A.Hamilton, and D.Webster were famous statesmen in early US history
and devised the methods named after them.
Edward V. Huntington,
Joseph Hill, and James Dean were American mathematicians.
The Europeans later encountered the same
mathematical problem but in guise II, and their mathematicians and politicians
invented their own solutions. Some of
the European solutions were equivalent to the American ones:

Jefferson = d'Hondt (idea also known to Henry Droop):
After all the votes have been tallied, successive quotients
are calculated for each list. The formula for the quotient is V/(s+1), where:

V is the total number of votes that party received; and

s is the number of seats that party has been allocated so far
(initially 0 for all parties)

Whichever party has the highest quotient gets the next seat allocated,
and their quotient is recalculated given their new seat total, i.e. its s is incremented.
The process is repeated until all seats have been allocated.

Webster = Sainte-Laguë = "method of odd numbers":
like the d'Hondt method, but with a different denominator in the quotient formula,
namely 2s+1 in place of s+1. (Incidentally, Webster could instead be defined using s+½
as the denominator. Rescaling a denominator sequence by any positive constant scale factor,
does not change the system.)

Of these, Imperiali is very favoring to large-population states (strictly more than Jefferson).
Modified Sainte-Laguë favors large states and disfavors small ones slightly
versus ordinary Sainte-Laguë; it differs from it only for
states (or parties, in the party-list usage) having ≤1 congressman.
Danish favors small states.
The Estonia and Macau methods actually are nonproportional and favor large and small states
respectively.
Every divisor method can be rewritten as an algorithm of d'Hondt type; you simply
make the d'Hondtian denominator sequence be the same as
(or any positive constant multiple of)
the sequence of "rounding thresholds"
for the divisor method.

Pairwise optimality properties

Let Sk be the number of seats for state k and
Pk be its population.

For all j,k, Adams minimizes |Sj - Sk Pj/Pk|.

Dean minimizes |Pj/Sj - Pk/Sk|

Huntington-Hill minimizes |Sj Pk/(Sk Pj) - 1|

Webster minimizes |Sj/Pj - Sk/Pk|

Jefferson minimizes |Sj Pk/Pj - Sk|

Global optimality properties

Among all possible ways to assign non-negative integers to the Sk summing to the
desired value S,

Webster minimizes
∑kPk(Sk/Pk-S/P)2
which corresponds to the idea
that individual people have "as equal shares (of congress) as possible," in the sense
of minimizing the variance of the seat-share, according to
André Sainte-Laguë 1910 and F.W.Owens in 1921 (who both argued this independently)
[Owens: On the Apportionment of Representatives,
Quarterly Pub. Amer. Statist. Assoc. 17,136 (Dec 1921) 958-968,
this theorem about "method W" minimizing the "total injustice" is stated without proof
on his page 964].
Equivalently,
given that S=∑kSk,
P=∑kPk, and the Pk
are regarded as fixed,
Webster minimizes
∑kPk(Sk/Pk)2=∑k(Sk)2/Pk.

Huntington-Hill minimizes
∑kSk(Pk/Sk-P/S)2
which corresponds to the idea
that district populations should
be as equal as possible (in the sense of minimizing their variance),
according to Huntington 1928, also stated by Owens on his p.965.
Comparing this objective function versus Webster's, it seems to me that Huntington-Hill's is
"morally wrong." That is, according to Huntington-Hill's, leaving a single person unrepresented
is "infinitely bad." In reality it is only finitely bad. That suggests to me
that Huntington-Hill's
minimization is morally distorted.
(The same criticism may be made of Adams below.)

Jefferson minimizes maxkSk/Pk, i.e.
minimizes the largest individual share of congress.
(Balinski&Young). Actually, it is stronger than this; it is not just the single max
that matters; if we remove the maxes, then the next max also gets
minimized by Jefferson's apportionment
of the remaining seats, and so on recursively.

Jefferson also minimizes ∑k(Sk/Pk-S/P)+
where (x)+≡max(x,0).

Adams minimizes maxkPk/Sk, i.e.
minimizes the largest district-population.
(Balinski&Young). Actually, it again is stronger than this in the same way.

Lemma 7.1 (p.117), lemma 6.1 (p.94) and theorem 6.1 (p.95) of
the book
Evaluation and Optimization of Electoral Systems,
by Pietro Grilli di
Cortona, Cecilia Manzi, Aline Pennisi, Federica Ricca, and Bruno
Simeone [SIAM Monographs on Discrete Mathematics and Its
Applications, Philadelphia, 1999] are powerful tools for devising and proving more
global optimality theorems.
Assuming they are correct
and I've used them correctly (which is not certain so please regard the
following as preliminary), we can prove the following new results:

Let H(n)=∑1≤m≤n(1/m) be the "Harmonic numbers."
Then Dean maximizes ∑k[2H(Sk)-1/Sk]Pk.

The new
divisor method based on the rounding function r(x)=⌈x-d⌉
where d is an arbitrary
constant with 0≤d≤1
(the case d=½ is Webster, d=0+ is Adams, and d=1- is Jefferson),
minimizes

∑k(Sk - Qk + d - ½)2/Qk.

In 2016 Toby Pereira conjectured something more complicated than, but equivalent to,
the following – and I confirmed it in the nonrigorous
sense that my computer failed to find
a counterexample – the following new optimality property of Jefferson=d'Hondt:

Jefferson minimizes
∑kPk[(Sk+½)/Pk]2=∑k(Sk+½)2/Pk.

Then I realized this conjecture is implied by the statement #3 just above it,
in the case d=1-. Indeed, given that S=∑kSk,
P=∑kPk, and the Pk
are regarded as fixed,
more generally it implies that

∑k(Sk - Qk + d - ½)2/Pk
and
∑k(Sk+d-½)2/Pk

both are globally minimized by the
divisor method arising from the rounding function r(x)=⌈x-d⌉
where d is an arbitrary
constant with 0≤d≤1.

Balinski & Young in their 1982 book Fair representation,
and
Sainte-Laguë 1910, both provided proofs of the Owens 1921
theorem about Webster.
When I looked them up I immediately saw that
their arguments generalize immediately to prove this general-d
theorem. The resulting proofs have
the same validity (or lack thereof) as their original proofs.

Geometric interpretations

Geometric interpretation of Webster:
The vector V of ideal (albeit not necessarily integer) seat-counts for the N states is
an N-vector constrained to lie on the hyperplane with sum=#seats.
We wish to replace it with a lattice point, i.e. all-integer N-vector,
which also lies on this hyperplane.
Webster chooses the hyperplane lattice point nearest to the line that passes
through both V and the origin 0.

Geometric interpretation of Hamilton/Vinton:
It simply finds the closest hyperplane lattice point to V
(actually not only in the usual L2 distance, but also in any Lq distance
for any q≥1).

Here's another method (the "Hilbert method" to put a name on it):
find the nearest hyperplane lattice point, but "nearest" in
the Hilbert projective metric
rather than the usual Euclidean metric. This metric's
distance function is

HPdist(X, Y) = log maxj,k (Xj Yk) / (Xk Yj).

giving the Hilbert distance between two N-vectors X and Y with all-positive entries.
In other words, for all j,k, Hilbert minimizes
|log (Pj Sk) / (Pk Sj)|.

New claim: the Hilbert and Huntington-Hill methods are equivalent.
(Incidentally, it is better than this, in the sense that it is not only the max that matters;
if we remove that max, then the next max also gets
minimized by Huntington-Hill's apportionment
of the remaining seats, and so on recursively.)

Truly Unbiased Methods

Based on computer numerical experiments,
Webster seems to be, on average over a large number of different randomized situations,
the "least biased" of the above divisor methods...
and we have a new method which can argue to be even less biased...
but all of the methods discussed above, including Webster and the new one,
are "biased" in some situations.
That is, we can always set up an artificial scenario in which the
"small states" always get more than their fair share of seats and the "big states" always get
less (or the reverse) with any of these methods.

However, it is possible to devise a truly unbiased method by using
randomized rounding.

Randomized rounding refers to the idea of rounding a number like 3.7
upward to 4 with probability 0.7
and downward to 3 with probability 0.3.
(More generally you round a number with fractional part x, 0<x<1 upward with probability x.)
Note that this preserves the expected value (here 3.7) of
the original number, i.e, in that sense is exactly unbiased.
We can choose the quantity we wish to be unbiased, round it in (an appropriate
version of) this manner for every state,
and then if the right total number of seats results, we quit (otherwise we do it
again with new randomness).

This idea will truly be unbiased –
but unfortunately it is randomized, not deterministic.

Another idea, suggested to me by Raphfrk, is
to make corrections over time. That is, we keep track for each state of
its historical "bias budget." If that state has historically had too many seats, we
round it down next time; if it has historically had too few seats, we
round it up next time. There are various ways to do that, but the point
is that averaged over time (and assuming the state-population changes are not insanely wild)
such a technique guarantees exact unbiasedness.