Tuesday, January 29, 2008

Once again, please bear with me on this one, as sometimes it's necessary to endure a tedious first half to fully understand the more interesting second half. Those of you who've seen 'Cloverfield' know what I'm talking about.

I begin by introducing the Gini Index. Developed in the early 1900's by Italian mathematician Corrado Gini, the Gini Index is a statistic oft cited by the UN and other such organizations to measure the inequality of income distribution in countries around the world. It's calculated by taking everybody in a given society, lining them up from left to right in order of increasing income, cumulating their income as you go down the line, and measuring how much this running total differs from what the running total would be if the society's collective income were evenly distributed among all members. The end result is a value between 0 and 100, where 0 is perfect equality (everybody has exactly the same income) and 100 is perfect inequality (the collective income of the entire society belongs to only one member - everybody else makes zero). If you want more details on calculating the Gini Index, click here.

For those readers who don't use the Gini Index on a day-to-day basis (what rock do you live under?), below is a sampling of the Gini values for income from selected countries:

Country:

Gini Index:

Denmark

24.7 (lowest in world)

India

32.5

United States

40.8

China

44.7

Brazil

58.0

Namibia

74.3 (highest in world)

So Denmark is the country with the most evenly distributed earning power among its citizens. Namibia on the other hand has the largest imbalance, with huge amounts of wealth concentrated among very few citizens. The United States lies somewhere in the middle.

Now that your Gini value processor is somewhat calibrated, let's employ the Gini Index to things it was never intended to be used for, such as the home run distribution of baseball teams. If we treated baseball teams as countries and home runs as income, we could quantify how much of a team's power is concentrated among the few or how evenly it is spread across the lineup.

Here are the 30 MLB teams, ranked in order from most evenly distributed to least evenly distributed (most concentrated) home run hitting. I've also added a column "HR Total Rank" to indicate how the teams ranked in terms of total home runs hit:

HR Distribution Rank:

Team

HR Total Rank:

HR Distribution (Gini Index):

Similar to Income Distribution of:

1 (most distributed)

Texas

8

30.8

Netherlands

2

Atlanta

12

36.0

Italy

3

Baltimore

23

37.2

Vietnam

4

Seattle

20

37.8

Latvia

5

Oakland

13

37.9

Jamaica

6

Detroit

13

38.2

Portugal

7

Boston

18

39.1

Israel

8

Kansas City

30 (fewest HRs)

39.4

Burkina Faso

9

Pittsburgh

22

39.5

Morocco

10

LA Dodgers

26

40.3

Trinidad and Tobago

11

Cleveland

9

41.1

United States

12

San Diego

14

41.6

Senegal

13

Arizona

15

41.9

Thailand

14

Tampa Bay

7

42.8

Iran

15

NY Yankees

4

43.4

Hong Kong

16

Milwaukee

1 (most HRs)

44.2

Venezuela

17

Cincinnati

3

44.3

Camaroon

18

Washington

27

44.4

Ivory Coast

19

Toronto

19

44.6

China

20

NY Mets

11

45.6

Rwanda

21

St Louis

24

46.4

Philippines

22

Florida

5

47.6

Mexico

23

Colorado

16

48.0

Madagascar

24

San Francisco

25

48.5

Malaysia

25

LA Angels

28

50.3

Gambia

26

Houston

17

50.4

Malawi

27

Philadelphia

2

50.7

Niger

28

Chicago Cubs

21

52.5

Argentina

29

Chicago White Sox

21

53.4

Chile

30 (least distributed)

Minnesota

29

62.5

Sierra Leone

So the Texas Rangers are the Denmark of Major League baseball (although statistically they are closer to the Netherlands), topping the list of most evenly distributed HR production. Their top 6 players account for just over half of the team's home runs. Contrast that with Minnesota, where it takes only their top 2 guys (Torii Hunter and Justin Morneau) to account for half. The Dodgers rank 10th on the list, with nobody demonstrating great power but with 7 guys producing moderate power. They boast the distinction of being the Trinidad and Tobago of baseball.

Another thing to note is that there's no obvious correlation between HR frequency and HR distribution. While Minnesota ranked at or near the bottom in both categories, many of both the best and worst power teams (Milwaukee, Cincinnati, NY Yankees and Washington, LA Dodgers, Kansas City) congregated around the middle of the HR distribution rankings, as Orel pointed out.

Well that's all the insight I have for now...if you have any, please share. Thanks for reading through.

7
comments:

Not to my knowledge. I suspect it'd be tough to meaningfully fit the two on a common scale. All else being equal, I think most would agree that for total homers, the higher the better. But for homer distribution, all else being equal, it's not as clear which is better, being more dispersed or more concentrated. It'd be like trying to combine mean and standard deviation into one stat. There's probably something related out there in the world of math.

well winning % is probably dominated by too many other factors, as even HR total doesn't present clear correlation.

here's study I'd like to see but don't have the diligence to conduct - throughout the past 10 years or so, for teams at a given total HR level, is there a correlation btwn total runs scored and HR dispersion. however one of the downsides of being too concentrated wouldn't be accounted for here - that being if one of your top HR hitter goes down - cuz their stats wouldn't be factored in anyways.

Anyways I am heading to airport will try to check in from time to time over next 2 wks -