UZR, Scouting, and the Fans

After reading some discussions over at The Book blog about UZR and regression to scouting reports I thought it would be a good idea to use the fans scouting reports as a regressing factor for UZR.

My methodology was as follows: I binned players into groups based on their positional ranking within the scouting reports, and then calculating the weighted average of the UZR/150s of the players within the bins. The following table is the results using the data from 2007, 2008, and 2009. (Quick Edit, the below table is for SS only, sorry for any confusion)

Rank

AVG UZR

1-10

5.6

11-20

2.5

21-30

-1.5

31-40

-3.3

41-50

-3.9

51+

-9.1

At this point my methodology diverges, as I wasn’t sure which method I like better. Method 1 is to regress each individual season’s data based on the players rank that season to get a new seasonal UZR, and then weight across the 3 years of data. Method 2 is to weight across the three years of data and then regress using the most recent fans scouting report ranking (in this case the interim 2009 results).

Method 1 is clearly sensitive to the ebb and flow of the fans, and is also a little more dependent on those rankings since the UZR’s being regressed have a smaller number of defensive games associated with them. Method 2 does not create “single season” stats as some people would probably like, and it only uses the most recent fan’s ranking. Overall I think I prefer Method 2, but could be swayed either way. The following table lists the top 10 shortstops ranked by Method 2 (I really need a better name).

Rank

Name

3 year uzr

Method 1

Method 2

1

Omar Vizquel

18.4

10.1

11.8

2

Jack Wilson

11.1

8.0

9.3

3

Brendan Ryan

11.4

7.0

8.4

4

Cesar Izturis

9.0

6.1

7.7

5

J.J. Hardy

9.2

5.8

7.2

6

Elvis Andrus

8.3

7.1

6.8

7

Adam Everett

13.2

4.9

6.6

8

Erick Aybar

7.3

6.3

6.5

9

Jimmy Rollins

6.6

5.9

6.3

10

Paul Janish

11.9

6.9

5.8

and the bottom 10

Rank

Name

3 year uzr

Method 1

Method 2

43

Hanley Ramirez

-4.9

-3.3

-3.9

44

Stephen Drew

-5.2

-2.6

-4.1

45

Ramon Vazquez

-7.8

-4.5

-4.3

46

Alex Cora

-5.3

-4.3

-4.4

47

Luis Rodriguez

-7.9

-4.8

-5.0

48

Juan Castro

-16.6

-2.6

-5.2

49

Julio Lugo

-9.3

-5.3

-6.8

50

Khalil Greene

-10.4

-1.5

-8.0

51

Brendan Harris

-8.3

-7.9

-8.7

52

Yuniesky Betancourt

-12.3

-9.6

-11.4

A couple of quick caveats, if you read the comments on the above linked thread, I noted that defensive games at fangraphs looks a little messed up. Those going back to normal would likely change these results. Also, I didn’t do a great job of searching the blogosphere, so if this has been done before, I apologize for presenting it as a new methodology.

As far as data sources: UZR via fangraphs and the fan’s scouting report via tangotiger. As always, comments or suggestions are appreciated.

I’ve sort of done this before myself, using a different method. I took the scouting turned into runs for each year, weighted at a certain amount for each season, weighted each season between defensive metrics and scouting runs total, then regressed to 0 mean. After reading MGL today, I might try regressing the weighted three years, still including each season’s scouting reports, to the current season’s reports.

I might give this a shot for my team before trying anything greater than that, but that seems fair. Truthfully, I never had any issues with the methods I used, but with MGL saying the regression needs to go towards report instead of the mean, I’d like to see those numbers and if they pass the sniff test.

Hey … I stumbled on your website by mistake. I was searching in Yahoo for Antivirus software that I had already purchased when I came upon your site, I must say your website is really cool I just love the theme, its amazing!. I don’t have the time at the moment to fully read your site but I have bookmarked it and also will sign up for your RSS feeds. I’ll back in a day or two. thanks for a nice site.