Sunday, September 10, 2006

Minor League Park Factors, 2006

Thanks to Jeff Sackmann’s MinorLeagueSplits.com, calculating these was less of a Sisyphean ordeal than usual!

As usual, there are a lot of caveats when dealing with minor league park factors to remind people of, especially people who don’t typically use them (and may be more familiar with major league factors). You’ll definitely find them a bit more variable, and quite a bit more in some cases, than MLB park factors for a few reasons. This is why using multiple year park factors for minors is very valuable.

- Season length. This is less of an issue with full-season leagues (for example, 72 home games or so in AAA), but when you’re dealing with short-season leagues such as the NY-Penn League, we’re talking a very small sample of games in a single season.

- Player variability. The vagaries of how the minors works results in lots of team ability changes - if, for example, if Buffalo played a lot of home games after Kevin Kouzmanoff was promoted (I have no idea, this is an example), then it would make Dunn Tire Park look more hitter-friendly when it was. While you encounter this to a very slight extent in the majors, the number of top players moving from team to team is nothing compared to the minors.

- Park conditions. There simply isn’t the uniformity of park conditions amongst minor league parks that there is in the majors - things like lighting and field condition can be more of an issue for some teams than others.

Reader Comments and Retorts

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

There may be very few people who care about this other than me....but is there anywhere to find independent league park factors? i'd specifically be interested in the atlantic league if they're available.

catomi01, I care about indy PFs. And, no, I don't know where they are.
***
Has anyone done work to try to account for imbalanced scheduling with PFs? It could be a fair amount of work and often wouldn't lead to huge gains but it's worth pursuing. For example, I suspect that PGE Park's (Portland OR) always low PF are in part because a disproportionate number of their road games are in places like Colorado Springs and Salt Lake.
***
Why no triples, Dan? I'd guess because the data is so noisy and levels are low enough that they don't hugely impact the game, but parks do influence them more than other types of hits. (The follow up would then be do you integrate them with work on doubles, but first things first.)

Has anyone done work to try to account for imbalanced scheduling with PFs? It could be a fair amount of work and often wouldn't lead to huge gains but it's worth pursuing.

The problem you run into here is that, in addition to imbalanced schedules, you also have to deal with imbalances in talent due to when teams play. If a team plays all of its games against an opponent at home in the first half, and on the road in the second half, the quality of competition on both sides of the ball is likely to be markedly different due to personnel shuffles.

I think it would be better to combine doubles and triples to create an EBH factor, then apply it to doubles and triples. It probably doesn't make a whole lot of difference except in the odd case.
I used to do separate EBH, 3b, and 2b factors - estimating the # of XBH, then triples, then calculating doubles as XBH-3B. This, in retrospect, may have been overkill.

The problem you run into here is that, in addition to imbalanced schedules, you also have to
deal with imbalances in talent due to when teams play.
True - I'm just not sure that that's enough reason for someone not to do it. (Not meaning Dan - Lord knows he's working on enough other things.) Multiyear factors, for example, help with roster churn - they don't help with imbalanced scheduling.
If I've time today, I'll crank out a quick-n'-dirty estimate of how this issue affects Portland-PCL stats, since I used them as an example before.

Multiyear factors, for example, help with roster churn - they don't help with imbalanced scheduling.

Depends on how much the scheduling turns over year-to-year, I guess - when you play your opponents. If the intradivisional games and interdivisional games are always played in roughly the same rotation, it might not help all that much.

I'm not sure that roster churn is all that big an issue any more, though. Except for AAA, teams are showing more of a tendency to leave players at the same level virtually all season; where roster churn is happening, it's often with organizational soldiers who are moved around the organization to whichever team needs a fill-in for a couple of weeks.

Out of the 3253 hitters that got an at-bat in the minors this year, only 15 were in the double digits in triples.

The park factors are going to be so inaccurate with so few triples that we're not going to have enough confidence in them to take anything more than a small park factor seriously. And a small park factor for triples is going to have zero effect on the translates triples for almost every player.

Translated, I only have 2 players total that get 10 triples in the majors (Brooks Conrad and Eugenio Velez). Just to add 2 triples after rounding to these guys, you'd have to be confident of a real triple park factor of 1.30 or greater.

I shouldn't even bother applying the doubles park factor either. Of the 142 parks I've done, only 2 applications of double park factor would increase or decrease a triples total by more than 1 triple - if Conrad or Velez had played at Batavia.

Just for fun, I decided to examine the MLE difference between my MLE triples totals for minor leaguers and my MLE triples totals if I had used doubles park factor for triples. It would have changed 9 players by 1 triple.

Cliffs notes version: add 2% to PFs from the West, subtract it from the East.

The following has a lot of flaws/shortcuts (I said quick and dirty) and I don't trust my math today (under the weather) - so pick it apart more than usual.

***

The PCL plays a 144 game schedule, with 112 of those games coming against conference opponents (7 other teams, played 16 times - 8 home, 8 away). Of the remaining 32, half at home versus one division in the other conference (4*4), the other half on the road versus the other division.

In the following, I'm going to ignore the impact of alternating divisions (as well as time of year when they are played, as well as rainouts, etc...) and reduce the analysis to intra- versus inter-conference games. Also, I'm going to use reindexed three year run factors (reindexed: average of all team PF = 100 - needed primarily due to franchise movement - impact is about a 0.3 reduction per team).

No shock that more runs are scored in the west (Pacific) than in the east (American) - after all, many of the eastern teams were part of the now-defunct American Association (hence the goofy conference names) which featured scoring levels more in line with the then-IL than the then-PCL.

Luckily, the division thing doesn't probably mean too much on a year-to-year basis (actual difference between division is likely more than it appears here because of shortcuts, but ... whatever). So, the average reindexed PF in the west (Pacific) is 3 points higher than in the east (American) - what does this mean? Only two of every nine (well, 32 of 144) games is played outside the conference - with a balanced schedule (I'm going to say half of the 144 games, even though it would really mean 8/15 of the schedule, since you can't play yourself), this confers a "true" conference difference of about 7 points (1+(72/32*(1.016-1))=1.036 for the Pacific, 1+(72/32*(.984-1))=.964 for the American).

Here, then, would be the reconfigured park factors, where the west is adjusted upwards about 2% (1.036/1.016) and the east downgraded accordingly:

First off, triples are way less important (I think) than other issues - they're just interesting. That said:

Translated, I only have 2 players total that get 10 triples in the majors (Brooks Conrad and Eugenio Velez). Just to add 2 triples after rounding to these guys, you'd have to be confident of a real triple park factor of 1.30 or greater.
Well, the things about triples is that we care about the big league factors (which we have more information about) and we care about identifying minor league parks that produce a ton of them (of course, you then run into a chicken and egg question as you only care about this category when looking at a player who triples a lot), so we know to downgrade those players, but we don't care so much finding minor league parks that depress triples (most MLE systems deflate the heck out of these totals anyway - plus, a triple PF of zero would do no more than double our estimate).

I shouldn't even bother applying the doubles park factor either. Of the 142 parks I've done, only 2 applications of double park factor would increase or decrease a triples total by more than 1 triple - if Conrad or Velez had played at Batavia.
Sure - doubles factors are rarely huge.

The most important thing is recognizing that a park surpressed triple is likely a double and so forth - as was discussed above.

In Diamond Mind parlance, DMB and ZiPS gave Brooks an Average (while other sources I've seen suggest something closer to Fair). He's relatively surehanded - don't know his rep on the pivot.

***

Depends on how much the scheduling turns over year-to-year, I guess - when you play your opponents. If the intradivisional games and interdivisional games are always played in roughly the same rotation, it might not help all that much.
Ah, that is what you meant. Yeah - I'm a little worried about that as I suspect there's an effect there. That said, you can't control for everything, no matter how hard I might argue that people should.
If I still had my own SAS license and more free time, I'd love to run some regressions to control for all this stuff, but alas...

Actually, I don't do triples factors for the majors, either! :)
Well, I meant if you're gonna do 'em... :)

***

So I thought a little more about the sceduling and point in time issue and I'm going a different way with it. If schedules were perfectly static - then this wouldn't be something we'd want to control for at all, season would be an "attribute" of the city/park. It's only when the rotation changes that additional error is introduced to our estimate. Also, this is a concern with individual team PFs as well - for instance, I imagine that teams in the south are at home more often during cold months than more northerly squads.

William, good catch - the formulae for 2 of those cells are wrong (Tri-City and Vancouver are correct). The raw HR factor for Tucson is 0.90 and 1.03 for Tulsa. I'll have that updated in the files next couple of hours.

Vancouver is a large park with high walls. Now that it's short-season A instead of AAA, I'm surprised anybody ever hits one out there. When I used to go to AAA Canadians games, it was almost always a shock to see a HR hit.

Sorry to be so late with this post - aooears that I missed this the first time around,but I'm not sure what to think about BB/SO park factors. Is it really a funxtion of park or just a random variance that depends on the talent a given team decides to sign? While the numbers may vary, I fail to see how a park could actually be a determining factor. Has this been regressed against other more likely causes?