Primary Menu

Category Archives: Baseball

Three years ago, I wrote an article called “10 Things I Believe About Baseball Without Evidence“, in which I hypothesized that it ought to be possible to develop some sort of theory of pitch sequencing. To me, pitch sequencing is the very heart of the sport, the chess match between batter and pitcher which makes the sport compelling. But for all our progress in sports analytics in recent years, a theory of pitch sequencing — what it is, how it works, which pitchers are good at it, which batters can be fooled by it — seems as distant as ever.

In this article, I hypothesized (without evidence, as the title suggests) that such a theory would involve somehow understanding that the brain of the batter makes predictions for the next pitch based on previous pitches:

I believe that before any given pitch, the batter is in some sort of Prediction State for the next pitch. After each pitch, the batter then moves into a different Prediction State.

One year after I wrote this evidence-free idea, a piece of evidence came in which supported my hypothesis.

Jeff Hawkins and Subutai Ahmad, who work for a company called Numenta which is trying to reverse engineer the brain with computers, published in October of 2015 a paper called “Why Neurons Have Thousands of Synapses, A Theory of Sequence Memory in Neocortex”.

You can read a nice layperson’s summary of the paper here. But I’ll summarize the summary even further.

Memory in the brain consists of cells called neurons. These neurons have different parts, and one of these parts is called “distal synapses”. Up until this point, nobody really had a good idea what these distal synapses were for, because they didn’t seem to do anything while a particular memory was firing. Hawkins and Ahmad theorize that this is because the distal synapses don’t cause the neuron to fire immediately. Instead, they electrically prepare the cell to fire quickly if a signal comes in from a certain direction. And it is this preparation which allows the brain to make predictions about sequences of events. Relevant quote from the paper:

“Each neuron learns to recognize hundreds of patterns that often precede
the cell becoming active. The recognition of any one of these
learned patterns acts as a prediction by depolarizing the cell
without directly causing an action potential. Finally, we show
how a network of neurons with this property will learn and
recall sequences of patterns. The network model relies on
depolarized neurons firing quickly and inhibiting other
nearby neurons, thus biasing the network’s activation
towards its predictions.”

And herein lies the physical foundation of a theory of pitch sequencing. For if Hawkins and Ahmad are correct about sequential learning, it means that there is indeed some sort of Prediction State that the brain is in before each pitch.

Once the brain has seen some sort of sequence of inputs, it prepares itself to recognize that sequence again, and to recognize and react to it more quickly the next time it appears, by being electrically primed to react through this neuronal depolarization.

At this point, it’s important to understand that we’re not just talking about sequences of individual pitches here (a curve followed by a fastball followed by a changeup). It can be that, too, but not only that.

A single pitch in and of itself is a sequence of patterns happening that the brain needs to recognize. It’s a windup, and then a release, and then a ball movement out of the hand, and then a spin which one can perhaps recognize, and then a speed and a directional movement of the ball in one way or another.

Each of these patterns and sub-patterns and sub-sub-patterns that compose a pitch are represented in the brain at the neuronal level. As a batter observes sequences of (sub-)(sub-)patterns, the brain automatically prepares itself to see those sequences again by depolarizing the neurons to make them respond faster to these patterns. Thus, from the pitches it has seen in the past, the brain moves into a sort of Prediction State about the pitches it anticipates seeing in the future.

This has the effect, as Hawkins and Ahmad put it, of “biasing the network’s activation towards its predictions”. The batter’s Prediction State has a bias, and pitchers can exploit this bias. The brain is ready to react to some patterns, which it will react quickly to, but at the expense of inhibiting a reaction to other other patterns, which it will be slower to react towards.

So if you throw three fastballs with the same speed and the same location in a row, the batter’s brain will become more and more prepared/biased to predict that pitch accurately with each subsequent pitch, and the batter becomes more likely to hit the ball hard.

But if pitchers understand what the batter’s brain is biased towards, they can fool the batter by defying that prediction. Throw a changeup to the same location, but with a different speed, and you can make the batter swing too early. The wrong neurons get fired, and the ones that should have fired to hit the ball properly are instead inhibited by the bias, and the batter does the wrong thing.

They say that pitching is an art, and perhaps at this time it is, but there is potential in this information that it could eventually be turned into a science.

* * *

This information doesn’t explain everything about how the brain processes sequencing, obviously. It’s just a initial framework for understanding how the brain learns to understand sequences of events and to predict them. And since we don’t really understand exactly it works in that general case in the brain, we therefore also don’t understand how it works for the specific case of pitch sequencing.

So if we have unanswered questions about the brain like, “how long does this cell depolarization last?” we also have corresponding unanswered questions about pitch sequencing, like, “how long does a batter remain biased towards a kind of pitch once he has seen it?”

The good news is, we can probably answer the second question without necessarily answering the first. There is data that will tell us how much better a batter gets when he sees the same pitch multiple times, either in a row, or in close proximity. Understanding the basic framework of how the brain works can help us ask better questions about pitch sequencing, and to develop useful theories about how it works, even before the neuroscientists figure out precisely how it works in the brain.

As I was writing a letter to my third-grade daughter’s principal in support of a change in homework policy (a letter which I’ve posted here), it occurred to me I was making a point about a phenomenon that isn’t unique to education at all, but happens in a lot of other fields, too: baseball, business, economics, and politics.

I don’t know if this phenomenon has a name. It probably does, because you’re very rarely the first person to think of an idea. If it does, I’m sure someone will soon enlighten me. The phenomenon goes like this:

* * *

Suppose you suck at something. Doesn’t matter what it is. You’re bad at this thing, and you know it. You don’t really understand why you’re so bad, but you know you could be so much better. One day, you get tired of sucking, and you decide it’s time to commit yourself to a program of systematic improvement, to try to be good at the thing you want to be good at.

So you decide to collect data on what you are doing, and then study that data to learn where exactly things are going so wrong. Then you’ll try some experiments to see what effect those experiments have on your results. Then you keep the good stuff, and throw out the bad stuff, and pretty soon you find yourself getting better and better at this thing you used to suck at.

So far so good, eh? But there’s a problem. You don’t really notice there’s a problem, because things are getting better and better. But the problem is there, and it has been there the whole time. The problem is this: the thing your data is measuring is not *exactly* the thing you’re trying to accomplish.

Why is this a problem? Let’s a simplified graph of this issue, so I can explain.

Let’s call the place you started at, the point where you really sucked, “Point A”.
Let’s call the goal you’re trying to reach “Point G”.
And let’s call the best place the data can lead you to “Point D”.

Note that Point D is near Point G, but it’s not exactly the same point. Doesn’t matter why they’re not the same point. Perhaps some part of your goal is not a thing that can be measured easily with data. Maybe you have more than one goal at a time, or your goals change over time. Whatever, doesn’t matter why, it just matters they’re just not exactly the same point.

Now here’s what happens:

You start out very far from your goal. You likely don’t even know exactly what or where your goal is, precisely, but (a) you’ll know it when you see it, and (b) know it’s sorta in the Point D direction. So, off you go. You embark on your data-driven journey. As a simplified example, we’ll graph your journey like this:

On this particular graph, your starting point, Point A, is 14.8 units away from your goal at Point G. Then you start following the path that the data leads you. You gather data, test, experiment, study the results, and repeat.

After a period of time, you reach Point B on the graph. You are now 10.8 units away from your goal. Wow, you think, this data-driven system is great! Look how much better you are than you were before!

So you keep going. You eventually reach Point C. You’re even closer now: only 6.0 units away from your goal!

And so you invest even more into your data-driven approach, because you’ve had nothing but success with it so far. You organize everything you do around this process. The process, and changes that you’ve made because of it, actually begin to become your new identity.

In time, you reach Point D. Amazing! You’re only 4.2 units away from your goal now! Everything is awesome! You believe in this process wholeheartedly now. The lessons you’ve learned permeate your entire worldview now. To deviate from the process would be insane, a betrayal of your values, a rejection of the very ideas you stand for. You can’t even imagine that the path you’ve chosen will not get any better than right here, now, at Point D.

Full speed ahead!

And then you reach Point E.

Eek!

Egads, you’re 6.00 units away from your goal now. You’ve followed the data like you always have, and suddenly, for no apparent reason, things have suddenly gotten worse.

And you go, what on Earth is going on? Why are you having problems now? You never had problems before.

And you’re human, and you’ve locked into this process and weaved it into your identity. You loved Points C & D so much that you can’t stand to see them discredited, so your Cognitive Dissonance kicks in, and you start looking for Excuses. You go looking for someone or something External to blame, so you can mentally wave off this little blip in the road. It’s not you, it’s them, those Evil people over there!

But it’s not a blip in the road. It’s the road itself. The road you chose doesn’t take you all the way to your destination. It gets close, but then it zooms on by.

But you won’t accept this, not now, not after the small sample size of just one little blip. So you continue on your same trajectory, until you reach Point F.

You stop, and look around, and realize you’re now 10.8 units away from your goal. What the F? Things are still getting worse, not better! You’re having more and more problems. You’re really, really F’ed up. What do you do now?

Can you let go of your Cognitive Dissonance, of your Excuse seeking, and step off the trajectory you’ve been on for so long?

F is a really F’ing dangerous point. Because you’re really F’ing confused now. Your belief system, your identity, is being called into question. You need to change direction, but how? How do you know where to aim next if you can’t trust your data to lead you in the right direction? You could head off in a completely wrong direction, and F things up even worse than they were before. And when that happens, it becomes easy for you to say, F this, and blow the whole process up. And then you’re right back to Point A Again. All your effort and all the lessons you learned will be for nothing.

WTF do you do now?

F’ing hell!

* * *

That’s the generic version of this phenomenon. Now let’s talk about some real-world examples. Of course, in the real world, things aren’t as simple as I projected above. The real world isn’t two-dimensional, and the data doesn’t lead you in a straight line. But the phenomenon does, I believe, exist in the wild. And it’s becoming more and more common as computers make data-driven processes easy for organizations and industries to implement and follow.

Education

As I said, homework policy is what got me thinking about this phenomenon. I have no doubt whatsoever that the schools my kids are going to now are better than the ones I went to 30-40 years ago. The kids learn more information at a faster rate than my generation ever did. And that improvement, I am confident, is in many ways a result of the data-driven processes that have arisen in the education system over the last few decades. Test scores are how school districts are judged by home buyers, they’re how administrators are judged by school boards, they’re how principals are judged by administrators, and they’re how teachers are judged by principals. The numbers allow education workers to be held accountable for their performance, and provide information about what is working and what needs fixing so that schools have a process that leads to continual improvement.

From my perspective, it’s fairly obvious that my kids’ generation is smarter than mine. But: I’m also pretty sure they’re more stressed out than we were. Way more stressed out, especially when they get to high school. I feel like by the time our kids get to high school, they have internalized a pressure-to-perform ethic that has built up over years. They hear stories about how you need such and such on your SATs and this many AP classes with these particular exam scores to get into the college of their dreams. And the pressure builds as some (otherwise excellent) teachers think nothing of giving hours and hours of homework every day.

Depression, anxiety, panic attacks, psychological breakdowns that require hospitalization: I’m sure those things existed when I went to school, too, but I never heard about it, and now they seem routine. When clusters of kids who should have everything going for them end up committing suicide, something has gone wrong. That’s your Point F moment: perhaps we’ve gone too far down this data-driven path.

Whatever we decide our goal of education is, I’m pretty sure that our Point G will not feature stressed-out kids who spend every waking hour studying. That’s not the exact spot we’re trying to get to. I’m not suggesting we throw out testing or stop giving homework. I am arguing that there exists a Point D, a sweet spot with just the right amount of testing, and just the right amount of homework, that challenges kids the right amount without stressing them out, and leaves the kids with the time they deserve to just be kids. Whatever gap between Point D and Point G that remains should be closed not with data, but with wisdom.

Baseball

The first and most popular story of an industry that transforms itself with data-driven processes is probably Michael Lewis’s Moneyball. It’s the story of how the revenue-challenged Oakland A’s baseball team used statistical analysis to compete with economic powerhouses like the New York Yankees.

I’ve been an A’s fan my whole life, and I covered them closely as an A’s blogger for several years. So I can appreciate the value that the A’s emphasis on statistical analysis has produced. But as an A’s fan, there’s also a certain frustration that comes with the A’s assumption that there is no difference between Point D and Point G. The A’s assume that the best way to win is to be excruciatingly logical in their decisions, and that if you win, everyone will be happy.

But many A’s fans, including myself, do not agree with that assumption. The Point F moment for us came when, during a stretch of three straight post-season appearances, the A’s traded their two most popular players, Yoenis Cespedes and Josh Donaldson, within a span of six months.

When you have a data-driven process that takes emotion out of your decisions, but your Point G includes emotions in the goal of the process, it’s unavoidable that you will have a gap between your Point D and your Point G. The anger and betrayal that A’s fans like myself felt about these trades is the result of the process inevitably shooting beyond its Point D.

Business

If Moneyball is not the most influential business book of the last few decades, it’s only because of Clayton Christensen’s book, The Innovator’s Dilemma. The Innovator’s Dilemma tells the story of a process in which large, established businesses can often find themselves defeated by small, upstart businesses with “disruptive innovations.”

I suppose you can think of the phenomenon described in the Innovator’s Dilemma as a subset of, or perhaps a corollary to, the phenomenon I am trying to describe. The dilemma happens because the established company has some statistical method for measuring its success, usually profit ratios or return on investment or some such thing. It’s on a data-driven track that has served it well and delivered it the success it has. Then the upstart company comes along and sells a worse product with worse statistical results, and because of these bad numbers, the establish company ignores it. But the upstart company is on an statistical path of its own, and eventually improves to the point where it passes the established company by. The established company does not realize its Point D and Point G are separate points, and finds itself turning towards Point G too late.

Here, let’s graph the Innovator’s Dilemma on the same scale as our phenomenon above:

The established company is the red line. They have reached Point D by the time the upstart, with the blue line, gets started. The established company thinks, they’re not a threat to us down at Point A. And even if they reach our current level at Point D, we will beyond Point F by then. They will never catch up.

This line of thinking is how Blockbuster lost to Netflix, how GM lost to Toyota, and how the newspaper industry lost its cash cow, classified ads, to Craigslist.

The mistake the establish company makes is assuming that Point G lies on/near the same path that they are currently on, that their current method of measuring success is the best path to victory in the competitive market. But it turns out that the smaller company is taking a shorter path with a more direct line to the real-life Point G, because their technology or business model has, by some twist, a different trajectory which takes it closer to Point G than the established one. By the time the larger company realizes its mistake, the smaller company has already gotten closer to Point G than the larger company, and the race is essentially over.

* * *

There are other ways in which businesses succumb to this phenomenon besides just the Innovator’s Dilemma. Those companies that hold closely to Milton Friedman’s idea that the sole purpose of a company is to maximize shareholder value are essentially saying that Point D is always the same as Point G.

But that creates political conflict with those who think that all stakeholders in a corporation (customers, employees, shareholders and the society and environment at large) need to have a role in the goals of a corporation. In that view, Point D is not the same as Point G. Maximizing profits for the shareholders will take you on a different trajectory from maximizing the outcomes for other stakeholders in various proportions. When a company forgets that, or ignores it, and shoots beyond its Point D, then there is going to inevitably be trouble. It creates distrust in the corporation in particular, and corporations in general. Take any corporate PR disaster you want as an example.

Economics

I’m a big fan of Star Trek, but one of the things I never understood about it was how they say that they don’t use money in the 23rd century. How do they measure the value of things if not by money? Our whole economic system is based on the idea that we measure economic success with money.

But if you think about it, accumulating money is not the goal of human activity. Money takes us to Point D, it’s not the path to Point G. What Star Trek is saying is that they somehow found a path to Point G without needing to pass through Point D first.

But that’s 200 years into a fictional future. Right now, in real life, we use money to measure human activity with. But money is not the goal. The goal is human welfare, human happiness, human flourishing, or some such thing. Economics can show us how to get close to the goal, but it can’t take us all the way there. There is a gap between the Point D we can reach with a money-based system of measurement, and our real-life Point G.

And as such, it will be inevitable that if we optimize our economic systems to optimize some monetary outcome, like GDP or inflation or tax revenues or some such thing, that eventually that optimization will shoot past the real-life target. In a sense, that’s kind of what we’re experiencing in our current economy. America’s GDP is fine, production is up, the inflation rate is low, unemployment is down, but there’s still a general unease about our economy. Some people point to economic inequality as the problem now, but measurements of economic inequality aren’t Point G, either, and if you optimized for that, you’d shoot past the real-life Point G, too, only in a different direction. Look at any historically Communist country (or Venezuela right now) to see how miserable missing in that direction can be.

The correct answer, as it seems to me in all of these examples, is to trust your data up to a certain point, your Point D, and then let wisdom be your guide the rest of the way.

Politics

Which brings us to politics. In 2016. Hoo boy.

Well, how did we get here?

I think there are essentially two data-driven processes that have landed us where we are today. Both of these processes have a gap between what we think of as the real-life goals of these entities, and the direction that the data leads them to. One is the process of news outlets chasing media ratings. And the other is political polling.

In the case of the media, the drive for ratings pushes journalism towards sensationalism and outrage and controversy and anger and conflict and drama. What we think journalism should actually do is inform and guide us towards wisdom. Everybody says they hate the media now, because everybody knows that the gap between Point D and Point G is growing larger and larger the further down the path of ratings the media goes. But it is difficult, particularly in a time where the technology and business models that the media operate under are changing rapidly, to change direction off that track.

And then there’s political polling. The process of winning elections has grown more and more data-driven over recent decades. A candidate has to say A, B, and C, but can’t say X, Y, or Z, in order to win. They have to casts votes for D, E, and F, but can’t vote for U, V or W. They have to make this many phone calls and attend that many fundraisers and kiss the butts of such and such donors in order to raise however many millions of dollars it takes to win. The process has created a generation of robopoliticians, none of whom have an original idea in their heads at all (or if they do, won’t say so for fear of What The Numbers Say.) You pretty much know what every politician will say on every issue if you know whether there’s a “D” or an “R” next to their name. Politicans on neither side of the aisle can formulate a coherent idea of what Point G looks like other beyond a checklist spit out of a statistical regression.

That leads us to the state of the union in 2016, where both politicians and the media have overshot their respective Point Ds.

And nobody feels like anyone gives a crap about the Point G of this whole process: to make the lives of the citizens that the media and the politicians represent as fruitful as possible. Both of these groups are zooming full speed ahead towards Point F instead of Point G.

And here are the American people, standing at Point E, going, whoa whoa whoa, where are you all going? And then the Republicans put up 13 robocandidates who want to lead everybody to the Republican version of Point F, plus Donald Trump. The Democrats put up Hillary Clinton, who can probably check all the data-driven boxes more skillfully than anybody else in the world, asking to lead everybody to the Democratic version of Point F, plus Bernie Sanders.

And Trump and Sanders surprise the experts, because they’re the only ones who are saying, let’s get off this path. Trump says, this is stupid, let’s head towards Point Fascism. Sanders says, we need a revolution, let’s head towards Point Socialism.

And most Americans like me just shake our heads, unhappy with our options, because Fascism and Socialism sound more like Point A than Point G to us. I don’t want to keep going, I don’t want to start over, and I don’t want to head in some old discredited direction that other countries have headed towards and failed. I just want to turn in the direction of wisdom.

Once upon a time, about a billion years ago, life was simple. Everybody lived in the oceans, and everybody had only one cell each. This was quite a fair and egalitarian way to live. Nobody really had significantly more resources than anyone else. Every individual just floated around, and took whatever it needed and could find, and just let the rest be.

This golden equilibrium was how life did business for a couple billion years. There was no such thing as jealousy or envy, and as a result, everyone lived pretty happy lives.

At first, these multi-celled creatures were just kind of like big blobs of single-celled organisms, and didn’t cause a lot of problems. Everybody was still kind of doing the same job as everyone else, even if they had organized themselves into a limited corporation of sorts. Most other single-celled creatures just figured they were harmless weirdos hanging out together, and ignored them.

They could not have been more wrong. For once the multi-cell genie was out of the bottle, Pandora’s box could not be closed, and the dominos began to fall. This simple change may have seemed innocent at first, but little did the single-cells know that they were the first creatures on earth to fall victim to the innovator’s dilemma. The single-celled creatures were far too invested in the status quo to change, and consequently ignored the multi-cellulars as irrelevant, and did not realize until it was too late that the game had suddenly shifted.

Ok, look, I told y’all with the Cespedes trade that you can’t analyze an A’s trade of a position player without breaking it down by platoon splits across the whole lineup. But did any of y’all listen to me? No. Y’all are still trying to analyze Donaldson vs Lawrie as if they are single players on single teams instead of two players on two platoon teams with other players on the team. So stop that.

Now look, I’m gonna make this simple. I’m going to assume that both Lawrie and Donaldson will be equally healthy, and they’re roughly comparable defensive players. They may not be, but this is a quick and dirty exercise here, so bear with me. And I’m just going to use OPS, so I don’t have to make this story as long as the other Josh Donaldson story that’s coming later today.

See, Brett Lawrie is actually better than Josh Donaldson against RHPs. The difference is that Donaldson crushes LHPs, and Lawrie for whatever reason actually is worse against LHPs than RHPs. He was particularly bad in 2014. I do not know why.

So for the platoon team that plays 2/3s of the A’s games, the one against RHPs, the A’s lineup actually just got better.

* * *

So now we need to fix the 1/3 of the A’s games against LHPs.

Last year, one of the A’s primary 1B/DHs against LHPs was Alberto Callaspo. He was awful. The A’s have signed Billy Butler to replace him.

So the A’s are losing about .400 OPS points by downgrading from Donaldson to Lawrie vs LHPs, but they get back about .300 of those OPS points by upgrading from Callaspo to Butler.

So now all Billy Beane has to do find that extra .100 points of OPS against LHPs, and the math works. Maybe it will come just out of the fact that most players don’t have reverse splits last their whole careers, and Lawrie will actually bounce back and hit better against LHPs in the future. If so, QED.

* * *

Disclaimer: the above analysis does not mean I like this trade. I do not like this trade. That (much longer) explanation is here.

Well, here we are. The Giants won another World Series, while the A’s flopped in the playoffs yet again. I’m not one of those A’s fans who hate the Giants, but it’s starting to annoy the crap out of even me to see the Giants always succeed in the playoffs, while seeing the A’s always fail.

The A’s have had 14 chances in the last 14 years to win a game to advance to the next round of the playoffs. They have lost 13 of those 14 games. If the playoffs are truly a crapshoot, the odds of this happening are 1-in-1,170. (So it’s not technically always — they could have gone on to lose the 2006 ALDS against the Twins, too, which would have made them 0-for-16, with an unlikelihood odds of 1-in-65,536. So if you want to look on the bright side, things could be 56 times worse than they are.) And in a crapshoot, the odds of the Giants winning 10 playoff series in a row, as they have now done, is 1-in-1024.

So if you’re an A’s fan who hates the Giants, and who believes that the playoffs are a just crapshoot, you’ve been struck with a series of unfortunate events that had literally less than a 1-in-a-million chance of happening.

Sabermetrics has come up with no good explanation for it except to say, well, these things happen about once every thousand times, or once every million times, sorry A’s fans, it just happened to be your turn to hit that unfortunate lottery, and it’s just bad luck. Oh, and you have a crappy stadium that’s falling apart and a team ownership and a local government who all seem too incompetent to do anything about it unlike those guys across the bay, sorry about that, too, gosh you guys are unlucky, tsk tsk tsk.

Which is just a deeply, deeply unsatisfying answer. If you have an ounce of humanity, you will reject that explanation, and ask the obvious question.

And to answer that question, the sabermetrician dives into the numbers, and pulls some out numbers with some number-pulling-out tools, and finds nothing to report. Nope, no evidence here of anything, so it must just be bad luck.

To which I ask: what if the reason the number-pulling-out tools can’t find any cause for the problem is because those number-pulling-out tools themselves are the problem?

I have no evidence of that. But it’s something I believe might be true, even though I can’t prove it.

* * *

I have a number of these beliefs–or hypotheses, if you will–about baseball, but I’ve mostly kept them to myself because of this lack of evidence. What the hell do I know, anyway? Who am I to pontificate? And why bother spouting these theories when I can’t defend them with evidence? So I just keep my mouth shut.

But I got a little bit of self-confidence in my belief system when Robert Arthur of Baseball Prospectus took one of my hypotheses (that injured A’s in the second half of 2014 had begun cheating on fastballs, making themselves vulnerable to offspeed pitches) and found evidence to support it ($):

The overall pattern of changes is beautifully consistent with Ken’s theory…

It’s very satisfying to find that the data supports one’s theory!

But I didn’t just come to this particular hypothesis that Mr. Arthur investigated out of thin air. This hypothesis arose out of a deeper foundation of hypotheses that color the way I look at baseball. I want to put all those hypotheses out on the table now, lack of evidence be damned. And maybe someone (maybe me someday, if I ever find the time and energy and resources and willpower to do so, which hasn’t happened yet) will take those hypotheses and invent the technology needed to find the evidence to support it.

So let’s put it out there.

* * *

Belief Without Evidence #1. A technological Sapir-Whorf hypothesis

The Sapir-Whorf hypothesis, a/k/a the Linguistic relativity principle holds that the language that a person speaks influences the way a person conceptualizes their world. The obvious example of this is that people have trouble distinguishing between colors if their language does not have a word for that color.

To a certain extent, I believe this hypothesis. Being fluent in both Swedish and English, I know there are certain concepts, such as the difference between belief in an opinion and belief in a fact, where the Swedish language makes clear distinctions (tycka and tro) and English does not. English speakers spend ridiculous amounts of time arguing about these things, and Swedes simply don’t need to. It’s not that English speakers can’t conceptualize the difference between opinion and fact, but doing so is way more difficult in English, because the word “belief” in English is quite fuzzy, whereas in Swedish, the language makes it simply impossible to confuse the two.

I touched upon this in my essay in the 2014 Baseball Prospectus annual, that I believe a similar concept applies to the technology we use. The reason statistical analysis began to influence the way we conceptualize baseball in the 1990s is not because human beings suddenly became smarter in the 1990s. There were statistically informed people who suggested such analysis almost a century earlier. It happened in the 1990s because the price of the technology needed to perform such analysis had finally became reasonable.

The predominant technology we use to perform such analysis is SQL, which is the primary language used to query relational databases. SQL and relational databases are technologies which are built upon set theory. A set is basically an unordered collection of objects.

And this is where I believe that a technological Sapir-Whorf hypothesis applies to baseball. Practically all of our analysis of baseball statistics treats its data an unordered collection of baseball events: pitches, plate appearances, games, series. Standard baseball analysis (the public kind anyway, who knows what is being done inside these organizations) treats its data that way because that’s the way SQL treats its data. The available technology guides our conceptualization of the world. And that leads us to my second hypothesis:

Belief Without Evidence #2. Baseball events are NOT unordered

For any batter to hit a ball, the batter needs to predict where the ball is going to be before it reaches the bat. There are two different mechanisms for this prediction.

First, there is a conscious prediction. The batter may decide, consciously, based on some sort of rational analysis, that he is looking for a fastball down and in, and wants to swing at only a pitch in that location that he can pull.

But once the pitcher releases the ball, this kind of conscious prediction mechanism is far, far too slow to be of any use. At this point, everything is turned over to a much faster, subconscious, automatic system to predict the actual flight of the ball, and to send the muscles in motion to meet the ball.

My thoughts here are heavily influenced by Jeff Hawkins‘ book On Intelligence, which lays out a framework for how this automatic system in the brain works as a memory-based prediction machine.

Order matters in baseball, because this automatic prediction mechanism has a strong recency bias. (A conscious prediction might not have a recency bias if truly rational, but how often does a batter perform a purely rational analysis at the plate?) The speed, location and movement of the most recent pitch will affect the brain’s automatic prediction of the speed, location and movement of the next pitch. The more recent a pitch, the more it affects the automatic system’s prediction for the next pitch.

Pitch sequencing, therefore, is at the heart of the very sport of baseball, yet it is woefully understudied in current public analysis, because our tools, based on a foundation of unordered sets, are woefully bad at processing and studying sequenced events.

There is a whole industry now dedicated to the statistical analysis of baseball using these set-based SQL tools. But SQL does not have a recency bias clause in its syntax that you can apply to a query. Because these tools don’t handle the ordered data well, they basically ignore The. Very. Core. of the sport: the sequencing battle between pitcher and batter.

Let me say that again: statistical analysis (that we in the public are aware of) takes the most important element of the sport, and ignores it.

It’s like having Newtonian physics without relativity and quantum mechanics. There’s a lot you can do with Newtonian physics, but at the extremes, it begins to break down, because it is ignoring some deeper, more fundamental truths.

If you’re a team that relies on constructing its roster using such statistical analysis, what mistakes are you making by ignoring the most important part of the game?

And not vice versa. Things like platoon splits and home field advantage are not Constants of the Universe like the speed of light or the Planck-Einstein relation. The arise from more fundamental truths about human anatomy and psychology.

For instance, once I got in an argument in which I did not believe that Sean Doolittle pitched better to certain catchers than others. The stats did not agree with me, albeit perhaps with a small sample size. But my objection wasn’t to the numbers, adequate sample size or not, it was to the lack of any sort of underlying physical/psychological mechanism where this these numbers could derive from. Sean Doolittle throws 90% fastballs. What the hell difference physically/psychologically does it make what catcher is back there catching it? It’s the same pitch, no matter who is catching it.

I do not consider a sabermetric truth to really be a truth unless there is a biomechanical/psychological foundation upon which that truth can rest, and from which that truth is capable of being derived.

Belief Without Evidence #4. Pitches are paths between states in a Prediction State Automaton

First, a little explanation of automata:

Automata theory is used in computer science to study states. For example, you can look at baseball as an base/out automaton, where before each plate appearance, the base/out combination is in one “state”, and in another “state” after the plate appearance. There are rules that tell you what possible states you can be in before and after a plate appearance.

So, at the beginning of an inning, the baseball base/out automaton is in a {Nobody on, 0 out} state. After the first plate appearance, you will be in one of five possible states:

You can’t, after the first appearance, reach a state where there are two runners on or two outs. You have to go to an intermediate state first. There are exactly 24 possible states you can have in this automaton. Each state in this automaton is a two dimensional {base, out} object. And from any of these 24 possible states, there are a limited, finite number of possible following states.

The “automaton” then, defines the what possible states can exist, and the rules by which you can move from one state to another.

Got it?

OK, now to the thing I believe without evidence: I believe that before any given pitch, the batter is in some sort of Prediction State for the next pitch. After each pitch, the batter then moves into a different Prediction State.

I don’t have a clear belief on exactly how many dimensions these Prediction States have. Maybe the Prediction State has three dimensions it:

1. Whether to swing
2. When to swing
3. Where to swing

Or maybe these Prediction States are much more complex, combining the above three states with specific kinds of pitches and movements and locations. It may be expressed by something like this, for example:

Or whatever. I don’t really know as what the parameters for these Prediction States should be. Is it {pitch type, in/out, up/down, movement/straight, fast/slow} or some other combination of pitch attributes? I don’t know.

And to what extent are these prediction automata more or less universal, or does each batter have his own unique automaton with its own unique rules? Again, I don’t know.

But I do know that if I were to build a technology for analyzing baseball, this is where I would begin, right at the core of the game, the engine that drives the sport: what pitch the batter is expecting from the pitcher, and what happens when the pitch he gets conforms or deviates from that expectation.

In order to unite the quantum and Newtonian versions of baseball analysis, the biophysical and the statistical, any Grand Unified Theory Of Everything Baseball must, in my belief, have some way to handle the Prediction State of the batter.

Belief Without Evidence #5: The quality of a pitch is a function of its speed, location, and movement, and also of the batter’s swing and prediction state

There are a few pitchers, like Aroldis Chapman, who can throw a pitch with such high-quality speed that the location, movement, and prediction state are rather irrelevant. And there are some, like Mariano Rivera, who have such a combination of high-quality location and movement that the speed and prediction state don’t matter much. With pitchers like that, the batter can predict perfectly what pitch he’s going to get, and still not hit it.

But most pitchers do not possess such a high-quality pitch that they can be predictable and get away with it at the Major League level. They need to manipulate the prediction state of the batter in order to succeed.

The less a batter is expecting a certain pitch, the less likely he is to make good contact. But pitching is not just a function of being unpredictable: the pitcher must balance what the Prediction State of the batter is and the batter’s ability to hit it, with his ability to also throw a pitch with good speed, location, and movement.

The complex nature of that 5-dimensional object ( {speed, location, movement, swing, prediction state} ) is what makes baseball so fascinating from pitch to pitch.

So for each pitch, the pitcher wants to:

1. Choose a pitch the batter is likely to predict incorrectly
2. Choose a pitch the pitcher is likely to throw with good speed, location, and movement
3. Choose a pitch which will result in a suboptimal swing path, resulting either in a miss or weak contact
4. Choose a pitch which, if not put in play, worsens the batter’s Prediction State for the next pitch

Belief Without Evidence #6: The quality of an at-bat is a 3-dimensional function

Those three dimensions being:
1. Getting a good pitch to hit
2. Hitting a ball hard when you do
3. Hitting a ball hard if you don’t.

A good pitch to hit is a pitch that (a) he is successfully predicting, and (b) he can get a good swing on. Whether he can get a good swing on a particular pitch depends on what his swing path is.

And again, there are two kinds of predictions: the automatic subconscious one where the batter just reacts to a pitch, and a conscious one where the batter decides beforehand to look for a certain pitch and ignore all others. And the count plays a big role whether the batter can take an approach to consciously look for a particular pitch, or whether he should (with two strikes especially) just let his subconscious react to whatever comes in there.

On the subconscious level, the more the pitcher keeps throwing the same pitch, the more the batter predicts that pitch accurately, and the more likely the batter is to hit that pitch. When pitchers talk about “establishing the inside fastball” for example, this is what they mean: to change the Prediction State in such a way that an inside fastball becomes part of the Prediction State, and thereby necessarily reduces the expectation of a different pitch in the future.

Just because a batter gets a pitch he is predicting, does not mean he will hit it. Most batters have some kind of hole in their swing. Some batters prefer high pitches, others low. Some are vulnerable inside, and others can’t hit the outside pitch well. Some can hit a fastball, but can’t time an offspeed pitch. Others have a slow bat speed and struggle with fastballs, but feast on the slower pitches.

So for each pitch, the batter wants to:
1. Predict a pitch correctly
2. Swing at a pitch that if it lets him approximate his optimal swing path
3. Take a pitch if it would cause a suboptimal swing path (unless 2 strikes in zone)
4. Take pitches out of the zone to move to a better Prediction State for the next pitch
5. If in a 2-strike situation, make contact (foul or fair) on a pitch in the zone

In a vast sea of unordered pitches from an unordered group of pitchers, you will get a randomly-distributed plethora of good pitches to hit, so the numbers will all work out in the end. So you acquire hitters based on these vast seas of data, ignoring what the batter does with difficult pitches to hit, because in the long run, they don’t matter much.

But against a good pitcher on a good day who does not give you a good pitch to hit, what do those batters do? Do they hit a ball hard if they don’t get a good pitch to hit?

To me, the biggest difference between the A’s in the playoffs and the Giants in the playoffs is Pablo Sandoval. Because there may not be anyone in baseball right now better than Sandoval who does damage even when he does not get a good pitch to hit. He can turn pitches in the dirt, in his eyes, and/or six inches off the plate into a hit. He’s almost immune to prediction state manipulation by opposing pitchers. And Hunter Pence, though not as extreme as Sandoval, has similar characteristics.

The A’s simply do not pursue those types of players. Players like Sandoval tend to have low OBPs, because they swing at so many bad pitches. Minor leaguers with that profile flop far more than they succeed, so they’re a bad risk to take. But there are times, against a good pitcher on a good day who is simply not giving hitters a good pitch to hit, that it is valuable to have a player who often does damage even with a bad pitch to hit. And those times happen more often in the playoffs.

A technology that used a system of evaluating players in which high-level statistics of player value were derived from a low-level {speed, location, movement, swing path, prediction state} matrix would better identify the true value of such players.

Belief Without Evidence #8: Diversity is Good for Batting Lineups

This belief is related to the belief about the definition of the quality of a pitch, and to the belief of a biomechanical/psychological foundation to all of this. A lineup with too many batters with similar strengths and weaknesses can make it easier for a pitcher to settle into a psychological/mechanical rhythm and mow down such a lineup. A lineup that is diverse (some hit fastballs, some like it inside, or low, some slug, others make contact, etc.) makes a pitcher have to change his approach from at-bat to at-bat. That forces the pitcher to have to make a variety of quality pitches in order to win. It’s harder for a pitcher to win if he has to have multiple pitches working well.

So when I praised the Giants for having Pablo Sandoval, I did not mean that an entire team of hitters like Pablo Sandoval would be ideal. But having one or two guys like him in a lineup with some more patient-type hitters is a good thing.

Belief Without Evidence #9: A lineup without holes scores runs exponentially, not linearly

This is probably the easiest of my hypotheses to disprove. But I have the gut feeling that one guy who is an automatic out in the middle of a lineup can take a rally that might score five runs and drop that rally down to 0 or 1 runs.

I think we saw this play out with the 2014 Oakland A’s. At the beginning of the year, everyone in the lineup was healthy and hitting somewhat near or above expectations. The A’s were just killing it in the pythagorean win column, because they’d get a rally going and that rally would just keep going and going.

But then Josh Donaldson and Brandon Moss started having some nagging injuries, and Moss in particular became pretty much an automatic out for a month or two. Those five-run rallies, once plentiful, almost instantly disappeared. Every rally seemed to be killed by a terrible at-bat in the middle of it.

Almost every team has a hole in the lineup at any given time, someone who is slumping for whatever reason. So for most teams, run scoring appears to be linear. But in those rare cases when everyone is clicking at the same time, their run scoring graph turns like a hockey stick and shoots upward.

The A’s success early in the year depended on the lineup being holeless, and when holes appeared, the whole thing collapsed back from exponential scoring into linear.

Belief Without Evidence #10: A’s fans are magical elves

I’ve been playing in my mind lately with the idea that A’s fans are like the house elves in the Harry Potter stories.

We exist so that others may abuse us. The greatest triumphs of others often comes at our expense. We dress in ratty clothing (stadium). Yet despite this constant abuse, we are fiercely loyal to our master. We attack viciously anyone who dares attack our master. We perform magic (great stadium atmosphere) on their behalf, no matter how awful our masters treat us in return.

If ever we were given clothing (a new stadium) by our master, we would be free of our bondage. Some, like Dobby, desire this, but others would not know what to do with themselves with freedom and wealth. It would ruin the very essence of their being.

I used to be like Dobby, longing for the freedom that a World Series victory and/or a new stadium would bring. But now, I am beginning to feel that the other elves are right — that it is wrong to support S.P.E.W. and long for something that would destroy who we are.

We are meant to suffer, so that other wizards may have their glories. We are elves. Let us be that we are and seek not to alter us.

I believe the evidence is clear enough to tell us this much: We were created not by a supernatural intelligence but by chance and necessity as one species out of millions in Earth’s biosphere. Hope and wish for otherwise as we will, there is no evidence of an external grace shining down upon us, no demonstrable destiny or purpose assigned us, no second life vouchsafed us for the end of the present one. We are, it seems, completely alone.

In Sophocles’ play Oedipus the King, the title character hears a rumor that he may not be what he thinks he is: the son of Polybus and Merope, the King and Queen of Corinth. Polybus and Merope deny the rumor, but Oedipus seeks external confirmation, and visits the Oracle at Delphi. The oracle ignores his question, and instead prophecies that he will kill his father and wed his mother.

Oedipus has no evidence he is not his parents’ son. He has no evidence to suggest he will eventually kill Polybus and marry Merope. But the latter is a much bigger problem than the former, so Oedipus ignores the first small problem and acts on the second, leaving Corinth forever, so as to avoid this horrible fate. He then proceeds to live his life as if he had solved his problem. And, of course, because this is a Greek tragedy, he hadn’t.

Rumors are not facts. Prophecies are not proven theorems. Yet it is not true that Oedipus had no evidence that he was not his parents’ son. He had the rumor. He had the prophecy. In a Bayesian sense, he should have considered the odds of his being adopted having increased from 0% before hearing the rumor and the prophecy, to what–1%? 10%? 25%?–afterwards.

The odds being less than 50%, however, the logical thing for Oedipus to do when faced with any given binary decision is to act as if the rumor was false. That’s the choice that gives him the best odds of succeeding, based on the information he has.

Hubris is extreme pride and arrogance shown by a character that ultimately brings about his downfall.

Hubris is a typical flaw in the personality of a character who enjoys a powerful position; as a result of which, he overestimates his capabilities to such an extent that he loses contact with reality. A character suffering from Hubris tries to cross normal human limits and violates moral codes.

Is it extreme pride and arrogance to make the most logical decision? If so, then the human condition is tragic no matter what decisions we make.

If we choose with the odds based on the best information we have, we risk making a catastrophic decision because we lacked a critical piece of data. If we choose out of rumor and superstition and fear, we risk living a life where bad decisions compound themselves with every choice we make, and we end up living a suboptimal life.

The more successful we are, however, the more likely we are to make the catastrophic decision that results in a classical, Greek-style tragedy. With every successful decision we make, the less likely it is, in a Bayesian sense, that we are lacking that critical piece of information, and the more likely it is, in a Bayesian sense, that our decision-making process is sound.

If you have a decision-making algorithm, and you’re 50% sure it’s good, and then you test it, and it works, now you’re, what–51%? 55%? 60%?–sure that it works. Test it again and it works again, and the odds rise again. Eventually, if you reach the top of a hierarchy and stay there, you get really confident that you know what you’re doing. You’re the king!

Hubris, then, is the logical result of success. In every form of competition, somebody has to reach the top. The closer to the top you get, the more likely it is that you think your success is because of your knowledge and your decision-making process. The more you become certain that your data and your process are sound, the more you should logically make bigger and bigger bets based on that data and that process. And because of those bigger and bigger bets, the harder you will fall if and when it turns out that your data and/or your decision-making process was flawed.

But if you look at the impact those trades have on this particular team’s offense, it’s negligable. Offensively, the numbers tell us that losing Cespedes is no big deal.

If you look at Yoenis Cespedes statistically, there’s no real evidence that trading him would hurt the A’s very much. His numbers are mediocre, and easily replaced.

But looking back on the trade now, it feels like the A’s and their fans were focused on the wrong prophecy. The prophecy that a superstar ace pitcher was the missing piece to Moneyball. The significant rumor, the important piece of Bayesian evidence that we ignored was this: that the 2012-14 A’s team was not a product of Billy Beane’s genius. That this team played like complete and utter crap for five years, and then Yoenis Cespedes showed up, and it suddenly and immediately became good. That for 2 1/2 years, when Cespedes was in the lineup, the team played well, and when he was out of the lineup, the team played like crap, regardless of how well Cespedes was playing.

And then Beane, in his moment of hubris, trusting the logic and the data and the decision-making process that had made a best-selling book and a Hollywood movie of his life and had seemingly landed him in first place for 2 1/2 years, traded Cespedes away, and the team reverted immediately to playing like complete and utter crap again.

But it’s there. It exists. It hurts to look at it. And it has all of us A’s fans wanting to poke our eyes out.

The gods hate us. They want to punish us for our pride and arrogance.

And you may say, gods are superstitious nonsense, that there is no evidence of an external wrath raining down upon us, no demonstrable cruel destiny or fate assigned us, no eternal Sisyphean existence vouchsafed us for the end of the present one.

And that’s true. There is no evidence for the existence of God, or gods. Except for the small, annoying, persistent rumor that at this particular point in time, we are here.

The Oakland A’s made a huge trade yesterday, sending their biggest name, Yoenis Cespedes, and a draft pick to the Boston Red Sox for Jon Lester and Jonny Gomes. They also made a smaller trade, sending Tommy Milone to the Minnesota Twins in exchange for Sam Fuld. Of course, the sports world was abuzzfrom the Cespedes trade, whichstunnedmany.

A couple of things left me unsatisfied about the reactions I’ve seen of the Cespedes trade. One is an old idea, expressed in Moneyball back in 2002: you don’t try to replace Giambi/Cespedes with one player, you replace him with other players in aggregate across the roster. The other a newer idea: is that the A’s platoon so much, that you can’t just analyze A’s players as atomic units. You can’t just say X is a 5 WAR player and Y is a 2 WAR player, and X – Y = 3 WAR. You have to break them down into their platoon split components, because the A’s use platoons far more efficiently than is baked into most of these formulas.

For example, if you look at Jonny Gomes as an atomic unit, he has suffered a severe decline this year. He’s hitting .234/.329/.354 this year, a far cry from the .262/.377/.491 he hit with the A’s in 2012, and in no way close to being able to replace Cespedes’ production. However, if you break Gomes down into platoon splits, you can see that his decline is entirely against right-handed pitching, where he is hitting a godawful .151/.236/.258 this year. Against left-handed pitching, however, he is still hitting a very healthy .302/.400/.431. A’s manager Bob Melvin is a master at getting the platoon advantage for his players, so we can bet we won’t see much of Jonny Gomes against RHPs.

So what I want to see is an analysis that really looks at the A’s as two teams: one team against RHPs which plays 72% of the time, and another team against LHPs which plays 28% of the time. Let’s look at those teams before and after the trade, and see how much the trades affected those two teams, even if we calculate these things in a kind of quick and dirty fashion.

To do that, you need to project performance by splits, which isn’t easy to find. PECOTA has a Marcel-like calculation called “Platoon multi”. Dan Szymborski pointed me to a platoon projection spreadsheet he created for his ZiPS projection. So I took that pre-season projected data, and combined it with their 2014 performance in a spreadsheet, to create a rest-of-season projection. (Okay, that wasn’t so quick, so the rest of this will be kind of dirty. We don’t have to be precise here, we just want a ballpark understanding of what’s going on.)

There’s another complicating factor here, in that the A’s currently have three players who are injured: Coco Crisp, Craig Gentry, and Kyle Blanks. Plus, Stephen Vogt has an injury that prevents him from catching, but not playing 1B or OF. So we’re going to run one set of numbers assuming everyone is healthy, and another assuming these injuries. Here are the best-hitting lineups (not by batting order, but sorted by GPA, from best player to worst). We’ll make removed (traded or optioned) players red, and added players blue.

Estimated runs per game, new lineup: 5.266Estimated runs per game, old lineup: 5.218

The offense improves vs LHPs, because Gomes is actually slightly more productive than Cespedes, thanks to his high OBP. The defensive effect is that Moss gets moved from DH into the outfield, because he’s a better fielder than Jonny Gomes, but not a better fielder than Cespedes.

Estimated runs per game, new lineup: 4.810Estimated runs per game, old lineup: 4.841

Losing Cespedes against RHPs has a more noticeable effect. Gomes and Cespedes are equivalent players vs LHPs, but the gap between Cespedes and his replacement against RHPs, Derek Norris, is larger, and creates a slight loss of runs per game. It also shifts Vogt and Moss around defensively to get Norris into the lineup.

Estimated runs per game, new lineup: 5.023Estimated runs per game, old lineup: 4.852

Yeesh, those are some atrocious OBPs at the bottom of the lineup with these injuries, because LH batters Vogt and Reddick are forced into the lineup against LHPs. Fuld is also a LH batter, but he has a weird reverse platoon split in his career; he’s actually been better vs LHPs than RHPs. Like with the healthy group, going from Cespedes to Gomes is a slight upgrade against LHPs; but the upgrade from Burns to Fuld is enormous.

Estimated runs per game, new lineup: 4.685Estimated runs per game, old lineup: 4.708

The main effect here is that Fuld gets Cespedes’ at bats, and that Reddick can move back to right field. But without the Fuld trade to complement the Cespedes trade, Sogard would be getting Cespedes’ at bats, and you’d have an awful outfield of Moss-Reddick-Vogt with Callaspo at 1b. Yeesh. You’re going to lose some offense, but that defensive alignment would probably kill you. I suspect that avoiding that defensive alignment alone is probably justification for trading Milone.

So let’s take those estimated runs per game, and extrapolate them over 162 games, and assume the average split of 72% RHPs and 28% LHPs, and combine those two split-handed teams into one team again, leaving us with just a healthy team and an injured team.

Of course, the injured team is not as good as the healthy team, and will be scoring fewer runs than the healthy team. But to analyze the trades, we don’t need to know the raw totals, we really only need to know how much the trades change the run scoring.

The healthy team loses 3.6 runs vs RHPs in the trades, but gains 2.2 runs vs LHPs, for a total loss of 1.4 runs over a whole season. It’s practically no loss of offense at all.

The injured team loses 2.7 runs vs RHPs in the trades, but gains 7.8 runs vs LHPs, for a total gain of 5.1 runs over a whole season. Most of that gain is from playing Fuld over Burns (vs RHPs) and Reddick/Vogt (vs LHPs).

Let’s say these three injured players are going to miss one-third of the remaining games to play. Multiply that 5.1 by one-third, and the -1.4 by two-thirds, and what you end up with is actually a slight gain (0.25 runs over the rest of the season), albeit so small that it is practically a wash.

The trades felt like a shock to many of us. On the surface, losing Cespedes’s sexy bat hurts, and trading a decent starting pitcher like Tommy Milone for a fourth outfielder seems like a waste. In a vacuum, that is true. But if you look at the impact those trades have on this particular team’s offense, it’s negligable.

Offensively, the numbers tell us that losing Cespedes is no big deal. And if everyone is healthy, trading for Fuld is a waste, because he wouldn’t play. But not everyone is healthy, especially in CF, and so Fuld is essential to keeping the offense at the level it would be without the trades.

So basically, we can consider the offense a wash. Now we can move on to analyzing the effect these trades have on the A’s defense and pitching. But I’m leaving that as an exercise for the reader. I’ve done enough for today.

Grant Brisbee has a fun series over on SB Nation where he ranks MLB stadiums by how well they make home runs look impressive. Surprisingly, he ranks the Oakland Coliseum 13th. It gets that high ranking because the various levels of Mount Davis provide a good contrast between a mediocre home run, and a towering one. When someone crushes one at the Coliseum, you can tell it’s crushed because it lands in the 2nd deck (down the line) or hits off the luxury boxes in center field.

That’s fine and all. I suppose it’s good that Mount Davis has some redeeming feature. But there are far more mediocre home runs than monster ones, and it’s what the current version of the Coliseum does to those wimpy home runs that I hate.

Hate hate HATE.

Really, there is nothing I hate more about the Coliseum than the placement of the outfield walls. Nothing. Not the troughs, not the sewage, not the crap we A’s fans have to take from other fans teams about the troughs and the sewage, not the 8th-inning Call Me Maybe, not even Mount Davis itself. I hate the placement of the outfield walls more than all of those things.

Except at the foul poles, there is no logic to the outfield walls at all. None. Look at the fence at any point between the foul poles. Why is the fence there? Why is it that height? No reason at all, really.

And worse than that, what really drives me bonkers about it is this: any EVERY point from pole to pole, if you hit the ball just barely over the fence, it DOES NOT LAND IN A SEAT.

Home runs should land in seats. Or if not IN seats, then OVER seats. Period.

* * *

Ok, Ken, you’ve been made Dictator of the Oakland Athletics for a day, and you can change one thing and one thing only. Give us your plan.

OK, I’m going to assume the A’s will sign a rumored 5-10 year lease extension, and are therefore planning to stay at the Coliseum awhile. This may be putting lipstick on a pig, but nonetheless, let’s make it a better place to watch a ballgame.

First of all, do you know why there is so much foul territory in Oakland? The story goes, as former A’s broadcaster Monte Moore use to tell, that the third deck had obstructed views of home plate because of its slope, so they had to move home plate further out than they planned.

I don’t know if that’s true or not, but let’s say that it is. Well, guess what? We’re not using the 3rd deck anymore. It’s (mostly) tarped off. So why is home plate still pushed out so far?

We’re going to put home plate back and the foul poles back to where they originally were supposed to be. Then we’re going to use the extra eight feet or so we gain to add some seats in front of the current bleacher seats. What we end up with is (a) an outfield configuration where, except for at the stairs, every home run lands in or over a seat, and (b) every seat in the main seating bowl is suddenly about two rows closer to the action, in a way that (c) shouldn’t cost ridiculous amounts of money to implement.

Here’s what it looks like with the new configuration in left field, and the old configuration in right field (click image for larger version):

Let’s look at this in more detail:

1. We’re moving the foul poles over about 6-7 feet, so that there’s only about 1 foot between the pole and the foul line seats. This pushes home plate back about eight feet or so, thusly:

2. The wall nearest to the foul poles is about 2-3 feet shorter than the seats, and begins to angle away from those seats as you move more towards center field. We’re fixing this. The walls go all the way up to the seats, and hug the seating section all the way. No more balls that land over this fence, but fall short of the seats. Compare the new and old corners:

3. We’ll get rid of that stupid idiotic ledge above the out-of-town scoreboard. With home plate being pushed about 8 feet back, we have room to add two or three extra rows of seats, and still keep roughly the same distance from home plate as before.

I don’t know if we keep a scoreboard there or not. If you give free wifi throughout the stadium instead, you probably don’t need it.

I cut and pasted Fenway’s Green Monster seats here, to show you don’t need to add seats identical to the other bleacher seats. There’s room for some creativity in this new section.

4. Centerfield is now about 405 feet from home instead of 400, but we’ve cut down on the foul territory quite a bit, so this may keep the amount of offense roughly the same as before.

* * *

Ahhhhhhhh, now see? That’s much better.

I’m sure you have all loved your Dictator for the Day, and Wish Long Life for your Beloved Comrade Who Brings Glory to the Homeland. Now please excuse me, I have some propaganda posters to go photoshop.

There’s no way to be gentle about this: A’s General Manager Baby Nellie’s offseason moves have clearly weakened the A’s anagram roster for 2014. They have become slightly worse across the board, but some of his moves in the bullpen…well, I just don’t know what he was thinking.

Starting Rotation:

The A’s have lost the two best anagrams from their 2013 starting rotation: Bartender Snot and No Local Robot. Angry Nosy and Rat Mocks Zit are decent replacements to be sure, but are also both clearly a step down. Fin Jar GIF looks like odd man out, as acronyms are purely replacement-level stuff, even if they can be pronounced.

The roster of catchers remains the same. Order Sinker is the best gamecaller of the group, of course. Pegs Hot Vent remains to fill in should either of the other two catchers need to go on midseason pilgrimages again.

5: Hajj Soon
21: Pegs Hot Vent
36: Order Sinker

Infielders:

Armload Seas gnip-gnopped his way to Texas last summer, so the A’s have replaced him with Tonic Punk. It’s a slight upgrade, to a mostly intact infield where even the weakest link redeems himself with a Star Wars reference.

A bit of background: in October of 1989, I had just returned from a year living in Sweden with my girlfriend (now wife) Pam. Pam was staying at her parents’ house and I was staying with her brother, until we could find jobs and afford to get our own place.

In hindsight, this letter is quite long, full of unnecessary details and subplots, not unlike a Victorian novel. It also lacks a good plot, because, well, no buildings fell down around me or anything. Nobody in the story was hurt, nobody was rescued. But in my defense, this was back in the days when you couldn’t just send an email or post something on Twitter or Facebook or Instagram and have everyone you know around the world instantly know what’s going on in your life. My Swedish friends probably got some horrific pictures on TV of collapsed buildings and fires and thought San Francisco had fallen into the sea. We weren’t so overwhelmed with data that a lack of filtering was a problem. TL;DR was not a thing back then.

Over at Beyond the Boxscore, Stephen Loftus has posted Pitcher Similarity Scores. The scores compare pitchers to each other based on:

Pitch Velocity

Pitch Break (Horizontally and Vertically)

Pitch Locations

Pitch Release Point

Curious about how the A’s scored, I extracted the A’s pitchers from the spreadsheet. A few pitchers didn’t seem to throw enough pitches last year to qualify (Brett Anderson, Sean Doolittle, Pat Neshek), while Fernando Rodriguez is on it, even though he hasn’t been seen in Oakland yet, because he got hurt in spring training.

A few notes:

A.J. Griffin is only mildly Zitoesque, and is actually more similar to Jerry Blevins, of all people.

Griffin is the only player on the A’s who does not have R.A. Dickey among his 10 least-similar players.

Bartolo Colon has the most-similar least-similar player in baseball, if that makes sense. His similarity to John Axford, his least similar player, scores higher in similarity than any other player’s least-similar player. I assume that’s because Colon throws mostly fastballs.

Tommy Milone seems to be the most unique pitcher on the A’s. His #1 comp score (Jason Vargas, 0.739) would be the 24th-highest score on Bartolo Colon’s list.

Yoenis Cespedes fascinates me. He came to the USA from Cuba last year with no professional baseball experience, and went straight to the majors. He had to adjust to the new level of play, of course. All players do. But usually the kind of learning a player does in the majors is subtle, since the difference between AAA and the majors is subtle. It’s hard for a layman like me to catch on to those subtleties.

But with Cespedes, the learning wasn’t subtle, it was obvious. He’s amazingly talented, and you could see, often from pitch to pitch, the adjustments he was making. The first time he faced a pitcher last year, he had a tendency to swing at breaking pitches out of the strike zone. Once. Maybe twice. But the next time, he’d take the pitch. Then the pitcher would have to throw some new wrinkle at him. Which he’d fail at initially, and then figure that out the next time, too. Then the pitchers have to come in and throw him a strike, and he’d hit it, hard.

Which makes me especially intrigued about this year, his second time through the league. How will the league try to get Yoenis Cespedes out now that he’s seen most of the pitchers before?

So I’d thought I’d look at what Seattle has done in the first two games against him, courtesy of some Pitch F/X graphs from Brooks Baseball.

Game 1

Facing Felix Hernandez. Last year, Cespedes was 4-for-12 against him, with a double and four strikeouts.

Plate appearance #1: Hernandez throws a get-me-over fastball on the first pitch. Cespedes takes. Then Hernandez throws a curve down and away, which Cespedes chases out of the zone, and grounds to third. I’m sure the Mariners wouldn’t mind seeing Cespedes swing at curveballs out of the zone all the time. If this were one year ago, they’d keep throwing it over and over again hoping he’ll still chase it, but as we’ll see, the Mariners don’t just do one thing against him anymore.

Plate appearance #2: Hernandez throws a slider up and over the middle of the plate on the first pitch. That’s a dangerous pitch to throw Cespedes, and he whacks it, but Brendan Ryan manages to make a good play on it and throw him out. The Mariners win this battle, but you wouldn’t want to use that pitch as an example of how you want to get Cespedes out. We’ll find that out in game 2.

Plate appearance #3: Cespedes hasn’t seen a changeup yet, but Hernandez throws him four of them in this at-bat. Also interesting is how Hernandez moves around the strike zone. Up and in, down and away, up and in, down and in, up and away, up and…oops over the plate. The last pitch is a changeup that’s up and over the plate, slightly in. Again Cespedes jumps on it, and again hits it hard right at a fielder, this time, the third baseman. Cespedes works the at-bat and gets a good pitch to hit, again it finds a glove, but again, this isn’t a recipe you probably want to rely on to get Cespedes out.

Plate appearance #4: This was in the bottom of the ninth, and Felix Hernandez had been replaced by Tom Wilhelmsen. Cespedes had struggled against Wilhelmsen last year, going 0-for-5 with 4 strikeouts. Cespedes gets ahead in the count by laying off a first-pitch curveball off the plate. Wilhelmsen then comes in with a fastball which turns out to be the best pitch of the at-bat for Cespedes to hit, but he fouled it off. Cespedes then lays off another curveball out of the zone. Cespedes is probably looking for another fastball like the 2nd pitch and does get it. But Wilhelmsen throws it inside off the plate, not a good pitch to hit, and Cespedes jams himself and grounds out to third. Another lesson for Cespedes to learn from–it will be fascinating to see what Cespedes and Wilhelmsen do the next time Cespedes faces him ahead in the count 2-1, 3-1, or 3-2.

Game 2

Facing Hisashi Iwakuma. Last year, Cespedes was 2-for-4 against him, with a homer.

Plate appearance #1: The pitchers Cespedes faces in this game don’t have the kind of stuff that Hernandez and Wilhelmsen had yesterday. Cespedes hardly sees any inside pitches in this game. We can see what Iwakuma wants to do in this game: instead of working inside and outside like the fireballers yesterday, he lives on the outside corner against him, either slightly over the plate away, or slightly off the plate away. Iwakuma misses away on the first two pitches, and Cespedes takes the third to make him throw a strike. Then Iwakuma makes the same mistake Hernandez did yesterday, leaving a slider up and over the plate. This time, Cespedes doesn’t hit it at any fielders, as he deposits it over the center field fence for a home run.

Plate appearance #2: Iwakuma avoids throwing Cespedes any sliders after that. He throws a fastball inside for show on the first pitch, and then goes back to the outside corner. He leads off with a good curveball down and away, and then goes up the ladder with two excellently located fastballs, both of which Cespedes swings through. I’m guessing the second fastball in a row surprised Cespedes a bit.

Plate appearance #3: Iwakuma is gone, and Cespedes is now facing Carter Capps, whom he faced once last year. He takes a first pitch curveball over the inside of the plate. Then Capps gets him to chase a couple of curveballs just off the plate, and strikes him out. Next time they face each other, I’ll be watching to see if Cespedes chases those curveballs again, or if he lays off the next time, and makes him throw something in the zone he can hit.

Plate appearance #4: Oliver Perez, this time, who like Capps had faced Cespedes only once before. He takes the first pitch for a strike, as he often does. Then on the second pitch, he gets a slider it a hittable location over the middle of the plate, but fouls it off. Then he swings through a well-located fastball low and away in the zone.

If you have a pitcher that Cespedes hasn’t seen much, try to throw him breaking pitches off the plate and get him to chase. That won’t work forever, though.

He can hit fastballs and crushes badly located off-speed stuff. So if he has faced a pitcher multiple times, mix up your pitches and avoid predictability.

If your pitcher has good velocity, you can try to jam him inside. Don’t try this with soft-tossers, though.

Location, location, location.

I don’t know that there’s anything there that isn’t true of most hitters in general, except that Cespedes doesn’t seem to have any one particular hole in his swing or vulnerability in his approach except against unfamiliar pitchers. So you have to try to fool him like Iwakuma did when he went up the ladder on him, or just hope that when you miss your spot that it finds a fielder.

Now go turn on the A’s game and watch Joe Saunders blow my whole rough theory apart tonight by pounding Cespedes inside with loopy sliders or something. That would be cool, because baseball is awesome like that, and there are always new lessons to be learned.

Jason Wojciechowski has a look at why A’s fans may be overoptimistic about the A’s this year. His analysis is reasonable. But I, too, find myself slightly more optimistic than the projections. I want to explore why I feel this way.

The A’s ended the season with five rookies in their starting rotation. Except for Travis Blackley, those rookies all return, joined by Brett Anderson and Bartolo Colon. The bullpen will basically be the same. I have some concerns about the starting pitcher depth — the 7th-9th pitchers in Sacramento are all big question marks — but that’s true for a lot of teams.

I expect the pitching to be roughly the same as last year. The big changes are on offense.

Despite winning their division, the A’s got below-average OBP in 2012 from six of the nine positions on the team, and were the worst in the league in three of them:

The A’s offense last year depended heavily on Yoenis Cespedes, Brandon Moss and the Smith/Gomes platoon at DH. You look at that list and think, well maybe they’ll regress at three spots in the lineup, but there’s lots of room for improvement at six!

And the A’s did make moves to improve the worst of these positions. Jemile Weeks held down second base for most of the year, and was awful, both offensively and defensively. A platoon of Sizemore and Sogard should be able to best Weeks’ numbers. John Jaso at catchers should easily surpass the pitiful numbers Kurt Suzuki put up before he was traded. And Jed Lowrie will surely outhit Cliff Pennington, although he may not be quite as good defensively.

For the players who were not replaced, I expect improvement from several of them. Yoenis Cespedes and Josh Donaldson were both a bit overwhelmed early in the year, but improved dramatically as the year went on. I’ve never seen a player learn to adjust so visibly and impressively as Cespedes. He tends to get fooled with off-speed pitches the first time he sees a pitcher, but the next time, he either lays off the pitch that fooled him, or he crushes it. I can’t wait to see what he does his second time through the league. Donaldson was learning to play third base at the beginning of the year, and seemed to take his defensive struggles to the plate with him. But his defense went from being awful in April to fantastic in September, and as his defense came around, he began to hit about what you’d expect from his minor league numbers in the past.

So that leaves basically Moss, Reddick and the DH platoon as sources for regression. Gomes has basically been replaced by Chris Young. Young, like Gomes, has strong platoon splits, and if Melvin can use Young like he used Gomes, I think the DH platoon can hold up. Young’s strong defense may tempt Melvin to play him more against right-handed pitchers than he played Gomes, with someone like Cespedes moving to DH. That would improve the defense, but hurt the offense. A wash? Maybe.

We might not expect Reddick to hit 32 homers next year, but he was awful for long stretches last year, particularly with men on base. He hit .283/.332/.540 with bases empty, but only .191/.273/.368 with men on base. If both of those splits regress revert to his personal mean, he’ll have more impact in 2013, because so much of his 2012 output was empty.

That leaves Brandon Moss, who to me is the key to the A’s season. If he produces anything like he did last year, the A’s make the playoffs. He out-OPSed (1.123) both Mike Trout (.900) and Miguel Cabrera (1.071) in September/October last year. But he’s a career .251/.317/.442 hitter. If he hits like his career numbers in 2013, the A’s may disappoint. The projection systems mostly regard his 2012 as a mirage, and expect numbers closer to his mediocre past.

Moss also has a big platoon split. Part of his 2012 success was being platooned with Chris Carter, who hit .241/.404/.494 in his half of the platoon. Carter was traded away to get Lowrie. Replacing Carter as a right-handed first baseman is Nate Freiman, a rule-5 player who has to stay on the roster all year, or be returned to the Padres. Freiman has power, but he can hardly be expected to put up an OBP over .400, even if strictly platooned against LHPs.

Billy Beane built a roster with a lot of depth and versatility, and if any hitters get hurt or underproduce, there are other players at the same positions who can step in and produce — except at first base. There really isn’t a good replacement for Moss if he gets hurt or reverts to pre-2012 form. But what I’ve seen in the five spring training games I watched, his swing looks good. I feel optimistic about Moss, which makes me optimistic about the A’s as a whole.

Major League Baseball’s Opening Day fell this year on Easter Sunday. It is probably no coincidence that both Easter and Opening Day arrive in spring, as both are meant to signal as spring does a rebirth, a new beginning, a fresh start.

Starting fresh is not as easy as it sounds. We humans are very good at pattern recognition. We see a new thing, and recognize in its shape some other shape we’ve seen in the past. The older we get, the more we do this; the more patterns we can bring to mind, the less we see some new thing as it is today, and the more we see that thing as something that came before.

Look, here comes young Oakland A’s baseball pitcher A.J. Griffin, throwing a curveball. It looks familiar, that curveball. Does he throw that curveball Zitoesquely? Or perhaps it’s more accurate to say that he throws it Duchschereresquely?

Today is Opening Day for Griffin’s A’s 2013 team. Will it be as magical as 2012 was? Or as disappointing as 2007? Or perhaps glorious, like 1972, 1973 and 1974?

We can take all the statistics from all the players from all the history of Major League Baseball, sum them all up in clever and scientifically sound ways, and make predictions. 82.2 wins! 86 wins! 93 wins!

Those predictions, they aren’t the future, or even the present. They are merely shadows of the past. To truly start fresh, we must try to look on things as a child does, like someone who has no past, who has no library of previous patterns in our heads.

This is, of course, impossible. These thoughts come to our minds automatically, whether we want them to or not.

And so today will happen, and tomorrow, and the days will add up through October to a number that is greater than or equal to or less than some number we expect in our heads, and we will be delighted or bored or disappointed accordingly. And only then, when it is too late to enjoy the year in and of and by itself, can the 2013 season drop the baggage of its past, and be free to be itself.

For what is truly born on Opening Day is not the current year, but the previous year. Congratulations on your newfound freedom, 2012. You were amazing.

Nobody was elected to the Baseball Hall of Fame today, and Rob Neyer has an interesting post exploring why some writers seem to consider steroid cheating in baseball as being worse than other forms of cheating. I want to address his article, because at one point he says something that is flat out wrong:

Why does the impact matter? I’m trying to imagine a player’s thoughts here … “Gosh, those amphetamines seemed to help a little, so even though it’s cheating I think they’re okay to use. But golly, these steroids everybody’s talking about … I’d better not mess with those, because they seem to help a LOT.”

That just defies everything we know about human nature and, specifically, the nature of world-class athletes. If there’s a small advantage to be taken, big-time athletes will take it. If there’s a larger advantage to be taken, they’ll take that.

Neyer is wrong about that defying what we know about human nature. Just the opposite, it actually conforms to it perfectly. Dan Ariely, Professor of Behavioral Economics at Duke, has made a science out of studying cheating, and he has found that nearly everyone does make a distinction between cheating a little versus cheating a lot. Watch this animated video of an Areily speech, and keep the steroid issue in mind as you listen to it:

Most people cheat, as Ariely says, “just a little bit”. Only a very very few cheat a lot. You see it every day: if you’re on the freeway, and the speed limit is 55mph, do you stay under 55mph? No, most people drive about 58-63mph–cheating just a little bit. A few will drive 70, 80, 90mph — but they’re a small minority.

If you cheat just a little bit, it’s easy to rationalize it, and still feel good about yourself. It is much harder to rationalize cheating a lot: in that case, you have crossed over into Ariely’s “What the Hell” effect.

I doubt that athlete’s psychology is very different from other humans in this manner. People don’t seem to mind people who cheat just a little bit — scuffing a baseball here, or stealing a sign there, or drinking some extra caffeine to stay alert. But there is a point where you flip over into the “What the Hell” effect — where you’re cheating so much that it has a noticeable effect, and you keep doing it, because what the hell, why not?

Where is the line in baseball between cheating a little and cheating a lot? I don’t know, and neither it seems, do the baseball writers. But this is not an black-and-white issue, where in order to be consistent, you either you have to let all cheaters in, or you have to kick all cheaters out, as I’ve seen some people (including, I think, Neyer) arguing. The science says there are levels of cheating wired into human nature. To Neyer’s credit, however much he may not want to draw a line between cheating a little and cheating a lot, he recognizes that writers are doing it, and he hypothesizes that they’re drawing the line at the statistical records being broken:

I continue to believe that a lot of the hand-wringing over steroids — which, by the way, I really wish hadn’t happened — is due to just two players: Mark McGwire and Barry Bonds. I believe that if McGwire and Bonds hadn’t so utterly destroyed the home-run records, leaving first Roger Maris and then Hank Aaron in the dust, we might not be having this discussion at all.

On this point, I think Neyer is right. Many people are outraged by steroids because breaking those cherished records makes it clear that Bonds and McGwire were cheating more than “just a little”. And because that line that is built into human psychology, people react emotionally to want to punish that behavior. The fact that baseball writers are taking some time to figure out what and where that line is, to me seems quite a reasonable thing to do.