Grand Forks Herald columnist Eric Bergeson‘s latest piece stumbled into the sabermetric community on twitter. My timeline yesterday morning filled with jokes from Baseball Prospectus analysts, former MLB.com reporters and even Aaron Gleeman, the “analyst” in Bergeson’s column, were in on the joke. The joke, of course, were the awful arguments made in Bergeson’s column.
Now, Bergeson is not a baseball writer. He is not a sports columnist or reporter at all so I can forgive him for misstating (misremembering?) some things. He calls the 2010 and 2012 San Francisco Giants “pipsqueaks” because to the casual observer, or to a columnist who didn’t bother looking up the regular season statistics for those two teams, they may well have appeared as pipsqueaks. They weren’t, of course, scoring a commendable 4.43 runs per game while only giving up 4.01 runs in 2012, leading to a fourth best-in-the-MLB 94-68 record that year. Yes, they were dead last in team home run total and merely slightly above league average in runs and runs batted in, but their pitching staff dominated most teams they faced and their batters had one of the best-park adjusted on-base plus slugging rates in the majors. However, I am compelled to respond to Bergeson on the issue that analytics “changed the game of baseball, and not for the better.”
I don’t want to talk about obscure facts or numbers. I want to talk about baseball.
Bergeson’s column, the latest I’ve seen in a string of pieces written by newspaper men across the country that flat attack sabermetrics with dubious arguments that we’ve ruined baseball with numbers. I thought I’d seen the height of these complaints with the AL MVP and WAR argument last year about Miguel Cabrera and Mike Trout, but we’re not even six or seven games into 2013 and here we are, not a step forward. To be fair, sports fandom begets argument. Arguments are fun.
“To [Oakland A's manager Billy] Beane, and to the statisticians, baseball isn’t beautiful until it starts to conform to the averages,” writes Bergeson, setting up the most common cliche that stat-head fans, sabermetricians and analysts only care about numbers. Obviously this is wholly untrue. If we only cared about numbers, we would just do pure math. We would open our math books and start doing rote calculations. Why worry about the fates of men in tights, the boys of summer?
We care deeply about the sport, so deeply have we have so honed our arguments about the game that we have developed methodology and glossaries full of well-documented terms to describe our arguments. At its core, sabermetrics isn’t in opposition to having fun, it is a sharpened edge with which smart and obsessive fans (and smart baseball executives, scouts, agents, players, etc.) can hone their arguments about our favorite sport. If part of the fun is having arguments like “Who is the best player of all time?” or “How will my favorite team do compared to your favorite team?” then sabermetrics as a school of thought, as a methodology, as a body of research lends itself to winning those arguments if for no other reason than it attempts to include rigor and logic in defense of those arguments.
If part of the fun is also watching your favorite team win games, then wouldn’t you want them to use any advancements in the sport to win those games? I, for one, am not a baseball fan because I like to see home runs (although I do enjoy seeing them), rather, I like to see my favorite teams win ball games.
The search for objective knowledge about baseball
I think we should have a fairer description of what sabermetrics actually is before we continue any further. From ‘The Sabermetric Manifesto‘ (1994):
Bill James defined sabermetrics as “the search for objective knowledge about baseball.” Thus, sabermetrics attempts to answer objective questions about baseball, such as “which player on the Red Sox contributed the most to the team’s offense?” or “How many home runs will Ken Griffey hit next year?” It cannot deal with the subjective judgments which are also important to the game, such as “Who is your favorite player?” or “That was a great game.”
Since statistics are the best objective record of the game available, sabermetricians often use them. Of course, a statistic is only useful if it is properly understood. Thus, a large part of sabermetrics involves understanding how to use statistics properly, which statistics are useful for what purposes, and similar things. This does not mean that you need to know a lot about mathematics to understand sabermetrics, only that you need to have some idea of how statistics can be used and misused.
If numbers have ruined baseball, then the father of baseball himself ruined baseball. Henry Chadwick devised the box score to describe game outcomes. Since then, fans, sportswriters and baseball men and women have added to these numbers. Some stats, like WHIP, come from fantasy baseball, but have found use in otherwise describing the game and player performance.
What sets “sabermetrics” apart, I think, is the complexity with which some statistics are created. Things like park factors, for example, are based on analysis of past performance of the various fields. Park factors, though, is merely an acknowledgment that some fields tend to have drastically different outcomes than other fields. Coors Field favors batters and hurts pitchers, while Target Field appears to favor pitchers. If we were to ignore this, we would be intellectually dishonest about comparing things like counting statistics like runs and runs batted in.
WAR, as another example, is particularly problematic from a lay point of view because there are two versions: Fangraphs WAR and Baseball Reference WAR, and both have minor differences such as how to count “replacement level” and what statistics to incorporate at what levels. Furthermore, both methods rely on proprietary data, so most of us can’t double check the math.
That said, the vast majority of sabermetrics measurements are clearly defined and relatively simple to calculate. One merely need to understand what each statistic measures and how it derives those measurements. If you graduated high school, you likely have enough background in mathematics to do 90% of these so-called “advanced analysis” techniques. Much of it is merely pre-algebra or different ways of doing the normal add this, subtract that and divide by this other thing that forms the basis for many “standard” statistics. In other words: arithmetic.
Analytics, on the other hand, has two outcomes. First, it is meant to describe the data. Second, it is meant to forecast and predict outcomes based on the data. This is generally where the mathematics in sabermetrics ramps up, but not to the point of great difficulty should you want to dive in and learn it.
With all due respect to Bergeson’s grandfather, there is no such law in mathematics as “the law of averages.” It is a common misconception of several theorem. I’m not going to get into the math too deeply in this space, but suffice it to say that when a person says a player “is due” for a hit, that person relies heavily on this misconception of probability.
A simple example would be flipping a fair coin. 50% of the time it will land on heads. 50% of the time it will land on tails. If I were to flip the coin once, half of the time it would land on heads, half of the time it would land on tails. If it lands on heads in this first flip, that outcome has no bearing on future flips of the coin. If I were to flip 10 heads in a row, the odds of my next flip would still be 50/50. There’s a phrase for this belief that independent events are a pattern called The Gambler’s Fallacy.
Bergeson: “Statistics say pitchers who consistently throw over 100 pitches wear out.” To which I ask, what makes such a nice, round number like 100 magical?
MATH ISN’T THE DAMN POINT
The point of all these newfangled numbers is their application. We use—we have always used—numbers to describe baseball. The numbers are a means to an end. When I read a box score or look at a scorecard or parse through data in Retrosheet files, or sort through the Play Index at Baseball Reference, I am doing so to apply the information in some way. Whether I want to recall a specific event from a game or project this season’s ERA for every pitcher for my fantasy baseball team, I used the numbers only for a useful purpose.
On the field, that purpose is winning baseball games. Winning teams get more butts in seats and hotdogs down gullets and jerseys on backs.
When I understand that a player is not necessarily “due” for a hit, I suddenly have a better understanding of the game that I love so much. Math is merely a tool for me to describe the game, the same way Casey At The Bat describes the game’s emotion through poetry, the same way Don DeLillo described nostalgia through the baseball from The Shot Heard ‘Round The World in Underworld, the same way that the smell of freshly cut grass, the taste of boiled hotdogs, the cheers of fans and the sun beating on my face describes a moment in the ballpark.
Rogers Hornsby: ”People ask me what I do in winter when there’s no baseball. I’ll tell you what I do. I stare out the window and wait for spring.”
I argue over the value of the Justin Upton trade. I wait for the Sean Lahman database to be finished sometime in early January. I wait for Marcel projections from Tom Tango. I read and re-read The Pitch That Killed. I stare out the window and I wait for spring.
Math is just another lens with which to understand and obsess over baseball.
In that respect, sabermetrics is not in opposition to “traditional” baseball any more than the Internet is in opposition to great writing. I would like to see these false dichotomies destroyed.
Shoeless Joe Jackson Comes to Iowa
The one constant through all the years, Eric, has been America. Baseball has rolled by like an army of steamrollers. It’s been erased like a blackboard, rebuilt, and erased again. But America has marked the time.
We progress, despite the messiness of progress. That is the story of baseball. That is the story of America. Since baseball was invented we have given women the vote and blacks better opportunities, we have created the designated hitter (for better or worse) and begun the destruction of those awful multi-purpose indoor stadiums. We have been to the moon and begun using instant replay in baseball.
The romance never left the game, Eric. You changed. You grew cynical about the game, as the game and America progressed. Rather than growing with the game, you chose to ignore the fact that today is the best day to be alive as a baseball fan.
Today you have access to every statistic imaginable (if you want them). Any question you would like answered is a heartbeat away. You can watch baseball games on your phone! Whatever you feel like talking about with regard to baseball, there is a group that feels the same way on the Internet. There’s probably a podcast out there, just for you.
I have seen this before. This attitude that everything is going to hell in a handbasket. I’m a journalist, by schooling, but a programmer by trade. I have seen newsrooms writhe and kick and scream in horrible fits during transition. These days may be horrible by comparison to the past to be a newspaper man or woman. These days are wonderful as a news consumer, though. Technology and telling stories are not mutually exclusive things, even though I know my fair share of reporters who ended up in the field precisely because a J-School degree meant doing the minimum math requirements. Things have changed, obviously.
nos·tal·gia, noun - a wistful or excessively sentimental yearning for return to or of some past period or irrecoverable condition.
As journalism is the search for objective knowledge about current events, sabermetrics are the search for objective knowledge about baseball. I struggle to see how, again and again, baseball journalists fail to acknowledge their nostalgic bias. It clouds their judgement.
Perhaps it is a fear that if baseball outcomes can be reduced to numbers, the role of the reporter is thus reduced. Reporters, remember, dominated the sport in its infancy. They were essentially the agents of the time. They wielded an amount of inbred power then that is considered unfathomable in a post-Watergate era of objective journalism, save for Hall of Fame ballots. However, they grow ever smaller in their usefulness to the sport.
If that is the case, then I would think we need to re-evaluate. We, as fans, as sports news consumers, are complex creatures. I want raw data. I want this, too:
The Ruth is mighty and shall prevail. He did yesterday. Babe made two home runs and the Yankees won from the Giants at the Polo Grounds by a score of 4 to 2. This evens up the World’s Series, with one game for each contender.
It was the first game the Yankees won from the Giants since Oct. 10, 1921, and it ended a string of eight successive victories for the latter, with one tie thrown in.
Victory came to the American League champions through a change in tactics. Miller Huggins could hardly fail to have observed Wednesday that terrible things were almost certain to happen to his men if they paused any place along the line from first to home.
In order to prevent blunders in base running he wisely decided to eliminate it. The batter who hits a ball into the stands cannot possibly be caught napping off any base.
-Heywood Broun, in the New York World, October 12, 1923
We call them stories for a reason.