By Father Gabe Costa
» More Columns
The last time I taught a sabermetrics course, I was fortunate to have United States Army officer Second Lieutenant Christopher Anderson auditing my class. Holding a degree in Nuclear Engineering from West Point, Lieutenant Anderson is also an avid baseball fan. He is our guest blogger for this installment of By The Numbers.
Lieutenant Christopher Anderson: At 12:05 pm the Cubs took up their defensive positions against the Phillies at Wrigley Field on a sunny, 81° F Saturday afternoon. After two hours and forty-nine minutes, with little pomp and circumstance, July 3, 2010, was written into the annals of baseball history for the single-game record the Chicago Cubs set. That Saturday, the Cubbies left an unfathomable seventeen men on base—the most in a 9-inning game since 1919, the last date when data was available.
Major League Baseball requires all official scorekeepers to record the number of men left on base by each team at the end of each half-inning. The total number of men LOB “shall include all runners who get on base by any means and who do not score and are not put out [and]…batter-runner[s] whose batted ball results in another runner being retired for the third out,” according to rule 10.02(g) of the Official Baseball Rules. Now that we have defined the statistic of interest, I would like to point out that all statistics and records cited in this blog entry and in the accompanying files come from www.baseball-reference.com.
The contemporary belief about the team LOB statistic is that it reflects how well (or, more accurately, how badly) a team does at getting runners home—manufacturing runs—and thus how likely they are to win a game; the more runs you score, the better your chances of winning. This argument results from the logical fallacy post hoc ergo propter hoc: “We left men on base that did not score runs, and afterwards we lost the game; therefore, leaving men on base that did not score runs caused us to lose the game.” Many people then take the next step of utilizing the logical fallacy of denying the antecedent, which brings us to the idea that “if we do not leave men on base, we will not lose.”
Yankee manager Joe Girardi, for example, engaged in this doubly-fallacious (not to be confused with “phallic”) thought process after a 4-3 loss to the Royals on May 11th, 2011. He cited the 15 Yankees left on base over 11 batted innings as the main reason for the loss. However, I aver that after a 7-inning, 1-run performance from a starter, the reliever coming in and walking two of the first three batters and then getting taken to centerfield for a 2-out, game-tying RBI single that led to the extra innings, is the real area of concern.
So far, I have supplied you with some anecdotal and philosophical facts; now it is time to delve into the numbers. Before analyzing any numbers, I hypothesized that the team LOB statistic does not negatively correlate to winning or losing; nor runs scored, abbreviated RF for “runs for.” In addition, I hypothesized that team LOB positively correlates to winning or losing and runs scored. I reasoned that since you have to get men on base to score (with the exception of the long ball), the amount of men a team leaves on base is indicative of their ability to continually get on base, which eventually will result in cycling runners into home, thus scoring runs.
If my hypothesis was to be true, it needed to be true for all teams—good, bad, powerful, weak. Therefore, for my data, I found the best, worst, .500 Winning Percentage (WPCT), and the New York team from the AL in 2008. This query gave me the Angels, Mariners, Indians, and Yankees, respectively. For each game the team played that year, I compiled win or loss, home or away, runs for, runs against, run differential (RD), team LOB, and innings batted. “Win or loss” and “home or away” were treated as binary, assigning 1 to a win, 0 to a loss; and, 1 to home and 0 to away. Run differential was determined by subtracting “runs against” from “runs for”. Team LOB, runs for, runs against, and innings batted were manually entered. All four teams combined yielded 486 sets of data.
To determine how the data that I compiled correlated to each other, I had to find their correlation coefficient. Correlation coefficients range in value from -1 to 1. -1 indicates a very strong negative correlation (as stat A decreases, stat B increases), 1 indicates a very strong positive correlation (as stat A increases, stat B increases), and 0 indicates no correlation. The reader can find the equations here, or, you can install the (free) data analysis add-on to Microsoft Excel and have the program do it for you—as I chose to do. The results are as follows:
As one can see, the correlation coefficients are very close to zero for all of the compared statistics. The data appears to say that the amount of men a team leaves on base in a game does not correlate to whether a team is home or away, wins or loses, the amount of runs a team scores, or how many runs a team wins or loses by. To further validate my analysis of the data, I found, for each of the four teams, how many men each team left on base per nine innings batted, and what their winning percentage was for 2008. I eyeballed a graph of the data (albeit only four points) to see if I could see a correlation: I saw none! A copy of that graph and all of the data I compiled is located in the attached Excel file.
At the end of the day, my hypothesis turned out to be half right. Team LOB positively correlates only very-weakly to the amount of runs a team scores, and not at all to winning or losing. However, based on this data, I feel safe in saying that team LOB has no correlation to whether a team wins or loses or how many runs it scores.
Managers: please stop using Team LOB as a catchall excuse for losing.
Fans: please stop using Team LOB as a scapegoat for why your team lost.
And to everyone who made it to the end of this blog, I say “Congratulations!” much like then Cubs manager Lou Piniella must have said to his team that July afternoon. They defeated the Phillies 3-1.