Ryan has a post about optimizing the A’s lineup over on The Pastime, using PECOTA projections and a formula from Cyril Morong over at Beyond the Boxscore.
Ryan didn’t have the programming nerdiness to work through all 362,800 lineup permutations. But I happened to be cursed with such geekdom, so I wrote a perl script to churn out the calculations. I ran it twice, once with Frank Thomas in the lineup, and once with Jay Payton in place of Thomas.
Here are the best and worst lineups. The number is runs/162 games.
Five best lineups with Thomas:
853.45: Bradley Chavez Ellis Thomas Johnson Crosby Swisher Kotsay Kendall 853.44: Bradley Chavez Ellis Thomas Johnson Swisher Crosby Kotsay Kendall 853.13: Bradley Johnson Ellis Thomas Chavez Crosby Swisher Kotsay Kendall 853.12: Bradley Johnson Ellis Thomas Chavez Swisher Crosby Kotsay Kendall 852.90: Ellis Chavez Bradley Thomas Johnson Swisher Crosby Kotsay Kendall
Five best lineups with Payton:
834.91: Bradley Johnson Ellis Chavez Swisher Payton Crosby Kotsay Kendall 834.80: Bradley Johnson Ellis Chavez Crosby Payton Swisher Kotsay Kendall 834.78: Bradley Swisher Ellis Chavez Johnson Payton Crosby Kotsay Kendall 834.63: Bradley Crosby Ellis Chavez Johnson Payton Swisher Kotsay Kendall 834.50: Bradley Chavez Ellis Swisher Johnson Payton Crosby Kotsay Kendall
A few interesting notes:
- This formula insists on batting Kotsay eighth and Kendall ninth. The other players switch around a lot at the top of the list, but that configuration is solid. If there is one conclusion to draw from this exercise, this is it.
- The A’s are about 20 runs/year better with Thomas in the lineup than Payton.
- It likes Bradley leading off and Ellis batting third. That’s probably not going to happen in real life, but the presumed order with Ellis leading off also works pretty well.
- Given that Ellis is probably going to lead off, and Chavez will bat either third, fourth, or fifth, the ideal lineups with that configuration are:
With Thomas: 852.58: Ellis Johnson Bradley Thomas Chavez Crosby Swisher Kotsay Kendall With Payton: 834.36: Ellis Johnson Bradley Chavez Swisher Payton Crosby Kotsay Kendall
Providing evidence that Zachary’s preference for Ellis and Johnson at the top of the order is a good one.
- When Thomas is in the lineup, it tends to like Chavez batting second. When Thomas is out of the lineup, it tends to like Chavez batting cleanup.
- Crosby and Swisher are pretty much interchangeable. Swapping them between any two lineups spots produces almost exactly the same result.
Now for some fun: the worst lineups…
With Thomas:
816.79: Crosby Kotsay Johnson Kendall Swisher Ellis Bradley Chavez Thomas 816.84: Swisher Kotsay Johnson Kendall Crosby Ellis Bradley Chavez Thomas 816.92: Crosby Kotsay Johnson Kendall Swisher Bradley Ellis Chavez Thomas 816.97: Swisher Kotsay Johnson Kendall Crosby Bradley Ellis Chavez Thomas 817.05: Kotsay Ellis Swisher Kendall Crosby Bradley Johnson Chavez Thomas
With Payton:
799.02: Payton Kotsay Swisher Kendall Crosby Ellis Bradley Johnson Chavez 799.11: Payton Kotsay Crosby Kendall Swisher Ellis Bradley Johnson Chavez 799.15: Payton Kotsay Swisher Kendall Crosby Bradley Ellis Johnson Chavez 799.24: Payton Kotsay Crosby Kendall Swisher Bradley Ellis Johnson Chavez 799.59: Payton Kotsay Swisher Kendall Crosby Ellis Bradley Chavez Johnson
The perl code is below, for those of you with the Unixness for these things…
#!/usr/bin/perl use Algorithm::Permute; # put players and their obp/slgs here my @pname = ('Ellis','Bradley','Chavez','Payton','Johnson','Crosby','Swisher','Kotsay','Kendall'); my @pobp = (.351,.355,.354,.322,.353,.346,.347,.332,.333); my @pslg = (.426,.447,.479,.432,.462,.453,.455,.414,.338); # formulae from http://www.beyondtheboxscore.com/story/2006/2/12/133645/296 my @obpx = (2.997,2.255,2.141,1.670,2.254,1.346,1.528,1.188,2.550); my @slgx = (.931,1.263,.933,1.504,1.146,1.237,1.164,.825,.539); my $constant = -5.261; my $slots = 9; my @array = (0..($slots-1)); Algorithm::Permute::permute { my $lineup = ""; $rpg = $constant; for (my $i=0; $i<$slots; $i++) { $rpg += ($obpx[$i] * $pobp[$array[$i]]) + ($slgx[$i] * $pslg[$array[$i]]); $lineup .= $pname[$array[$i]] . " "; } print 1.00*(int($rpg*16200)/100) . " " . $lineup . "\n"; } @array; # run the program from the command line like this: ./permute.pl | sort -n >somefilename.txt
1. If I can get that program to run from the terminal emulation on my Mac, I will officially become a geek won't I?
2. Yes. Especially if you can figure out how to download and install Algorithm::Permute from cpan.org.
3. This formula insists on batting Kotsay eighth and Kendall ninth.
"I'm smarter than any stinkin' formula!"
--Ken Macha
4. Cool.
I would really like to see data involving pitches per plate appearance (P/PA). The OBP and SLG statistics are great starting points.
The comments are great for leading up to a few posts this week.
5. Beyond the Boxscore has a new formula based on DH-only leagues, which for some reason really minimizes the value of the #3 spot in the order. Using those numbers puts Kotsay in the #3 slot most of the time.
That's really weird, so I'm starting to question those numbers. Either that, or we need to have a radically different view of batting orders when there's a DH than we're used to.
6. P/PA would be cool, as would having L/R splits. I don't know of any existing projections with L/R splits, but I suppose you could calculate a L/R Marcel projection.
7. It's pretty easy to see what's going on here. If the model says that the #9 hitter has the smallest effect on run production (which is probably true, but not to the extent that the original version claimed), then you'll certainly want to hide your least productive hitter there. And if it further claims (as in the revised version) that, of the 1-8 slots, slugging matters least for the #3 hitter, then you'll probably want to put Kotsay, with the lowest projected slugging other than Kendall, at #3, especially on a team with such a small range of expected OBPs.
The question is whether to believe the model in the first place. It looks to me like the noise in the data is so great, that there isn't much that can be salvaged here. In any case, this seems to be a somewhat perverse way of trying to solve the problem of optimizing a batting order. Simulations are simpler and likely to give more useful results.
The drawback of simulations is that it takes around 100K games with a fixed lineup to get precision on the order of .01 runs/game, so it would be somewhat prohibitive to do it for all possible permutations. But that fact should also tell you why it's so hard to draw any useful conclusions from a few years of historical data.
Something like the following might be interesting, though: run a simulation with a typical lineup (based on league averages for each slot), and then vary OBP/SLG in each slot slightly to get a table like Morong's. The coefficients should look considerably less random than what we have here. Then you could apply Ken's script to any actual group of players. There would be some circularity in logic here (it would generate lineups that are optimal, given the constraint that they look something like the lineups that managers actually use), and this can be seen as either a bug (it might miss a better answer) or a feature (it give answers that have some chance of actually being implemented).
I'm also looking forward to seeing what mgl/tango have to say about this subject in The Book.
8. Simulations are simpler than a 10-line script?
I get what you're saying, Turnstiles. Still, it seems a waste to disregard real game data, and use simulated data instead. Maybe there really is something going on here that we wouldn't capture in simulation. Maybe a hybrid solution would be better?
9. What season stats did you use? Or are you using career stats? I started writing a simulator in C to churn through the permutations that is similar to the one that salb918 over at beyondtheboxscore.com wrote in MATLAB, but when I plugged in 2005 stats for your above lineup (using Payton), my estimated runs per season clocks in way lower than what you got. I'm double checking to make sure I didn't make any typos in the stats (I'm using PAs, BBs, hits, 2Bs, 3Bs, HRs, SOs), but I would'nt have expected to be 100 runs lower than what you got.