Tuesday, January 26, 2016

Ramblings on Projections

FanGraphs today released the 2016 ZiPS forecasts for the Nationals (ZiPS is a well known projection system used by lots of websites including FG and ESPN). The ZiPS projections, and most other statistically based projections as well, are pretty optimistic about the Nationals’ 2016 season and that got me thinking. The break down of the specific projections will come in future blog posts or podcasts, but today’s release got me thinking on a more general level about projections.

I’m sure a lot of people will look at the projections for 2016 and have a hard time putting any faith in them given that the same projection systems pegged the Nationals as one of the best teams in 2015, only for Washington to vastly underperform those projections. Why trust the models this go around?

It’s a fair question, but it suggests an incomplete understanding of how these projections work. In a nutshell, every projection system takes a player’s historical performance, applies an aging curve based on similar players, runs thousands of season long simulations, and spits out a result. How each system calculates the aging curve and which stats they pull in are the systems' “secret sauce” but the general concept is the same for each one. The actual output from the model, though, isn’t actually what gets shown on websites or thrown into headline as click bait, and that’s where people get lost.

What the systems spit out isn’t one specific win-loss record but a bell curve of potential outcomes. When the Nationals are projected as an 88 win team in 2016 (according to FG), that’s the most likely outcome the model predicted, call it 50% of the time, but it also ran some simulations that saw them winning 100 games or only 75 wins 5% of the time. That doesn’t mean the model is broken, in fact it’s built this way on purpose. In a single season, or a sample size of one in this case, the projection's most likely outcome may in fact be way off what actually takes place. Take a larger sample size, say the projections for each team in every season for 10 seasons, and the projection systems are shockingly accurate.

To simplify things, think of your standard six sided dice. How to predict each roll? Well, it’s obvious that each roll has a 1/6 chance of resulting in a “4” or every other number 1-6. Designing a model to predict rolls would expect that if you rolled that die (assuming it’s not weighted) 163 times, a full 1/6 would be expected to have resulted in a 4, it’s basic math and statistics. Of course in reality, even with regulation dice, it’s likely that the numbers wouldn’t work out exactly to 1/6 of every number. Sometimes, you get a “hot hand” at the craps table and roll the same number 5 times in a row. Statistically unlikely, sure, but not impossible.

This same concept applies to baseball projections, which are obviously much more complicated than a single dice roll. Take again the 2015 Nationals. While the most likely outcome from these projections pegged the Nationals for around 90 wins, some of the simulations undoubtedly resulted in the actual 83 wins achieved by the Nats, just at a lower rate than 90 wins. It doesn’t mean the projections are broken, it just means that the season didn’t play out exactly as the model thought it would, on average.

And there’s the rub. In order to build a model, you need to make some assumptions. How many games Bryce Harper will play, for example, or the number of starts Tanner Roark will make. The model makes these assumptions based on its database on historical precedent on both the individual player himself and players similar to him in the past. For a team like the 2015 Nationals, the actual results for these assumptions, specifically the multitude of injuries and under performance from players like Ian Desmond, were closer to the lowest extreme than the norm and the thus you end up with a win-loss record closer to the extreme low end than the middle of the bell curve.

So when FanGraph’s says the Nationals will win 88 games in 2016, understand that it means that they think the team will likely win between 83 and 92 wins, with 88 being the most likely result. Does that make the projections worthless? 83 wins means the team is playing golf in the fall and 92 wins all but guarantees a playoff spot, so it’s not like it’s a tiny spread. You are, of course, free to think that. I will respectfully disagree. What models like those used by FanGraphs or ZiPS are trying to do is project the true talent level of a team. It’s a very useful indicator, but it isn’t the end all be all. In fact, this appears to be Billy Beane’s strategy with the A’s. He can’t afford to build a team with a true talent level of a 90 win team year in and year out, it's just too expensive. Instead he compiles a team with a talent level of an 85 win team and hopes for some good fortune to bank a couple extra wins and end the season in the playoffs.

In this light, the projections for the Nationals in 2015 were actually pretty accurate. Factoring in the games lost to injuries, the disappointing performance from key players, and games blown by the bullpen, it was clear the team wasn’t playing up to its true talent level and it would have been foolish to expect the team to overcome all these unforeseeable disappointments to achieve the early season projections. Pull out the wins that would have come from a healthy Ryan Zimmerman, Anthony Rendon, etc., and the Nationals did look more like an 83 win team. If some of those unfortunate events hadn't taken place, it's totally reasonable to expect the Nationals would have won a handful more games than they actually did, just as the models predicted.

So in a bubble, the 2016 Nationals look like a playoff caliber team. The games aren’t played in a bubble, of course, and things won’t go as planned and the team probably won’t win exactly 88 games. That’s why you play the games, as they say. But we have a baseline talent level to compare to now. And while it feels like the momentum of the offseason has shifted away from the Nationals, World Series aren't won in the offseason, as Nationals fans surely know. The talent level is there for the 2016 Nationals to make the playoffs, they just have to live up to that potential.

No comments:

Post a Comment