Few people had a better 2008 than Nate Silver. The 30-year-old numbers whiz fed polling data and election returns into a statistical model he designed, allowing him to call the presidential primaries and the general election with uncanny accuracy and turning his Web site, FiveThirtyEight.com (named for the total number of Electoral College votes) into a must-read for political junkies.
Meanwhile, the system he developed for crunching baseball stats, known as PECOTA, predicted a successful season for the previously horrible Tampa Bay Rays, who went on to make it to the World Series. Silver spoke with Senior Writer Edward Cone about increasing the signal-to-noise ratio in political polling, and the implications of his work for business intelligence.
CIO Insight: A year ago, you were a successful baseball wonk, and now you’re talking politics on national television. How did that happen?
Nate Silver: You have 14 or 15 national survey research firms–all putting out polls in different states–and you have the local pollsters. You have so much information to sort through that someone has to play traffic cop and say this is what we should look at; this is how we can aggregate this information in a way that’s sensible. I felt the same frustration you get with the sports media, with conventional wisdom that doesn’t hold up to scrutiny. That was a lot of it–just wanting to improve the political discourse.
So there was data out there, but you analyzed it in a systematic way.
Silver: We have rules and algorithms set up for dealing with all these different polls. If you get a poll that looks like an outlier, nine times out of 10 you can understand why it came up with the numbers that it did–some weird assumption about turnout, or whatever else. You might have 15 polls that come out and one really weird number that comes out by chance or bad design. Those should be the kind that you disregard, even though they’re interesting and drive the most discussion.
You did your work with pretty limited resources. There was no super-computer in your home office.
Silver: A lot of it is not that complicated. You need a good model, and you need to not take shortcuts. You need to sweat the small stuff. I’ve got a problem-solving approach: My background as an undergrad was in economics, not statistics. I’m not that interested in the numbers for the numbers’ sake, but more in the question: Who’s going to win the presidency? When you are building a model, you have to understand the problem you’re trying to solve.
You didn’t just rely on the numbers. Your team spent time crossing the country to see things for themselves. What can businesses learn from that combination of approaches?
Silver: Sometimes, if you depend purely on the numbers, there’s a kind of inverse selection at work. That applies to things like site selection for a business. A site might look great on paper for retail space, but maybe there’s a porn store across the street, or a murder happened there two years ago. There are negative outliers you have to be very careful of.
Getting this right takes a combination of things. You want to be really meticulous, but also keep in mind the big picture: the problem you’re trying to solve. You can’t outsource all this stuff to India.
You could have a million Ph.D.s working on it–people who are very smart–but if they don’t understand the landscape of American politics, or what a baseball player’s career looks like, they won’t get it right. There’s an art to it, as well as a science.
There are a lot of parallels between this kind of political analysis and other areas. I went to a conference about national security, and I could see parallels between predictions in baseball and predictions based on picking up terrorist chatter. And there are overlaps with the way a company like McDonald’s understands its reputation in different countries.
I’m trying to define this science of electoral sabremetrics [statistical analysis used in baseball]. I’m looking at the art and science of forecasting hurricanes, air traffic control, people who predict the future in different ways. What are the commonalities? It’s all driven by data. The art is not in collecting information, but in being able to sort through it and find meaning within all that noise.