Social Media Can't Predict Super Tuesday

A few weeks back, I wrote a post denouncing the idea of predicting the Super Bowl using social data. I had some fun pointing out the questionable research practices behind using consumer opinion to "predict" the outcome of a sporting event. One key issue I argued was that in sports, the public opinion has no influence over the event outcome. But what about using consumer opinions to predict a political election? This can work, right?

USA Today has an article running in parallel with Super Tuesday, aptly asking that same question: Can Social media Predict Election Outcomes? If my post's title wasn't enough of a spoiler, if you read that piece, you'll find a few quotes from me speaking out against the concept. Because, although predicting an election using online opinions is a much more plausible concept than predicting a football game, it's not going to work. And here's why:

  • Sample bias. I'll be the first to argue that social media is everywhere. If you're online, you're social. But it's not quite that simple. For years, we've outlined the idea of Social Technographics® - the concept of consumer segments based on social media behavior. Some people observe content; others create it. In fact, only about a quarter of online adults create it. So yes, social media has nearly all audiences participating, but not all audiences actually create the content we'd analyze. But it's actually a bit worse than that. Social media has the opposite skew of voter registration data. People under 18 can't vote, but they sure can tweet. Conversely, recent election data shows that conservative voters skew older - but that segment doesn't create the same volume of social content. We can listen to social media for voter conversation, but it's not the most representative sample.
  • The single-vote problem. Although many blog posts today are counting candidate mentions to account for popularity, a mention does not equal a vote. One person blogging and tweeting regularly about their favorite candidate gets the same number of votes as the rest of us. But each of those posts is counted, while our votes only count once. And yes, popular social media outlets can reach many, many voters, but this is the same as tracking traditional news coverage of candidates. As one source points out, counting the number of mentions will only tell us how popular Ron Paul is online. While his supporters can tweet early and tweet often, they can still only vote once.
  • The social sentiment issue. Although it's easy to count the number of mentions of a candidate, many predictions use sentiment analysis for an added dimension of predictive data. Identifying if a tweet or blog post is positive for a candidate or negative against a candidate is helpful, but it's not the whole story. The problem is that while we go online to say how much we love something or how much we hate something, we rarely go online to say something was OK. But that's very different from voting. Today on Super Tuesday, primary voters have a choice, and although some may not feel passionate about their decision, that vote counts the same as that of the person running a candidate's Twitter feed. A positive mention - or negative mention - is not indicative of the voting masses.

This brings us back to my original argument of social data: It's only as valuable as the other data you combine it with. Integrating polling data with social media numbers could show us some interesting outcomes. But alone, it only tells us what's going on online. So as the Super Tuesday results start rolling in (and some sources start to claim their social predictions as correct), let's remember that correlation is not causation.

So what can you do about this? First, if you're going to use social data, focus on its relationship to other key data. Learn about its relationship to your business metrics. Second, if you're a social data vendor, I challenge you to see what better insights you can find with social data. Prove the value of social media as a data source, not just the fun things we can learn by counting keywords. And most importantly for US readers, when November rolls around: vote.