Intro To Predictive Analytics Reading List

Predictive Analytics Is Red Hot

Why? What organization couldn’t benefit from making better decisions? Just ask the Obama campaign, which used sophisticated uplift modeling to target and influence swing voters. Or telecom firms that use predictive analytics to help prevent customer churn. Or police departments that use it to reduce crime. The list goes on and on and on. Virtually every organization could benefit from predictive analytics. Don’t confuse traditional business intelligence (BI) with predictive analytics. BI is about reports, dashboards, and advanced visualizations (which are still essential to every organization). Predictive is different. Predictive analytics uses machine learning algorithms on large and small data sets alike to predict outcomes. But predictive is not about absolutes; it doesn’t gaurentee an outcome. Rather, it’s about probabilities. For example, there is a 76% chance that this person will click on this display ad. Or there is a 63% chance that this customer will buy at a certain price. Or there is an 89% chance that this part will fail. Good stuff, but it’s hard to understand and harder to do. It’s worth it, though: Organizations that employ predictive analytics can dramatically reduce risk, disrupt competitors, and save tons of dough. Many are doing it now. More want to.

Few understand the what, why, and how of predictive analytics. Here’s a short, ordered reading list designed to get you up to speed super fast:

  • The Signal And The Noise: Why So Many Predictions Fail — but Some Don’t by Nate Silver. Nate Silver built an innovative system for predicting baseball performance and predicted the 2008 and 2012 elections within a hair’s breadth — all by the time he was thirty. The New York Times now publishes FiveThirtyEight.com, where Silver is one of the nation’s most influential political forecasters. Why read: This book will simultaneously inspire your “predictive” imagination and ground you in the realities of predictive analytics.
  • Predictive Analytics: The Power To Predict Who Will Click, Buy, Lie, or Die by Eric Siegel. Former Columbia University professor and Predictive Analytics World founder Eric Siegel has written a very accessible book on how predictive analytics works. It’s chock-full of dozens of real-world examples, such as how Chase Bank predicted mortgage risk (before the recession), IBM Watson won Jeopardy!, and Hewlett-Packard predicted employee flight risk. Why read: Predictive Analytics is a perfectly paced explanation of how predictive analytics works and a repository of dozens of real examples across many use cases.
  • Uncontrolled: The Surprising Payoff Of of Trial-and-Error For Business, Politics, and Society by Jim Manzi. Predictive analytics is not magic — it’s science. Jim Manzi reminds us of the power of the scientific method and reveals the shocking truth that many huge societal and business decisions are made based on  misinterpreted data and statistics. Why read: Controlled experimentation amplifies the results of predictive analytics and helps avert the risk of inaccurate predictive models.
  • Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten, Eibe Frank, and Mark Hall. If you write code or have a computer science background, you’ll probably want to know the gory details of how predictive analytics works. Why read: Data Mining provides a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. 
  • The Forrester Wave: Big Data Predictive Analytics Solutions, Q1 2013 (Forrester client access only). Forrester assesed the state of the big data predictive analytics market. To see how the vendors stack up against each other, we evaluated the strengths and weaknesses of top big data predictive analytics solutions vendors, including Angoss Software, IBM, KXEN, Oracle, Revolution Analytics, Salford Systems, SAP, SAS, StatSoft, and Tibco Software. Forrester expects the market for big data predictive analytics solutions to be vibrant, highly competitive, and flush with new entrants over the next three years. Why read: These vendor solutions can jump-start your predictive analytics program.

Happy reading, and please recommend some other good predictive analytics reads.

Comments

Great list!

Mike -

Great book recommendations!

Here are few others that might be of interest to your followers:

Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions by Giovanni Seni, John Elder and Robert Grossman ( http://www.amazon.com/Ensemble-Methods-Data-Mining-Predictions/dp/160845... ). This is a very approachable book about this cutting edge technique that produces much higher lift in models. While there's code in it, anyone can read and understand.

Big Data Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses by Mike Minelli, Michele Chambers, Ambiga Dhiraj (http://www.bigdatabiganalytics.com). This book helps business and IT managers/executives understand the value of big data through practical case studies along with consumable descriptions of the enabling technology. Full disclosure, I'm a co-author but many folks have told me how much they enjoy the book - both non-technical and highly technical folks so I'd be remiss if I didn't include it additional recommendations. Plus my royalties on the book go to a not-for-profit that helps eradicate slavery. So you could spend your money less wisely.

Taming the Big Data Tidal Wave by Bill Franks (http://www.amazon.com/Taming-Data-Tidal-Wave-Opportunities/dp/1118208781). This book is full of case studies and provides a great overview for business managers.

Looking forward to your new post!

Michele Chambers
Chief Strategy Officer
Revolution Analytics
@mcAnalytics

Big Data Illusions

Mike, good post with interesting reading. Let's make clear that we talk mostly about analyzing human behavior.

The problem is the stupid title 'Predictive Analytics'. There is nothing that can be predicted. We can discover the statistically relevant distribution of things that will also be prevalent in future data patterns. It does and cannot predict what an individual does. Therefore PA creates as many rights and wrongs for individual behavior as GUESSING!

But the big problem is that this is not understood. Date are taken as 'fact'. Mathematical rigor does not turn fiction into fact. It turns numbers into other numbers with no relevance to reality. It obfuscates the causal information that may be in the data in the small.

Over a hundred years ago Gabriel Tarde http://en.wikipedia.org/wiki/Gabriel_Tarde wrote 'Always the same mistake: to believe that to see the logical pattern of social facts, you must extract yourself from the details, go upward until you embrace vast landscapes panoramically.'

Data collection creates a collection bias in areas where data are collectible and where not. That information is however not part of the data set. The context of the data is more important than the data in itself. Predictive statistical processing destroys that context. What is important is 'SMALL DATA' that we can analyze to understand the why.

Tarde was certain that the small psychological interactions between humans are the key element, meaning rather the WHY and not the HOW MANY. The decision-making around imitation and innovation were in his mind the key elements.

The only mathematical model that we can use to analyze that is PATTERN MATCHING, which is not possible and also not necessary in large data sets that are collected arbitrarily. You can't mine data unless you know what you want to mine for before you collect and then you need to understand what is collected and how. People do not drill for oil just anyhwere but they look for certain geological patterns. And that does not even involve people.

The totally irrational thing is that people trust 'BIG DATA' more than 'SMALL DATA' because of our own decision bias (Kahneman&Tversky) that information that is more often prevalent is also more relevant - heavily used in repetitive advertizing.

But there you go, we think we can understand big data without understanding humans ...

Regards, Max

Big Data Illusions

I also have a blog post on the subject to offer:

Naive Intervention – Part 3: Illusions of Predictability in Investment Theory and BPM

http://isismjpucher.wordpress.com/2013/01/25/naive-intervention-part-3-i...

One more book, and what illusions?

Nice list. I'll suggest another book aimed at the computer science / data science types: The Elements of Statistical Learning, by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. See more at: http://www-stat.stanford.edu/~tibs/ElemStatLearn/.

This book is comparable to Data Mining: Practical Machine Learning Tools and Techniques, though it goes into more depth overall and it's not explicitly tied to a particular tool like WEKA. R packages are available for many of the algorithms, though.

As for Mr. Pulcher, I have to respectfully disagree. At my company (Causata) I see successful predictions in statistically significant quantities every week. These come from useful models that produce measurable ROI and testable hypotheses for further exploration and learning.

Mr Pulcher: if you're waiting for a model to make perfect predictions around human social interactions, then, by all means, pull up a chair! You're going to be waiting a while...

The rest of us will focus on models that are, to paraphrase George E. P. Box, wrong but useful: http://en.wikiquote.org/wiki/George_E._P._Box.

Justin Hemann
www.causata.com

Statistical Learning and Predictive Analytics

Justin, thanks for the comment. I am certainly not waiting for anything in the area of prediction as I have clearly stated.

I have spent a lot of time with statistical learning and yes they are useful in fairly closed systems and obviously in lab environments. They let us understand probabilities and some results seem to be statistically significant.

The main problem is after all the accuracy of the causalities in the model, the relevance of the data collected, the ability to shape execution in regards to the model and the ability to seperate the self-fulfilling prophecy from the real results. In most interesting areas there is no option to do double blind tests to see if the model can stand on its own.

We are using a statistical learning engine ourselves but I would not be so arrogant and call that predictions. It simply observes and identifies recurring patterns. That alone is interesting enough.

All I am asking for is some humble acceptance that most of nature's inner workings are hidden to us. As Nassim Taleb wrote:
Mathematical rigor does not turn fiction into fact.

Predicting is always difficult - especially about the future !

Mike,
Good List. I occssionally refer to your writings in my blog.

Yep, All The President's Data Scientists did make a difference.

"Analytics at Work: Smarter Decisions, Better Results" By Thomas Davenport is another relevant book.

I look at it as a continuum - Connected, Descriptive, Reactive, Predictive & Adaptive. Current Analytics systems are at best reactive, mostly descriptive ie they tell us how things are.

And yes, models are always a representation of the "real world" but if they help organizations to be adaptive, they have served their purpose.

Cheers

Re - Reading List

Thank you all for the recommendations, very useful to one such as me who is trying to broaden his understanding of the topic.

-dave