Last year I published a reasonably well-received research document on Hadoop infrastructure, “Building the Foundations for Customer Insight: Hadoop Infrastructure Architecture”. Now, less than a year later it’s looking obsolete, not so much because it was wrong for traditional (and yes, it does seem funny to use a word like “traditional” to describe a technology that itself is still rapidly evolving and only in mainstream use for a handful of years) Hadoop, but because the universe of analytics technology and tools has been evolving at light-speed.
If your analytics are anchored by Hadoop and its underlying map reduce processing, then the mainstream architecture described in the document, that of clusters of servers each with their own compute and storage, may still be appropriate. On the other hand, if, like many enterprises, you are adding additional analysis tools such as NoSQL databases, SQL on Hadoop (Impala, Stinger, Vertica) and particularly Spark, an in-memory-based analytics technology that is well suited for real-time and streaming data, it may be necessary to begin reassessing the supporting infrastructure in order to build something that can continue to support Hadoop as well as cater to the differing access patterns of other tools sets. This need to rethink the underlying analytics plumbing was brought home by a recent demonstration by HP of a reference architecture for analytics, publicly referred to as the HP Big Data Reference Architecture.
At the Cisco Live Event 2014 in San Francisco last week, we heard about plenty of updates, extensions, and new acquisitions to expand the business. The major technologies highlighted were InterCloud, Application Centric Infrastructure (ACI), and the Internet of Everything (IoE). Among these new offerings, I reveal that Cisco’s extended big data and analytics capabilities excited me the most. Why? Because its data virtualization techniques can help customers easily analyze large volumes of virtual data, no matter where it physically resides; enhanced video analytics technology could improve the customer experience when checking out in retail stores or waiting for a train; while IoE analytics and digital intelligence increase customer engagement.
Data virtualization supports big data analytics. End user organizations realize the importance of quickly and carefully making decisions; to do this, they plan to centralize data from different branch offices or departments. Consolidating data that resides in multiple systems and in global locations — or that is locked away in spreadsheets — is expensive. For example, telecom operators in China have hundreds of millions subscribers and need to consolidate and analyze this customer data — but it resides in 31 provincial companies. Data consolidation will be a huge and expensive project, but data virtualization technology can help solve this problem. Customers could consider adding Cisco to their data virtualization vendor shortlist, especially given Cisco’s acquisition of Composite Software last July.