One of the reasons for only a portion of enterprise and external (about a third of structured and a quarter of unstructured -) data being available for insights is a restrictive architecture of SQL databases. In SQL databases data and metadata (data models, aka schemas) are tightly bound and inseparable (aka early binding, schema on write). Changing the model often requires at best just rebuilding an index or an aggregate, at worst - reloading entire columns and tables. Therefore many analysts start their work from data sets based on these tightly bound models, where DBAs and data architects have already built business requirements (that may be outdated or incomplete) into the models. Thus the data delivered to the end-users already contains inherent biases, which are opaque to the user and can strongly influence their analysis. As part of the natural evolution of Business Intelligence (BI) platforms data exploration now addresses this challenge. How? BI pros can now take advantage of ALL raw data available in their enterprises by:
Last year I published a reasonably well-received research document on Hadoop infrastructure, “Building the Foundations for Customer Insight: Hadoop Infrastructure Architecture”. Now, less than a year later it’s looking obsolete, not so much because it was wrong for traditional (and yes, it does seem funny to use a word like “traditional” to describe a technology that itself is still rapidly evolving and only in mainstream use for a handful of years) Hadoop, but because the universe of analytics technology and tools has been evolving at light-speed.
If your analytics are anchored by Hadoop and its underlying map reduce processing, then the mainstream architecture described in the document, that of clusters of servers each with their own compute and storage, may still be appropriate. On the other hand, if, like many enterprises, you are adding additional analysis tools such as NoSQL databases, SQL on Hadoop (Impala, Stinger, Vertica) and particularly Spark, an in-memory-based analytics technology that is well suited for real-time and streaming data, it may be necessary to begin reassessing the supporting infrastructure in order to build something that can continue to support Hadoop as well as cater to the differing access patterns of other tools sets. This need to rethink the underlying analytics plumbing was brought home by a recent demonstration by HP of a reference architecture for analytics, publicly referred to as the HP Big Data Reference Architecture.
Ten years ago, open source software (OSS) was more like a toy for independent software vendors (ISVs) in China: Only the geeks in R&D played around with it. However, the software industry has been developing quickly in China throughout the past decade, and technology trends such as service-oriented architecture (SOA), business process management (BPM), cloud computing, the mobile Internet, and big data are driving much broader adoption of OSS.
OSS has become a widely used element of firms’ enterprise architecture. For front-end application architecture on the client side, various open source frameworks, such as jQuery and ExtJS, have been incorporated into many ISVs’ front-end frameworks. On the server side, OSS like Node.js is becoming popular for ISVs in China for high Web throughput capabilities. From an infrastructure and information architecture perspective, open source offerings like Openstack, Cloudstack, and Eucalyptus have been piloted by major telecom carriers including China Telecom and China Unicom, as well as information and communication solution providers like Huawei and IT service providers like CIeNET. To round this out, many startup companies are developing solutions based on MongoDB, an open source NoSQL database.
Familiarity with OSS is becoming a necessary qualification for software developers and product strategy professionals. Because of the wide usage of OSS among both vendors and end users, working experience and extensive knowledge with OSS is becoming a necessary qualification not only for software engineers, but also an important factors for product strategy professionals to establish appropriate product road maps and support their business initiatives.