Do you ever feel like you’re facing a moving target? Whether it’s the latest customer requirements, or how to improve operations, or to retain your best employees, or to price your products, the context in which you are doing business is increasingly dynamic. And, so are the tools you need to better understand that context? Everyone is talking about the promise of big data and advanced analytics, but we all know that companies struggle to reach the Holy Grail.
Data and analytics tools and the skills required to use them are changing faster than ever. Technologies that were university research projects just last year are now part of a wide range of products and services. How can firms keep up with the accelerated pace of innovation? Alas, many cannot. According to Forrester's Q3 2015 Global State Of Strategic Planning, Enterprise Architecture, And PMO Online Survey, 73% of companies understand the business value of data and aspire to be data-driven but just 29% confirm that they are actually turning data into action. Many firms report having mature data management, governance, and analytics practices, but yesterday's skills are not necessarily what they will need tomorrow — or even today.
The same goes for data sources. We all know that using external data sources enhances the insights from our business intelligence. But which data and where to get it?
With the incredible popularity of big data and Hadoop every Business Intelligence (BI) vendor wants to also be known as a "BI on Hadoop" vendor. But what they really can do is limited to a) querying HDFS data organized in HIVE tables using HiveQL or b) ingest any flat file into memory and analyze the data there. Basically, to most of the BI vendors Hadoop is just another data source. Let's now see what qualifies a BI vendor as a "Native Hadoop BI Platform". If we assume that all BI platforms have to have data extraction/integration, persistence, analytics and visualization layers, then "Native Hadoop/Spark BI Platforms" should be able to (ok, yes, I just had to add Spark)
Use Hadoop/Spark as the primary processing platform for MOST of the aforementioned functionality. The only exception is visualization layer which is not what Hadoop/Spark do.
Use distributed processing frameworks natively, such as
Generation of MapReduce and/or Spark jobs
Management of distributed processing framework jobs by YARN, etc
Note, generating Hive or SparkSQL queries does not qualify
Do declarative work in the product’s main user interface interpreted and executed on Hadoop/Spark directly. Not via a "pass through" mode.
Natively support Apache Sentry and Apache Ranger security
I am kicking off a research stream which will result in the "Text Analytics Roles & Responsibilities" doc. Before I finalize an RFI to our clients to see who/how/when/where they employ for these projects and applications, I'd like to explore what the actual roles and responsibilities are. So far we've come up with the following roles and their respective responsibilities
Business owner. The ultimate recipient of text analytics process results. So far I have
Customer intelligence analyst
Customer service/call center analyst
Competitive intelligence analyst
Product R&D analyst
Linguist/Data Scientist. Builds language and statistical rules for text mining (or modifies these from an off-the-shelf-product). Works with business owners to
Create "golden copies" of documents/content which will be used as base for text analytics
Works with data stewards and business ownes to define corporate taxonomies and lexicon
Data Steward. Owns corporate lexicon and taxonomies
Architect. Owns big data strategy and architecture (include data hubs, data warehouses, BI, etc) where unstructured data is one of the components
Developer/integrator. Develops custom built text analytics apps or embeds text analytics functionality into other applications (ERP, CRM, BI, etc)
You've done all the right things by following your enterprise vendor selection methodology. You created an RFI and sent it out to all of the vendors on your "approved" list. You then filtered out the responses based on your requirements, and sent out a detailed RFP. You created a detailed scoring methodology, reviewed the proposals, listened to the in-person presentations, and filtered out everyone but the top respondents. But you still ended up with more than one. What do you do?
If you shortlisted two or more market leaders (see Forrester's latest evaluation) I would not agonize over who has better methodologies, reference architectures, training, project execution and risk management, etc. They all have top of the line capabilities in all of the above. Rather, I'd concentrate on the following specifics
The vendor who proposed more specific named individuals to the project, and you reviewed and liked their resumes, gets an edge over a vendor who only proposed general roles to be staffed at the time of the project kick off.
Not very long ago, it would have been almost inconceivable to consider a new large-scale data analysis project in which the open source Apache Hadoop did not play a pivotal role.
Every Hadoop blog post needs a picture of an elephant. (Source: Paul Miller)
Then, as so often happens, the gushing enthusiasm became more nuanced. Hadoop, some began (wrongly) to mutter, was "just about MapReduce." Hadoop, others (not always correctly) suggested, was "slow."
Then newer tools came along. Hadoop, a growing cacophony (innacurately) trumpeted, was "not as good as Spark."
But, in the real world, Hadoop continues to be great at what it's good at. It's just not good at everything people tried throwing in its direction. We really shouldn't be surprised by this. And yet, it seems, so many of us are.
For CIOs asked to drive new programmes of work in which big data plays a part (and few are not), the competing claims in this space are both unhelpful and confusing. Hadoop and Spark are not, despite some suggestions, directly equivalent. In many cases, asking "Hadoop or Spark" is simply the wrong question.
Predictive analytics has become the key to helping businesses — especially those in the highly dynamic Chinese market — create differentiated, individualized customer experiences and make better decisions. Enterprise architecture professionals must take a customer-oriented approach to developing their predictive analytics strategy and architecture.
I’ve recently published tworeports focusing on how to architect predictive analytics capability. These reports analyze the trends around predictive analytics adoption in China and discuss four key areas that EA pros must focus on to accelerate digital transformation. They also show EA pros how to unleash the power of digital business by analyzing the predictive analytics practices of visionary Chinese firms. Some of the key takeaways:
Predictive analytics must cover the full customer life cycle and leverage business insights. Organizations require predictable insights into customer behaviors and business operations. Youmust implement predictive analytics solutions and deliver value to customers throughout their life cycle to differentiate your customer experience and sustain business growth.You should also realize the importance of business stakeholders and define effective mechanisms for translating their business knowledge into predictive algorithm inputs to optimize predictive models faster and generate deeper customer insights.
Instinctively we know that it is not just about collecting the data. Big and bigger doesn’t necessarily make you smart and smarter. It just makes you one of those pack rats that has piles of stuff in all corners of your house. Yes, it might be very well organized and could have a potential use that makes it work keeping. But will you ever take it out and use it? Will you ever really benefit from what you’ve so painstakingly collected? Likely not.
Don’t be a data pack rat. This is the year to turn your data into actions and positive business outcomes.
In 2016, the energy around data-driven investments will continue to elevate the importance of data and create incremental improvement in business performance for many but some serious digital disruption for others. Here are a few of our data predictions for 2016.
Three of four architects strive to make their firms data driven. But well-meaning technology managers only deal with part of the problem: How to use technology to glean deeper, faster insight from more data -- and more cheaply. But consider that only 29% of architects say their firms are good at connecting analytics results to business outcome. This is a huge gap! And the problem is the ‘data driven’ mentality that never fights it’s way out of technology and to what firms care about - outcomes.
In 2016, customer-obsessed leaders will leapfrog their competition, and we will see a shift as firms seek to grow revenue and transform customer experiences. Insight will become a key competitive weapon, as firms move beyond big data and solve problems with data driven thinking.
Shift #1 - Data and analytics energy will continue drive incremental improvement
In 2016, the energy around data-driven investments will continue to elevate the importance of data and create incremental improvement in business performance. In 2016, Forrester predicts:
Chief data officers will gain power, prestige and presence...for now. But the long term viability of the role is unclear. Certain types of businesses, like digital natives, won’t benefit from appointing a CDO.
Machine learning will reduce the insight killer - time. Machine learning will replace manual data wrangling and data governance dirty work. The freeing up of time will accelerate data strategies.
You can't bring up semantics without someone inserting an apology for the geekiness of the discussion. If you're a data person like me, geek away! But for everyone else, it's a topic best left alone. Well, like every geek, the semantic geeks now have their day — and may just rule the data world.
It begins with a seemingly innocent set of questions:
"Is there a better way to master my data?"
"Is there a better way to understand the data I have?"
"Is there a better way to bring data and content together?"
"Is there a better way to personalize data and insight to be relevant?"
Semantics discussions today are born out of the data chaos that our traditional data management and governance capabilities are struggling under. They're born out of the fact that even with the best big data technology and analytics being adopted, business stakeholder satisfaction with analytics has decreased by 21% from 2014 to 2015, according to Forrester's Global Business Technographics® Data And Analytics Survey, 2015. Innovative data architects and vendors realize that semantics is the key to bringing context and meaning to our information so we can extract those much-needed business insights, at scale, and more importantly, personalized.
I recently attended IBM BusinessConnect 2015 in Germany. I had great discussions regarding industrial Internet of Things (IoT) and Industrie 4.0 solutions as well as digital transformation in the B2B segment. One issue that particularly caught my attention: edge computing in the context of the mobile IoT.
Mobility in the IoT context raises the question when to use a central computing approach versus when to use edge computing. The CIO must decide whether solution intelligence should primarily reside in a central location or at the edge of the network and therefore closer to (or even inside) mobile IoT devices like cars, smart watches, or smart meters. At least three factors should guide this decision:
Data transmission costs. The costs of data transmission can quickly undermine any mobile IoT business case. For instance, aircraft engine sensors collect massive amounts of data during a flight but send only a small fraction of that data in real time via satellite connectivity to a central data monitoring center while the plane is in the air. All other data is sent via Wi-Fi or traditional mobile broadband connectivity like UMTS or LTE once the plane is on the ground.
Mobile bandwidth, latency, and speed. The available bandwidth limits the amount of data that can be transmitted at any given time, limiting the use cases for mobile IoT. For instance, sharing large volumes of data about the turbines of a large container ship and detailed inventory measurements of each container on board is completely impractical unless the ship is close to a coastal area with high mobile broadband connectivity.