What Do BI Vendors Mean When They Say They Integrate With Hadoop

There's certainly a lot of hype out there about big data. As I previously wrote, some of it is indeed hype, but there are still many legitimate big data cases - I saw a great example during my last business trip. Hadoop certainly plays a key role in the big data revolution, so all business intelligence (BI) vendors are jumping on the bandwagon and saying that they integrate with Hadoop. But what does that really mean? First of all, Hadoop is not a single entity; it's a conglomeration of multiple projects, each addressing a certain niche within the Hadoop ecosystem, such as data access, data integration, DBMS, system management, reporting, analytics, data exploration, and much much more. To lift the veil of hype, I recommend that you ask your BI vendors the following questions

  1. Which specific Hadoop projects do you integrate with (HDFS, Hive, HBase, Pig, Sqoop, and many others)?
  2. Do you work with the community edition software or with commercial distributions from MapR, EMC/Greenplum, Hortonworks, or Cloudera? Have these vendors certified your Hadoop implementations?
  3. Do you have tools, utilities to help the client data into Hadoop in the first place (see comment from Birst)?
  4. Are you querying Hadoop data directly from your BI tools (reports, dashboards) or are you ingesting Hadoop data into your own DBMS? If the latter:
    1. Are you selecting Hadoop result sets using Hive?
    2. Are you ingesting Hadoop data using Sqoop?
    3. Is your ETL generating and pushing down Map Reduce jobs to Hadoop? Are you generating Pig scripts?
Read more

BI In Russia And Israel

I recently had both the privilege and pleasure to do a deep dive into the cold and warm BI waters in Russia and Israel. Cold - because some of my experiences were sobering. Warm - because the reception could not have been more pleasant. My presentations were well attended (sponsored by www.in4media.ru in Russia and www.matrix.co.il in Israel), showing high levels of BI interest, adoption, experience, and expertise.  Challenges remain the same, as Russian and Israeli businesses struggle with BI governance, ownership, SDLC and PMO methodologies, data, and app integration just like the rest of the world. I spent long evening hours with a large global company in Israel that grew rapidly by M&A and is struggling with multiple strategic challenges: centralize or localize BI, vendor selection, end user empowerment, etc. Sound familiar?

But it was not all business as usual. A few interesting regional peculiarities did come out. For example, the "BI as a key competitive differentiator" message fell on mostly deaf ears in Russia, as Russian companies don't really compete against each other. Territories, brands, markets, and spheres of influence are handed top down from the government or negotiated in high-level deals behind closed doors. That is not to say, however, that BI in Russia is only used for reporting - multiple businesses are pushing BI to the limits such as advanced customer segmentation for better upsell/cross-sell rates. 

I was also pleasantly surprised and impressed a few times (and for those of you who know me well, you know that it's pretty hard to impress the old veteran):

Read more

What Does R Integration Really Mean For BI Platforms?

I just received yet another call from a reporter asking me to comment on yet another BI vendor announcing R integration. All leading BI vendors are embedding/integrating with R these days, so I was not sure what was really new in the announcement. I guess the real question is the level of integration. For example:

  • Since R is a scripting language, does a BI vendor provide point-and-click GUI to generate R code?
  • Can R routines leverage and take advantage of all of the BI metadata (data structures, definitions, etc.) without having to redefine it again just for R?
  • How easily can the output from R calculations (scores, rankings) be embedded in the BI reports and dashboards? Do the new scores just become automagically available for BI reports, or does somebody need to add them to BI data stores and metadata?
  • Can the BI vendor import/export R models based on PMML?
  • Is it a general R integration, or are there prebuilt vertical (industry specific) or domain (finance, HR, supply chain, risk, etc) metrics as part of a solution?
  • What server are R models executed in? Reporting server? Database server? Their own server?
  • Then there's the whole business of model design, management, and execution, which is usually the realm of advanced analytics platforms. How much of these capabilities does the BI vendor provide?

Did I get that right? Any other features/capabilities that really distinguish one BI/R integration from another? Really interested in hearing your comments.

Key Questions To Ask Yourself Before Embarking On A Big Data Journey

Do you think you are ready to tackle Big Data because you are pushing the limits of your data Volume, Velocity, Variety and Variability? Take a deep breath (and maybe a cold shower) before you plunge full speed ahead into unchartered territories and murky waters of Big Data. Now that you are calm, cool and collected, ask yourself the following key questions:

  • What’s the business use case? What are some of the business pain points, challenges and opportunities you are trying to address with Big Data? Are your business users coming to you with such requests or are you in the doomed-for-failure realm of technology looking for a solution?
  • Are you sure it’s not just BI 101Once you identify specific business requirements, ask whether Big Data is really the answer you are looking for. In the majority of my Big Data client inquiries, after a few probing questions I typically find out that it's really BI 101: data governance, data integration, data modeling and architecture, org structures, responsibilities, budgets, priorities, etc. Not Big Data.
  • Why can’t your current environment handle it? Next comes another sanity check. If you are still thinking you are dealing with Big Data challenges, are you sure you need to do something different, technology-wise? Are you really sure your existing ETL/DW/BI/Advanced Analytics environment can't address the pain points in question? Would just adding another node, another server, more memory (if these are all within your acceptable budget ranges) do the trick?
Read more

COTS Vs. Home-Grown BI Apps

Wanted to run the following two questions and my answers by the community:

Q. What is the average age of reporting applications at large enterprises?

A. Reporting apps typically involve source data integration, data models, metrics, reports, dashboards, and queries. I'd rate the longevity of these in descending order (data sources being most stable and queries changing all the time).

Q. What is the percentage of reporting applications that are homegrown versus custom built?

A. These are by no means solid data points but rather my off-the-cuff – albeit educated - guesses:

  • The majority (let's say >50%) of reports are still being built in Excel and Access.
  • Very few (let's say <10%) are done in non-BI-specific environments (programming languages).
  • The other 40% I'd split 50/50 between:
    • off-the-shelf reports and dashboards built into ERP or BI apps,
    • and custom-coded in BI tools

Needless to say, this differs greatly by industry and business domain. Thoughts?

Advanced Data Visualization - A Critical BI Component

As one of the industry-renowned data visualization experts Edward Tufte once said, “The world is complex, dynamic, multidimensional; the paper is static, flat. How are we to represent the rich visual world of experience and measurement on mere flatland?” Indeed, there’s just too much information out there for all categories of knowledge workers to visualize it effectively. More often than not, traditional reports using tabs, rows, and columns do not paint the whole picture or, even worse, lead an analyst to a wrong conclusion. Firms need to use data visualization because information workers:

  • Cannot see a pattern without data visualization. Simply seeing numbers on a grid often does not convey the whole story — and in the worst case, it can even lead to a wrong conclusion. This is best demonstrated by Anscombe’s quartet where four seemingly similar groups of x/y coordinates reveal very different patterns when represented in a graph.
  • Cannot fit all of the necessary data points onto a single screen. Even with the smallest reasonably readable font, single-line spacing, and no grid, one cannot realistically fit more than a few thousand data points on a single page or screen using numerical information only. When using advanced data visualization techniques, one can fit tens of thousands (an order-of-magnitude difference) of data points onto a single screen. In his book The Visual Display of Quantitative Information, Edward Tufte gives an example of more than 21,000 data points effectively displayed on a US map that fits onto a single screen.
Read more

Use Cases For Specific BI Tools

I get the following question very often. What are the best practices for creating an enterprise reporting policy as to when to use what reporting tool/application? Alas, as with everything else in business intelligence, the answer is not that easy. The old days of developers versus power users versus casual users are gone. The world is way more complex these days. In order to create such a policy, you need to consider the following dimensions:

  •  Report/analysis type
    • Historical (what happened)
    • Operational (what is happening now)
    • Analytical (why did it happen)
    • Predictive (what might happen)
    • Prescriptive (what should I do about it)
    • Exploratory (what's out there that I don't know about)
  • Interaction types
    • Looking at static report output only
    • Lightly interacting with canned reports (sorting, filtering)
    • Fully interacting with canned reports (pivoting, drilling)
    • Assembling existing report, visualizations, and metrics into customized dashboards
    • Full report authoring capabilities
  • User types
    • Internal
    • External (customers, partners)
  • Data latency
    • Real time
    • Near-real time
    • Batch
  • Report latency, as in need the report:
    • Now
    • Tomorrow
    • In a few days
    • In a few weeks
  • Decision types
    • Strategic (a few complex decisions/reports per month)
    • Tactical (many less-complex decisions/reports per month)
    • Operational (many complex/simple decisions/reports per day)
Read more

Self-Service BI

Traditional BI approaches and technologies — even when using the latest technology, best practices, and architectures — almost always have a serious side effect: a constant backlog of BI requests. Enterprises where IT addresses more than 20% of BI requirements will continue to see the snowball effect of an ever-growing BI requests backlog. Why? Because:

  • BI requirements change faster than an IT-centric support model can keep up. Even with by-the-book BI applications, firms still struggle to turn BI applications on a dime to meet frequently changing business requirements. Enterprises can expect a life span of at least several years out of enterprise resource planning (ERP), customer relationship management (CRM), human resources (HR), and financial applications, but a BI application can become outdated the day it is rolled out. Even within implementation times of just a few weeks, the world may have changed completely due to a sudden mergers and acquisitions (M&A) event, a new competitive threat, new management structure, or new regulatory reporting requirements.
Read more

Everything You Wanted To Know But Were Afraid To Ask About BI

How does an enterprise — especially a large, global one with multiple product lines and multiple enterprise resource planning (ERP) applications — make sense of operations, logistics, and finances? There’s just too much information for any one person to process. It’s business intelligence (BI) to the rescue! But what is BI, and how does BI differ from reporting and management information systems (MIS)? What is the business impact, and what are the costs versus the benefits? What is the appropriate strategy for implementing BI and achieving continued BI success? Our new report will give business and IT executives an understanding of the four critical phases of strategizing around BI to achieve business goals — or “everything you wanted to know but were afraid to ask” about BI. Here’s a sneak preview of the kinds of topics the report covers and the kinds of BI questions one needs to ask in order to build an effective and efficient enterprise BI environment:

  1. Prepare For Your BI Program
    1. The future of BI is all about agility. IT no longer has exclusive control of BI platforms, tools, and applications; business users demand more empowerment (or make empowered changes without IT involvement), and previously unshakable pillars of the BI foundation such as relational databases are quickly being supplemented with alternative BI platforms. It’s no longer business as usual. Ask yourself:
      1. What are the main business and IT trends driving BI?
      2. What are the latest BI technologies that I need to know about?
      3. What’s out there beyond traditional BI?
Read more

Data Discovery And Exploration - IBM Acquires Vivisimo

Today IBM announced its plans to acquire Vivisimo - an enterprise search vendor with big data capabilities. Our research shows that only 1% to 5% of all enterprise data is in a structured, modeled format that fits neatly into enterprise data warehouses (EDWs) and data marts. The rest of enterprise data (and we are not even talking about external data such as social media data, for example) may not be organized into structures that easily fit into relational or multidimensional databases. There’s also a chicken-and-the-egg syndrome going on here. Before you can put your data into a structure, such as a database, you need to understand what’s out there and what structures do or may exist. But in order for you to explore the data in the first place, traditional data integration technologies require some structures to even start the exploration (tables, columns, etc). So how do you explore something without a structure, without a model, and without preconceived notions? That’s where big data exploration and discovery technologies such as Hadoop and Vivisimo come into play. (There are many others vendors in this space as well, including Oracle Endeca, Attivio, and Saffron Technology. While these vendors may not directly compete with Vivisimo and all use different approaches and architectures, the final objective - data discovery - is often the same.) Data exploration and discovery was one of our top 2012 business intelligence predictions. However, it’s only a first step in the full cycle of business intelligence and

Read more