There's certainly a lot of hype out there about big data. As I previously wrote, some of it is indeed hype, but there are still many legitimate big data cases - I saw a great example during my last business trip. Hadoop certainly plays a key role in the big data revolution, so all business intelligence (BI) vendors are jumping on the bandwagon and saying that they integrate with Hadoop. But what does that really mean? First of all, Hadoop is not a single entity; it's a conglomeration of multiple projects, each addressing a certain niche within the Hadoop ecosystem, such as data access, data integration, DBMS, system management, reporting, analytics, data exploration, and much much more. To lift the veil of hype, I recommend that you ask your BI vendors the following questions
Which specific Hadoop projects do you integrate with (HDFS, Hive, HBase, Pig, Sqoop, and many others)?
Do you work with the community edition software or with commercial distributions from MapR, EMC/Greenplum, Hortonworks, or Cloudera? Have these vendors certified your Hadoop implementations?
Do you have tools, utilities to help the client data into Hadoop in the first place (see comment from Birst)?
Are you querying Hadoop data directly from your BI tools (reports, dashboards) or are you ingesting Hadoop data into your own DBMS? If the latter:
Are you selecting Hadoop result sets using Hive?
Are you ingesting Hadoop data using Sqoop?
Is your ETL generating and pushing down Map Reduce jobs to Hadoop? Are you generating Pig scripts?
I recently had both the privilege and pleasure to do a deep dive into the cold and warm BI waters in Russia and Israel. Cold - because some of my experiences were sobering. Warm - because the reception could not have been more pleasant. My presentations were well attended (sponsored by www.in4media.ru in Russia and www.matrix.co.il in Israel), showing high levels of BI interest, adoption, experience, and expertise. Challenges remain the same, as Russian and Israeli businesses struggle with BI governance, ownership, SDLC and PMO methodologies, data, and app integration just like the rest of the world. I spent long evening hours with a large global company in Israel that grew rapidly by M&A and is struggling with multiple strategic challenges: centralize or localize BI, vendor selection, end user empowerment, etc. Sound familiar?
But it was not all business as usual. A few interesting regional peculiarities did come out. For example, the "BI as a key competitive differentiator" message fell on mostly deaf ears in Russia, as Russian companies don't really compete against each other. Territories, brands, markets, and spheres of influence are handed top down from the government or negotiated in high-level deals behind closed doors. That is not to say, however, that BI in Russia is only used for reporting - multiple businesses are pushing BI to the limits such as advanced customer segmentation for better upsell/cross-sell rates.
I was also pleasantly surprised and impressed a few times (and for those of you who know me well, you know that it's pretty hard to impress the old veteran):
In a recent media interview I was asked about whether the requirements for data visualization had changed. The questions were focused around whether users are still satisfied with dashboards, graphs and charts or do they have new needs, demands and expectations.
Arguably, Ancient Egyptian hieroglyphics were probably the first real "commercial" examples of data visualization (though many people before the Egyptians also used the same approach — but more often as a general communications tool). Since then, visualization of data has certainly always been both a popular and important topic. For example, Florence Nightingale changed the course of healthcare with a single compelling polar area chart on the causes of death during the Crimean War.
In looking at this question of how and why data visualization might be changing, I identified at least 5 major triggers. Namely:
Increasing volumes of data. It's no surprise that we now have to process much larger volumes of data. But this also impacts the ways we need to represent it. The volume of data stimulates new forms of visualization tools. While not all of these tools are new (strictly speaking), they have at least begun to find a much broader audience as we find the need to communicate much more information much more rapidly. Time walling and infographics are just two approaches that are not necessarily all that new but they have attracted much greater usage as a direct result of the increasing volume of data.