Big Data Survey

Forrester is in the middle of a major research effort on various Big Data-related topics. As part of this research, we’ll be kicking off a client survey shortly. I’d like to solicit everyone’s input on the survey questions and answer options. Here’s the first draft. What am I missing?

  1. Scope. What is the scope of your Big Data initiative?
    1. Enterprise
    2. LOB
    3. Departmental
    4. Regional
    5. Project-based
  2. Status. What is the status of your Big Data initiative?
    1. In production
    2. Piloting
    3. Testing
    4. Evaluating
  3. Industry. Are the questions you are trying to address with your Big Data initiative general or industry-specific?
    1. General
    2. Industry-specific
    3. Both
  4. Domains. What enterprise areas does your Big Data initiative address?
    1. Sales
    2. Marketing
    3. Customer service
    4. Finance
    5. HR
    6. Product development
    7. Operations
    8. Logistics
    9. Brand management
    10. IT analytics
    11. Risk management
  5. Why BigData? What are the main business requirements or inadequacies of earlier-generation BI/DW/ET technologies, applications, and architecture that are causing you to consider or implement Big Data?
    1. Data volume
      1. <10Tb
      2. 10-100Tb
      3. 100Tb-1Pb
      4. >1Pb
    2. Velocity of change and scope/requirements unpredictability
    3. Data diversity
    4. Analysis-driven requirements (Big Data) vs. requirements-driven analysis (traditional BI/DW)
    5. Cost. Big Data solutions are less expensive than traditional ETL/DW/BI solutions
  6. BigData as input to BI apps. Do you plan to use Big Data exploraton results for
    1. Inputs into BI applications
    2. Specifications for BI applications
  7. Types of data. What types of data/records are you planning to analyze using BigData technologies?
    1. Transacational data from enterprise applications

    2. Clickstream

    3. Unstructured content from email, office documents, etc

    4. Social media (Facebook, Twitter, etc) data

    5. Sensor / Machine/Device Data

    6. Locational/Geospatial Data

    7. Scientific/Genomic data

    8. Image (large Video/Photographic) Data

  8. Ownership. Who owns or drives your BigData initiative?
    1. Mostly business-driven, with minimal IT support
    2. Business/IT collaboration
    3. Mostly IT driven, with minimal business involvement
  9. External assistance. Are you doing this on your own or with help of consultants and other external SMEs?
    1. All internal
    2. Mostly internal, with some help from third parties
    3. Mostly third parties under our direction and supervision
    4. All outsourced
  10. Integration with BI, DW, etc. How is your Big Data initiative integrated with, embedded in, or part of your other BI/DW/ETL/data governance/MDM initiatives, if at all?
    1. Big Data and BI/DW/ETL are just different areas of a broad information management activity
    2. Big Data and BI/DW/ETL are separate initiatives with close coordination
    3. Big Data and BI/DW/ETL are separate initiatives with some coordination
    4. Big Data and BI/DW/ETL are separate initiatives
  11. Integration with advanced analytics. How is your Big Data initiative integrated with, embedded in, or part of your other advanced analytics (statistical analysis, data mining, predictive modeling, etc.) initiatives, if at all?
    1. Big Data and advanced analytics are just different areas of a broad information management activity
    2. Big Data and advanced analytics are separate initiatives with close coordination
    3. Big Data and advanced analytics are separate initiatives with some coordination
    4. Big Data and advanced analytics are separate initiatives
  12. Project management. Do you run your Big Data initiative using the same or different PMO standards than BI/DW/ETL?
    1. Same
    2. Different
  13. Software development. Do you run your Big Data initiative using the same or different SDLC standards than BI/DW/ETL?
    1. Same
    2. Different
  14. Integration with enterprise apps. Do your Big Data applications stand on their own or are they tightly integrated or embedded with any of the following?
    1. Enterprise applications (ERP, CRM)
    2. Business processes (BPM)
    3. Business rules (BRE)
  15. Concerns. Are the following concerns higher, lower, or the same when dealing with Big Data initiatives as compared with earlier-generation BI/DW/ETL applications?
    1. Security
    2. Privacy
    3. Operational risk (liability, reputation, etc.)
  16. Retention. Do you intend to retain you raw Big Data post the exploration /analysis stage?
    1. No
    2. Yes, for compliance
    3. Yes, for re-processing, more analysis
  17. Big Data technology. What technology do you use for BigData applications?
    1. Data integration tools based on BigData technology
    2. DW tools based on BigData technology
    3. BI tools based on BigData technology
    4. Advanced Analytics tools based on BigData technology
  18. App delivery model. Do you run your Big Data applications on premises or in the cloud?
    1. On-premises
    2. Hosted/private cloud
    3. Public cloud
  19. Commercial vs. Open Source. Do you use mostly
    1. Open source Big Data technology (Hadoop, MapReduce, Cassandra, and the other Apache open source specs)
    2. Commercial source Big Data tools
  20. Business case. Do you have a business case for the Big Data initaitive in place?
    1. Yes, with a proven ROI
    2. Yes, with a projected ROI
    3. Yes, with intangible benefits only
    4. No business case
  21. Metrics. How do you plan to measure the success of the Big Data inititative?
    1. With quantitative metrics tied to business performance
    2. With qualitative metrics tied to business performance
    3. With quantitative metrics tied to IT performance
    4. With qualitative metrics tied to IT performance
    5. No specific measurement methodology in place

Comments

Comment on Big Data survey

Boris,

Just an idea: to consider "structured" or "unstructured" data. It could be relevant since it makes the difference. It would be completely different project, scope and technologies.

Hope it is helpful

Josep Arroyo
Quiterian

Josep, absolutely. That's

Josep, absolutely. That's question 4, option for "diverse data"

Big Data Defined

As we continue to hear more and more about "Big Data," it has become increasingly clear to me that people project their own meaning onto the term. Is big data more than an approach to managing large volumes of data? Are there inherent parameters as to what is meant by a big data initiative? Curious to know how your audience would define big data.

it's not just about volume

We are exploring at least 4 dimensions to define Big Data
- Data volume
- Data diversity, disparity
- Data velocity - speed of data and requirements change
- Don't know how to call this one yet. But it's something about a paradigm shift where exploration drive requirements (vs the other way around in traditional BI/DW)

suggestion for survey

don't know if this will help, but these are seriously big issues that go into design...

I) When confronted with several sources of data for the same subject areas/tables/ fields, what is your strategy:

a) Try to find the one authoritative source and use it and only it
b) Use the best source that you have and populate main tables with it, but keep other sources for reference/alternative uses ... or just in case
c) Take all the data you can, but document their sources, and make no judgements about which is the better source.
d) Any of the above, depending upon the nature of the subject matter.

II As dimensions slowly change over time (organizational structures or some other heirarchies or dimensions change meaning) what is your strategy for handling it?

a) never change the historical data - let users make decisions as needed.
b) Go back and change historical data to reflect current reality
c) other _________________________

III (Related to your Question 12) Data Security

How do you intend to handle very sensitive data like customer name, address, social security numbers, and birthdays? (check all that apply) ...
(This question would need a lot more work to make it right, but you see what I am driving at...

a) Not applicable to us, we don't have sensitive data
b) We trust our employees and this data is necessary for all people using this data
c) We encrypt sensitive data to all but a handful of trusted, entitled users. (but data elements retain their integrity for matching purposes
d) We mask sensitive data to all but a handful of trusted, entitled users. (data elements do not necessarily retain their integrity for matching purposes)

and/or check all that apply

i) Only certain people are entitled to see certain Tables
ii) Only certain people are entitled to see certain Columns
iii) Only certain people are entitled to see certain Rows

....
I would need a little more time on this one... but I am sure you see it's importance.

good luck with the work - stu

Stuart, thanks for the

Stuart, thanks for the contribution. I am trying only to research questions that are unique to BigData. You seem to be proposing questions that are applicable to any traditional BI / DW project / initiative, and that's not the scope of this survey. Does this clarification make sense?

Would you consider deployment models e.g. Cloud to be in scope?

Boris, would you consider details of the deployment model i.e. on premises vs. cloud to be in scope for thsi survey or is it way too in to the weeds? Also, any thoughts on questions around vendor perceptions?

Leon Katsnelson
http://BigDataOnCloud.com

Leon, yes your first point is

Leon, yes your first point is already in Q16. Re: the second one - no, this survey and the resulting research document will be all about business cases and best practices around BigData, not specific technologies. Thanks for contributing.

Additional Big Data questions

Boris, these are great questions. How about?

Q. Are you considering Hadoop as part of your Big data projects?
a. Yes, in production and already part of the extended environment
b. Yes, in pilot and testing
c. Yes, just exploring
d. No, not interested at this time

Q. How are you using, or planning to use social media like twitter, Facebook and linkedin for your Big Data projects (multiple choices)?
a. Enrich customer records with insight on relationships, behaviors and influences
b. Proactively respond to customer and partner sentiments, complaints and competitive activities
c. Use social media as sources for product feedback and process improvement
d. Monitor employee activities for behaviors that may be out-of-compliance with HR policy

Q. What so-called Big Data sources are you planning to incorporate for your projects with in the next 12 months (multiple choices)?
a. Clickstream
b. Sensor / Machine/Device Data
c. Locational/Geospatial Data
d. Scientific/Genomic data
e. Image (large Video/Photographic) Data
f. Call Detail records

Good luck with the research.

Julianna, excellent

Julianna, excellent suggestions! I'll incorporate some of them. Thank you!

Retention and discovery

Hello

Great initiative. In addition to the other very good comments, I would add (1) documents as a type of bigdata (email, office documents etc) (2) a section on data retention and discovery

Data retention and discovery

a. Do you intend to retain you raw big data post the analysis stage?
b. What is your goal for retention (compliance, re-processing)?
c. Is eDiscovery of big data a concern?
d. Do you believe that record management infrastructure need to evolve/account for new constraints of bigdata

Regards,

Jean-Luc Chatelain

Looks great

Boris:

You should also ask about the scale to which they plan to take their "Big Data" initiatives: 10-100 TB, 100TB-1PB, 1PB+, and how soon. Also, per the other comment, ask whether and how they plan to incorporate Hadoop, MapReduce, Cassandra, and the other Apache open source specs, codebases, and initiatives in their plans. Other than that, looks really good.

Jim

Perfect, Jim, thanks.

Perfect, Jim, thanks.

Correction

Boris:

Me bad. My suggestion #1 was already covered in your question #1.

My only excuse is general imbecility on my part. Please forgive.

Jim

Big Data Survey

The issue of advanced analytics and big data is the most prevalent issue that I see from my clients. Specifically:
1. Whether to forecast big data trends or patterns using time-series vs. regression analysis
2. Whether to use a statistically valid random sample (SVRS) or the entire data set for prediction
3. Whether to do advanced analytics in an operational data warehouse or use a separate/dedicated analytics environment.

BTW. overall your survey is excellent and needed.

Big Data Strategy

We agree wholeheartedly in exploring for discovery as part of a big data strategy. However, we think the exploration can be optimized by using the company’s strategy to ask questions. Executives should ask, “Based on our strategy and goals, what 10 questions can we ask that might be contained and illuminated by our big data-sphere?” Further, as cloud offerings become ever more important to the enterprise, executives should ask vendors about their big data strategy. Rather than reports, alerts and dashboards, what types of higher order algorithms do you use to support decision making? What near purpose applications do you consider sources of higher order inputs? What near purpose applications have you already integrated with to perform higher order output?

Sara, thanks for insightfull

Sara, thanks for insightfull comments, although I think I need a bit more explanation of what you mean exactly. Which is ok, because I know we have an inquiry call that is being scheduled.