Jean-Pierre Garbani serves Infrastructure & Operations Professionals. See the full Analyst bio.
Visit Forrester.com to learn how we make Infrastructure & Operations Professionals successful every day.
How Complexity Spilled The Oil
Posted by Jean-Pierre Garbani on January 11, 2011
The Gulf oil spill of April 2010 was an unprecedented disaster. The National Oil Spill Commission’s report summary shows that this could have been prevented with the use of better technology. For example, while the Commission agrees that the monitoring systems used on the platform provided the right data, it points out that the solution used relied on engineers to make sense of that data and correlate the right elements to detect anomalies. “More sophisticated, automated alarms and algorithms” could have been used to create meaningful alerts and maybe prevent the explosion. The Commission’s report shows that the reporting systems used have not kept pace with the increased complexity of drilling platforms. Another conclusion is even more disturbing, as it points out that these deficiencies are not uncommon and that other drilling platforms in the Gulf of Mexico face similar challenges.
If we substitute “drilling platform” with “data center,” this sound awfully familiar. How many IT organizations are relying on relatively simple data collection coming from point monitoring such as network, server, or application while trying to manage the performance and availability of increasingly complex applications? IT operations engineers sift through mountains of data from different sources trying to make sense of what is happening and usually fall short of finding meaningful alerts. The consequences may not be as dire as the Gulf oil spill, but they can still translate into lost productivity and revenue.
The fact that many IT operations have not (yet) faced a meltdown is not a valid counterargument: There is, for example, a good reason to purchase hurricane insurance when one lives in Florida, even though destructive storms are not that common. Like the weather, there are so many variables at play in today’s business services that mere humans can’t be expected to make sense of it.
If the challenge is real, finding the right solution may not be easy. IT operations have acquired solutions from diverse vendors, mostly as a reaction to perceived issues and uncertainties. Because the data collected comes from diverse sources, it needs first to be “normalized”: The raw data from a monitoring collector must be run through a normalization algorithm to: 1) convert it into a form that could be used in comparison with other data types, and 2) placed in an actual context to determine its dependencies. An example of normalization is to consider a data value in a “period context”: At a given time of the day, on a given day of the year, is the value collected within x% of its “normal” value?
There are several solutions on the market that provide normalization and statistical analysis for improving alerts. But for these to be effective, we also must remember that all elements of the infrastructure and application must be instrumented and provide data. Another disaster, the Three Mile Island nuclear power plant failure, can be directly traced to an incomplete infrastructure monitoring leading to an incorrect conclusion about the root cause of the problem.
Monitoring is useless if it is not: 1) covering all potential points of failure, and 2) using normalization and statistical analysis to make sense of the data. As the Oil Spill Commission points out, you can’t expect a person to spend hours in front of a screen and detect minute variations that are the warning signs of impending disaster.
search forrester's blogs
Chart the digital business future.
Attend Forrester’s Forum for Infrastructure & Operations Professionals EMEA, June 10-11, London UK
Lead with a "mobile first" strategy.
Attend the complimentary Webinar Provide Next Generation Services To Your Customers June 5, 2013, 1:00–2:00 p.m. EST
Analyst Blogs
- Andre Kindness (20)
- Bryan Wang (7)
- Christian Kane (4)
- Christopher Voce (8)
- Dave Bartoletti (14)
- David Johnson (40)
- Doug Washburn (35)
- Eveline Oehrlich (8)
- Glenn O'Donnell (25)
- Henry Baltazar (3)
- Henry Dewing (3)
- James Staten (102)
- Jean-Pierre Garbani (12)
- John Rakowski (16)
- JP Gownder (47)
- Katyayan Gupta (10)
- Laura Koetzle (1)
- Lauren Nelson (4)
- Michele Pelino (3)
- Rachel Dines (28)
- Richard Fichera (107)
- Stephanie Balaouras (1)
- Stephen Mann (93)
- Wen Zhao (2)
Top Categories
Archives
- May 2013 (1)
- February 2011 (3)
- January 2011 (1)
- December 2010 (4)
- August 2010 (1)
- July 2010 (2)
- June 2010 (5)
- March 2010 (2)
- January 2010 (2)
- October 2009 (2)
- September 2009 (2)
- August 2009 (1)
- June 2009 (1)
- See all
Comments
Automation of the Complexity
Great analogy, JP. We have far more information to track and correlate these days than ever before and growing more complex with dynamic infrastructures. How can folks afford to play high stakes poker without the insurance policy? Read more here..... http://bit.ly/hhnEJO
Michele Hudnall @HudnallsHuddle
mhudnall@novell.com
www.businessservicemanagementhub.com @BSMHub