With the increased presence of business principles within the IT arena, I get a lot of inquiries from  Infrastructure & Operations Professionals who are trying to figure out how to justify their investment in a particular product or solution in the security, monitoring, and management areas. Since most marketing personnel view this either as a waste of resources in a futile quest of achievement or too intimidating to even begin to tackle, IT vendors have not provided their customers more than marketing words:  lower TCO, more efficient, higher value, more secure, or more reliable. It’s a bummer since the request is a valid concern for any IT organization. Consider that other industries — nuclear power plants, medical delivery systems, or air traffic control — with complex products and services look at risk and reward all the time to justify their investments. They all use some form of probabilistic risk assessment (PRA) tools to figure out technological, financial, and programmatic risk by combining it with disaster costs: revenue losses, productivity losses, compliance and/or reporting penalties, penalties and loss of discounts, impact to customers and strategic partners, and impact to cash flow.

PRA teams use fault tree analysis(FTA) for top-down assessment and failure mode and effect analysis(FMEA) for bottom-up. 

  • FTA combines probabilities, fault logic, hierarchical structures, and graphical displays to provide an analytical technique, in the context of its environment and operation, to find all realistic ways issues could arise. The faults can be component hardware failures, human errors, software errors, or any other pertinent events that can lead to disruption. Since any complex failure combinations involving a thousand contributing causes are difficult to find by manual analysis, software assists in combinatorial methods and simulations. 
  • FMEA is a procedure in operations management for analysis of potential failure modes within a system for classification by the severity and likelihood of the failures. FMEA activity helps a team identify potential failure modes based on past experience with similar products or processes, enabling the team to design those failures out of the system with minimal effort and resource expenditure, thereby reducing development time and costs. It is widely used in manufacturing industries in various phases of the product life cycle and is now increasingly finding use in the service industry.  

With each failure, there is an associated severity, probability, and frequency. These results will provide the risk probability that can be multiplied against costs outlined in Stephanie Balaouras’ report, Building The Business Case For Disaster Recovery Spending. It will provide the business decision-makers a dollar value of not having visibility, control, resiliency, and security.