Events are, and have been for quite some time, the fundamental elements of IT infrastructure real-time monitoring. Any status changed, threshold crossed in device usage, or step performed in a process generates an event that needs to be reported, analyzed, and acted upon by IT operations.
Historically, the lower layers of IT infrastructure (i.e., network components and hardware platforms) have been regarded as the most prone to hardware and software failures and have therefore been the object of all attention and of most management software investments. In reality, today’s failures are much more likely to be coming from the application and the management of platform and application updates than from the hardware platforms. The increased infrastructure complexity has resulted in a multiplication of events reported on IT management consoles.
Over the years, several solutions have been developed to extract the truth from the clutter of event messages. Network management pioneered solutions such as rule engines and codebook. The idea was to determine, among a group of related events, the original straw that broke the camel’s back. We then moved on to more sophisticated statistical and pattern analysis: Using historical data we could determine what was normal at any given time for a group of parameters. This not only reduces the number of events, it eliminates false alerts and provides a predictive analysis based on parameters’ value evolution in time.
The next step, which has been used in industrial process control and in business activities and is now finding its way into IT management solutions, is complex event processing (CEP).
Infrastructure diversity is one important component of many IT infrastructures’ complexity. Even at a time when organizations are standardizing on x86 hardware, they often maintain separate support groups by types of operating systems. In the meantime, we see even more technology diversity developing in a relentless pursuit of performance, and ironically, simplification. This begs a simple question: Should we, for the sake of operational efficiency, standardize at the lowest possible level, e.g., the computing platform, or at a much higher level, e.g., the user interface?
In the past months, I think a clear answer was provided by the mainframe world. One key element that actually limits mainframe expansion in some data centers is the perception from higher levels of management that the mainframe is a complex-to-operate and obsolete platform, too radically different from the Linux and Windows operating systems. This comes from the fact that most mainframe management solutions use an explicit interface for configuration and deployment that requires a detailed knowledge of the mainframe specificity. Mastering it requires skills and experience that unfortunately do not seem to be taught in most computer science classes. Because mainframe education is lacking, the issue seems to be more acute than in other IT segments. This eventually would condemn the mainframe when all the baby boomers decide that they would rather golf in Florida.
This whole perception was shattered to pieces by two major announcements. The most recent one is the new IBM zEnterprise platform, which regroups a mix of hardware and software platforms under a single administration interface. In doing this, IBM provides a solution that actually abstracts the platforms’ diversity and removes the need for different administrators versed in the vagaries of the different operating systems.
One of the great revolutions in manufacturing of the past decades is just-in-time inventory management. The basic idea is to provision only what is needed for a certain level of operation and to put in place a number of management functions that will trigger the provisioning of inventory. This is one the key elements that allowed the manufacturing of goods to contain production costs. We have been trying to adapt the concept to IT for years with little success. But a combination of the latest technologies is finally bringing the concept to a working level. IT operations often faces unpredictable workloads or large variations of workloads during peak periods. Typically, the solution is to over-provision infrastructure capacity and use a number of potential corrective measures: load balancing, traffic shaping, fast reconfiguration and provisioning of servers, etc.
Among critical industrial processes, IT is probably the only one where control and management come as an afterthought. Blame it on product vendors or on immature clients, but it seems that IT management always takes a second seat to application functionalities.
IT operation is seen as a purely tactical activity, but this should not occult the need for a management strategy. Acquiring products on a whim and hastily putting together an ad hoc process to use them is a recipe for chaos. When infrastructure management, which is supposed to bring order and control in IT, leads the way to anarchy, a meltdown is a forgone conclusion.
Most infrastructure management products present a high level of usefulness and innovation. One should be, however, conscious of the vendor’s limitations. Vendors spend a lot of time talking about the mythical customer needs, while most of them have no experience of IT operations. Consequently, their horizon is limited to the technology they have, and that tree does hide the forest. Clients should carefully select products for the role they play in the overall infrastructure management strategy, not solely on the basis of immediate relief. As the world of IT Operations is becoming more complex every day, the value of an IT management product lies not only with its capability to resolve an immediate issue, but also in its ability to participate future management solutions. The tactical and strategic constraints should not be mutually exclusive.
The choice between different formats of cloud computing (IaaS, SaaS mostly) and their comparison to internal IT business service deployment must be based on objective criteria. But this is mostly uncharted territory in IT. Many organizations have difficulties implementing a realistic chargeback solution, and the real cost of business services is often an elusive target. We all agree that IT needs a better form of financial management, even though 80% of organizations will consider it primarily as a means for understanding where to cut costs rather than a strategy to drive a better IT organization.
Financial management will help IT understand better its cost structure in all dimensions, but this is not enough to make an informed choice between a business service internal or external deployment. I think that the problem of which deployment model to choose from requires a new methodology that will get data from financial management. As I often do, I turned to manufacturing to see how they deal with this type of analysis and cost optimization. The starting point is of course an architectural model of the “product”, and this effectively shows how valuable these models are in IT. The two types of analysis, FAST (Function Analysis System Technique) and QFD (Quality Function Deployment), combine into a “Value Analysis Matrix” that lists the customer requirements against the way these requirements are answered by the “product” (or business service) components. Each of these components has a weight (derived from its correlation with the customer requirements) and a cost associated to it. Analyzing several models (for example a SaaS model against an internal deployment) would lead to not only an informed decision but also would open the door to an optimization of the service cost.
I think that such a methodology would complement a financial management product and help IT become more efficient.
Technology growth is exponential. We all know about Moore’s Law by which the density of transistors on a chip doubles every two years; but there is also Watts Humphrey’s comment that the size of software doubles every two years, Nielsen’s Law by which Internet bandwidth available to users doubles every two years, and many others concerning storage, computing speed, and power consumption in a data center. IT organizations and especially IT operations must cope with this afflux of technology, which brings more and more services to the business, as well as the management of the legacy services and technology. I believe that the two most important roadblocks that prevent IT from optimizing its costs are in fact diversity and complexity. Cloud computing, whether SaaS or IaaS, is going to add diversity and complexity, as is virtualization in its current form. This is illustrated by the following chart, which compiles answers to the question: “Approximately how many physical servers with the following processor types does your firm operate that you know about?”
If virtualization could potentially address the number of servers in each category, it does not address the diversity of servers, nor does it address the complexity of services running on these diverse technologies.
The marriage of Gomez and Compuware is starting to bear fruits. One of the key aspects of web application performance management is end user experience. This is approached largely from the data center standpoint, within the firewall. But the best solution to understand the real customer experience is to have an agent sitting on the customer side of the application, without the firewall, a possibility that is clearly out of bounds for most public facing applications. The Gomez-Compuware alliance is the first time that these two sides are brought together within the same management application, Compuware Vantage. What Vantage brings to the equation is the Application Performance Management (APM) view of IT Operations: response time collected from the network and correlated with infrastructure and application monitoring in the data center. But, it’s not the customer view. What Gomez brings with its recent version, the “Gomez Winter 2010 Platform Release” is a number of features that let IT understand what goes beyond the firewall: not only how the application content was delivered, but how the additional content from external providers was delivered and what was the actual performance at the end user level: the outside-in view of the application is now combined with the inside-out view of IT Operations provided by Vantage APM. And this is now spreading outside the pure desktop/laptop user group to reach out the increasing mobile and smart phone crowd. IT used to be able to answer the question of “is it the application or the infrastructure?” with Vantage. IT can now answer a broader set of questions: “is it the application, the internet service provider, the web services providers?’ for an increasingly broader range of use-case scenarios.