Complex Event Processing And IT Automation

Events are, and have been for quite some time, the fundamental elements of IT infrastructure real-time monitoring. Any status changed, threshold crossed in device usage, or step performed in a process generates an event that needs to be reported, analyzed, and acted upon by IT operations.

Historically, the lower layers of IT infrastructure (i.e., network components and hardware platforms) have been regarded as the most prone to hardware and software failures and have therefore been the object of all attention and of most management software investments. In reality, today’s failures are much more likely to be coming from the application and the management of platform and application updates than from the hardware platforms. The increased infrastructure complexity has resulted in a multiplication of events reported on IT management consoles.

Over the years, several solutions have been developed to extract the truth from the clutter of event messages. Network management pioneered solutions such as rule engines and codebook. The idea was to determine, among a group of related events, the original straw that broke the camel’s back. We then moved on to more sophisticated statistical and pattern analysis: Using historical data we could determine what was normal at any given time for a group of parameters. This not only reduces the number of events, it eliminates false alerts and provides a predictive analysis based on parameters’ value evolution in time.

The next step, which has been used in industrial process control and in business activities and is now finding its way into IT management solutions, is complex event processing (CEP). 

About 10 years ago, I worked in IT operations in a company that was launching an IPO. Of course, the number of hits on the company’s Web site doubled and even tripled in the days before the launch. This is an obvious and simplistic example, but which actually would result in different conclusions when seen from an event management perspective. Network management rule engines received events from the infrastructure: router memory low, packets dropped, server CPU usage too high, database server performance alerts, etc. With no idea about what caused this sudden increase (of course we knew, but this is just a simple example), IT operations might have been tempted to increase the Web site infrastructure capacity. If IT operations were using predictive analysis, it would detect an abnormal pattern and predict the impending crash of the Web site in time for IT operations to do something about it. But it would have fared no better in analyzing the true root cause of the problem. Using complex event processing adds the business event dimension, for example, that the company made its intention public at some point in time. Processing this business event with the infrastructure events tells IT operations why the traffic increased, infers that it is most probably a transient phenomenon, and therefore recommends a temporary increase in capacity. If a private cloud is implemented, it could even trigger the provisioning of this extra and temporary capacity.

Several APM-BTM vendors have announced a CEP capability with their solutions. Building CEP rules is not simple and will require a good dose of analysis and cooperation between business and IT, but the reward is certainly worth it.

I hope I did some justice to CEP and welcome your comments.

Comments

Real-time Intelligence combined with Automation is Key

JP, I totally agree. Over the last couple of years, we’ve seen more and more customers needing to sense and respond to business events in real-time. They are leading indicators -- to ensure successful business and IT automation, exceed SLAs and manage the IT environment on demand. (It’s why UC4 acquired Senactive!)

The power of process automation in combination with CEP enables customers to fully leverage the elasticity the cloud and virtualization offers. You need the intelligence to interpret (we call it “context-awareness”) business indicators and IT events, but you also need the end-to-end view of the automated process to know which process-steps are coming next. The combination gives you the patterns and ability to predict what action is required/needed (essentially, to do the right thing at the right time.) We’re seeing a broad range of use cases for CEP and automation across multiple industries…e-commerce, financial services, utilities and more.

Complexity is a Good Thing Again

This is exactly why i think Complexity is a Good Thing Again, Over the years, we’ve been attacking the word ‘complexity’: The implication was that complexity was unnecessary, difficult to manage, and thus needlessly expensive. But today, we are achieving fantastic savings in capital expenses while increasing flexibility. Solutions such as virtualization and cloud computing, security, network management are smart and robust, yet (unavoidably) complex in their very nature.

So the battle for IT can now be targeted not necessarily at ‘reducing complexity’, but rather on ‘improving the efficiency’ of managing across the variety of complex IT domains. In other words, the goal is to achieve consistent service levels, higher quality of offerings and cost optimization.

IT Process Automation is the Key, Till now, each of the various IT systems has been optimized in and of itself… But we haven’t yet optimized the day-to-day operational activity and troubleshooting diagnostic steps that keeps these systems working together. As much as we all want to believe that our environment can become one well-greased machine that requires no human interaction, it just isn’t a reality. This is not because we haven’t figured out how to get people out of the loop. It is because in most cases, people are an integral part of that loop!

Thats why we developed in Ayehu Software the solution to handle complex events (CE) or as we call it in Ayehu - Critical Situations and using process automation we help IT operations to manage and resolve critical problems in a very simple and efficient way.

CEP - From obscure to obvious

At least the need is.

I agree and what's becoming more obvious (albeit "CEP" is still a difficult TLA) is how valuable uniting various pieces of event and historical data to understand impacts and predict possible issues and opportunities can be.

While the moniker "CEP" may not be how and end solution is thought of, it seems that most applications will have to have varying levels event-awareness and intelligence to help companies. From fraud identification to time-of-engagement up selling, I think we can all agree, time becomes more of an essence and rapid responsiveness moves from a "good to have" to a "must-have."

Being Complex Makes Transaction Assurance Simpler

JP - I read this posting with interest. I agree with other comments that "Complex" has put some people off CEP -- but letting a system handle the complexity can actually make life simpler. I like BTM and CEP convergence to deepen CEP's visibility and to strength BTM's correlation and pattern detection capabilities. I agree with your comment that encoding the CEP rules can be the challenge - but often domain experts have the rules and just need the way to express them. We've found that in many industries, including financial services, telco, logistics, airlines, gaming etc.

I included a ref to your blog post in a post I've just written here: http://blogs.progress.com/business_making_progress/2010/09/how-being-com...

JB