As businesses get serious about the cloud, developers are bringing more business-critical transaction data to cloud-resident web and mobile apps. Indeed, web and mobile apps that drive systems of engagement (how you interact with your customers and partners) are the reason why many companies look to the cloud in the first place. Public clouds offer the speed and agility developers want, plus the development tools they need. Once you’ve built a killer web or mobile app in the cloud and it’s in production, driving real revenue, who’s responsible for making sure it performs?
It’s a team effort. Developers have to think about performance management as they build, and IT operations teams need to design application monitoring and management into their cloud deployment processes up front. Why? Because there’s no time to do it later. You won’t have time to implement a new app monitoring solution for each new cloud app before you need to get it out to users. And once it’s out there, you need to be tracking user experience immediately.
In traditional IT, one of the reasons we could get away with limited insight into application performance was because we usually overprovisioned resources to make sure we didn’t have to worry about it. It’s easier to have excess capacity than to solve tricky performance problems – problems you might only see once in a while.
Bridgekeeper: "What ... is your name?"
Traveler: "John Swainson of Dell."
Bridgekeeper: "What ... is your quest?"
Traveler: "Hey! That's not a bad idea!"
We suspect Dell's process was more methodical than that!
This acquisition was not a surprise, of course. All along, it has been obvious that Dell needed stronger assets in software as it continues on its quest to avoid the Gorge of Eternal Peril that is spanned by the Bridge of Death. When the company announced that John Swainson was joining to lead the newly formed software group, astute industry watchers knew the next steps would include an ambitious acquisition. We predicted such an acquisition would be one of Swainson's first moves, and after only four months on the job, indeed it was.
I have been working on a research document, to be published this quarter, on the impact of 8-socket x86 servers based on Intel’s new Xeon 7500 CPU. In a nutshell, these systems have the performance of the best-of-breed RISC/UNIX systems of three years ago, at a substantially better price, and their overall performance improvement trajectory has been steeper than competing technologies for the past decade.
This is probably not shocking news and is not the subject of this current post, although I would encourage you to read it when it is finally published. During the course of researching this document I spent time trying to prove or disprove my thesis that x86 system performance solidly overlapped that of RISC/UNIX with available benchmark results. The process highlighted for me the limitations of using standardized benchmarks for performance comparisons. There are now so many benchmarks available that system vendors are only performing each benchmark on selected subsets of their product lines, if at all. Additionally, most benchmarks suffer from several common flaws:
They are results from high-end configurations, in many cases far beyond the norm for any normal use cases, but results cannot be interpolated to smaller, more realistic configurations.
They are often the result of teams of very smart experts tuning the system configurations, application and system software parameters for optimal results. For a large benchmark such as SAP or TPC, it is probably reasonable to assume that there are over 1,000 variables involved in the tuning effort. This makes the results very much like EPA mileage figures — the consumer is guaranteed not to exceed these numbers.
Events are, and have been for quite some time, the fundamental elements of IT infrastructure real-time monitoring. Any status changed, threshold crossed in device usage, or step performed in a process generates an event that needs to be reported, analyzed, and acted upon by IT operations.
Historically, the lower layers of IT infrastructure (i.e., network components and hardware platforms) have been regarded as the most prone to hardware and software failures and have therefore been the object of all attention and of most management software investments. In reality, today’s failures are much more likely to be coming from the application and the management of platform and application updates than from the hardware platforms. The increased infrastructure complexity has resulted in a multiplication of events reported on IT management consoles.
Over the years, several solutions have been developed to extract the truth from the clutter of event messages. Network management pioneered solutions such as rule engines and codebook. The idea was to determine, among a group of related events, the original straw that broke the camel’s back. We then moved on to more sophisticated statistical and pattern analysis: Using historical data we could determine what was normal at any given time for a group of parameters. This not only reduces the number of events, it eliminates false alerts and provides a predictive analysis based on parameters’ value evolution in time.
The next step, which has been used in industrial process control and in business activities and is now finding its way into IT management solutions, is complex event processing (CEP).
The marriage of Gomez and Compuware is starting to bear fruits. One of the key aspects of web application performance management is end user experience. This is approached largely from the data center standpoint, within the firewall. But the best solution to understand the real customer experience is to have an agent sitting on the customer side of the application, without the firewall, a possibility that is clearly out of bounds for most public facing applications. The Gomez-Compuware alliance is the first time that these two sides are brought together within the same management application, Compuware Vantage. What Vantage brings to the equation is the Application Performance Management (APM) view of IT Operations: response time collected from the network and correlated with infrastructure and application monitoring in the data center. But, it’s not the customer view. What Gomez brings with its recent version, the “Gomez Winter 2010 Platform Release” is a number of features that let IT understand what goes beyond the firewall: not only how the application content was delivered, but how the additional content from external providers was delivered and what was the actual performance at the end user level: the outside-in view of the application is now combined with the inside-out view of IT Operations provided by Vantage APM. And this is now spreading outside the pure desktop/laptop user group to reach out the increasing mobile and smart phone crowd. IT used to be able to answer the question of “is it the application or the infrastructure?” with Vantage. IT can now answer a broader set of questions: “is it the application, the internet service provider, the web services providers?’ for an increasingly broader range of use-case scenarios.