I recently had an opportunity to spend a day in three separate meetings with infrastructure & operations professionals from three of the top six financial service firms in the country, and discuss topics ranging from long-term business and infrastructure strategy to specific likes and dislikes regarding their Tier-1 vendors and their challengers. The day’s meetings were neither classic consulting nor classic briefings, but rather a free-form discussion, guided only loosely by an agenda and, despite possible Federal regulations to the contrary, completely devoid of PowerPoint presentations. As in the past, these in-depth meetings provided a wealth of food for thought, interesting and sometimes contradictory indicators from the three groups. There was a lot of material to ponder, but I’ll try and summarize some of the high-level takeaways in this post.
Servers and Vendors
These companies between them own in the neighborhood of 180,000 servers, and probably purchase 30,000 - 50,000 servers per year in various cycles of procurements. In short, these are heavyweight users. One thing that struck me in the course of the conversations was the Machiavellian view of their Tier-1 server vendors. Viewed as key partners, at the same time the majority of this group of users devoted a substantial amount of time to keeping their key vendors at arm’s length through aggressive vendor management techniques like deliberate splitting of procurements between competitors. They understand their suppliers' margins and cost structures well, and are committed to driving hardware supplier margins to “as close to zero as we can,” in the words of one participant.
There has been turmoil and angst recently in the 0pen source community of late over Oracle’s decision to cancel OpenSolaris. Since this community can be expected to react violently anytime something is taken out of open source, the real question is whether this action has any impact on real-world IT and operations professionals. The short answer is no.
Enterprise Solaris users, be they small, medium or large, are using it to run critical applications; and as far as we can tell, the uptake of OpenSolaris as opposed to Solaris supplied and sold by Sun was very low in commercial accounts, other than possibly a surge in test and dev environments. The decision to take Solaris into the open source arena was, in my opinion, fundamentally flawed, and Oracle’s subsequent decision to change this is eminently rational – Oracle’s customers almost certainly are not going to run their companies on an OS that is built and maintained by any open source community (even the vast majority of corporate Linux use is via a distribution supported by a major vendor and under a paid subscription model), and Oracle cannot continue to develop Solaris unless they have absolute control over it, just as is the case with every other enterprise OS. In the same vein, unless Oracle can also have an expectation of being compensated for their investments in future Solaris development, there is little motivation for them to continue to invest heavily in Solaris.
Historically, the positioning of Dell versus its two major competitors for high-value enterprise business, particularly where it involved complex services and the ability to deliver deeply integrated infrastructure and management stacks, has been as sort of an also ran. Competitors looked at Dell as a price spoiler and a channel for standard storage and networking offerings from its partners, not as a potential threat to the high-ground of being able to deliver complex integrated infrastructure solutions.
This comforting image of Dell as being a glorified box pusher appears to be coming to an end. When my colleague Andrew Reichman recently wrote about Dell’s attempted acquisition of 3Par, it made me take another look at Dell’s recent pattern of investments and the series of announcements they have made around delivering integrated infrastructure with a message and solution offering that looks like it is aimed squarely at HP and IBM's Virtual Fabric.
In a recent discussion with a group of infrastructure architects, power architecture, especially UPS engineering, was on the table as a topic. There was general agreement that UPS systems are a necessary evil, cumbersome and expensive beasts to put into a DC, and a lot of speculation on alternatives. There was general consensus that the goal was to develop a solution that would be more granular install and deploy and thus allow easier and ad-hoc decisions about which resources to protect, and agreement that battery technologies and current UPS architectures were not optimal for this kind of solution.
So what if someone were to suddenly expand battery technology R&D investment by a factor of maybe 100x of R&D and into battery technology, expand high-capacity battery production by a giant factor, and drive prices down precipitously? That’s a tall order for today’s UPS industry, but it’s happening now courtesy of the auto industry and the anticipated wave of plug-in hybrid cars. While batteries for cars and batteries for computers certainly have their differences in terms of depth and frequency of charge/discharge cycles, packaging, lifespan, etc, there is little doubt that investments in dense and powerful automotive batteries and power management technology will bleed through into the data center. Throw in recent developments in high-charge capacitors (referred to in the media as “super capacitors”), which add the impedance match between the requirements for spike demands and a chemical battery’s dislike of sudden state changes, and you have all the foundational ingredients for major transformation in the way we think about supplying backup power to our data center components.
It’s probably fair to say that the computer community is obsessed with speed. After all, our people buy computers to solve problems, and generally the faster the computer, the faster the problem gets solved. The earliest benchmark that I have seen is published in “High Speed Computing Devices, Engineering Resource Devices, McGraw Hill, 1950.” They cite the Marchant desktop calculator as achieving a best-in-class result of 1,350 digits per minute for addition, and the threshold problems then were figuring out how to break down Newton Raphsen equation solvers for maximum computational efficiency. And so the race begins…
Not much has changed since 1950. While our appetites are now expressed in GFLOPs per CPU and TFLOPS per system, users continue to push for escalation of performance in numerically intensive problems. Just as we settled down to a relatively predictable performance model with standard CPUs and cores glued into servers and aggregated into distributed computing architectures of various flavors, along came the notion of attached processors. First appearing in the 1960s and 1970s as attached mainframe vector processors and attached floating point array processors for minicomputers, attached processors have always had a devoted and vocal minority support within the industry. My own brush with them was as a developer using a Floating Point Systems array processor attached to a 32-bit minicomputer to speed up a nuclear reactor core power monitoring application. When all was said and done, the 50X performance advantage of the FPS box had decreased to about 3.5X for the total application. Not bad, but a defeat of expectations. Subsequent brushes with attempts to integrate DSPs with workstations left me a bit jaundiced about the future of attached processors as general purpose accelerators.
I’ve been getting a number of inquiries recently regarding benchmarking potential savings from consolidating multiple physical servers onto a smaller number of servers using VMs, usually VMware. The variations in the complexity of the existing versus new infrastructures, operating environments, and applications under consideration make it impossible to come up with consistent rules of thumb, and in most cases, also make it very difficult to predict with any accuracy what the final outcome will be absent a very tedious modeling exercise.
However, the major variables that influence the puzzle remain relatively constant, giving us the ability to at least set out a framework to help analyze potential consolidation projects. This list usually includes:
I spoke today at SHARE’s biannual conference, giving a talk on emerging data center architectures, x86 servers, and internal clouds. SHARE is an organization which describes itself as “representing over 2,000 of IBM's top enterprise computing customers.” In other words, definitely a mainframe geekfest, as described by one attendee. I saw hundreds of people around my age (think waaay over 30), and was able to swap stories of my long-ago IBM mainframe programming experience (that’s what we called “Software Engineering” back when it was FORTRAN, COBOL, PL/1 and BAL. I was astounded to see that IMS was still a going thing, with sessions on the agenda, and in a show of hands, at least a third of the audience reported still running IMS.
Oh well, dinosaurs may survive in some odd corners of the world, and IMS workloads, while not exciting, are a necessary and familiar facet of legacy applications that have decades of stability and embedded culture behind them…
But wait! Look again at the IMS session right next door to my keynote. It was about connecting zLinux and IMS. Other sessions included more zLinux, WebSphere and other seemingly new-age topics. Again, my audience confirmed the sea-change in the mainframe world. Far more compelling than any claims by IBM reps was the audience reaction to a question about zLinux – more than half of them indicated that they currently run zLinux, a response which was much higher than I anticipated. Further discussions after the session with several zLinux users left me with some strong impressions: