OK, out of respect for your time, now that I’ve caught you with a title that promises some drama I’ll cut to the chase and tell you that I definitely lean toward the former. Having spent a couple of days here at Oracle Open World poking around the various flavors of Engineered Systems, including the established Exadata and Exalogic along with the new SPARC Super Cluster (all of a week old) and the newly announced Exalytic system for big data analytics, I am pretty convinced that they represent an intelligent and modular set of optimized platforms for specific workloads. In addition to being modular, they give me the strong impression of a “composable” architecture – the various elements of processing nodes, Oracle storage nodes, ZFS file nodes and other components can clearly be recombined over time as customer requirements dictate, either as standard products or as custom configurations.
My colleague James Staten recently wrote about AutoDesk Cloud as an exemplar of the move toward App Internet, the concept of implementing applications that are distributed between local and cloud resources in a fashion that is transparent to the user except for the improved experience. His analysis is 100% correct, and AutoDesk Cloud represents a major leap in CAD functionality, intelligently offloading the inherently parallel and intensive rendering tasks and facilitating some aspects of collaboration.
But (and there’s always a “but”), having been involved in graphics technology on and off since the '80s, I would say that “cloud” implementation of rendering and analysis is something that has been incrementally evolving for decades, with hundreds of well-documented distributed environments with desktops fluidly shipping their renderings to local rendering and analysis farms that would today be called private clouds, with the results shipped back to the creating workstations. This work was largely developed and paid for either by universities and by media companies as part of major movie production projects. Some of them were of significant scale, such as “Massive,” the rendering and animation farm for "Lord of the Rings" that had approximately 1,500 compute nodes, and a subsequent installation at Weta that may have up to 7,000 nodes. In my, admittedly arguable, opinion, the move to AutoDesk Cloud, while representing a major jump in capabilities by making the cloud accessible to a huge number of users, does not represent a major architectural innovation, but rather an incremental step.
In the good old days, computer industry trade shows were bigger than life events – booths with barkers and actors, ice cream and espresso bars and games in the booth, magic acts and surging crowds gawking at technology. In recent years, they have for the most part become sad shadows of their former selves. The great SHOWS are gone, replaced with button-down vertical and regional events where you are lucky to get a pen or a miniature candy bar for your troubles.
Enter Oracle OpenWorld. Mix 45,000 people, hundreds of exhibitors, one of the world’s largest software and systems company looking to make an impression, and you have the new generation of technology extravaganza. The scale is extravagant, taking up the entire Moscone Center complex (N, S and W) along with a couple of hotel venues, closing off a block of a major San Francisco street for a week, and throwing a little evening party for 20 or 30 thousand people.
But mixed with the hoopla, which included wheel of fortune giveaways that had hundreds of people snaking around the already crowded exhibition floor in serpentine lines, mini golf and whack-a-mole-games in the exhibit booths along with the aforementioned espresso and ice cream stands, there was genuine content and the public face of some significant trends. So far, after 24 hours, some major messages come through loud and clear:
I just attended IDF and I’ve got to say, Intel has certainly gotten the cloud message. Almost everything is centered on clouds, from the high-concept keynotes to the presentations on low-level infrastructure, although if you dug deep enough there was content for general old-fashioned data center and I&O professionals. Some highlights:
Chips and processors and low-level hardware
Intel is, after all, a semiconductor foundry, and despite their expertise in design, their true core competitive advantage is their foundry operations – even their competitors grudgingly acknowledge that they can manufacture semiconductors better than anyone else on the planet. As a consequence, showing off new designs and processes is always front and center at IDF, and this year was no exception. Last year it was Sandy Bridge, the 22nm shrink of the 32nm Westmere (although Sandy Bridge also incorporated some significant design improvements). This year it was Ivy Bridge, the 22nm “tick” of the Intel “tick-tock” design cycle. Ivy Bridge is the new 22nm architecture and seems to have inherited Intel’s recent focus on power efficiency, with major improvements beyond the already solid advantages of their 22nm process, including deeper P-States and the ability to actually shut down parts of the chip when it is idle. While they did not discuss the server variants in any detail, the desktop versions will get an entirely new integrated graphics processor which they are obviously hoping will blunt AMD’s resurgence in client systems. On the server side, if I were to guess, I would guess more cores and larger caches, along with increased support for virtualization of I/O beyond what they currently have.
Well, maybe everybody is saying “cloud” these days, but my first impression of Microsoft Windows Server 8 (not the final name) is that Microsoft has been listening very closely to what customers want from an OS that can support both public and private enterprise cloud implementations. And most importantly, the things that they have built into WS8 for “clouds” also look like they make life easier for plain old enterprise IT.
Microsoft appears to have focused its efforts on several key themes, all of which benefit legacy IT architectures as well as emerging clouds:
Management, migration and recovery of VMs in a multi-system domain – Major improvements in Hyper-V and management capabilities mean that I&O groups can easily build multi-system clusters of WS8 servers, and easily migrate VMs across system boundaries. Muplitle systems can be clustered with Fibre Channel, making it easier to implement high-performance clusters.
Multi-tenancy – A host of features, primarily around management and role-based delegation that make it easier and more secure to implement multi-tenant VM clouds.
Recovery and resiliency – Microsoft claims that they can failover VMs from one machine to another in 25 seconds, a very impressive number indeed. While vendor performance claims are always like EPA mileage – you are guaranteed never to exceed this number – this is an impressive claim and a major capability, with major implications for HA architecture in any data center.
Last year at VMworld I noted Xsigo Systems, a small privately held company at VMworld showing their I/O Director technology, which delivereda subset of HP Virtual Connect or Cisco UCS I/O virtualization capability in a fashion that could be consumed by legacy rack-mount servers from any vendor. I/O Director connects to the server with one or more 10 G Ethernet links, and then splits traffic out into enterprise Ethernet and FC networks. On the server side, the applications, including VMware, see multiple virtual NICs and HBAs courtesy of Xsigo’s proprietary virtual NIC driver.
Controlled via Xsigo’s management console, the server MAC and WWNs can be programmed, and the servers can now connect to multiple external networks with fewer cables and substantially lower costs for NIC and HBA hardware. Virtualized I/O is one of the major transformative developments in emerging data center architecture, and will remain a theme in Forrester’s data center research coverage.
This year at VMworld, Xsigo announced a major expansion of their capabilities – Xsigo Server Fabric, which takes the previous rack-scale single-Xsigo switch domains and links them into a data-center-scale fabric. Combined with improvements in the software and UI, Xsigo now claims to offer one-click connection of any server resource to any network or storage resource within the domain of Xsigo’s fabric. Most significantly, Xsigo’s interface is optimized to allow connection of VMs to storage and network resources, and to allow the creation of private VM-VM links.
We have been repeatedly reminded that the requirements of hyper-scale cloud properties are different from those of the mainstream enterprise, but I am now beginning to suspect that the top strata of the traditional enterprise may be leaning in the same direction. This suspicion has been triggered by the combination of a recent day in NY visiting I&O groups in a handful of very large companies and a number of unrelated client interactions.
The pattern that I see developing is one of “haves” versus “have nots” in terms of their ability to execute on their technology vision with internal resources. The “haves” are the traditional large sophisticated corporations, with a high concentration in financial services. They have sophisticated IT groups, are capable fo writing extremely complex systems management and operations software, and typically own and manage 10,000 servers or more. The have nots are the ones with more modest skills and abilities, who may own 1000s of servers, but tend to be less advanced than the core FSI companies in terms of their ability to integrate and optimize their infrastructure.
The divergence in requirements comes from what they expect and want from their primary system vendors. The have nots are companies who understand their limitations and are looking for help form their vendors in the form of converged infrastructures, new virtualization management tools, and deeper integration of management software to automate operational tasks, These are people who buy HP c-Class, Cisco UCS, for example, and then add vendor-supplied and ISV management and automation tools on top of them in an attempt to control complexity and costs. They are willing to accept deeper vendor lock-in in exchange for the benefits of the advanced capabilities.
A project I’m working on for an approximately half-billion dollar company in the health care industry has forced me to revisit Hyper-V versus VMware after a long period of inattention on my part, and it has become apparent that Hyper-V has made significant progress as a viable platform for at least medium enterprises. My key takeaways include:
Hyper-V has come a long way and is now a viable competitor in Microsoft environments up through mid-size enterprise as long as their DR/HA requirements are not too stringent and as long as they are willing to use Microsoft’s Systems Center, Server Management Suite and Performance Resource Optimization as well as other vendor specific pieces of software as part of their management environment.
Hyper-V still has limitations in VM memory size, total physical system memory size and number of cores per VM compared to VMware, and VMware boasts more flexible memory management and I/O options, but these differences are less significant that they were two years ago.
For large enterprises and for complete integrated management, particularly storage, HA, DR and automated workload migration, and for what appears to be close to 100% coverage of workload sizes, VMware is still king of the barnyard. VMware also boasts an incredibly rich partner ecosystem.
For cloud, Microsoft has a plausible story but it is completely wrapped around Azure.
While I have not had the time (or the inclination, if I was being totally honest) to develop a very granular comparison, VMware’s recent changes to its legacy licensing structure (and subsequent changes to the new pricing structure) does look like license cost remains an attraction for Microsoft Hyper-V, especially if the enterprise is using Windows Server Enterprise Edition.
NVIDIA recently shared a case study involving risk calculations at a JP Morgan Chase that I think is significant for the extreme levels of acceleration gained by integrating GPUs with conventional CPUs, and also as an illustration of a mainstream financial application of GPU technology.
JP Morgan Chase’s Equity Derivatives Group began evaluating GPUs as computational accelerators in 2009, and now runs over half of their risk calculations on hybrid systems containing x86 CPUs and NVIDIA Tesla GPUs, and claims a 40x improvement in calculation times combined with a 75% cost savings. The cost savings appear to be derived from a combination of lower capital costs to deliver an equivalent throughput of calculations along with improved energy efficiency per calculation.
Implicit in the speedup of 40x, from multiple hours to several minutes, is the implication that these calculations can become part of a near real-time business-critical analysis process instead of an overnight or daily batch process. Given the intensely competitive nature of derivatives trading, it is highly likely that JPMC will enhance their use of GPUs as traders demand an ever increasing number of these calculations. And of course, their competition has been using the same technology as well, based on numerous conversations I have had with Wall Street infrastructure architects over the past year.
My net take on this is that we will see a succession of similar announcements as GPUs become a fully mainstream acceleration technology as opposed to an experimental fringe. If you are an I&O professional whose users are demanding extreme computational performance on a constrained space, power and capital budget, you owe it to yourself and your company to evaluate the newest accelerator technology. Your competitors are almost certainly doing so.
Over the past months server vendors have been announcing benchmark results for systems incorporating Intel’s high-end x86 CPU, the E7, with HP trumping all existing benchmarks with their recently announced numbers (although, as noted in x86 Servers Hit The High Notes, the results are clustered within a few percent each other). HP recently announced new performance numbers for their ProLiant DL980, their high-end 8-socket x86 server using the newest Intel E7 processors. With up to 10 cores, these new processors can bring up to 80 cores to bear on large problems such as database, ERP and other enterprise applications.
The performance results on the SAP SD 2-Tier benchmark, for example, at 25160 SD users, show a performance improvement of 35% over the previous high-water mark of 18635. The results seem to scale almost exactly with the product of core count x clock speed, indicating that both the system hardware and the supporting OS, in this case Windows Server 2008, are not at their scalability limits. This gives us confidence that subsequent spins of the CPU will in turn yield further performance increases before hitting system of OS limitations. Results from other benchmarks show similar patterns as well.
Key takeaways for I&O professionals include:
Expect to see at least 25% to 35% throughput improvements in many workloads with systems based on the latest the high-performance PCUs from Intel. In situations where data center space and cooling resources are constrained this can be a significant boost for a same-footprint upgrade of a high-end system.
For Unix to Linux migrations, target platform scalability continues become less of an issue.