On Tuesday, September 4, Microsoft made the official announcement of Windows Server 2012, ending what has seemed like an interminable sequence of rumors, Beta releases, and endless speculation about this successor to Windows Server 2008.
So, is it worth the wait and does it live up to its hype? All omens point to a resounding “YES.”
Make no mistake, this is a really major restructuring of the OS, and a major step-function in capabilities aligned with several major strategic trends for both Microsoft and the rest of the industry. While Microsoft’s high level message is centered on the cloud, and on the Windows Server 2012 features that make it a productive platform upon which both enterprises and service providers can build a cost-effective cloud, its features will be immensely valuable to a wide range of businesses.
What It Does
The reviewers guide for Windows Server 2012 is over 220 pages long, and the OS has at least 100 features that are worth noting, so a real exploration of the features of this OS is way beyond what I can do here. Nonetheless, we can look at several buckets of technology to get an understanding of the general capabilities. Also important to note is that while Microsoft has positioned this as a very cloud-friendly OS, almost all of these cloud-related features are also very useful to an enterprise IT environment.
New file system — Included in WS2012 is ReFS, a new file system designed to survive failures that would bring down or corrupt the previous NTFS file system (which is still available). Combined with improvements in cluster management and failover, this is a capability that will play across the entire user spectrum.
Today HP announced a new set of technology programs and future products designed to move x86 server technology for both Windows and Linux more fully into the realm of truly mission-critical computing. My interpretation of these moves is that it is both a combined defensive and pro-active offensive action on HP’s part that will both protect them as their Itanium/HP-UX portfolio slowly declines as well as offer attractive and potentially unique options for both current and future customers who want to deploy increasingly critical services on x86 platforms.
Bearing in mind that the earliest of these elements will not be in place until approximately mid-2012, the key elements that HP is currently disclosing are:
ServiceGuard for Linux – This is a big win for Linux users on HP, and removes a major operational and architectural hurdle for HP-UX migrations. ServiceGuard is a highly regarded clustering and HA facility on HP-UX, and includes many features for local and geographically distributed HA. The lack of ServiceGuard is often cited as a risk in HP-UX migrations. The availability of ServiceGuard by mid-2012 will remove yet another barrier to smooth migration from HP-UX to Linux, and will help make sure that HP retains the business as it migrates from HP-UX.
Analysis engine for x86 – Analysis engine is internal software that provides system diagnostics, predictive failure analysis and self-repair on HP-UX systems. With an uncommitted delivery date, HP will port this to selected x86 servers. My guess is that since the analysis engine probably requires some level of hardware assist, the analysis engine will be paired with the next item on the list…
NVIDIA recently shared a case study involving risk calculations at a JP Morgan Chase that I think is significant for the extreme levels of acceleration gained by integrating GPUs with conventional CPUs, and also as an illustration of a mainstream financial application of GPU technology.
JP Morgan Chase’s Equity Derivatives Group began evaluating GPUs as computational accelerators in 2009, and now runs over half of their risk calculations on hybrid systems containing x86 CPUs and NVIDIA Tesla GPUs, and claims a 40x improvement in calculation times combined with a 75% cost savings. The cost savings appear to be derived from a combination of lower capital costs to deliver an equivalent throughput of calculations along with improved energy efficiency per calculation.
Implicit in the speedup of 40x, from multiple hours to several minutes, is the implication that these calculations can become part of a near real-time business-critical analysis process instead of an overnight or daily batch process. Given the intensely competitive nature of derivatives trading, it is highly likely that JPMC will enhance their use of GPUs as traders demand an ever increasing number of these calculations. And of course, their competition has been using the same technology as well, based on numerous conversations I have had with Wall Street infrastructure architects over the past year.
My net take on this is that we will see a succession of similar announcements as GPUs become a fully mainstream acceleration technology as opposed to an experimental fringe. If you are an I&O professional whose users are demanding extreme computational performance on a constrained space, power and capital budget, you owe it to yourself and your company to evaluate the newest accelerator technology. Your competitors are almost certainly doing so.
I just spent some time talking to ScaleMP, an interesting niche player that provides a server virtualization solution. What is interesting about ScaleMP is that rather than splitting a single physical server into multiple VMs, they are the only successful offering (to the best of my knowledge) that allows I&O groups to scale up a collection of smaller servers to work as a larger SMP.
Others have tried and failed to deliver this kind of solution, but ScaleMP seems to have actually succeeded, with a claimed 200 customers and expectations of somewhere between 250 and 300 next year.
Their vSMP product comes in two flavors, one that allows a cluster of machines to look like a single system for purposes of management and maintenance while still running as independent cluster nodes, and one that glues the member systems together to appear as a single monolithic SMP.
Does it work? I haven’t been able to verify their claims with actual customers, but they have been selling for about five years, claim over 200 accounts, with a couple of dozen publicly referenced. All in all, probably too elaborate a front to maintain if there was really nothing there. The background of the principals and the technical details they were willing to share convinced me that they have a deep understanding of the low-level memory management, prefectching, and caching that would be needed to make a collection of systems function effectively as a single system image. Their smaller scale benchmarks displayed good scalability in the range of 4 – 8 systems, well short of their theoretical limits.
My quick take is that the software works, and bears investigation if you have an application that:
Either is certified to run with ScaleMP (not many), or one where that you control the code.
You understand the memory reference patterns of the application, and
On Dec. 2, Oracle announced the next move in its program to integrate its hardware and software assets, with the introduction of Oracle Private Cloud Architecture, an integrated infrastructure stack with Infiniband and/or 10G Ethernet fabric, integrated virtualization, management and servers along with software content, both Oracle’s and customer-supplied. Oracle has rolled out the architecture as a general platform for a variety of cloud environments, along with three specific implementations, Exadata, Exalogic and the new Sunrise Supercluster, as proof points for the architecture.
Exadata has been dealt with extensively in other venues, both inside Forrester and externally, and appears to deliver the goods for I&O groups who require efficient consolidation and maximum performance from an Oracle database environment.
Exalogic is a middleware-targeted companion to the Exadata hardware architecture (or another instantiation of Oracle’s private cloud architecture, depending on how you look at it), presenting an integrated infrastructure stack ready to run either Oracle or third-party apps, although Oracle is positioning it as a Java middleware platform. It consists of the following major components integrated into a single rack:
Oracle x86 or T3-based servers and storage.
Oracle Quad-rate Infiniband switches and the Oracle Solaris gateway, which makes the Infiniband network look like an extension of the enterprise 10G Ethernet environment.
Oracle Linux or Solaris.
Oracle Enterprise Manager Ops Center for management.