The Background – Linux as a Fast Follower and the Need for Hot Patching
No doubt about it, Linux has made impressive strides in the last 15 years, gaining many features previously associated with high-end proprietary Unix as it made the transition from small system plaything to core enterprise processing resource and the engine of the extended web as we know it. Along the way it gained reliable and highly scalable schedulers, a multiplicity of efficient and scalable file systems, advanced RAS features, its own embedded virtualization and efficient thread support.
As Linux grew, so did supporting hardware, particularly the capabilities of the ubiquitous x86 CPU upon which the vast majority of Linux runs today. But the debate has always been about how close Linux could get to "the real OS", the core proprietary Unix variants that for two decades defined the limits of non-mainframe scalability and reliability. But "the times they are a changing", and the new narrative may be "when will Unix catch up to Linux on critical RAS features like hot patching".
Hot patching, the ability to apply updates to the OS kernel while it is running, is a long sought-after but elusive feature of a production OS. Long sought after because both developers and operations teams recognize that bringing down an OS instance that is doing critical high-volume work is at best disruptive and worst a logistical nightmare, and elusive because it is incredibly difficult. There have been several failed attempts, and several implementations that "almost worked" but were so fraught with exceptions that they were not really useful in production.[i]
In 2014 I wrote about Microsoft and Dell’s joint Cloud Platform System offering, Microsoft’s initial foray into an “Azure-Like” experience in the enterprise data center. While not a complete or totally transparent Azure experience, it was a definite stake in the ground around Microsoft’s intentions to provide enterprise Azure with hybrid on-premise and public cloud (Azure) interoperability.
I got it wrong about other partners – as far as I know, Dell is the only hardware partner to offer Microsoft CPS – but it looks like my idiot-proof guess that CPS was a stepping stone toward a true on premise Azure was correct, with Microsoft today announcing its technology preview of Azure Stack, the first iteration of a true enterprise Azure offering with hybrid on-prem and public cloud interoperability.
Azure Stack is in some ways a parallel offering to the existing Windows Server/Systems Center and Azure Pack offering, and I believe it represents Microsoft’s long-term vision for enterprise IT, although Microsoft will do nothing to compromise the millions of legacy environments who want to incremental enhance their Windows environment. But for those looking to embrace a more complete cloud experience, Azure Stack is just what the doctor ordered – an Azure environment that can run in the enterprise that has seamless access to the immense Azure public cloud environment.
On the partner front, this time Microsoft will be introducing this as a pure software that can run on one or more standard x86 servers, no special integration required, although I’m sure there will be many bundled offerings of Azure Stack and integration services from partners.
I’ve written and commented in the past about the inevitability of a new class of infrastructure called “composable”, i.e. integrated server, storage and network infrastructure that allowed its users to “compose”, that is to say configure, a physical server out of a collection of pooled server nodes, storage devices and shared network connections.[i]
The early exemplars of this class were pioneering efforts from Egenera and blade systems from Cisco, HP, IBM and others, which allowed some level of abstraction (a necessary precursor to composablity) of server UIDs including network addresses and storage bindings, and introduced the notion of templates for server configuration. More recently the Dell FX and the Cisco UCS M-Series servers introduced the notion of composing of servers from pools of resources within the bounds of a single chassis.[ii] While innovative, they were early efforts, and lacked a number of software and hardware features that were required for deployment against a wide spectrum of enterprise workloads.
This morning, HPE put a major marker down in the realm of composable infrastructure with the announcement of Synergy, its new composable infrastructure system. HPE Synergy represents a major step-function in capabilities for core enterprise infrastructure as it delivers cloud-like semantics to core physical infrastructure. Among its key capabilities:
Looking at Oracle’s latest iteration of its SPARC processor technology, the new M7 CPU, it is at first blush an excellent implementation of SPARC, with 32 cores with 8 threads each implemented in an aggressive 20 nm process and promising a well-deserved performance bump for legacy SPARC/Solaris users. But the impact of the M7 goes beyond simple comparisons to previous generations of SPARC and competing products such as Intel’s Xeon E7 and IBM POWER 8. The M7 is Oracle’s first tangible delivery of its “Software on Silicon” promise, with significant acceleration of key software operations enabled in the M7 hardware.[i]
Oracle took aim at selected performance bottlenecks and security exposures, some specific to Oracle software, and some generic in nature but of great importance. Among the major enhancements in the M7 are:[ii]
Cryptography – While many CPUs now include some form of acceleration for cryptography, Oracle claims the M7 includes a wider variety and deeper support, resulting in almost indistinguishable performance across a range of benchmarks with SSL and other cryptographic protocols enabled. Oracle claims that the M7 is the first CPU architecture that does not present users with the choice of secure or fast, but allows both simultaneously.
My colleague Henry Baltazar and I have been watching the development of new systems and storage technology for years now, and each of us has been trumpeting in our own way the future potential of new non-volatile memory technology (NVM) to not only provide a major leap for current flash-based storage technology but to trigger a major transformation in how servers and storage are architected and deployed and eventually in how software looks at persistent versus nonpersistent storage.
All well and good, but up until very recently we were limited to vague prognostications about which flavor of NVM would finally belly up to the bar for mass production, and how the resultant systems could be architected. In the last 30 days, two major technology developments, Intel’s further disclosure of its future joint-venture NVM technology, now known as 3D XPoint™ Technology, and Diablo Technologies introduction of Memory1, have allowed us to sharpen the focus on the potential outcomes and routes to market for this next wave of infrastructure transformation.
In the world of CMOS semiconductor process, the fundamental heartbeat that drives the continuing evolution of all the devices and computers we use and governs at a fundamantal level hte services we can layer on top of them is the continual shrinkage of the transistors we build upon, and we are used to the regular cadence of miniaturization, generally led by Intel, as we progress from one generation to the next. 32nm logic is so old-fashioned, 22nm parts are in volume production across the entire CPU spectrum, 14 nm parts have started to appear, and the rumor mill is active with reports of initial shipments of 10 nm parts in mid-2016. But there is a collective nervousness about the transition to 7 nm, the next step in the industry process roadmap, with industry leader Intel commenting at the recent 2015 International Solid State Circuit conference that it may have to move away from conventional silicon materials for the transition to 7 nm parts, and that there were many obstacles to mass production beyond the 10 nm threshold.
But there are other players in the game, and some of them are anxious to demonstrate that Intel may not have the commanding lead that many observers assume they have. In a surprise move that hints at the future of some of its own products and that will certainly galvanize both partners and competitors, IBM, discounted by many as a spent force in the semiconductor world with its recent divestiture of its manufacturing business, has just made a real jaw-dropper of an announcement – the existence of working 7nm semiconductors.
Recently we’ve had a chance to look again at two very conflicting views from HP and Facebook on how to do web-scale and cloud computing, both announced at the recent OCP annual event in California.
From HP come its new CloudLine systems, the public face of their joint venture with Foxcon. Early details released by HP show a line of cost-optimized servers descended from a conventional engineering lineage and incorporating selected bits of OCP technology to reduce costs. These are minimalist rack servers designed, after stripping away all the announcement verbiage, to compete with white-box vendors such as Quanta, SuperMicro and a host of others. Available in five models ranging from the minimally-featured CL1100 up through larger nodes designed for high I/O, big data and compute-intensive workloads, these systems will allow large installations to install capacity at costs ranging from 10 – 25% less than the equivalent capacity in their standard ProLiant product line. While the strategic implications of HP having to share IP and market presence with Foxcon are still unclear, it is a measure of HP’s adaptability that they were willing to execute on this arrangement to protect against inroads from emerging competition in the most rapidly growing segment of the server market, and one where they have probably been under immense margin pressure.
Intel has made no secret of its development of the Xeon D, an SOC product designed to take Xeon processing close to power levels and product niches currently occupied by its lower-power and lower performance Atom line, and where emerging competition from ARM is more viable.
The new Xeon D-1500 is clear evidence that Intel “gets it” as far as platforms for hyperscale computing and other throughput per Watt and density-sensitive workloads, both in the enterprise and in the cloud are concerned. The D1500 breaks new ground in several areas:
It is the first Xeon SOC, combining 4 or 8 Xeon cores with embedded I/O including SATA, PCIe and multiple 10 nd 1 Gb Ethernet ports.
It is the first of Intel’s 14 nm server chips expected to be introduced this year. This expected process shrink will also deliver a further performance and performance per Watt across the entire line of entry through mid-range server parts this year.
Why is this significant?
With the D-1500, Intel effectively draws a very deep line in the sand for emerging ARM technology as well as for AMD. The D1500, with 20W – 45W power, delivers the lower end of Xeon performance at power and density levels previously associated with Atom, and close enough to what is expected from the newer generation of higher performance ARM chips to once again call into question the viability of ARM on a pure performance and efficiency basis. While ARM implementations with embedded accelerators such as DSPs may still be attractive in selected workloads, the availability of a mainstream x86 option at these power levels may blunt the pace of ARM design wins both for general-purpose servers as well as embedded designs, notably for storage systems.
We have been watching many variants on efficient packaging of servers for highly scalable workloads for years, including blades, modular servers, and dense HPC rack offerings from multiple vendors, most of the highly effective, and all highly proprietary. With the advent of Facebook’s Open Compute Project, the table was set for a wave of standardized rack servers and the prospect of very cost-effective rack-scale deployments of very standardized servers. But the IP for intelligently shared and managed power and cooling at a rack level needed a serious R&D effort that the OCP community, by and large, was unwilling to make. Into this opportunity stepped Intel, which has been quietly working on its internal Rack Scale Architecture (RSA) program for the last couple of years, and whose first product wave was officially outed recently as part of an announcement by Intel and Ericsson.
While not officially announcing Intel’s product nomenclature, Ericsson announced their “HDS 8000” based on Intel’s RSA, and Intel representatives then went on to explain the fundamental of RSA, including a view of the enhancements coming this year.
RSA is a combination of very standardized x86 servers, a specialized rack enclosure with shared Ethernet switching and power/cooling, and layers of firmware to accomplish a set of tasks common to managing a rack of servers, including:
· Asset discovery
· Switch setup and management
· Power and cooling management across the servers with the rack
On one level, IBM’s new z13, announced last Wednesday in New York, is exactly what the mainframe world has been expecting for the last two and a half years – more capacity (a big boost this time around – triple the main memory, more and faster cores, more I/O ports, etc.), a modest boost in price performance, and a very sexy cabinet design (I know it’s not really a major evaluation factor, but I think IBM’s industrial design for its system enclosures for Flex System, Power and the z System is absolutely gorgeous, should be in the MOMA*). IBM indeed delivered against these expectations, plus more. In this case a lot more.
In addition to the required upgrades to fuel the normal mainframe upgrade cycle and its reasonably predictable revenue, IBM has made a bold but rational repositioning of the mainframe as a core platform for the workloads generated by mobile transactions, the most rapidly growing workload across all sectors of the global economy. What makes this positioning rational as opposed to a pipe-dream for IBM is an underlying pattern common to many of these transactions – at some point they access data generated by and stored on a mainframe. By enhancing the economics of the increasingly Linux-centric processing chain that occurs before the call for the mainframe data, IBM hopes to foster the migration of these workloads to the mainframe where its access to the resident data will be more efficient, benefitting from inherently lower latency for data access as well as from access to embedded high-value functions such as accelerators for inline analytics. In essence, IBM hopes to shift the center of gravity for mobile processing toward the mainframe and away from distributed x86 Linux systems that they no longer manufacture.