Background — High Performance Attached Processors Handicapped By Architecture
The application of high-performance accelerators, notably GPUs, GPGPUs (APUs in AMD terminology) to a variety of computing problems has blossomed over the last decade, resulting in ever more affordable compute power for both horizon and mundane problems, along with growing revenue streams for a growing industry ecosystem. Adding heat to an already active mix, Intel’s Xeon Phi accelerators, the most recent addition to the GPU ecosystem, have the potential to speed adoption even further due to hoped-for synergies generated by the immense universe of x86 code that could potentially run on the Xeon Phi cores.
However, despite any potential synergies, GPUs (I will use this term generically to refer to all forms of these attached accelerators as they currently exist in the market) suffer from a fundamental architectural problem — they are very distant, in terms of latency, from the main scalar system memory and are not part of the coherent memory domain. This in turn has major impacts on performance, cost, design of the GPUs, and the structure of the algorithms:
Performance — The latency for memory accesses generally dictated by PCIe latencies, which while much improved over previous generations, are a factor of 100 or more longer than latency from coherent cache or local scalar CPU memory. While clever design and programming, such as overlapping and buffering multiple transfers can hide the latency in a series of transfers, it is difficult to hide the latency for an initial block of data. Even AMD’s integrated APUs, in which the GPU elements are on a common die, do not share a common memory space, and explicit transfers are made in and out of the APU memory.
HP today announced the Moonshot 1500 server, their first official volume product in the Project Moonshot server product family (the initial Redstone, a Calxeda ARM-based server, was only available in limited quantities as a development system), and it represents both a significant product today and a major stake in the ground for future products, both from HP and eventually from competitors. It’s initial attractions – an extreme density low power x86 server platform for a variety of low-to-midrange CPU workloads – hides the fact that it is probably a blueprint for both a family of future products from HP as well as similar products from other vendors.
Geek Stuff – What was Announced
The Moonshot 1500 is a 4.3U enclosure that can contain up to 45 plug-in server cartridges, each one a complete server node with a dual-core Intel Atom 1200 CPU, up to 8 GB of memory and a single disk or SSD device, up to 1 TB, and the servers share common power supplies and cooling. But beyond the density, the real attraction of the MS1500 is its scalable fabric and CPU-agnostic architecture. Embedded in the chassis are multiple fabrics for storage, management and network giving the MS1500 (my acronym, not an official HP label) some of the advantages of a blade server without the advanced management capabilities. At initial shipment, only the network and management fabric will be enabled by the system firmware, with each chassis having up two Gb Ethernet switches (technically they can be configured with one, but nobody will do so), allowing the 45 servers to share uplinks to the enterprise network.
Today’s announcements at the Open Compute Project (OCP) 2013 Summit could be considered as tangible markers for the OCP crossing the line into real relevance as an important influence on emerging hyper-scale and cloud computing as well as having a potential bleed-through into the world of enterprise data centers and computing. This is obviously a subjective viewpoint – there is no objective standard for relevance, only post-facto recognition that something was important or not. But in this case I’m going to stick my neck out and predict that OCP will have some influence and will be a sticky presence in the industry for many years.
Even if their specs (which look generally quite good) do not get picked up verbatim, they will act as an influence on major vendors who will, much like the auto industry in the 1970s, get the message that there is a market for economical “low-frills” alternatives.
Major OCP Initiatives
To date, OCP has announced a number of useful hardware specifications, including:
With a couple of months' perspective, I’m pretty convinced that Intel has made a potentially disruptive entry in the market for programmable computational accelerators, often referred to as GPGPUs (General Purpose Graphics Processing Units) in deference to the fact that the market leaders, NVIDIA and AMD, have dominated the segment with parallel computational units derived from high-end GPUs. In late 2012, Intel, referring to the architecture as MIC (Many Independent Cores) introduced the Xeon Phi product, the long-awaited productization of the development project that was known internally (and to the rest of the world as well) as Knight’s Ferry, a MIC coprocessor with up to 62 modified Xeon cores implemented in its latest 22 nm process.
On Tuesday November 8, after more than a year of pre-announcement disclosures that eventually left very little to the imagination, Intel finally announced the Itanium 9500, formerly known as Poulson. Added to this was the big surprise of HP announcing a refresh of its current line of Integrity servers, from blades to the large Superdome servers, with the new Itanium 9500.
As noted in an earlier post, the Itanium 9500 offers considerable performance improvements over its predecessors, and instantiated in HP’s new Integrity line it is positioned as delivering between 2X and 3X the performance per socket as previous Itanium 9300 (Tukwilla) systems at approximately the same price. For those remaining committed to Itanium and its attendant OS platforms, notably HP-UX, this is unmitigated good news. The fly in the ointment (I have never seen a fly in any ointment, but it does sound gross), of course, is HP’s dispute with Oracle. Despite the initial judgment in HP’s favor, the trial is a) not over yet, and b) Oracle has already filed for an early appeal of the initial verdict, which would ordinarily have to wait until the second phase of the trial, scheduled for next year, to finish. The net takeaway is that Oracle’s future availability on Itanium and HP-UX is not yet assured, so we really cannot advise the large number of Oracle users who will require Oracle 12 and later versions to relax yet.
Earlier this week, in conjunction with ARM Holdings plc’s announcement of the upcoming Cortex A53 and A57, full 64-bit CPU implementations based on the ARM V8 specification, AMD also announced that it would be designing and selling SOC (System On a Chip) products based on this technology in 2014, roughly coinciding with availability of 64-bit parts from ARM and other partners.
This is a major event in the ARM ecosystem. AMD, while much smaller than Intel, is still a multi-billion-dollar enterprise, and for the second largest vendor of x86 chips to also throw its hat into the ARM ecosystem and potentially compete with its own mainstream server and desktop CPU business is an aggressive move on the part of AMD management that carries some risk and much potential advantage.
Reduced to its essentials, what AMD announced (and in some cases hinted at):
Intention to produce A53/A57 SOC modules for multiple server segments. There was no formal statement of intentions regarding tablet/mobile devices, but it doesn’t take a rocket scientist to figure out that AMD wants a piece of this market, and ARM is a way to participate.
The announcement is wider that just the SOC silicon. AMD also hinted at making a range of IP, including its fabric architecture from the SeaMicro architecture, available in the form of “reusable IP blocks.” My interpretation is that it intends to make the fabric, reference architectures, and various SOCs available to its hardware system partners.
[For some reason this has been unpublished since April — so here it is well after AMD announced its next spin of the SeaMicro product.]
At its recent financial analyst day, AMD indicated that it intended to differentiate itself by creating products that were advantaged in niche markets, with specific mention, among other segments, of servers, and to generally shake up the trench warfare that has had it on the losing side of its lifelong battle with Intel (my interpretation, not AMD management’s words). Today, at least for the server side of the business, it made a move that can potentially offer it visibility and differentiation by acquiring innovative server startup SeaMicro.
SeaMicro has attracted our attention since its appearance (blog post 1, blog post 2) with its innovative architecture that dramatically reduces power and improves density by sharing components like I/O adapters, disks, and even BIOS over a proprietary fabric. The irony here is that SeaMicro came to market with a tight alignment with Intel, who at one point even introduced a special dual-core packaging of its Atom CPU to allow SeaMicro to improve its density and power efficiency. Most recently SeaMicro and Intel announced a new model that featured Xeon CPUs to address the more mainstream segments that were not a part of SeaMicro’s original Atom-based offering.
The most notable news to come out of the VMworld conference last week was the coronation of Pat Gelsinger as the new CEO of VMware. His tenure officially started over the weekend, on September 1, to be exact.
For those who don’t know Pat’s career, he gained fame at Intel as the personification of the x86 processor family. It’s unfair to pick a single person as the father of the modern x86 architecture, but if you had to pick just one person, it’s probably Pat. He then grew to become CTO, and eventually ran the Digital Enterprise Group. This group accounted for 55% of Intel’s US$37.586B in revenue according to its 2008 annual report, the last full year of Pat’s tenure. EMC poached him from Intel in 2009, naming him president of the Information Infrastructure Products group. EMC’s performance since then has been very strong, with a 17.5% YoY revenue increase in its latest annual report. Pat’s group contributed 53.7% of that revenue. While he’s a geek at heart (his early work), he proved without a doubt that he also has the business execution chops (his later work). Both will serve him well at VMware, especially the latter.
This week the California courts handed down a nice present for HP — a verdict confirming that Oracle was required to continue to deliver its software on HP’s Itanium-based Integrity servers. This was a major victory for HP, on the face of it giving them the prize they sought — continued availability of Oracle’s eponymous database on their high-end systems.
However, HP’s customers should not immediately assume that everything has returned to a “status quo ante.” Once Humpty Dumpty has fallen off the wall it is very difficult to put the pieces together again. As I see it, there are still three major elephants in the room that HP users must acknowledge before they make any decisions:
Oracle will appeal, and there is no guarantee of the outcome. The verdict could be upheld or it could be reversed. If it is upheld, then that represents a further delay in the start date from which Oracle will be measured for its compliance with the court ordered development. Oracle will also continue to press its counterclaims against HP, but those do not directly relate to the continued development or Oracle software on Itanium.
Itanium is still nearing the end of its road map. A reasonable interpretation of the road map tea leaves that have been exposed puts the final Itanium release at about 2015 unless Intel decides to artificially split Kittson into two separate releases. Integrity customers must take this into account as they buy into the architecture in the last few years of Itanium’s life, although HP can be depended on to offer high-quality support for a decade after the last Itanium CPU rolls off Intel’s fab lines. HP has declared its intention to produce Integrity-level x86 systems, but OS support intentions are currently stated as Linux and Windows, not HP-UX.
Earlier this week at its Discover customer event, HP announced a significant set of improvements to its already successful c-Class BladeSystem product line, which, despite continuing competitive pressure from IBM and the entry of Cisco into the market three years ago, still commands approximately 50% of the blade market. The significant components of this announcement fall into four major functional buckets – improved hardware, simplified and expanded storage features, new interconnects and I/O options, and serviceability enhancements. Among the highlights are:
Direct connection of HP 3PAR storage – One of the major drawbacks for block-mode storage with blades has always been the cost of the SAN to connect it to the blade enclosure. With the ability to connect an HP 3PAR storage array directly to the c-Class enclosure without any SAN components, HP has reduced both the cost and the complexity of storage for a wide class of applications that have storage requirements within the scope of a single storage array.
New blades – With this announcement, HP fills in the gaps in their blade portfolio, announcing a new Intel Xeon EN based BL-420 for entry requirements, an upgrade to the BL-465 to support the latest AMD 16-core Interlagos CPU, and the BL-660, a new single-width Xeon E5 based 4-socket blade. In addition, HP has expanded the capacity of the sidecar storage blade to 1.5 TB, enabling an 8-server and 12 TB + chassis configuration.