Extremes of x86 Servers Illustrate the Depth of the Ecosystem and the Diversity of Workloads

Richard Fichera

I’ve recently been thinking a lot about application-specific workloads and architectures (Optimize Scalalable Workload-Specific Infrastructure for Customer Experiences), and it got me to thinking about the extremes of the server spectrum – the very small and the very large as they apply to x86 servers. The range, and the variation in intended workloads is pretty spectacular as we diverge from the mean, which for the enterprise means a 2-socket Xeon server, usually in 1U or 2U form factors.

At the bottom, we find really tiny embedded servers, some with very non-traditional packaging. My favorite is probably the technology from Arnouse digital technology, a small boutique that produces computers primarily for military and industrial ruggedized environments.

Slightly bigger than a credit card, their BioDigital server is a rugged embedded server with up to 8 GB of RAM and 128 GB SSD and a very low power footprint. Based on an Atom-class CPU, thus is clearly not the choice for most workloads, but it is an exemplar of what happens when the workload is in a hostile environment and the computer maybe needs to be part of a man-carried or vehicle-mounted portable tactical or field system. While its creators are testing the waters for acceptance as a compute cluster with up to 4000 of them mounted in a standard rack, it’s likely that these will remain a niche product for applications requiring the intersection of small size, extreme ruggedness and complete x86 compatibility, which includes a wide range of applications from military to portable desktop modules.

Read more

IBM Announces Next Generation POWER Systems – Big Win for AIX Users, New Option for Linux

Richard Fichera

On April 23, IBM rolled out the long-awaited POWER8 CPU, the successor to POWER7+, and given the extensive pre-announcement speculation, the hardware itself was no big surprise (the details are fascinating, but not suitable for this venue), offering an estimated  30 - 50% improvement in application performance over the latest POWER7+, with potential for order of magnitude improvements with selected big data and analytics workloads. While the technology is interesting, we are pretty numb to the “bigger, better, faster” messaging that inevitably accompanies new hardware announcements, and the real impact of this announcement lies in its utility for current AIX users and IBM’s increased focus on Linux and its support of the OpenPOWER initiative.

Technology

OK, so we’re numb, but it’s still interesting. POWER8 is an entirely new processor generation implemented in 22 nm CMOS (the same geometry as Intel’s high-end CPUs). The processor features up to 12 cores, each with up to 8 threads, and a focus on not only throughput but high performance per thread and per core for low-thread-count applications. Added to the mix is up to 1 TB of memory per socket, massive PCIe 3 I/O connectivity and Coherent Accelerator Processor Interface (CAPI), IBM’s technology to deliver memory-controller-based access for accelerators and flash memory in POWER systems. CAPI figures prominently in IBM’s positioning of POWER as the ultimate analytics engine, with the announcement profiling the performance of a configuration using 40 TB of CAPI-attached flash for huge in-memory analytics at a fraction of the cost of a non-CAPI configuration.[i]

A Slam-dunk for AIX users and a new play for Linux

Read more

Intel Bumps up High-End Servers with New Xeon E7 V2 - A Long Awaited and Timely Leap

Richard Fichera

The long draught at the high-end

It’s been a long wait, about four years if memory serves me well, since Intel introduced the Xeon E7, a high-end server CPU targeted at the highest performance per-socket x86, from high-end two socket servers to 8-socket servers with tons of memory and lots of I/O. In the ensuing four years (an eternity in a world where annual product cycles are considered the norm), subsequent generations of lesser Xeons, most recently culminating in the latest generation 22 nm Xeon E5 V2 Ivy Bridge server CPUs, have somewhat diluted the value proposition of the original E7.

So what is the poor high-end server user with really demanding single-image workloads to do? The answer was to wait for the Xeon E7 V2, and at first glance, it appears that the wait was worth it. High-end CPUs take longer to develop than lower-end products, and in my opinion Intel made the right decision to skip the previous generation 22nm Sandy Bridge architecture and go to Ivy Bridge, it’s architectural successor in the Intel “Tick-Tock” cycle of new process, then new architecture.

What was announced?

The announcement was the formal unveiling of the Xeon E7 V2 CPU, available in multiple performance bins with anywhere from 8 to 15 cores per socket. Critical specifications include:

  • Up to 15 cores per socket
  • 24 DIMM slots, allowing up to 1.5 TB of memory with 64 GB DIMMs
  • Approximately 4X I/O bandwidth improvement
  • New RAS features, including low-level memory controller modes optimized for either high-availability or performance mode (BIOS option), enhanced error recovery and soft-error reporting
Read more

2014 Server and Data Center Predictions

Richard Fichera

As the new year looms, thoughts turn once again to our annual reading of the tea leaves, in this case focused on what I see coming in server land. We’ve just published the full report, Predictions for 2014: Servers & Data Centers, but as teaser, here are a few of the major highlights from the report:

1.      Increasing choices in form factor and packaging – I&O pros will have to cope with a proliferation of new form factors, some optimized for dense low-power cloud workloads, some for general purpose legacy IT, and some for horizontal VM clusters (or internal cloud if you prefer). These will continue to appear in an increasing number of variants.

2.      ARM – Make or break time is coming, depending on the success of coming 64-bit ARM CPU/SOC designs with full server feature sets including VM support.

3.      The beat goes on – Major turn of the great wheel coming for server CPUs in early 2014.

4.      Huge potential disruption in flash architecture – Introduction of flash in main memory DIMM slots has the potential to completely disrupt how flash is used in storage tiers, and potentially can break the current storage tiering model, initially physically with the potential to ripple through memory architectures.

Read more

Intel Lays Out Future Data Center Strategy - Serious Focus on Emerging Opportunities

Richard Fichera

Yesterday Intel had a major press and analyst event in San Francisco to talk about their vision for the future of the data center, anchored on what has become in many eyes the virtuous cycle of future infrastructure demand – mobile devices and “the Internet of things” driving cloud resource consumption, which in turn spews out big data which spawns storage and the requirement for yet more computing to analyze it. As usual with these kinds of events from Intel, it was long on serious vision, and strong on strategic positioning but a bit parsimonious on actual future product information with a couple of interesting exceptions.

Content and Core Topics:

No major surprises on the underlying demand-side drivers. The the proliferation of mobile device, the impending Internet of Things and the mountains of big data that they generate will combine to continue to increase demand for cloud-resident infrastructure, particularly servers and storage, both of which present Intel with an opportunity to sell semiconductors. Needless to say, Intel laced their presentations with frequent reminders about who was the king of semiconductor manufacturingJ

Read more

AMD Quietly Rolls Out hUMA – Potential Game-Changer for Parallel Computing

Richard Fichera

Background  High Performance Attached Processors Handicapped By Architecture

The application of high-performance accelerators, notably GPUs, GPGPUs (APUs in AMD terminology) to a variety of computing problems has blossomed over the last decade, resulting in ever more affordable compute power for both horizon and mundane problems, along with growing revenue streams for a growing industry ecosystem. Adding heat to an already active mix, Intel’s Xeon Phi accelerators, the most recent addition to the GPU ecosystem, have the potential to speed adoption even further due to hoped-for synergies generated by the immense universe of x86 code that could potentially run on the Xeon Phi cores.

However, despite any potential synergies, GPUs (I will use this term generically to refer to all forms of these attached accelerators as they currently exist in the market) suffer from a fundamental architectural problem — they are very distant, in terms of latency, from the main scalar system memory and are not part of the coherent memory domain. This in turn has major impacts on performance, cost, design of the GPUs, and the structure of the algorithms:

  • Performance — The latency for memory accesses generally dictated by PCIe latencies, which while much improved over previous generations, are a factor of 100 or more longer than latency from coherent cache or local scalar CPU memory. While clever design and programming, such as overlapping and buffering multiple transfers can hide the latency in a series of transfers, it is difficult to hide the latency for an initial block of data. Even AMD’s integrated APUs, in which the GPU elements are on a common die, do not share a common memory space, and explicit transfers are made in and out of the APU memory.
Read more

HP Launches First Project Moonshot Server – The Shape of Things to Come?

Richard Fichera

 

Overview - Moonshot Takes Off

HP today announced the Moonshot 1500 server, their first official volume product in the Project Moonshot server product family (the initial Redstone, a Calxeda ARM-based server, was only available in limited quantities as a development system), and it represents both a significant product today and a major stake in the ground for future products, both from HP and eventually from competitors. It’s initial attractions – an extreme density low power x86 server platform for a variety of low-to-midrange CPU workloads – hides the fact that it is probably a blueprint for both a family of future products from HP as well as similar products from other vendors.

Geek Stuff – What was Announced

The Moonshot 1500 is a 4.3U enclosure that can contain up to 45 plug-in server cartridges, each one a complete server node with a dual-core Intel Atom 1200 CPU, up to 8 GB of memory and a single disk or SSD device, up to 1 TB, and the servers share common power supplies and cooling. But beyond the density, the real attraction of the MS1500 is its scalable fabric and CPU-agnostic architecture. Embedded in the chassis are multiple fabrics for storage, management and network giving the MS1500 (my acronym, not an official HP label) some of the advantages of a blade server without the advanced management capabilities. At initial shipment, only the network and management fabric will be enabled by the system firmware, with each chassis having up two Gb Ethernet switches (technically they can be configured with one, but nobody will do so), allowing the 45 servers to share uplinks to the enterprise network.

Read more

Open Compute Project – Rising Relevance And More Stakeholders

Richard Fichera

Background

Today’s announcements at the Open Compute Project (OCP) 2013 Summit could be considered as tangible markers for the OCP crossing the line into real relevance as an important influence on emerging hyper-scale and cloud computing as well as having a potential bleed-through into the world of enterprise data centers and computing. This is obviously a subjective viewpoint – there is no objective standard for relevance, only post-facto recognition that something was important or not. But in this case I’m going to stick my neck out and predict that OCP will have some influence and will be a sticky presence in the industry for many years.

Even if their specs (which look generally quite good) do not get picked up verbatim, they will act as an influence on major vendors who will, much like the auto industry in the 1970s, get the message that there is a market for economical “low-frills” alternatives.

Major OCP Initiatives

To date, OCP has announced a number of useful hardware specifications, including:

Read more

Intel Makes Its Mark In The HPC Segment With Xeon Phi

Richard Fichera

Background

With a  couple of months' perspective, I’m pretty convinced that Intel has made a potentially disruptive entry in the market for programmable computational accelerators, often referred to as GPGPUs (General Purpose Graphics Processing Units) in deference to the fact that the market leaders, NVIDIA and AMD, have dominated the segment with parallel computational units derived from high-end GPUs. In late 2012, Intel, referring to the architecture as MIC (Many Independent Cores) introduced the Xeon Phi product, the long-awaited productization of the development project that was known internally (and to the rest of the world as well) as Knight’s Ferry, a MIC coprocessor with up to 62 modified Xeon cores implemented in its latest 22 nm process.

Why Xeon Phi Is Important

Read more

Tectonic Shift In The ARM Ecosystem — AMD Announces ARM Intentions

Richard Fichera

Earlier this week, in conjunction with ARM Holdings plc’s announcement of the upcoming Cortex A53 and A57, full 64-bit CPU implementations based on the ARM V8 specification, AMD also announced that it would be designing and selling SOC (System On a Chip) products based on this technology in 2014, roughly coinciding with availability of 64-bit parts from ARM and other partners.

This is a major event in the ARM ecosystem. AMD, while much smaller than Intel, is still a multi-billion-dollar enterprise, and for the second largest vendor of x86 chips to also throw its hat into the ARM ecosystem and potentially compete with its own mainstream server and desktop CPU business is an aggressive move on the part of AMD management that carries some risk and much potential advantage.

Reduced to its essentials, what AMD announced (and in some cases hinted at):

  • Intention to produce A53/A57 SOC modules for multiple server segments. There was no formal statement of intentions regarding tablet/mobile devices, but it doesn’t take a rocket scientist to figure out that AMD wants a piece of this market, and ARM is a way to participate.
  • The announcement is wider that just the SOC silicon. AMD also hinted at making a range of IP, including its fabric architecture from the SeaMicro architecture, available in the form of “reusable IP blocks.” My interpretation is that it intends to make the fabric, reference architectures, and various SOCs available to its hardware system partners.
Read more