On April 23, IBM rolled out the long-awaited POWER8 CPU, the successor to POWER7+, and given the extensive pre-announcement speculation, the hardware itself was no big surprise (the details are fascinating, but not suitable for this venue), offering an estimated 30 - 50% improvement in application performance over the latest POWER7+, with potential for order of magnitude improvements with selected big data and analytics workloads. While the technology is interesting, we are pretty numb to the “bigger, better, faster” messaging that inevitably accompanies new hardware announcements, and the real impact of this announcement lies in its utility for current AIX users and IBM’s increased focus on Linux and its support of the OpenPOWER initiative.
OK, so we’re numb, but it’s still interesting. POWER8 is an entirely new processor generation implemented in 22 nm CMOS (the same geometry as Intel’s high-end CPUs). The processor features up to 12 cores, each with up to 8 threads, and a focus on not only throughput but high performance per thread and per core for low-thread-count applications. Added to the mix is up to 1 TB of memory per socket, massive PCIe 3 I/O connectivity and Coherent Accelerator Processor Interface (CAPI), IBM’s technology to deliver memory-controller-based access for accelerators and flash memory in POWER systems. CAPI figures prominently in IBM’s positioning of POWER as the ultimate analytics engine, with the announcement profiling the performance of a configuration using 40 TB of CAPI-attached flash for huge in-memory analytics at a fraction of the cost of a non-CAPI configuration.[i]
A Slam-dunk for AIX users and a new play for Linux
As the new year looms, thoughts turn once again to our annual reading of the tea leaves, in this case focused on what I see coming in server land. We’ve just published the full report, Predictions for 2014: Servers & Data Centers, but as teaser, here are a few of the major highlights from the report:
1. Increasing choices in form factor and packaging – I&O pros will have to cope with a proliferation of new form factors, some optimized for dense low-power cloud workloads, some for general purpose legacy IT, and some for horizontal VM clusters (or internal cloud if you prefer). These will continue to appear in an increasing number of variants.
2. ARM – Make or break time is coming, depending on the success of coming 64-bit ARM CPU/SOC designs with full server feature sets including VM support.
3. The beat goes on – Major turn of the great wheel coming for server CPUs in early 2014.
4. Huge potential disruption in flash architecture – Introduction of flash in main memory DIMM slots has the potential to completely disrupt how flash is used in storage tiers, and potentially can break the current storage tiering model, initially physically with the potential to ripple through memory architectures.
When I returned to Forrester in mid-2010, one of the first blog posts I wrote was about Oracle’s new roadmap for SPARC and Solaris, catalyzed by numerous client inquiries and other interactions in which Oracle’s real level of commitment to future SPARC hardware was the topic of discussion. In most cases I could describe the customer mood as skeptical at best, and panicked and committed to migration off of SPARC and Solaris at worst. Nonetheless, after some time spent with Oracle management, I expressed my improved confidence in the new hardware team that Oracle had assembled and their new roadmap for SPARC processors after the successive debacles of the UltraSPARC-5 and Rock processors under Sun’s stewardship.
Two and a half years later, it is obvious that Oracle has delivered on its commitments regarding SPARC and is continuing its investments in SPARC CPU and system design as well as its Solaris OS technology. The latest evolution of SPARC technology, the SPARC T5 and the soon-to-be-announced M5, continue the evolution and design practices set forth by Oracle’s Rick Hetherington in 2010 — incremental evolution of a common set of SPARC cores, differentiation by variation of core count, threads and cache as opposed to fundamental architecture, and a reliable multi-year performance progression of cores and system scalability.
Nathan Bedford Forrest, a Confederate general of despicable ideology and consummate tactics, spoke of “keepin up the skeer,” applying continued pressure to opponents to prevent them from regrouping and counterattacking. POWER7+, the most recent version of IBM’s POWER architecture, anticipated as a follow-up to the POWER7 for almost a year, was finally announced this week, and appears to be “keepin up the skeer” in terms of its competitive potential for IBM POWER-based systems. In short, it is a hot piece of technology that will keep existing IBM users happy and should help IBM maintain its impressive momentum in the Unix systems segment.
For the chip heads, the CPU is implemented in a 32 NM process, the same as Intel’s upcoming Poulson, and embodies some interesting evolutions in high-end chip design, including:
Use of DRAM instead of SRAM — IBM has pioneered the use of embedded DRAM (eDRAM) as embedded L3 cache instead of the more standard and faster SRAM. In exchange for the loss of speed, eDRAM requires fewer transistors and lower power, allowing IBM to pack a total of 80 MB (a lot) of shared L3 cache, far more than any other product has ever sported.
Over the last couple of years, IBM, despite having a rich internal technology ecosystem and a number of competitive blade and CI offerings, has not had a comprehensive integrated offering to challenge HP’s CloudSystem Matrix and Cisco’s UCS. This past week IBM effectively silenced its critics and jumped to the head of the CI queue with the announcement of two products, PureFlex and PureApplication, the results of a massive multi-year engineering investment in blade hardware, systems management, networking, and storage integration. Based on a new modular blade architecture and new management architecture, the two products are really more of a continuum of a product defined by the level of software rather than two separate technology offerings.
PureFlex is the base product, consisting of the new hardware (which despite having the same number of blades as the existing HS blade products, is in fact a totally new piece of hardware), which integrates both BNT-based networking as well as a new object-based management architecture which can manage up to four chassis and provide a powerful setoff optimization, installation, and self-diagnostic functions for the hardware and software stack up to and including the OS images and VMs. In addition IBM appears to have integrated the complete suite of Open Fabric Manager and Virtual Fabric for remapping MAC/WWN UIDs and managing VM networking connections, and storage integration via the embedded V7000 storage unit, which serves as both a storage pool and an aggregation point for virtualizing external storage. The laundry list of features and functions is too long to itemize here, but PureFlex, especially with its hypervisor-neutrality and IBM’s Cloud FastStart option, is a complete platform for an enterprise private cloud or a horizontal VM compute farm, however you choose to label a shared VM utility.
There has been a lot of ill-considered press coverage about the “death” of UNIX and coverage of the wholesale migration of UNIX workloads to LINUX, some of which (the latter, not the former) I have contributed to. But to set the record straight, the extinction of UNIX is not going to happen in our lifetime.
While UNIX revenues are not growing at any major clip, it appears as if they have actually had a slight uptick over the past year, probably due to a surge by IBM, and seem to be nicely stuck around the $18 - 20B level annual range. But what is important is the “why,” not the exact dollar figure.
UNIX on proprietary RISC architectures will stay around for several reasons that primarily revolve around their being the only close alternative to mainframes in regards to specific high-end operational characteristics:
Performance – If you need the biggest single-system SMP OS image, UNIX is still the only realistic commercial alternative other than mainframes.
Isolated bulletproof partitionability – If you want to run workload on dynamically scalable and electrically isolated partitions with the option to move workloads between them while running, then UNIX is your answer.
Near-ultimate availability – If you are looking for the highest levels of reliability and availability ex mainframes and custom FT systems, UNIX is the answer. It still possesses slight availability advantages, especially if you factor in the more robust online maintenance capabilities of the leading UNIX OS variants.
Last year I wrote about Oracle’s new plans for SPARC, anchored by a new line of SPARC CPUs engineered in conjunction with Fujitsu (Does SPARC have a Future?), and commented that the first deliveries of this new technology would probably be in early 2012, and until we saw this tangible evidence of Oracle’s actual execution of this road map we could not predict with any confidence the future viability of SPARC.
The T4 CPU
Fast forward a year and Oracle has delivered the first of the new CPUs, ahead of schedule and with impressive gains in performance that make it look like SPARC will remain a viable platform for years. Specifically, Oracle has introduced the T4 CPU and systems based on them. The T4, an evolution of Oracle’s highly threaded T-Series architecture, is implemented with an entirely new core that will form the basis, with variations in number of threads versus cores and cache designs, of the future M and T series systems. The M series will have fewer threads and more performance per thread, while the T CPUs will, like their predecessors, emphasize throughput for highly threaded workloads. The new T4 will have 8 cores, and each core will have 8 threads. While the T4 emphasizes highly threaded workload performance, it is important to note that Oracles has radically improved single-thread performance over its predecessors, with Oracle claiming performance per thread improvements of 5X over its predecessors, greatly improving its utility as a CPU to power less thread-intensive workloads as well.
On June 15, HP announced that it had filed suit against Oracle, saying in a statement:
“HP is seeking the court’s assistance to compel Oracle to:
Reverse its decision to discontinue all software development on the Itanium platform
Reaffirm its commitment to offer its product suite on HP platforms, including Itanium;
Immediately reset the Itanium core processor licensing factor consistent with the model prior to December 1, 2010 for RISC/EPIC systems
HP also seeks:
Injunctive relief, including an order prohibiting Oracle from making false and misleading statements regarding the Itanium microprocessor or HP’s Itanium-based servers and remedying the harm caused by Oracle’s conduct.
Damages and fees and other standard remedies available in cases of this nature.”
Oracle announced today that it is going to cease development for Itanium across its product line, stating that itbelieved, after consultation with Intel management, that x86 was Intel’s strategic platform. Intel of course responded with a press release that specifically stated that there were at least two additional Itanium products in active development – Poulsen (which has seen its initial specifications, if not availability, announced), and Kittson, of which little is known.
This is a huge move, and one that seems like a kick carefully aimed at the you know what’s of HP’s Itanium-based server business, which competes directly with Oracle’s SPARC-based Unix servers. If Oracle stays the course in the face of what will certainly be immense pressure from HP, mild censure from Intel, and consternation on the part of many large customers, the consequences are pretty obvious:
Intel loses prestige, credibility for Itanium, and a potential drop-off of business from its only large Itanium customer. Nonetheless, the majority of Intel’s server business is x86, and it will, in the end, suffer only a token loss of revenue. Intel’s response to this move by Oracle will be muted – public defense of Itanium, but no fireworks.
This week at ISSCC, Intel made its first detailed public disclosures about its upcoming “Poulson” next-generation Itanium CPU. While not in any sense complete, the details they did disclose paint a picture of a competent product that will continue to keep the heat on in the high-end UNIX systems market. Highlights include:
Process — Poulson will be produced in a 32 nm process, skipping the intermediate 45 nm step that many observers expected to see as a step down from the current 65 nm Itanium process. This is a plus for Itanium consumers, since it allows for denser circuits and cheaper chips. With an industry record 3.1 billion transistors, Poulson needs all the help it can get keeping size and power down. The new process also promises major improvements in power efficiency.
Cores and cache — Poulson will have 8 cores and 54 MB of on-chip cache, a huge amount, even for a cache-sensitive architecture like Itanium. Poulson will have a 12-issue pipeline instead of the current 6-issue pipeline, promising to extract more performance from existing code without any recompilation.
Compatibility — Poulson is socket- and pin-compatible with the current Itanium 9300 CPU, which will mean that HP can move more quickly into production shipments when it's available.