With a couple of months' perspective, I’m pretty convinced that Intel has made a potentially disruptive entry in the market for programmable computational accelerators, often referred to as GPGPUs (General Purpose Graphics Processing Units) in deference to the fact that the market leaders, NVIDIA and AMD, have dominated the segment with parallel computational units derived from high-end GPUs. In late 2012, Intel, referring to the architecture as MIC (Many Independent Cores) introduced the Xeon Phi product, the long-awaited productization of the development project that was known internally (and to the rest of the world as well) as Knight’s Ferry, a MIC coprocessor with up to 62 modified Xeon cores implemented in its latest 22 nm process.
Today, after two of its largest partners have already announced their systems portfolios that will use it, Intel finally announced one of the worst-kept secrets in the industry: the Xeon E5-2600 family of processors.
OK, now that I’ve got in my jab at the absurdity of the announcement scheduling, let’s look at the thing itself. In a nutshell, these new processors, based on the previous-generation 32 nm production process of the Xeon 5600 series but incorporating the new “Sandy Bridge” architecture, are, in fact, a big deal. They incorporate several architectural innovations and will bring major improvements in power efficiency and performance to servers. Highlights include:
Performance improvements on selected benchmarks of up to 80% above the previous Xeon 5600 CPUs, apparently due to both improved CPU architecture and larger memory capacity (up to 24 DIMMs at 32 GB per DIMM equals a whopping 768 GB capacity for a two-socket, eight-core/socket server).
Improved I/O architecture, including an on-chip PCIe 3 controller and a special mode that allows I/O controllers to write directly to the CPU cache without a round trip to memory — a feature that only a handful of I/O device developers will use, but one that contributes to improved I/O performance and lowers CPU overhead during PCIe I/O.
Significantly improved energy efficiency, with the SPECpower_ssj2008 benchmark showing a 50% improvement in performance per watt over previous models.
Intel today publicly announced its anticipated “Westmere EX” high end Westmere architecture server CPU as the E7, now part of a new family nomenclature encompassing entry (E3), midrange (E5), and high-end server CPUs (E7), and at first glance it certainly looks like it delivers on the promise of the Westmere architecture with enhancements that will appeal to buyers of high-end x86 systems.
The E7 in a nutshell:
32 nm CPU with up to 10 cores, each with hyper threading, for up to 20 threads per socket.
Intel claims that the system-level performance will be up to 40% higher than the prior generation 8-core Nehalem EX. Notice that the per-core performance improvement is modest (although Intel does offer a SKU with 8 cores and a slightly higher clock rate for those desiring ultimate performance per thread).
Improvements in security with Intel Advanced Encryption Standard New Instruction (AES-NI) and Intel Trusted Execution Technology (Intel TXT).
Major improvements in power management by incorporating the power management capabilities from the Xeon 5600 CPUs, which include more aggressive P states, improved idle power operation, and the ability to separately reduce individual core power setting depending on workload, although to what extent this is supported on systems that do not incorporate Intel’s Node Manager software is not clear.
This week at ISSCC, Intel made its first detailed public disclosures about its upcoming “Poulson” next-generation Itanium CPU. While not in any sense complete, the details they did disclose paint a picture of a competent product that will continue to keep the heat on in the high-end UNIX systems market. Highlights include:
Process — Poulson will be produced in a 32 nm process, skipping the intermediate 45 nm step that many observers expected to see as a step down from the current 65 nm Itanium process. This is a plus for Itanium consumers, since it allows for denser circuits and cheaper chips. With an industry record 3.1 billion transistors, Poulson needs all the help it can get keeping size and power down. The new process also promises major improvements in power efficiency.
Cores and cache — Poulson will have 8 cores and 54 MB of on-chip cache, a huge amount, even for a cache-sensitive architecture like Itanium. Poulson will have a 12-issue pipeline instead of the current 6-issue pipeline, promising to extract more performance from existing code without any recompilation.
Compatibility — Poulson is socket- and pin-compatible with the current Itanium 9300 CPU, which will mean that HP can move more quickly into production shipments when it's available.
Intel today officially announced the first products based on the much-discussed Sandy Bridge CPU architecture, and first impressions are highly favorable, with my take being that Sandy Bridge represents the first step in a very aggressive product road map for Intel in 2011.
Sandy Bridge is the next architectural spin after Intel’s Westmere shrink of the predecessor Nehalem architecture (the “tick” in Intel’s famous “tick-tock” progression of architectural changes followed by process shrink) and incorporates some major innovations compared to the previous architecture:
Minor but in toto significant changes to many aspects of the low-level microarchitecture – more registers, better prefetch, changes to the way instructions and operands are decode, cached and written back to registers and cache.
Major changes in integration of functions on the CPU die – Almost all major subsystems, including CPU, memory controller, graphics controller and PCIe controller, are now integrated onto the same die, along with the ability to share data with much lower latency than in previous generations. In addition to more efficient data sharing, this level of integration allows for better power efficiency.
Improvements to media processing – A dedicated video transcoding engine and an extended vector instruction set for media and floating point calculations improves Sandy Bridge capabilities in several major application domains.