At its recent financial analyst day, AMD indicated that it intended to differentiate itself by creating products that were advantaged in niche markets, with specific mention, among other segments, of servers, and to generally shake up the trench warfare that has had it on the losing side of its lifelong battle with Intel (my interpretation, not AMD management’s words). Today, at least for the server side of the business AMD made a move that can potentially offer it visibility and differentiation by acquiring innovative server startup SeaMicro.
SeaMicro has attracted our attention since its appearance (blog post 1, blog post 2), with its innovative architecture that dramatically reduces power and improves density by sharing components like I/O adapters, disks, and even BIOS over a proprietary fabric. The irony here is that SeaMicro came to market with a tight alignment with Intel, who at one point even introduced a special dual-core packaging of its Atom CPU to allow SeaMicro to improve its density and power efficiency. Most recently SeaMicro and Intel announced a new model that featured Xeon CPUs to address the more mainstream segments that were not for SeaMicro’s original Atom-based offering.
On Monday, February 13, HP announced its next turn of the great wheel for servers with the announcement of its Gen8 family of servers. Interestingly, since the announcement was ahead of Intel’s official announcement of the supporting E5 server CPUs, HP had absolutely nothing to say about the CPUs or performance of these systems. But even if the CPU information had been available, it would have been a sideshow to the main thrust of the Gen8 launch — improving the overall TCO (particularly Opex) of servers by making them more automated, more manageable, and easier to remediate when there is a problem, along with enhancements to storage, data center infrastructure management (DCIM) capabilities, and a fundamental change in the way that services and support are delivered.
With a little more granularity, the major components of the Gen8 server technology announcement included:
Onboard Automation – A suite of capabilities and tools that provide improved agentless local intelligence to allow quicker and lower labor cost provisioning, including faster boot cycles, “one click” firmware updates of single or multiple systems, intelligent and greatly improved boot-time diagnostics, and run-time diagnostics. This is apparently implemented by more powerful onboard management controllers and pre-provisioning a lot of software on built-in flash memory, which is used by the onboard controller. HP claims that the combination of these tools can increase operator productivity by up to 65%. One of the eye-catching features is an iPhone app that will scan a code printed on the server and go back through the Insight Management Environment stack and trigger the appropriate script to provision the server.[i]Possibly a bit of a gimmick, but a cool-looking one.
In late 2010 I noted that startup SeaMicro had introduced an ultra-dense server using Intel Atom chips in an innovative fabric-based architecture that allowed them to factor out much of the power overhead from a large multi-CPU server ( http://blogs.forrester.com/richard_fichera/10-09-21-little_servers_big_applications_intel_developer_forum). Along with many observers, I noted that the original SeaMicro server was well-suited to many light-weight edge processing tasks, but that the system would not support more traditional compute-intensive tasks due to the performance of the Atom core. I was, however, quite taken with the basic architecture, which uses a proprietary high-speed (1.28 Tb/s) 3D mesh interconnect to allow the CPU cores to share network, BIOS and disk resources that are normally replicated on a per-server in conventional designs, with commensurate reductions in power and an increase in density.
Today HP announced a new set of technology programs and future products designed to move x86 server technology for both Windows and Linux more fully into the realm of truly mission-critical computing. My interpretation of these moves is that it is both a combined defensive and pro-active offensive action on HP’s part that will both protect them as their Itanium/HP-UX portfolio slowly declines as well as offer attractive and potentially unique options for both current and future customers who want to deploy increasingly critical services on x86 platforms.
Bearing in mind that the earliest of these elements will not be in place until approximately mid-2012, the key elements that HP is currently disclosing are:
ServiceGuard for Linux – This is a big win for Linux users on HP, and removes a major operational and architectural hurdle for HP-UX migrations. ServiceGuard is a highly regarded clustering and HA facility on HP-UX, and includes many features for local and geographically distributed HA. The lack of ServiceGuard is often cited as a risk in HP-UX migrations. The availability of ServiceGuard by mid-2012 will remove yet another barrier to smooth migration from HP-UX to Linux, and will help make sure that HP retains the business as it migrates from HP-UX.
Analysis engine for x86 – Analysis engine is internal software that provides system diagnostics, predictive failure analysis and self-repair on HP-UX systems. With an uncommitted delivery date, HP will port this to selected x86 servers. My guess is that since the analysis engine probably requires some level of hardware assist, the analysis engine will be paired with the next item on the list…
This week AMD finally released their AMD 6200 and 4200 series CPUs. These are the long-awaited server-oriented Interlagos and Valencia CPUs, based on their new “Bulldozer” core, offering up to 16 x86 cores in a single socket. The announcement was targeted at (drum roll, one guess per customer only) … “The Cloud.” AMD appears to be positioning its new architectures as the platform of choice for cloud-oriented workloads, focusing on highly threaded throughput oriented benchmarks that take full advantage of its high core count and unique floating point architecture, along with what look like excellent throughput per Watt metrics.
At the same time it is pushing the now seemingly mandatory “cloud” message, AMD is not ignoring the meat-and-potatoes enterprise workloads that have been the mainstay of server CPUs sales –virtualization, database, and HPC, where the combination of many cores, excellent memory bandwidth and large memory configurations should yield excellent results. In its competitive comparisons, AMD targets Intel’s 5640 CPU, which it claims represents Intel’s most widely used Xeon CPU, and shows very favorable comparisons in regards to performance, price and power consumption. Among the features that AMD cites as contributing to these results are:
Advanced power and thermal management, including the ability to power off inactive cores contributing to an idle power of less than 4.4W per core. Interlagos offers a unique capability called TDP, which allows I&O groups to set the total power threshold of the CPU in 1W increments to allow fine-grained tailoring of power in the server racks.
Turbo CORE, which allows boosting the clock speed of cores by up to 1 GHz for half the cores or 500 MHz for all the cores, depending on workload.
Emerging ARM server Calxeda has been hinting for some time that they had a significant partnership announcement in the works, and while we didn’t necessarily not believe them, we hear a lot of claims from startups telling us to “stay tuned” for something big. Sometimes they pan out, sometimes they simply go away. But this morning Calxeda surpassed our expectations by unveiling just one major systems partner – but it just happens to be Hewlett Packard, which dominates the WW market for x86 servers.
At its core (unintended but not bad pun), the HP Hyperscale business unit Project Moonshot and Calxeda’s server technology are about improving the efficiency of web and cloud workloads, and promises improvements in excess of 90% in power efficiency and similar improvements in physical density compared with current x86 solutions. As I noted in my first post on ARM servers and other documents, even if these estimates turn out to be exaggerated, there is still a generous window within which to do much, much, better than current technologies. And workloads (such as memcache, Hadoop, static web servers) will be selected for their fit to this new platform, so the workloads that run on these new platforms will potentially come close to the cases quoted by HP and Calxeda.
There has been a lot of ill-considered press coverage about the “death” of UNIX and coverage of the wholesale migration of UNIX workloads to LINUX, some of which (the latter, not the former) I have contributed to. But to set the record straight, the extinction of UNIX is not going to happen in our lifetime.
While UNIX revenues are not growing at any major clip, it appears as if they have actually had a slight uptick over the past year, probably due to a surge by IBM, and seem to be nicely stuck around the $18 - 20B level annual range. But what is important is the “why,” not the exact dollar figure.
UNIX on proprietary RISC architectures will stay around for several reasons that primarily revolve around their being the only close alternative to mainframes in regards to specific high-end operational characteristics:
Performance – If you need the biggest single-system SMP OS image, UNIX is still the only realistic commercial alternative other than mainframes.
Isolated bulletproof partitionability – If you want to run workload on dynamically scalable and electrically isolated partitions with the option to move workloads between them while running, then UNIX is your answer.
Near-ultimate availability – If you are looking for the highest levels of reliability and availability ex mainframes and custom FT systems, UNIX is the answer. It still possesses slight availability advantages, especially if you factor in the more robust online maintenance capabilities of the leading UNIX OS variants.
Last year I wrote about Oracle’s new plans for SPARC, anchored by a new line of SPARC CPUs engineered in conjunction with Fujitsu (Does SPARC have a Future?), and commented that the first deliveries of this new technology would probably be in early 2012, and until we saw this tangible evidence of Oracle’s actual execution of this road map we could not predict with any confidence the future viability of SPARC.
The T4 CPU
Fast forward a year and Oracle has delivered the first of the new CPUs, ahead of schedule and with impressive gains in performance that make it look like SPARC will remain a viable platform for years. Specifically, Oracle has introduced the T4 CPU and systems based on them. The T4, an evolution of Oracle’s highly threaded T-Series architecture, is implemented with an entirely new core that will form the basis, with variations in number of threads versus cores and cache designs, of the future M and T series systems. The M series will have fewer threads and more performance per thread, while the T CPUs will, like their predecessors, emphasize throughput for highly threaded workloads. The new T4 will have 8 cores, and each core will have 8 threads. While the T4 emphasizes highly threaded workload performance, it is important to note that Oracles has radically improved single-thread performance over its predecessors, with Oracle claiming performance per thread improvements of 5X over its predecessors, greatly improving its utility as a CPU to power less thread-intensive workloads as well.
I just attended IDF and I’ve got to say, Intel has certainly gotten the cloud message. Almost everything is centered on clouds, from the high-concept keynotes to the presentations on low-level infrastructure, although if you dug deep enough there was content for general old-fashioned data center and I&O professionals. Some highlights:
Chips and processors and low-level hardware
Intel is, after all, a semiconductor foundry, and despite their expertise in design, their true core competitive advantage is their foundry operations – even their competitors grudgingly acknowledge that they can manufacture semiconductors better than anyone else on the planet. As a consequence, showing off new designs and processes is always front and center at IDF, and this year was no exception. Last year it was Sandy Bridge, the 22nm shrink of the 32nm Westmere (although Sandy Bridge also incorporated some significant design improvements). This year it was Ivy Bridge, the 22nm “tick” of the Intel “tick-tock” design cycle. Ivy Bridge is the new 22nm architecture and seems to have inherited Intel’s recent focus on power efficiency, with major improvements beyond the already solid advantages of their 22nm process, including deeper P-States and the ability to actually shut down parts of the chip when it is idle. While they did not discuss the server variants in any detail, the desktop versions will get an entirely new integrated graphics processor which they are obviously hoping will blunt AMD’s resurgence in client systems. On the server side, if I were to guess, I would guess more cores and larger caches, along with increased support for virtualization of I/O beyond what they currently have.
At the Hot Chips conference last week, Intel disclosed additional details about the upcoming Poulson Itanium CPU due for shipment early next year. For Itanium loyalists (essentially committed HP-UX customers) the disclosures are a ray of sunshine among the gloomy news that has been the lot of Itanium devotees recently.
Poulson will bring several significant improvements to Itanium in both performance and reliability. On the performance side, we have significant improvements on several fronts:
Process – Poulson will be manufactured with the same 32 nm semiconductor process that will (at least for a while) be driving the high-end Xeon processors. This is goodness all around – performance will improve and Intel now can load its latest production lines more efficiently.
More cores and parallelism – Poulson will be an 8-core processor with a whopping 54 MB of on-chip cache, and Intel has doubled the width of the multi-issue instruction pipeline, from 6 to 12 instructions. Combined with improved hyperthreading, the combination of 2X cores and 2X the total number of potential instructions executed per clock cycle by each core hints at impressive performance gains.
Architecture and instruction tweaks – Intel has added additional instructions based on analysis of workloads. This kind of tuning of processor architectures seldom results in major gains in performance, but every small increment helps.