In late 2010 I noted that startup SeaMicro had introduced an ultra-dense server using Intel Atom chips in an innovative fabric-based architecture that allowed them to factor out much of the power overhead from a large multi-CPU server ( http://blogs.forrester.com/richard_fichera/10-09-21-little_servers_big_applications_intel_developer_forum). Along with many observers, I noted that the original SeaMicro server was well-suited to many light-weight edge processing tasks, but that the system would not support more traditional compute-intensive tasks due to the performance of the Atom core. I was, however, quite taken with the basic architecture, which uses a proprietary high-speed (1.28 Tb/s) 3D mesh interconnect to allow the CPU cores to share network, BIOS and disk resources that are normally replicated on a per-server in conventional designs, with commensurate reductions in power and an increase in density.
This week AMD finally released their AMD 6200 and 4200 series CPUs. These are the long-awaited server-oriented Interlagos and Valencia CPUs, based on their new “Bulldozer” core, offering up to 16 x86 cores in a single socket. The announcement was targeted at (drum roll, one guess per customer only) … “The Cloud.” AMD appears to be positioning its new architectures as the platform of choice for cloud-oriented workloads, focusing on highly threaded throughput oriented benchmarks that take full advantage of its high core count and unique floating point architecture, along with what look like excellent throughput per Watt metrics.
At the same time it is pushing the now seemingly mandatory “cloud” message, AMD is not ignoring the meat-and-potatoes enterprise workloads that have been the mainstay of server CPUs sales –virtualization, database, and HPC, where the combination of many cores, excellent memory bandwidth and large memory configurations should yield excellent results. In its competitive comparisons, AMD targets Intel’s 5640 CPU, which it claims represents Intel’s most widely used Xeon CPU, and shows very favorable comparisons in regards to performance, price and power consumption. Among the features that AMD cites as contributing to these results are:
Advanced power and thermal management, including the ability to power off inactive cores contributing to an idle power of less than 4.4W per core. Interlagos offers a unique capability called TDP, which allows I&O groups to set the total power threshold of the CPU in 1W increments to allow fine-grained tailoring of power in the server racks.
Turbo CORE, which allows boosting the clock speed of cores by up to 1 GHz for half the cores or 500 MHz for all the cores, depending on workload.
Emerging ARM server Calxeda has been hinting for some time that they had a significant partnership announcement in the works, and while we didn’t necessarily not believe them, we hear a lot of claims from startups telling us to “stay tuned” for something big. Sometimes they pan out, sometimes they simply go away. But this morning Calxeda surpassed our expectations by unveiling just one major systems partner – but it just happens to be Hewlett Packard, which dominates the WW market for x86 servers.
At its core (unintended but not bad pun), the HP Hyperscale business unit Project Moonshot and Calxeda’s server technology are about improving the efficiency of web and cloud workloads, and promises improvements in excess of 90% in power efficiency and similar improvements in physical density compared with current x86 solutions. As I noted in my first post on ARM servers and other documents, even if these estimates turn out to be exaggerated, there is still a generous window within which to do much, much, better than current technologies. And workloads (such as memcache, Hadoop, static web servers) will be selected for their fit to this new platform, so the workloads that run on these new platforms will potentially come close to the cases quoted by HP and Calxeda.
I just attended IDF and I’ve got to say, Intel has certainly gotten the cloud message. Almost everything is centered on clouds, from the high-concept keynotes to the presentations on low-level infrastructure, although if you dug deep enough there was content for general old-fashioned data center and I&O professionals. Some highlights:
Chips and processors and low-level hardware
Intel is, after all, a semiconductor foundry, and despite their expertise in design, their true core competitive advantage is their foundry operations – even their competitors grudgingly acknowledge that they can manufacture semiconductors better than anyone else on the planet. As a consequence, showing off new designs and processes is always front and center at IDF, and this year was no exception. Last year it was Sandy Bridge, the 22nm shrink of the 32nm Westmere (although Sandy Bridge also incorporated some significant design improvements). This year it was Ivy Bridge, the 22nm “tick” of the Intel “tick-tock” design cycle. Ivy Bridge is the new 22nm architecture and seems to have inherited Intel’s recent focus on power efficiency, with major improvements beyond the already solid advantages of their 22nm process, including deeper P-States and the ability to actually shut down parts of the chip when it is idle. While they did not discuss the server variants in any detail, the desktop versions will get an entirely new integrated graphics processor which they are obviously hoping will blunt AMD’s resurgence in client systems. On the server side, if I were to guess, I would guess more cores and larger caches, along with increased support for virtualization of I/O beyond what they currently have.
We have been repeatedly reminded that the requirements of hyper-scale cloud properties are different from those of the mainstream enterprise, but I am now beginning to suspect that the top strata of the traditional enterprise may be leaning in the same direction. This suspicion has been triggered by the combination of a recent day in NY visiting I&O groups in a handful of very large companies and a number of unrelated client interactions.
The pattern that I see developing is one of “haves” versus “have nots” in terms of their ability to execute on their technology vision with internal resources. The “haves” are the traditional large sophisticated corporations, with a high concentration in financial services. They have sophisticated IT groups, are capable fo writing extremely complex systems management and operations software, and typically own and manage 10,000 servers or more. The have nots are the ones with more modest skills and abilities, who may own 1000s of servers, but tend to be less advanced than the core FSI companies in terms of their ability to integrate and optimize their infrastructure.
The divergence in requirements comes from what they expect and want from their primary system vendors. The have nots are companies who understand their limitations and are looking for help form their vendors in the form of converged infrastructures, new virtualization management tools, and deeper integration of management software to automate operational tasks, These are people who buy HP c-Class, Cisco UCS, for example, and then add vendor-supplied and ISV management and automation tools on top of them in an attempt to control complexity and costs. They are willing to accept deeper vendor lock-in in exchange for the benefits of the advanced capabilities.
A project I’m working on for an approximately half-billion dollar company in the health care industry has forced me to revisit Hyper-V versus VMware after a long period of inattention on my part, and it has become apparent that Hyper-V has made significant progress as a viable platform for at least medium enterprises. My key takeaways include:
Hyper-V has come a long way and is now a viable competitor in Microsoft environments up through mid-size enterprise as long as their DR/HA requirements are not too stringent and as long as they are willing to use Microsoft’s Systems Center, Server Management Suite and Performance Resource Optimization as well as other vendor specific pieces of software as part of their management environment.
Hyper-V still has limitations in VM memory size, total physical system memory size and number of cores per VM compared to VMware, and VMware boasts more flexible memory management and I/O options, but these differences are less significant that they were two years ago.
For large enterprises and for complete integrated management, particularly storage, HA, DR and automated workload migration, and for what appears to be close to 100% coverage of workload sizes, VMware is still king of the barnyard. VMware also boasts an incredibly rich partner ecosystem.
For cloud, Microsoft has a plausible story but it is completely wrapped around Azure.
While I have not had the time (or the inclination, if I was being totally honest) to develop a very granular comparison, VMware’s recent changes to its legacy licensing structure (and subsequent changes to the new pricing structure) does look like license cost remains an attraction for Microsoft Hyper-V, especially if the enterprise is using Windows Server Enterprise Edition.
I recently had an opportunity to spend some time with SUSE management, including President and General Manager Nils Brauckmann, and came away with what I think is a reasonably clear picture of The Attachmate Group’s (TAG) intentions and of SUSE’s overall condition these days. Overall, impressions were positive, with some key takeaways:
TAG has clarified its intentions regarding SUSE. TAG has organized its computer holdings as four independent business units, Novell, NetIQ, Attachmate and SUSE, each one with its own independent sales, development, marketing, etc. resources. The advantages and disadvantages of this approach are pretty straightforward, with the lack of opportunity to share resources aiming the business units for R&D and marketing/sales being balanced off by crystal clear accountability and the attendant focus it brings. SUSE management agrees that it has undercommunicated in the past, and says that now that the corporate structure has been nailed down it will be very aggressive in communicating its new structure and goals.
SUSE’s market presence has shifted to a more balanced posture. Over the last several years SUSE has shifted to a somewhat less European-centric focus, with 50% of revenues coming from North America, less than 50% from EMEA, and claims to be the No. 1 Linux vendor in China, where it has expanded its development staffing. SUSE claims to have gained market share overall, laying claim to approximately 30% of WW Linux market share by revenue.
Focus on enterprise and cloud. Given its modest revenues of under $200 million, SUSE realizes that it cannot be all things to all people, and states that it will be focusing heavily on enterprise business servers and cloud technology, with less emphasis on desktops and projects that do not have strong financial returns, such as its investment in Mono, which it has partnered with Xamarin to continue development,.
NVIDIA recently shared a case study involving risk calculations at a JP Morgan Chase that I think is significant for the extreme levels of acceleration gained by integrating GPUs with conventional CPUs, and also as an illustration of a mainstream financial application of GPU technology.
JP Morgan Chase’s Equity Derivatives Group began evaluating GPUs as computational accelerators in 2009, and now runs over half of their risk calculations on hybrid systems containing x86 CPUs and NVIDIA Tesla GPUs, and claims a 40x improvement in calculation times combined with a 75% cost savings. The cost savings appear to be derived from a combination of lower capital costs to deliver an equivalent throughput of calculations along with improved energy efficiency per calculation.
Implicit in the speedup of 40x, from multiple hours to several minutes, is the implication that these calculations can become part of a near real-time business-critical analysis process instead of an overnight or daily batch process. Given the intensely competitive nature of derivatives trading, it is highly likely that JPMC will enhance their use of GPUs as traders demand an ever increasing number of these calculations. And of course, their competition has been using the same technology as well, based on numerous conversations I have had with Wall Street infrastructure architects over the past year.
My net take on this is that we will see a succession of similar announcements as GPUs become a fully mainstream acceleration technology as opposed to an experimental fringe. If you are an I&O professional whose users are demanding extreme computational performance on a constrained space, power and capital budget, you owe it to yourself and your company to evaluate the newest accelerator technology. Your competitors are almost certainly doing so.
While NVIDIA and to a lesser extent AMD (via its ATI branded product line) have effectively monopolized the rapidly growing and hyperbole-generating market for GPGPUs, highly parallel application accelerators, Intel has teased the industry for several years, starting with its 80-core Polaris Research Processor demonstration in 2008. Intel’s strategy was pretty transparent – it had nothing in this space, and needed to serve notice that it was actively pursuing it without showing its hand prematurely. This situation of deliberate ambiguity came to an end last month when Intel finally disclosed more details on its line of Many Independent Core (MIC) accelerators.
Intel’s approach to attached parallel processing is radically different than its competitors and appears to make excellent use of its core IP assets – fabrication and expertise and the x86 instruction set. While competing products from NVIDIA and AMD are based on graphics processing architectures, employing 100s of parallel non-x86 cores, Intel’s products will feature a smaller (32 – 64 in the disclosed products) number of simplified x86 cores on the theory that developers will be able to harvest large portions of code that already runs on 4 – 10 core x86 CPUs and easily port them to these new parallel engines.
Intel has been publishing research for about a decade on what they call “3D Trigate” transistors, which held out the hope for both improved performance as well as power efficiency. Today Intel revealed details of its commercialization of this research in its upcoming 22 nm process as well as demonstrating actual systems based on 22 nm CPU parts.
The new products, under the internal name of “Ivy Bridge”, are the process shrink of the recently announced Sandy Bridge architecture in the next “Tock” cycle of the famous Intel “Tick-Tock” design methodology, where the “Tick” is a new optimized architecture and the “Tock” is the shrinking of this architecture onto then next generation semiconductor process.
What makes these Trigate transistors so innovative is the fact that they change the fundamental geometry of the semiconductors from a basically flat “planar” design to one with more vertical structure, earning them the description of “3D”. For users the concepts are simpler to understand – this new transistor design, which will become the standard across all of Intel’s products moving forward, delivers some fundamental benefits to CPUs implemented with them:
Leakage current is reduced to near zero, resulting in very efficient operation for system in an idle state.
Power consumption at equivalent performance is reduced by approximately 50% from Sandy Bridge’s already improved results with its 32 nm process.