Calxeda, one of the most visible stealth mode startups in the industry, has finally given us an initial peek at the first iteration of its server plans, and they both meet our inflated expectations from this ARM server startup and validate some of the initial claims of ARM proponents.
While still holding their actual delivery dates and details of specifications close to their vest, Calxeda did reveal the following cards from their hand:
The first reference design, which will be provided to OEM partners as well as delivered directly to selected end users and developers, will be based on an ARM Cortex A9 quad-core SOC design.
The SOC, as Calxeda will demonstrate with one of its reference designs, will enable OEMs to design servers as dense as 120 ARM quad-core nodes (480 cores) in a 2U enclosure, with an average consumption of about 5 watts per node (1.25 watts per core) including DRAM.
While not forthcoming with details about the performance, topology or protocols, the SOC will contain an embedded fabric for the individual quad-core SOC servers to communicate with each other.
Most significantly for prospective users, Calxeda is claiming, and has some convincing models to back up these claims, that they will provide a performance advantage of 5X to 10X the performance/watt and (even higher when price is factored in for a metric of performance/watt/$) of any products they expect to see when they bring the product to market.
Since its introduction of its Core 2 architecture, Intel reversed much of the damage done to it by AMD in the server space, with attendant publicity. AMD, however, has been quietly reclaiming some ground with its 12-core 6100 series CPUs, showing strength in benchmarks that emphasize high throughput in process-rich environments as opposed to maximum performance per core. Several AMD-based system products have also been cited by their manufacturers to us as enjoying very strong customer acceptance due to the throughput of the 12-core CPUs combined with their attractive pricing. As a fillip to this success, AMD this past week announced speed bumps for the 6100-series products to give a slight performance boost as they continue to compete with Intel’s Xeon 5600 and 7500 products (Intel’s Sandy Bridge server products have not yet been announced).
But the real news last week was the quiet subtext that the anticipated 16-core Interlagos products based on the new Bulldozer core appear to be on schedule for Q2 ’11 shipments system partners, who should probably be able to ship systems during Q3, and that AMD is still certifying them as compatible with the current sockets used for the 12-core 6000 CPUs. This implies that system partners will be able to quickly deliver products based on the new parts very rapidly.
Actual performance of these systems will obviously be dependent on the workloads being run, but our gut feeling is that while they will not rival the per-core performance of the Intel Xeon 7500 CPUs, for large throughput-oriented environments with high numbers of processes, a description that fits a large number of web and middleware environments, these CPUs, each with up to a 50% performance advantage per core over the current AMD CPUs, may deliver some impressive benchmarks and keep the competition in the server space at a boil, which in the end is always helpful to customers.
From nothing more than an outlandish speculation, the prospects for a new entrant into the volume Linux and Windows server space have suddenly become much more concrete, culminating in an immense buzz at CES as numerous players, including NVIDIA and Microsoft, stoked the fires with innuendo, announcements, and demos.
Consumers of x86 servers are always on the lookout for faster, cheaper, and more power-efficient servers. In the event that they can’t get all three, the combination of cheaper and more energy-efficient seems to be attractive to a large enough chunk of the market to have motivated Intel, AMD, and all their system partners to develop low-power chips and servers designed for high density compute and web/cloud environments. Up until now the debate was Intel versus AMD, and low power meant a CPU with four cores and a power dissipation of 35 – 65 Watts.
The Promised Land
The performance trajectory of processors that were formerly purely mobile device processors, notably the ARM Cortex, has suddenly introduced a new potential option into the collective industry mindset. But is this even a reasonable proposition, and if so, what does it take for it to become a reality?
Our first item of business is to figure out whether or not it even makes sense to think about these CPUs as server processors. My quick take is yes, with some caveats. The latest ARM offering is the Cortex A9, with vendors offering dual core products at up to 1.2 GHz currently (the architecture claims scalability to four cores and 2 GHz). It draws approximately 2W, much less than any single core x86 CPU, and a multi-core version should be able to execute any reasonable web workload. Coupled with the promise of embedded GPUs, the notion of a server that consumes much less power than even the lowest power x86 begins to look attractive. But…
I met recently with Cisco’s UCS group in San Jose to get a quick update on sales and maybe some hints about future development. The overall picture is one of rapid growth decoupled from whatever pressures Cisco management has cautioned about in other areas of the business.
Overall, according to recent disclosure by Cisco CEO John Chambers, Cisco’s UCS revenue is growing at a 550% Y/Y growth rate, with the most recent quarterly revenues indicating a $500M run rate (we make that out as about $125M quarterly revenue). This figure does not seem to include the over 4,000 blades used by Cisco IT, nor does it include units being consumed internally by Cisco and subsequently shipped to customers as part of appliances or other Cisco products. Also of note is the fact that it is fiscal Q1 for Cisco, traditionally its weakest quarter, although with an annual growth rate in excess of 500% we would expect that UCS sequential quarters will be marching to a totally different drummer than the overall company numbers.
I have been working on a research document, to be published this quarter, on the impact of 8-socket x86 servers based on Intel’s new Xeon 7500 CPU. In a nutshell, these systems have the performance of the best-of-breed RISC/UNIX systems of three years ago, at a substantially better price, and their overall performance improvement trajectory has been steeper than competing technologies for the past decade.
This is probably not shocking news and is not the subject of this current post, although I would encourage you to read it when it is finally published. During the course of researching this document I spent time trying to prove or disprove my thesis that x86 system performance solidly overlapped that of RISC/UNIX with available benchmark results. The process highlighted for me the limitations of using standardized benchmarks for performance comparisons. There are now so many benchmarks available that system vendors are only performing each benchmark on selected subsets of their product lines, if at all. Additionally, most benchmarks suffer from several common flaws:
They are results from high-end configurations, in many cases far beyond the norm for any normal use cases, but results cannot be interpolated to smaller, more realistic configurations.
They are often the result of teams of very smart experts tuning the system configurations, application and system software parameters for optimal results. For a large benchmark such as SAP or TPC, it is probably reasonable to assume that there are over 1,000 variables involved in the tuning effort. This makes the results very much like EPA mileage figures — the consumer is guaranteed not to exceed these numbers.
Fujitsu? Who? I recently attended Fujitsu’s global analyst conference in Boston, which gave me an opportunity to check in with the best kept secret in the North American market. Even Fujitsu execs admit that many people in this largest of IT markets think that Fujitsu has something to do with film, and few of us have ever seen a Fujitsu system installed in the US unless it was a POS system.
So what is the management of this global $50 Billion information and communications technology company, with a competitive portfolio of client, server and storage products and a global service and integration capability, going to do about its lack of presence in the world’s largest IT market? In a word, invest. Fujitsu’s management, judging from their history and what they have disclosed of their plans, intends to invest in the US over the next three to four years to consolidate their estimated $3 Billion in N. American business into a more manageable (simpler) set of operating companies, and to double down on hiring and selling into the N. American market. The fact that they have given themselves multiple years to do so is very indicative of what I have always thought of as Fujitsu’s greatest strength and one of their major weaknesses – they operate on Japanese time, so to speak. For an American company to undertake to build a presence over multiple years with seeming disregard for quarterly earnings would be almost unheard of, so Fujitsu’s management gets major kudos for that. On the other hand, years of observing them from a distance also leads me to believe that their approach to solving problems inherently lacks the sense of urgency of some of their competitors.
There has been a lot of press about IBM’s acquisition of BNT (Blade Network Technologies) focusing on the economics and market share of BNT as a competitor to Cisco and HP’s ProCurve/3Com franchise. But at its heart the acquisition is more about defending and expanding a position in the emerging converged server, networking, and storage infrastructure segment than it is about raw switch port market share. It is also a powerful vindication of the proposition that infrastructure convergence is driving major realignment in the vendor industry.
Starting with HP’s success with its c-Class blade servers and Virtual Connect technology, and escalating with Cisco’s entrance into the server market, IBM continued its investment in its Virtual Fabric and Open Fabric Manager technology, heavily leveraging BNT’s switch platforms. At some point it became clear that BNT was a critical element of IBM’s convergence strategy, with IBM’s plans now heavily dependent on a vendor with whom they had an excellent, but non-exclusive relationship, and one whose acquisition by another player could severely compromise their product plans. Hence the acquisition. Now that it owns BNT, IBM can capitalize on its excellent edge network technology for further development of its converged infrastructure strategy without hesitation about further leveraging BNT’s technology.
I recently spent a day with IBM’s x86 team, primarily to get back up to speed on their entire x86 product line, and partially to realign my views of them after spending almost five years as a direct competitor. All in all, time well spent, with some key takeaways:
IBM has fixed some major structural problems with the entire x86 program and it perception in the company – As recently as two years ago, it appeared to the outside world that IBM was not really serious about x86 servers. Between licensing its low-end server designs to Lenovo (although IBM continued to sell its own versions) and an apparent retreat to the upper-end of the segment, it appeared that IBM was not serious about x86 severs. New management, new alignment with sales, and a higher internal profile for x86 seems to have moved the division back into IBM’s mainstream.
Increased investment – It looks like IBM significantly ramped up investments in x86 products about three years ago. The result has been a relatively steady flow of new products into the marketplace, some of which, such as the HS22 blade, significantly reversed deficits versus equivalent HP products. Others followed in high-end servers, virtualization and systems management, and increased velocity of innovation in low-end systems.
Established leadership in new niches such as dense modular server deployments – IBM’s iDataplex, while representing a small footprint in terms of their total volume, gave them immediate visibility as an innovator in the rapidly growing niche for hyper scale dense deployments. Along the way, IBM has also apparently become the leader in GPU deployments as well, another low-volume but high-visibility niche.