NVIDIA recently shared a case study involving risk calculations at a JP Morgan Chase that I think is significant for the extreme levels of acceleration gained by integrating GPUs with conventional CPUs, and also as an illustration of a mainstream financial application of GPU technology.
JP Morgan Chase’s Equity Derivatives Group began evaluating GPUs as computational accelerators in 2009, and now runs over half of their risk calculations on hybrid systems containing x86 CPUs and NVIDIA Tesla GPUs, and claims a 40x improvement in calculation times combined with a 75% cost savings. The cost savings appear to be derived from a combination of lower capital costs to deliver an equivalent throughput of calculations along with improved energy efficiency per calculation.
Implicit in the speedup of 40x, from multiple hours to several minutes, is the implication that these calculations can become part of a near real-time business-critical analysis process instead of an overnight or daily batch process. Given the intensely competitive nature of derivatives trading, it is highly likely that JPMC will enhance their use of GPUs as traders demand an ever increasing number of these calculations. And of course, their competition has been using the same technology as well, based on numerous conversations I have had with Wall Street infrastructure architects over the past year.
My net take on this is that we will see a succession of similar announcements as GPUs become a fully mainstream acceleration technology as opposed to an experimental fringe. If you are an I&O professional whose users are demanding extreme computational performance on a constrained space, power and capital budget, you owe it to yourself and your company to evaluate the newest accelerator technology. Your competitors are almost certainly doing so.
Over the past months server vendors have been announcing benchmark results for systems incorporating Intel’s high-end x86 CPU, the E7, with HP trumping all existing benchmarks with their recently announced numbers (although, as noted in x86 Servers Hit The High Notes, the results are clustered within a few percent each other). HP recently announced new performance numbers for their ProLiant DL980, their high-end 8-socket x86 server using the newest Intel E7 processors. With up to 10 cores, these new processors can bring up to 80 cores to bear on large problems such as database, ERP and other enterprise applications.
The performance results on the SAP SD 2-Tier benchmark, for example, at 25160 SD users, show a performance improvement of 35% over the previous high-water mark of 18635. The results seem to scale almost exactly with the product of core count x clock speed, indicating that both the system hardware and the supporting OS, in this case Windows Server 2008, are not at their scalability limits. This gives us confidence that subsequent spins of the CPU will in turn yield further performance increases before hitting system of OS limitations. Results from other benchmarks show similar patterns as well.
Key takeaways for I&O professionals include:
Expect to see at least 25% to 35% throughput improvements in many workloads with systems based on the latest the high-performance PCUs from Intel. In situations where data center space and cooling resources are constrained this can be a significant boost for a same-footprint upgrade of a high-end system.
For Unix to Linux migrations, target platform scalability continues become less of an issue.
While NVIDIA and to a lesser extent AMD (via its ATI branded product line) have effectively monopolized the rapidly growing and hyperbole-generating market for GPGPUs, highly parallel application accelerators, Intel has teased the industry for several years, starting with its 80-core Polaris Research Processor demonstration in 2008. Intel’s strategy was pretty transparent – it had nothing in this space, and needed to serve notice that it was actively pursuing it without showing its hand prematurely. This situation of deliberate ambiguity came to an end last month when Intel finally disclosed more details on its line of Many Independent Core (MIC) accelerators.
Intel’s approach to attached parallel processing is radically different than its competitors and appears to make excellent use of its core IP assets – fabrication and expertise and the x86 instruction set. While competing products from NVIDIA and AMD are based on graphics processing architectures, employing 100s of parallel non-x86 cores, Intel’s products will feature a smaller (32 – 64 in the disclosed products) number of simplified x86 cores on the theory that developers will be able to harvest large portions of code that already runs on 4 – 10 core x86 CPUs and easily port them to these new parallel engines.
On June 15, HP announced that it had filed suit against Oracle, saying in a statement:
“HP is seeking the court’s assistance to compel Oracle to:
Reverse its decision to discontinue all software development on the Itanium platform
Reaffirm its commitment to offer its product suite on HP platforms, including Itanium;
Immediately reset the Itanium core processor licensing factor consistent with the model prior to December 1, 2010 for RISC/EPIC systems
HP also seeks:
Injunctive relief, including an order prohibiting Oracle from making false and misleading statements regarding the Itanium microprocessor or HP’s Itanium-based servers and remedying the harm caused by Oracle’s conduct.
Damages and fees and other standard remedies available in cases of this nature.”
When Cisco began shipping UCS slightly over two years ago, competitor reaction ranged the gamut from concerned to gleefully dismissive of their chances at success in the server market. The reasons given for their guaranteed lack of success were a combination of technical (the product won’t really work), the economics (Cisco can’t live on server margins) to cultural (Cisco doesn’t know servers and can’t succeed in a market where they are not the quasi-monopolistic dominating player). Some ignored them, and some attempted to preemptively introduce products that delivered similar functionality, and in the two years following introduction, competitive reaction was very similar – yes they are selling, but we don’t think they are a significant threat.
Any lingering doubt about whether Cisco can become a credible supplier has been laid to rest with Cisco’s recent quarterly financial disclosures and IDC’s revelation that Cisco is now the No. 3 worldwide blade vendor, with slightly over 10% of worldwide (and close to 20% in North America) blade server shipments. In their quarterly call, Cisco revealed Q1 revenues of $171 million, for a $684 million revenue run rate, and claimed a booking run rate of $900 million annually. In addition, they placed their total customer count at 5,400. While actual customer count is hard to verify, Cisco has been reporting a steady and impressive growth in customers since initial shipment, and Forrester’s anecdotal data confirms both the significant interest and installed UCS systems among Forrester’s clients.
Entering into a new competitive segment, especially one dominated by major players with well-staked out turf, requires a level of hyperbole, dramatic positioning and a differentiable product. Cisco has certainly achieved all this and more in the first two years of shipment of its UCS product, and shows no signs of fatigue to date.
However, Cisco’s announcement this week that it is now part of Microsoft’s Fast Track Data Warehouse and Fast Track OLTP program is a sign that UCS is also entering the mainstream of enterprise technology. The Microsoft Fast Track program, offering a set of reference architectures, system specification and sizing guides for both common usage scenarios for Microsoft SQL Server, is not new, nor is it in any way unique to Cisco. Fast Track includes Dell, HP, IBM, and Bull. The fact that Cisco will now get equal billing from Microsoft in this program is significant – it is the beginning of the transition from emerging fringe to mainstream , and an endorsement to anyone in the infrastructure business that Cisco is now appearing on the same stage as the major incumbents.
Will this represent a breakthrough revenue opportunity for Cisco? Probably not, since Microsoft will be careful not to play favorites and will certainly not risk alienating its major systems partners, but Cisco’s inclusion on this list is another incremental step in becoming a mainstream server supplier. Like the chicken soup that my grandmother used to offer, it can’t hurt.
Intel has been publishing research for about a decade on what they call “3D Trigate” transistors, which held out the hope for both improved performance as well as power efficiency. Today Intel revealed details of its commercialization of this research in its upcoming 22 nm process as well as demonstrating actual systems based on 22 nm CPU parts.
The new products, under the internal name of “Ivy Bridge”, are the process shrink of the recently announced Sandy Bridge architecture in the next “Tock” cycle of the famous Intel “Tick-Tock” design methodology, where the “Tick” is a new optimized architecture and the “Tock” is the shrinking of this architecture onto then next generation semiconductor process.
What makes these Trigate transistors so innovative is the fact that they change the fundamental geometry of the semiconductors from a basically flat “planar” design to one with more vertical structure, earning them the description of “3D”. For users the concepts are simpler to understand – this new transistor design, which will become the standard across all of Intel’s products moving forward, delivers some fundamental benefits to CPUs implemented with them:
Leakage current is reduced to near zero, resulting in very efficient operation for system in an idle state.
Power consumption at equivalent performance is reduced by approximately 50% from Sandy Bridge’s already improved results with its 32 nm process.
Since Oracle dropped their bombshell on HP and Itanium, I have fielded multiple emails and about a dozen inquiries from HP and Oracle customers wanting to discuss their options and plans. So far, there has been no general sense of panic, and the scenarios seem to be falling into several buckets:
The majority of Oracle DB/HP customers are not at the latest revision of Oracle, so they have a window within which to make any decisions, bounded on the high end by the time it will take them to make a required upgrade of their application plus DB stack past the current 11.2 supported Itanium release. For those customers still on Oracle release 9, this can be many years, while for those currently on 11.2, the next upgrade cycle will cause a dislocation. The most common application that has come up in inquiries is SAP, with Oracle’s own apps second.
Customers with other Oracle software, such as Hyperion, Peoplesoft, Oracle’s eBusiness Suite, etc., and other ISV software are often facing complicated constraints on their upgrades. In some cases decisions by the ISVs will drive the users toward upgrades they do not want to make. Several clients told me they will defer ISV upgrades to avoid being pushed into an unsupported version of the DB.
Egenera, arguably THE pioneer in what the industry is now calling converged infrastructure, has had a hard life. Early to market in 2000 with a solution that was approximately a decade ahead of its time, it offered an elegant abstraction of physical servers into what chief architect Maxim Smith described as “fungible and anonymous” resources connected by software defined virtual networks. Its interface was easy to use, allowing the definition of virtualized networks, NICs, servers with optional failover and pools of spare resources with a fluidity that has taken the rest of the industry almost 10 years to catch up to. Unfortunately this elegant presentation was chained to a completely proprietary hardware architecture, which encumbered the economics of x86 servers with an obsolete network fabric, expensive system controller and physical architecture (but it was the first vendor to include blue lights on its servers). The power of the PanManager software was enough to keep the company alive, but not enough to overcome the economics of the solution and put them on a fast revenue path, especially as emerging competitors began to offer partial equivalents at lower costs. The company is privately held and does not disclose revenues, but Forrester estimates it is still less than $100 M in annual revenues.
In approximately 2006, Egenera began the process of converting its product to a pure software offering capable of running on commodity server hardware and standard Ethernet switches. In subsequent years they have announced distribution arrangements with Fujitsu (an existing partner for their earlier products) and an OEM partnership with Dell, which apparently was not successful, since Dell subsequently purchased Scalent, an emerging software competitor. Despite this, Egenera claims that its software business is growing and has been a factor in the company’s first full year of profitability.
A lot has been written about potential threats to Intel’s low-power server hegemony, including discussions of threats from not only its perennial minority rival AMD but also from emerging non-x86 technologies such as ARM servers. While these are real threats, with potential for disrupting Intel’s position in the low power and small form factor server segment if left unanswered, Intel’s management has not been asleep at the wheel. As part of the rollout of the new Sandy Bridge architecture, Intel recently disclosed their platform strategy for what they are defining as “Micro Servers,” small single-socket servers with shared power and cooling to improve density beyond the generally accepted dividing line of one server per RU that separates “standard density” from “high density.” While I think that Intel’s definition is a bit myopic, mostly serving to attach a label to a well established category, it is a useful tool for segmenting low-end servers and talking about the relevant workloads.
Intel’s strategy revolves around introducing successive generations of its Sandy Bridge and future architectures embodied as Low Power (LP) and Ultra Low Power (ULP) products with promises of up to 2.2X performance per watt and 30% less actual power compared to previous generation equivalent x86 servers, as outlined in the following chart from Intel:
So what does this mean for Infrastructure & Operations professionals interested in serving the target loads for micro servers, such as: