One of the developing trends in computing, relevant to both enterprise and service providers alike, is the notion of workload-specific or application-centric computing architectures. These architectures, optimized for specific workloads, promise improved efficiencies for running their targeted workloads, and by extension the services that they support. Earlier this year we covered the basics of this concept in “Optimize Scalable Workload-Specific Infrastructure for Customer Experiences”, and this week HP has announced a pair of server cartridges for their Moonshot system that exemplify this concept, as well as being representative of the next wave of ARM products that will emerge during the remainder of 2014 and into 2015 to tilt once more at the x86 windmill that currently dominates the computing landscape.
Specifically, HP has announced the ProLiant m400 Server Cartridge (m400) and the ProLiant m800 Server Cartridge (m800), both ARM-based servers packaged as cartridges for the HP Moonshot system, which can hold up to 45 of these cartridges in its approximately 4U enclosure. These servers are interesting from two perspectives – that they are both ARM-based products, one being the first tier-1 vendor offering of a 64-bit ARM CPU and that they are both being introduced with a specific workload target in mind for which they have been specifically optimized.
NVIDIA recently shared a case study involving risk calculations at a JP Morgan Chase that I think is significant for the extreme levels of acceleration gained by integrating GPUs with conventional CPUs, and also as an illustration of a mainstream financial application of GPU technology.
JP Morgan Chase’s Equity Derivatives Group began evaluating GPUs as computational accelerators in 2009, and now runs over half of their risk calculations on hybrid systems containing x86 CPUs and NVIDIA Tesla GPUs, and claims a 40x improvement in calculation times combined with a 75% cost savings. The cost savings appear to be derived from a combination of lower capital costs to deliver an equivalent throughput of calculations along with improved energy efficiency per calculation.
Implicit in the speedup of 40x, from multiple hours to several minutes, is the implication that these calculations can become part of a near real-time business-critical analysis process instead of an overnight or daily batch process. Given the intensely competitive nature of derivatives trading, it is highly likely that JPMC will enhance their use of GPUs as traders demand an ever increasing number of these calculations. And of course, their competition has been using the same technology as well, based on numerous conversations I have had with Wall Street infrastructure architects over the past year.
My net take on this is that we will see a succession of similar announcements as GPUs become a fully mainstream acceleration technology as opposed to an experimental fringe. If you are an I&O professional whose users are demanding extreme computational performance on a constrained space, power and capital budget, you owe it to yourself and your company to evaluate the newest accelerator technology. Your competitors are almost certainly doing so.
While NVIDIA and to a lesser extent AMD (via its ATI branded product line) have effectively monopolized the rapidly growing and hyperbole-generating market for GPGPUs, highly parallel application accelerators, Intel has teased the industry for several years, starting with its 80-core Polaris Research Processor demonstration in 2008. Intel’s strategy was pretty transparent – it had nothing in this space, and needed to serve notice that it was actively pursuing it without showing its hand prematurely. This situation of deliberate ambiguity came to an end last month when Intel finally disclosed more details on its line of Many Independent Core (MIC) accelerators.
Intel’s approach to attached parallel processing is radically different than its competitors and appears to make excellent use of its core IP assets – fabrication and expertise and the x86 instruction set. While competing products from NVIDIA and AMD are based on graphics processing architectures, employing 100s of parallel non-x86 cores, Intel’s products will feature a smaller (32 – 64 in the disclosed products) number of simplified x86 cores on the theory that developers will be able to harvest large portions of code that already runs on 4 – 10 core x86 CPUs and easily port them to these new parallel engines.