NVIDIA recently shared a case study involving risk calculations at a JP Morgan Chase that I think is significant for the extreme levels of acceleration gained by integrating GPUs with conventional CPUs, and also as an illustration of a mainstream financial application of GPU technology.
JP Morgan Chase’s Equity Derivatives Group began evaluating GPUs as computational accelerators in 2009, and now runs over half of their risk calculations on hybrid systems containing x86 CPUs and NVIDIA Tesla GPUs, and claims a 40x improvement in calculation times combined with a 75% cost savings. The cost savings appear to be derived from a combination of lower capital costs to deliver an equivalent throughput of calculations along with improved energy efficiency per calculation.
Implicit in the speedup of 40x, from multiple hours to several minutes, is the implication that these calculations can become part of a near real-time business-critical analysis process instead of an overnight or daily batch process. Given the intensely competitive nature of derivatives trading, it is highly likely that JPMC will enhance their use of GPUs as traders demand an ever increasing number of these calculations. And of course, their competition has been using the same technology as well, based on numerous conversations I have had with Wall Street infrastructure architects over the past year.
My net take on this is that we will see a succession of similar announcements as GPUs become a fully mainstream acceleration technology as opposed to an experimental fringe. If you are an I&O professional whose users are demanding extreme computational performance on a constrained space, power and capital budget, you owe it to yourself and your company to evaluate the newest accelerator technology. Your competitors are almost certainly doing so.
Over the past months server vendors have been announcing benchmark results for systems incorporating Intel’s high-end x86 CPU, the E7, with HP trumping all existing benchmarks with their recently announced numbers (although, as noted in x86 Servers Hit The High Notes, the results are clustered within a few percent each other). HP recently announced new performance numbers for their ProLiant DL980, their high-end 8-socket x86 server using the newest Intel E7 processors. With up to 10 cores, these new processors can bring up to 80 cores to bear on large problems such as database, ERP and other enterprise applications.
The performance results on the SAP SD 2-Tier benchmark, for example, at 25160 SD users, show a performance improvement of 35% over the previous high-water mark of 18635. The results seem to scale almost exactly with the product of core count x clock speed, indicating that both the system hardware and the supporting OS, in this case Windows Server 2008, are not at their scalability limits. This gives us confidence that subsequent spins of the CPU will in turn yield further performance increases before hitting system of OS limitations. Results from other benchmarks show similar patterns as well.
Key takeaways for I&O professionals include:
Expect to see at least 25% to 35% throughput improvements in many workloads with systems based on the latest the high-performance PCUs from Intel. In situations where data center space and cooling resources are constrained this can be a significant boost for a same-footprint upgrade of a high-end system.
For Unix to Linux migrations, target platform scalability continues become less of an issue.
After considerable speculation and anticipation, VMware has finally announced vSphere 5 as part of a major cloud infrastructure launch, including vCloud Director 1.5, SRM 5 and vShield 5. From our first impressions, it is both well worth the wait and merits immediate serious consideration as an enterprise virtualization platform, particularly for existing VMware customers.
The list of features is voluminous, with at least 100 improvements, large and small, but among the features, several stand out as particularly significant as I&O professionals continue their efforts to virtualize the data center, primarily dealing with and support for both larger VMs and physical host systems, and also with the improved manageability of storage and improvements Site Recovery Manager (SRM), the remote-site HA components:
Replication improvements for Site Recovery Manager, allowing replication without SANs
Distributed Resource Scheduling (DRS) for Storage
Support for up to 1 TB of memory per VM
Support for 32 vCPUs per VM
Support for up to 160 Logical CPUs and 2 TB or RAM
New GUI to configure multicore vCPUs
Storage driven storage delivery based on the VMware-Aware Storage APIs
Improved version of the Cluster File System, VMFS5
Storage APIs – Array Integration: Thin Provisioning enabling reclaiming blocks of a thin provisioned LUN on the array when a virtual disk is deleted
Swap to SSD
2TB+ LUN support
Storage vMotion snapshot support
vNetwork Distributed Switch improvements providing improved visibility in VM traffic
vCenter Server Appliance
vCenter Solutions Manager, providing a consistent interface to configure and monitor vCenter-integrated solutions developed by VMware and third parties
Revamped VMware High Availability (HA) with Fault Domain Manager
While NVIDIA and to a lesser extent AMD (via its ATI branded product line) have effectively monopolized the rapidly growing and hyperbole-generating market for GPGPUs, highly parallel application accelerators, Intel has teased the industry for several years, starting with its 80-core Polaris Research Processor demonstration in 2008. Intel’s strategy was pretty transparent – it had nothing in this space, and needed to serve notice that it was actively pursuing it without showing its hand prematurely. This situation of deliberate ambiguity came to an end last month when Intel finally disclosed more details on its line of Many Independent Core (MIC) accelerators.
Intel’s approach to attached parallel processing is radically different than its competitors and appears to make excellent use of its core IP assets – fabrication and expertise and the x86 instruction set. While competing products from NVIDIA and AMD are based on graphics processing architectures, employing 100s of parallel non-x86 cores, Intel’s products will feature a smaller (32 – 64 in the disclosed products) number of simplified x86 cores on the theory that developers will be able to harvest large portions of code that already runs on 4 – 10 core x86 CPUs and easily port them to these new parallel engines.
Back during the dot.com boom years, existing telcos and dozens of new network operators, especially in western Europe and North America, laid vast amounts of fiber optic networks in anticipation of rapidly rising Internet usage and traffic. When the expected volumes of Internet usage failed to materialize, they did not turn on or “light up” most (some estimate 80% and even 90% on many routes) of this fiber network capacity. This unused capacity was called “dark fiber,” and it has only been in recent years that this dark fiber has been put to use.
I am seeing early signs of something similar in the build-out of infrastructure-as-a-service (IaaS) cloud offerings. Of course, the data centers of servers, storage devices, and networks that IaaS vendors need can scale up in a more linear fashion (add another rack of blade servers as needed to support an new client) than the all-or-nothing build-out of fiber optic networks, so the magnitude of “dark cloud” will never reach the magnitude of “dark fiber.” Nonetheless, if current trends continue and accelerate, there is a real potential for IaaS wannabes creating a glut of “dark cloud” capacity that exceeds actual demand, with resulting downward pressure on prices and shakeouts of unsuccessful IaaS providers.
Cloud computing continues to be hyped. By now, almost every ICT hardware, software, and services company has some form of cloud strategy — even if it’s just a cloud label on a traditional hosting offering — to ride this wave. This misleading vendor “cloud washing” and the complex diversity of the cloud market in general make cloud one of the most popular and yet most misunderstood topics today (for a comprehensive taxonomy of the cloud computing market, see this Forrester blog post).
Software-as-a-service (SaaS) is the largest and most strongly growing cloud computing market; its total market size in 2011 is $21.2 billion, and this will explode to $78.4 billion by the end of 2015, according to our recently published sizing of the cloud market. But SaaS consists of many different submarkets: Historically, customer relationship management (CRM), human capital management (HCM) — in the form of “lightweight” modules like talent management rather than payroll — eProcurement, and collaboration software have the highest SaaS adoption rates, but highly integrated software applications that process the most sensitive business data, such as enterprise resource planning (ERP), are the lantern-bearers of SaaS adoption today.
Do you keep every single light on in your house even though you are fast asleep in your bedroom?
Of course you don't. That would be an abject waste. Then why do most firms deploy peak capacity infrastructure resources that run around the clock even though their applications have distinct usage patterns? Sometimes the applications are sleeping (low usage). At other times, they are huffing and puffing under the stampede of glorious customers. The answer is because they have no choice. Application developers and infrastructure operations pros collaborate (call it DevOps if you want) to determine the infrastucture that will be needed to meet peak demand.
One server, two server, three server, four.
The business is happy when the web traffic pedal is to the floor.
All of us in the technology industry get caught up in the near-term fluctuations and pressures of our business. This quarter’s earnings, next quarter’s shipments, this year’s hiring plan . . . it’s easy to get swallowed up by the flood of immediate concerns. So one of the things that we work hard on at Forrester, and that our clients value in their relationships with us, is taking a few steps back and looking at the longer-term, bigger picture of the size and shape of the industry’s trajectory. It provides strategic and financial context for the short-term fluctuations and trends that buffet all of us.
I am lucky to co-lead research in Forrester's Vendor Strategy team, which is explicitly chartered to predict and quantify the new growth opportunities and disruptions facing strategists at some of our leading clients. We will put those predictions on display later this month at Forrester's IT Forum, our flagship client event. Among the sessions that Vendor Strategy analysts will be leading:
"The Software Industry in Transition": Holger Kisker will preview his latest research detailing best practices for software vendors navigating the tricky transition from traditional license to as-a-service pricing and engagement models.
"The Computing Technologies of 2016": Frank Gillett will put us in a time machine for a trip five years into the future of computing, storage, network, and component technologies that will underpin new applications, new experiences, and new computing capabilities.
Intel has been publishing research for about a decade on what they call “3D Trigate” transistors, which held out the hope for both improved performance as well as power efficiency. Today Intel revealed details of its commercialization of this research in its upcoming 22 nm process as well as demonstrating actual systems based on 22 nm CPU parts.
The new products, under the internal name of “Ivy Bridge”, are the process shrink of the recently announced Sandy Bridge architecture in the next “Tock” cycle of the famous Intel “Tick-Tock” design methodology, where the “Tick” is a new optimized architecture and the “Tock” is the shrinking of this architecture onto then next generation semiconductor process.
What makes these Trigate transistors so innovative is the fact that they change the fundamental geometry of the semiconductors from a basically flat “planar” design to one with more vertical structure, earning them the description of “3D”. For users the concepts are simpler to understand – this new transistor design, which will become the standard across all of Intel’s products moving forward, delivers some fundamental benefits to CPUs implemented with them:
Leakage current is reduced to near zero, resulting in very efficient operation for system in an idle state.
Power consumption at equivalent performance is reduced by approximately 50% from Sandy Bridge’s already improved results with its 32 nm process.
. . . but bad reactive marketing can make the problem much worse.
[co-authored by Zachary Reiss-Davis]
As has been widely reported, in sources broad and narrow, Amazon.com’s cloud service EC2 went down for an extended period of time yesterday, bringing many of the hottest high-tech startups with it, ranging from the well known (Foursquare, Quora) to the esoteric (About.me, EveryTrail). For a partial list of smaller startups affected, see http://ec2disabled.com/.
While this is clearly a blow to both Amazon.com and to the cloud hosting market in general, it also serves as an example of how technology companies must quickly respond publicly and engage with their customers when problems arise. Amazon.com let their customers control the narrative by not participating in any social media response to the problem; their only communication was through their online dashboard with vague platitudes. Instead, they allowed angry heads of product management and CEOs who are used to communicating with their customers on blogs and Twitter to unequivocally blame Amazon.com for the problem.