Over the past months server vendors have been announcing benchmark results for systems incorporating Intel’s high-end x86 CPU, the E7, with HP trumping all existing benchmarks with their recently announced numbers (although, as noted in x86 Servers Hit The High Notes, the results are clustered within a few percent each other). HP recently announced new performance numbers for their ProLiant DL980, their high-end 8-socket x86 server using the newest Intel E7 processors. With up to 10 cores, these new processors can bring up to 80 cores to bear on large problems such as database, ERP and other enterprise applications.
The performance results on the SAP SD 2-Tier benchmark, for example, at 25160 SD users, show a performance improvement of 35% over the previous high-water mark of 18635. The results seem to scale almost exactly with the product of core count x clock speed, indicating that both the system hardware and the supporting OS, in this case Windows Server 2008, are not at their scalability limits. This gives us confidence that subsequent spins of the CPU will in turn yield further performance increases before hitting system of OS limitations. Results from other benchmarks show similar patterns as well.
Key takeaways for I&O professionals include:
Expect to see at least 25% to 35% throughput improvements in many workloads with systems based on the latest the high-performance PCUs from Intel. In situations where data center space and cooling resources are constrained this can be a significant boost for a same-footprint upgrade of a high-end system.
For Unix to Linux migrations, target platform scalability continues become less of an issue.
Not to be left out of the announcement fever that has gripped vendors recently, Cisco today announced several updates to their UCS product line aimed at easing potential system bottlenecks by improving the whole I/O chain between the network and the servers, and improving management, including:
Improved Fabric Interconnect (FI) – The FI is the top of the UCS hardware hierarchy, a thinly disguised Nexus 5xxx series switch that connects the UCS hierarchy to the enterprise network and runs the UCS Manager (UCSM) software. Previously the highest end FI had 40 ports, each of which had to be specifically configured as Ethernet, FCoE, or FC. The new FI, the model 6248UP has 48 ports, each one of which can be flexibly assigned as up toa 10G port for any of the supported protocols. In addition to modestly raising the bandwidth, the 6248UP brings increased flexibility and a claimed 40% reduction in latency.
New Fabric Extender (FEX) – The FEXC connects the individual UCS chassis with the FI. With the new 2208 FEX, Cisco doubles the bandwidth between the chassis and the FI.
VIC1280 Virtual Interface Card (VIC) – At the bottom of the management hierarchy the new VIC1280 quadruples the bandwidth to each individual server to a total of 80 GB. The 80 GB can be presented as up to 8 10 GB physical NICs or teamed into a pair fo 40 Gb NICS, with up to 256 virtual devices (vNIC, vHBA, etc presented to the software running on the servers.
While NVIDIA and to a lesser extent AMD (via its ATI branded product line) have effectively monopolized the rapidly growing and hyperbole-generating market for GPGPUs, highly parallel application accelerators, Intel has teased the industry for several years, starting with its 80-core Polaris Research Processor demonstration in 2008. Intel’s strategy was pretty transparent – it had nothing in this space, and needed to serve notice that it was actively pursuing it without showing its hand prematurely. This situation of deliberate ambiguity came to an end last month when Intel finally disclosed more details on its line of Many Independent Core (MIC) accelerators.
Intel’s approach to attached parallel processing is radically different than its competitors and appears to make excellent use of its core IP assets – fabrication and expertise and the x86 instruction set. While competing products from NVIDIA and AMD are based on graphics processing architectures, employing 100s of parallel non-x86 cores, Intel’s products will feature a smaller (32 – 64 in the disclosed products) number of simplified x86 cores on the theory that developers will be able to harvest large portions of code that already runs on 4 – 10 core x86 CPUs and easily port them to these new parallel engines.