Intel Steps On The Accelerator, Reveals Many Independent Core Road Map

While NVIDIA and to a lesser extent AMD (via its ATI branded product line) have effectively monopolized the rapidly growing and hyperbole-generating market for GPGPUs, highly parallel application accelerators, Intel has teased the industry for several years, starting with its 80-core Polaris Research Processor demonstration in 2008. Intel’s strategy was pretty transparent – it had nothing in this space, and needed to serve notice that it was actively pursuing it without showing its hand prematurely. This situation of deliberate ambiguity came to an end last month when Intel finally disclosed more details on its line of Many Independent Core (MIC) accelerators.

Intel’s approach to attached parallel processing is radically different than its competitors and appears to make excellent use of its core IP assets – fabrication and expertise and the x86 instruction set. While competing products from NVIDIA and AMD are based on graphics processing architectures, employing 100s of parallel non-x86 cores, Intel’s products will feature a smaller (32 – 64 in the disclosed products) number of simplified x86 cores on the theory that developers will be able to harvest large portions of code that already runs on 4 – 10 core x86 CPUs and easily port them to these new parallel engines.

Obviously “the devil is in the details” – Intel is also refining an array of parallel programming tools, including compilers, simulators, debugging tools and accelerated parallel libraries, but several early beta partners for the Knight’s Ferry* development system, a PCIe card unit with 32 cores, have publicly reported excellent results, with speedups of 10x or more on their sample applications. Code samples they have released give us a strong impression that if you have code that is optimized to take advantage of Intel’s current multi-core products, the additional compiler options to extend that to a larger number of cores are easily inserted into the code. Of course, the process of redesigning the algorithm initially to take advantage of 8 or 10 parallel cores may not have been trivial, and extending the algorithm to a parallel architecture with up to 8x the number of cores on an attached processor with a very different and long-latency memory architecture may entail significant further alternations of the algorithm and the code to express it. Nonetheless, the ability to start with existing x86 code cannot be anything other than a strong positive for the size of the potential developer community.

The product road map to date consists of “Knight’s Ferry,” a 32-core development system available in limited quantities to select partners, and “Knight’s Corner,” a product which we expect to see in 2012 with an undisclosed number of cores, to be delivered in Intel’s 22 nm process. While Intel has only stated that it will be at least 50 cores, based on the process jump from 45 nm (Knight’s Ferry) to 22 nm, we expect at least 64 cores in Knight’s Corner, with a possibility of up to 128, depending on yields, anticipated pricing and maturity of the supporting software tools.

The big question is how will this play in a technology segment already dominated by two competitors where users have already demonstrated a willingness to develop in alternative languages like Cuda and Open CL? Our prognostication is that Intel will probably not cause the immediate collapse of revenues for its competitors, but will certainly put increasing pressure on them while opening up the market for less aggressively scaled parallel acceleration of a much wider range of applications. Regardless of the potential need to change the code, the ability to start with the source code for an existing x86 application will remain a strategic advantage for Intel, and they will probably sell a lot of Knight’s Corner units to people who just want their application to go faster with minimal development effort and for whom the performance advantages of an additional 64 cores will be significant, assuming that Intel prices these units competitively. Interestingly enough, all of the arguments about the utility of starting with existing x86 source code also apply to AMD’s APU technology, which also benefits from a more intimate and lower latency memory interconnect than does the Knight’s/Xeon duo.

So what’s beyond Knight’s Corner? Intel has not disclosed any further details on the product road map, and beyond a high probability of a denser product based on the next major process iteration beyond 22nm, we have little informed speculation to offer. However, I would offer the additional guess that Intel’s elves are at work on a QuickPath interface version of the MIC technology, since the major drawback of the Knight’s family is that in common with most competitors it suffers (a relative word considering the impressive performance gains these products have demonstrated even in their first release) from long memory latencies for loading and unloading blocks of memory over PCIe.

All in all a major step forward in performance potential for the x86 architecture and increased pressure for NVIDIA and AMD. We look forward to continued activity in the attached accelerator segment as these products get closer to market.