ScaleMP – Interesting Twist On Systems Scalability And Virtualization

I just spent some time talking to ScaleMP, an interesting niche player that provides a server virtualization solution. What is interesting about ScaleMP is that rather than splitting a single physical server into multiple VMs, they are the only successful offering (to the best of my knowledge) that allows I&O groups to scale up a collection of smaller servers to work as a larger SMP.

Others have tried and failed to deliver this kind of solution, but ScaleMP seems to have actually succeeded, with a claimed 200 customers and expectations of somewhere between 250 and 300 next year.

Their vSMP product comes in two flavors, one that allows a cluster of machines to look like a single system for purposes of management and maintenance while still running as independent cluster nodes, and one that glues the member systems together to appear as a single monolithic SMP.

Does it work? I haven’t been able to verify their claims with actual customers, but they have been selling for about five years, claim over 200 accounts, with a couple of dozen publicly referenced. All in all, probably too elaborate a front to maintain if there was really nothing there. The background of the principals and the technical details they were willing to share convinced me that they have a deep understanding of the low-level memory management, prefectching, and caching that would be needed to make a collection of systems function effectively as a single system image. Their smaller scale benchmarks displayed good scalability in the range of 4 – 8 systems, well short of their theoretical limits.

My quick take is that the software works, and bears investigation if you have an application that:

  1. Either is certified to run with ScaleMP (not many), or one where that you control the code.
  2. You understand the memory reference patterns of the application, and
  3. The application can be tuned to keep memory references cache-local and or resident on system memory local to the physical node – remote memory references that require coherence are probably what will bring this solution to its knees.

All in all an interesting niche solution that may be worth looking at if you have an application that is scaling past the bounds of your largest single server and you don’t want to invest in a big SMP.

Would you consider such a solution? You can check them out at