I’ve been talking to a number of users and providers of bare-metal cloud services, and am finding the common threads among the high-profile use cases both interesting individually and starting to connect some dots in terms of common use cases for these service providers who provide the ability to provision and use dedicated physical servers with very similar semantics to the common VM IaaS cloud – servers that can be instantiated at will in the cloud, provisioned with a variety of OS images, be connected to storage and run applications. The differentiation for the customers is in behavior of the resulting images:
Deterministic performance – Your workload is running on a dedicated resource, so there is no question of any “noisy neighbor” problem, or even of sharing resources with otherwise well-behaved neighbors.
Extreme low latency – Like it or not, VMs, even lightweight ones, impose some level of additional latency compared to bare-metal OS images. Where this latency is a factor, bare-metal clouds offer a differentiated alternative.
Raw performance – Under the right conditions, a single bare-metal server can process more work than a collection of VMs, even when their nominal aggregate performance is similar. Benchmarking is always tricky, but several of the bare metal cloud vendors can show some impressive comparative benchmarks to prospective customers.
Last year at VMworld I noted Xsigo Systems, a small privately held company at VMworld showing their I/O Director technology, which delivereda subset of HP Virtual Connect or Cisco UCS I/O virtualization capability in a fashion that could be consumed by legacy rack-mount servers from any vendor. I/O Director connects to the server with one or more 10 G Ethernet links, and then splits traffic out into enterprise Ethernet and FC networks. On the server side, the applications, including VMware, see multiple virtual NICs and HBAs courtesy of Xsigo’s proprietary virtual NIC driver.
Controlled via Xsigo’s management console, the server MAC and WWNs can be programmed, and the servers can now connect to multiple external networks with fewer cables and substantially lower costs for NIC and HBA hardware. Virtualized I/O is one of the major transformative developments in emerging data center architecture, and will remain a theme in Forrester’s data center research coverage.
This year at VMworld, Xsigo announced a major expansion of their capabilities – Xsigo Server Fabric, which takes the previous rack-scale single-Xsigo switch domains and links them into a data-center-scale fabric. Combined with improvements in the software and UI, Xsigo now claims to offer one-click connection of any server resource to any network or storage resource within the domain of Xsigo’s fabric. Most significantly, Xsigo’s interface is optimized to allow connection of VMs to storage and network resources, and to allow the creation of private VM-VM links.
After considerable speculation and anticipation, VMware has finally announced vSphere 5 as part of a major cloud infrastructure launch, including vCloud Director 1.5, SRM 5 and vShield 5. From our first impressions, it is both well worth the wait and merits immediate serious consideration as an enterprise virtualization platform, particularly for existing VMware customers.
The list of features is voluminous, with at least 100 improvements, large and small, but among the features, several stand out as particularly significant as I&O professionals continue their efforts to virtualize the data center, primarily dealing with and support for both larger VMs and physical host systems, and also with the improved manageability of storage and improvements Site Recovery Manager (SRM), the remote-site HA components:
Replication improvements for Site Recovery Manager, allowing replication without SANs
Distributed Resource Scheduling (DRS) for Storage
Support for up to 1 TB of memory per VM
Support for 32 vCPUs per VM
Support for up to 160 Logical CPUs and 2 TB or RAM
New GUI to configure multicore vCPUs
Storage driven storage delivery based on the VMware-Aware Storage APIs
Improved version of the Cluster File System, VMFS5
Storage APIs – Array Integration: Thin Provisioning enabling reclaiming blocks of a thin provisioned LUN on the array when a virtual disk is deleted
Swap to SSD
2TB+ LUN support
Storage vMotion snapshot support
vNetwork Distributed Switch improvements providing improved visibility in VM traffic
vCenter Server Appliance
vCenter Solutions Manager, providing a consistent interface to configure and monitor vCenter-integrated solutions developed by VMware and third parties
Revamped VMware High Availability (HA) with Fault Domain Manager
I just spent some time talking to ScaleMP, an interesting niche player that provides a server virtualization solution. What is interesting about ScaleMP is that rather than splitting a single physical server into multiple VMs, they are the only successful offering (to the best of my knowledge) that allows I&O groups to scale up a collection of smaller servers to work as a larger SMP.
Others have tried and failed to deliver this kind of solution, but ScaleMP seems to have actually succeeded, with a claimed 200 customers and expectations of somewhere between 250 and 300 next year.
Their vSMP product comes in two flavors, one that allows a cluster of machines to look like a single system for purposes of management and maintenance while still running as independent cluster nodes, and one that glues the member systems together to appear as a single monolithic SMP.
Does it work? I haven’t been able to verify their claims with actual customers, but they have been selling for about five years, claim over 200 accounts, with a couple of dozen publicly referenced. All in all, probably too elaborate a front to maintain if there was really nothing there. The background of the principals and the technical details they were willing to share convinced me that they have a deep understanding of the low-level memory management, prefectching, and caching that would be needed to make a collection of systems function effectively as a single system image. Their smaller scale benchmarks displayed good scalability in the range of 4 – 8 systems, well short of their theoretical limits.
My quick take is that the software works, and bears investigation if you have an application that:
Either is certified to run with ScaleMP (not many), or one where that you control the code.
You understand the memory reference patterns of the application, and