Last year I wrote about Oracle’s new plans for SPARC, anchored by a new line of SPARC CPUs engineered in conjunction with Fujitsu (Does SPARC have a Future?), and commented that the first deliveries of this new technology would probably be in early 2012, and until we saw this tangible evidence of Oracle’s actual execution of this road map we could not predict with any confidence the future viability of SPARC.
The T4 CPU
Fast forward a year and Oracle has delivered the first of the new CPUs, ahead of schedule and with impressive gains in performance that make it look like SPARC will remain a viable platform for years. Specifically, Oracle has introduced the T4 CPU and systems based on them. The T4, an evolution of Oracle’s highly threaded T-Series architecture, is implemented with an entirely new core that will form the basis, with variations in number of threads versus cores and cache designs, of the future M and T series systems. The M series will have fewer threads and more performance per thread, while the T CPUs will, like their predecessors, emphasize throughput for highly threaded workloads. The new T4 will have 8 cores, and each core will have 8 threads. While the T4 emphasizes highly threaded workload performance, it is important to note that Oracles has radically improved single-thread performance over its predecessors, with Oracle claiming performance per thread improvements of 5X over its predecessors, greatly improving its utility as a CPU to power less thread-intensive workloads as well.
Well, maybe everybody is saying “cloud” these days, but my first impression of Microsoft Windows Server 8 (not the final name) is that Microsoft has been listening very closely to what customers want from an OS that can support both public and private enterprise cloud implementations. And most importantly, the things that they have built into WS8 for “clouds” also look like they make life easier for plain old enterprise IT.
Microsoft appears to have focused its efforts on several key themes, all of which benefit legacy IT architectures as well as emerging clouds:
Management, migration and recovery of VMs in a multi-system domain – Major improvements in Hyper-V and management capabilities mean that I&O groups can easily build multi-system clusters of WS8 servers, and easily migrate VMs across system boundaries. Muplitle systems can be clustered with Fibre Channel, making it easier to implement high-performance clusters.
Multi-tenancy – A host of features, primarily around management and role-based delegation that make it easier and more secure to implement multi-tenant VM clouds.
Recovery and resiliency – Microsoft claims that they can failover VMs from one machine to another in 25 seconds, a very impressive number indeed. While vendor performance claims are always like EPA mileage – you are guaranteed never to exceed this number – this is an impressive claim and a major capability, with major implications for HA architecture in any data center.
At the Hot Chips conference last week, Intel disclosed additional details about the upcoming Poulson Itanium CPU due for shipment early next year. For Itanium loyalists (essentially committed HP-UX customers) the disclosures are a ray of sunshine among the gloomy news that has been the lot of Itanium devotees recently.
Poulson will bring several significant improvements to Itanium in both performance and reliability. On the performance side, we have significant improvements on several fronts:
Process – Poulson will be manufactured with the same 32 nm semiconductor process that will (at least for a while) be driving the high-end Xeon processors. This is goodness all around – performance will improve and Intel now can load its latest production lines more efficiently.
More cores and parallelism – Poulson will be an 8-core processor with a whopping 54 MB of on-chip cache, and Intel has doubled the width of the multi-issue instruction pipeline, from 6 to 12 instructions. Combined with improved hyperthreading, the combination of 2X cores and 2X the total number of potential instructions executed per clock cycle by each core hints at impressive performance gains.
Architecture and instruction tweaks – Intel has added additional instructions based on analysis of workloads. This kind of tuning of processor architectures seldom results in major gains in performance, but every small increment helps.
We have been repeatedly reminded that the requirements of hyper-scale cloud properties are different from those of the mainstream enterprise, but I am now beginning to suspect that the top strata of the traditional enterprise may be leaning in the same direction. This suspicion has been triggered by the combination of a recent day in NY visiting I&O groups in a handful of very large companies and a number of unrelated client interactions.
The pattern that I see developing is one of “haves” versus “have nots” in terms of their ability to execute on their technology vision with internal resources. The “haves” are the traditional large sophisticated corporations, with a high concentration in financial services. They have sophisticated IT groups, are capable fo writing extremely complex systems management and operations software, and typically own and manage 10,000 servers or more. The have nots are the ones with more modest skills and abilities, who may own 1000s of servers, but tend to be less advanced than the core FSI companies in terms of their ability to integrate and optimize their infrastructure.
The divergence in requirements comes from what they expect and want from their primary system vendors. The have nots are companies who understand their limitations and are looking for help form their vendors in the form of converged infrastructures, new virtualization management tools, and deeper integration of management software to automate operational tasks, These are people who buy HP c-Class, Cisco UCS, for example, and then add vendor-supplied and ISV management and automation tools on top of them in an attempt to control complexity and costs. They are willing to accept deeper vendor lock-in in exchange for the benefits of the advanced capabilities.
I recently had an opportunity to spend some time with SUSE management, including President and General Manager Nils Brauckmann, and came away with what I think is a reasonably clear picture of The Attachmate Group’s (TAG) intentions and of SUSE’s overall condition these days. Overall, impressions were positive, with some key takeaways:
TAG has clarified its intentions regarding SUSE. TAG has organized its computer holdings as four independent business units, Novell, NetIQ, Attachmate and SUSE, each one with its own independent sales, development, marketing, etc. resources. The advantages and disadvantages of this approach are pretty straightforward, with the lack of opportunity to share resources aiming the business units for R&D and marketing/sales being balanced off by crystal clear accountability and the attendant focus it brings. SUSE management agrees that it has undercommunicated in the past, and says that now that the corporate structure has been nailed down it will be very aggressive in communicating its new structure and goals.
SUSE’s market presence has shifted to a more balanced posture. Over the last several years SUSE has shifted to a somewhat less European-centric focus, with 50% of revenues coming from North America, less than 50% from EMEA, and claims to be the No. 1 Linux vendor in China, where it has expanded its development staffing. SUSE claims to have gained market share overall, laying claim to approximately 30% of WW Linux market share by revenue.
Focus on enterprise and cloud. Given its modest revenues of under $200 million, SUSE realizes that it cannot be all things to all people, and states that it will be focusing heavily on enterprise business servers and cloud technology, with less emphasis on desktops and projects that do not have strong financial returns, such as its investment in Mono, which it has partnered with Xamarin to continue development,.
Over the past months server vendors have been announcing benchmark results for systems incorporating Intel’s high-end x86 CPU, the E7, with HP trumping all existing benchmarks with their recently announced numbers (although, as noted in x86 Servers Hit The High Notes, the results are clustered within a few percent each other). HP recently announced new performance numbers for their ProLiant DL980, their high-end 8-socket x86 server using the newest Intel E7 processors. With up to 10 cores, these new processors can bring up to 80 cores to bear on large problems such as database, ERP and other enterprise applications.
The performance results on the SAP SD 2-Tier benchmark, for example, at 25160 SD users, show a performance improvement of 35% over the previous high-water mark of 18635. The results seem to scale almost exactly with the product of core count x clock speed, indicating that both the system hardware and the supporting OS, in this case Windows Server 2008, are not at their scalability limits. This gives us confidence that subsequent spins of the CPU will in turn yield further performance increases before hitting system of OS limitations. Results from other benchmarks show similar patterns as well.
Key takeaways for I&O professionals include:
Expect to see at least 25% to 35% throughput improvements in many workloads with systems based on the latest the high-performance PCUs from Intel. In situations where data center space and cooling resources are constrained this can be a significant boost for a same-footprint upgrade of a high-end system.
For Unix to Linux migrations, target platform scalability continues become less of an issue.
Since Oracle dropped their bombshell on HP and Itanium, I have fielded multiple emails and about a dozen inquiries from HP and Oracle customers wanting to discuss their options and plans. So far, there has been no general sense of panic, and the scenarios seem to be falling into several buckets:
The majority of Oracle DB/HP customers are not at the latest revision of Oracle, so they have a window within which to make any decisions, bounded on the high end by the time it will take them to make a required upgrade of their application plus DB stack past the current 11.2 supported Itanium release. For those customers still on Oracle release 9, this can be many years, while for those currently on 11.2, the next upgrade cycle will cause a dislocation. The most common application that has come up in inquiries is SAP, with Oracle’s own apps second.
Customers with other Oracle software, such as Hyperion, Peoplesoft, Oracle’s eBusiness Suite, etc., and other ISV software are often facing complicated constraints on their upgrades. In some cases decisions by the ISVs will drive the users toward upgrades they do not want to make. Several clients told me they will defer ISV upgrades to avoid being pushed into an unsupported version of the DB.
Intel today publicly announced its anticipated “Westmere EX” high end Westmere architecture server CPU as the E7, now part of a new family nomenclature encompassing entry (E3), midrange (E5), and high-end server CPUs (E7), and at first glance it certainly looks like it delivers on the promise of the Westmere architecture with enhancements that will appeal to buyers of high-end x86 systems.
The E7 in a nutshell:
32 nm CPU with up to 10 cores, each with hyper threading, for up to 20 threads per socket.
Intel claims that the system-level performance will be up to 40% higher than the prior generation 8-core Nehalem EX. Notice that the per-core performance improvement is modest (although Intel does offer a SKU with 8 cores and a slightly higher clock rate for those desiring ultimate performance per thread).
Improvements in security with Intel Advanced Encryption Standard New Instruction (AES-NI) and Intel Trusted Execution Technology (Intel TXT).
Major improvements in power management by incorporating the power management capabilities from the Xeon 5600 CPUs, which include more aggressive P states, improved idle power operation, and the ability to separately reduce individual core power setting depending on workload, although to what extent this is supported on systems that do not incorporate Intel’s Node Manager software is not clear.
This week at ISSCC, Intel made its first detailed public disclosures about its upcoming “Poulson” next-generation Itanium CPU. While not in any sense complete, the details they did disclose paint a picture of a competent product that will continue to keep the heat on in the high-end UNIX systems market. Highlights include:
Process — Poulson will be produced in a 32 nm process, skipping the intermediate 45 nm step that many observers expected to see as a step down from the current 65 nm Itanium process. This is a plus for Itanium consumers, since it allows for denser circuits and cheaper chips. With an industry record 3.1 billion transistors, Poulson needs all the help it can get keeping size and power down. The new process also promises major improvements in power efficiency.
Cores and cache — Poulson will have 8 cores and 54 MB of on-chip cache, a huge amount, even for a cache-sensitive architecture like Itanium. Poulson will have a 12-issue pipeline instead of the current 6-issue pipeline, promising to extract more performance from existing code without any recompilation.
Compatibility — Poulson is socket- and pin-compatible with the current Itanium 9300 CPU, which will mean that HP can move more quickly into production shipments when it's available.
I’ve recently had the opportunity to talk with a small sample of SLES 11 and RH 6 Linux users, all developing their own applications. All were long-time Linux users, and two of them, one in travel services and one in financial services, had applications that can be described as both large and mission-critical.
The overall message is encouraging for Linux advocates, both the calm rational type as well as those who approach it with near-religious fervor. The latest releases from SUSE and Red Hat, both based on the 2.6.32 Linux kernel, show significant improvements in scalability and modest improvements in iso-configuration performance. One user reported that an application that previously had maxed out at 24 cores with SLES 10 was now nearing production certification with 48 cores under SLES 11. Performance scalability was reported as “not linear, but worth doing the upgrade.”
Overall memory scalability under Linux is still a question mark, since the widely available x86 platforms do not exceed 3 TB of memory, but initial reports from a user familiar with HP’s DL 980 verify that the new Linux Kernel can reliably manage at least 2TB of RAM under heavy load.
File system options continue to expand as well. The older Linux FS standard, ETX4, which can scale to “only” 16 TB, has been joined by additional options such as XFS (contributed by SGI), which has been implemented in several installations with file systems in excess of 100 TB, relieving a limitation that may have been more psychological than practical for most users.