Last year I published a reasonably well-received research document on Hadoop infrastructure, “Building the Foundations for Customer Insight: Hadoop Infrastructure Architecture”. Now, less than a year later it’s looking obsolete, not so much because it was wrong for traditional (and yes, it does seem funny to use a word like “traditional” to describe a technology that itself is still rapidly evolving and only in mainstream use for a handful of years) Hadoop, but because the universe of analytics technology and tools has been evolving at light-speed.
If your analytics are anchored by Hadoop and its underlying map reduce processing, then the mainstream architecture described in the document, that of clusters of servers each with their own compute and storage, may still be appropriate. On the other hand, if, like many enterprises, you are adding additional analysis tools such as NoSQL databases, SQL on Hadoop (Impala, Stinger, Vertica) and particularly Spark, an in-memory-based analytics technology that is well suited for real-time and streaming data, it may be necessary to begin reassessing the supporting infrastructure in order to build something that can continue to support Hadoop as well as cater to the differing access patterns of other tools sets. This need to rethink the underlying analytics plumbing was brought home by a recent demonstration by HP of a reference architecture for analytics, publicly referred to as the HP Big Data Reference Architecture.
There is always a tendency to regard the major players in large markets as being a static background against which the froth of smaller companies and the rapid dance of customer innovation plays out. But if we turn our lens toward the major server vendors (who are now also storage and networking as well as software vendors), we see that the relatively flat industry revenues hide almost continuous churn. Turn back the clock slightly more than five years ago, and the market was dominated by three vendors, HP, Dell and IBM. In slightly more than five years, IBM has divested itself of highest velocity portion of its server business, Dell is no longer a public company, Lenovo is now a major player in servers, Cisco has come out of nowhere to mount a serious challenge in the x86 server segment, and HP has announced that it intends to split itself into two companies.
And it hasn’t stopped. Two recent events, the fracturing of the VCE consortium and the formerly unthinkable hook-up of IBM and Cisco illustrate the urgency with which existing players are seeking differential advantage, and reinforce our contention that the whole segment of converged and integrated infrastructure remains one of the active and profitable segments of the industry.
EMC’s recent acquisition of Cisco’s interest in VCE effectively acknowledged what most customers have been telling us for a long time – that VCE had become essentially an EMC-driven sales vehicle to sell storage, supported by VMware (owned by EMC) and Cisco as a systems platform. EMC’s purchase of Cisco’s interest also tacitly acknowledges two underlying tensions in the converged infrastructure space:
At its Paris summit, the OpenStack community celebrated the 10th release of what has become the leading open source Infrastructure as a Service cloud platform software. What stood out about this latest iteration and the progress of its ever-growing ecosystem of vendors, users and service providers was the lack of excitement that comes with maturity. The Juno release addressed many challenges holding back enterprise adoption to this point and showed signs that 2015 may prove to be the year its use shifts over from mostly test & dev, to mostly production. Forrester clients will find a new Quick Take on OpenStack that analyzes the state of this platform and recommended actions here. In this blog post we look at looming questions facing the OpenStack community that could affect the pace and direction of its innovation.
Dell today announced its new FX system architecture, and I am decidedly impressed.
Dell FX is a 2U flexible infrastructure building block that allows infrastructure architects to compose an application-appropriate server and storage infrastructure out of the following set of resources:
Multiple choices of server nodes, ranging from multi-core Atom to new Xeon E5 V3 servers. With configurations ranging from 2 to 16 server nodes per enclosure, there is pretty much a configuration point for most mainstream applications.
A novel flexible method of mapping disks from up to three optional disk modules, each with 16 drives - the mapping, controlled by the onboard management, allows each server to appear as if the disk is locally attached DASD, so no changes are needed in any software that thinks it is accessing local storage. A very slick evolution in storage provisioning.
A set of I/O aggregators for consolidating Ethernet and FC I/O from the enclosure.
All in all, an attractive and flexible packaging scheme for infrastructure that needs to be tailored to specific combinations of server, storage and network configurations. Probably an ideal platform to support the Nutanix software suite that Dell is reselling as well. My guess is that other system design groups are thinking along these lines, but this is now a pretty unique package, and merits attention from infrastructure architects.
On October 20 at TechEd, Microsoft quietly slipped in what looks like a potential game-changing announcement in the private/hybrid cloud world when they rolled out Microsoft Cloud Platform System (CPS), an integrated hardware/software system that combines an Azure-consistent on premise cloud with an optimized hardware stack from Dell.
While the timing of the event comes as a surprise, the fact that IBM has decided to unload its technically excellent but unprofitable semiconductor manufacturing operation does not, nor does its choice of Globalfoundries, with whom it has had a longstanding relationship.
Very much in the shadows of all the press coverage and hysteria attendant on emerging cloud architectures and customer-facing systems of engagement are the nitty-gritty operational details that lurk like monsters in the swamp of legacy infrastructure, and some of them have teeth. And sometimes these teeth can really take a bite out of the posterior of an unprepared organization.
One of those toothy animals that I&O groups are increasingly encountering in their landscapes is the problem of what to do with Windows Server 2003 (WS2003). It turns out there are still approximately 11 million WS2003 systems running today, with another 10+ million instances running as VM guests. Overall, possibly more than 22 million OS images and a ton of hardware that will need replacing and upgrading. And increasing numbers of organizations have finally begun to take seriously the fact that Microsoft is really going to end support and updates as of July 2015.
Based on the conversations I have been having with our clients, the typical I&O group that is now scrambling to come up with a plan has not been willfully negligent, nor are they stupid. Usually WS2003 servers are legacy servers, quietly running some mature piece of code, often in satellite locations or in the shops of acquired companies. The workloads are a mix of ISV and bespoke code, but it is often a LOB-specific application, with the run-of-the-mill collaboration, infrastructure servers and, etc. having long since migrated to newer platforms. A surprising number of clients have told me that they have identified the servers, but not always the applications or the business owners – often a complex task for an old resource in a large company.
I'm at IDF, a major geekfest for the people interested in the guts of today’s computing infrastructure, and will be immersing myself in the flow for a couple of days. Before going completely off the deep end, I wanted to call out the announcement of the new Xeon E5. While I’ve discussed it in more depth in an accompanying Quick Take just published on our main website, I wanted to add some additional comments on its implications for data center operations, particularly in the areas of capacity planning and long-term capital budgeting.
For many years, each successive iteration of Intel’s and partners’ roadmaps has been quietly delivering a major benefit that seldom gets top billing – additional capacity within the same power and physical footprint, and the resulting ability for users from small enterprises to mega-scale service providers, to defer additional data spending capital expense.
A group of us just published an analysis of VMworld (Breaking Down VMworld), and I thought I’d take this opportunity to add some additional color to the analysis. The report is an excellent synthesis of our analysis, the work of a talented team of collaborators with my two cents thrown in as well, but I wanted to emphasize a few additional impressions, primarily around storage, converged infrastructure, and the overall tone of the show.
First, storage. If they ever need a new name for the show, they might consider “StorageWorld” – it seemed to me that just about every other booth on the show floor was about storage. Cloud storage, flash storage, hybrid storage, cheap storage, smart storage, object storage … you get the picture.[i] Reading about the hyper-growth of storage and the criticality of storage management to the overall operation of a virtualized environment does not drive the concept home in quite the same way as seeing 1000s of show attendees thronging the booths of the storage vendors, large and small, for days on end. Another leading indicator, IMHO, was the “edge of the show” booths, the cheaper booths on the edge of the floor, where smaller startups congregate, which was also well populated with new and small storage vendors – there is certainly no shortage of ambition and vision in the storage technology pipeline for the next few years.
I’ve recently been thinking a lot about application-specific workloads and architectures (Optimize Scalalable Workload-Specific Infrastructure for Customer Experiences), and it got me to thinking about the extremes of the server spectrum – the very small and the very large as they apply to x86 servers. The range, and the variation in intended workloads is pretty spectacular as we diverge from the mean, which for the enterprise means a 2-socket Xeon server, usually in 1U or 2U form factors.
At the bottom, we find really tiny embedded servers, some with very non-traditional packaging. My favorite is probably the technology from Arnouse digital technology, a small boutique that produces computers primarily for military and industrial ruggedized environments.
Slightly bigger than a credit card, their BioDigital server is a rugged embedded server with up to 8 GB of RAM and 128 GB SSD and a very low power footprint. Based on an Atom-class CPU, thus is clearly not the choice for most workloads, but it is an exemplar of what happens when the workload is in a hostile environment and the computer maybe needs to be part of a man-carried or vehicle-mounted portable tactical or field system. While its creators are testing the waters for acceptance as a compute cluster with up to 4000 of them mounted in a standard rack, it’s likely that these will remain a niche product for applications requiring the intersection of small size, extreme ruggedness and complete x86 compatibility, which includes a wide range of applications from military to portable desktop modules.