For Better Security Operations, Speak to the Pack in its Native Tongue

Chase Cunningham

I have a huge German Shepherd that ranks only slightly behind my human children when it comes to being spoiled and how much attention he gets.  I’ve been working on training him for nearly a year now, and he amazes me with how intelligent he is. He knows all the basics: sit, stay, here, lay down, etc. But he also picked up detecting scents very quickly and is learning to detect things with his nose that I can’t even see with my eyes. And he does all of these things faster than most kids learn to break the Netflix password.  

The other day, working with him on his training points, I thought to myself, “Woah, my dog speaks human.” Not just English either. He speaks German (that’s the language he's trained in), and he totally understands it. I realized the problem is that I don't speak “Dog.” My dog knows about 30 human words, and they are words in a language his master has no business trying to pronounce, mind you. But he knows what those words mean, and he gets the tasking or request down every time they're uttered. He could look at me for an hour and bark, growl, howl, yip, or yelp constantly, and he could be telling me the cure for cancer and I wouldn’t know it.  

OK that’s interesting, but what does it have to do with better communication among techies?

Read more

Intel Announces Xeon SOC – Seriously Raising the Bar for AMD and ARM Competition

Richard Fichera

Intel has made no secret of its development of the Xeon D, an SOC product designed to take Xeon processing close to power levels and product niches currently occupied by its lower-power and lower performance Atom line, and where emerging competition from ARM is more viable.

The new Xeon D-1500 is clear evidence that Intel “gets it” as far as platforms for hyperscale computing and other throughput per Watt and density-sensitive workloads, both in the enterprise and in the cloud are concerned. The D1500 breaks new ground in several areas:

It is the first Xeon SOC, combining 4 or 8 Xeon cores with embedded I/O including SATA, PCIe and multiple 10 nd 1 Gb Ethernet ports.

(Source: Intel)

It is the first of Intel’s 14 nm server chips expected to be introduced this year. This expected process shrink will also deliver a further performance and performance per Watt across the entire line of entry through mid-range server parts this year.

Why is this significant?

With the D-1500, Intel effectively draws a very deep line in the sand for emerging ARM technology as well as for AMD. The D1500, with 20W – 45W power, delivers the lower end of Xeon performance at power and density levels previously associated with Atom, and close enough to what is expected from the newer generation of higher performance ARM chips to once again call into question the viability of ARM on a pure performance and efficiency basis. While ARM implementations with embedded accelerators such as DSPs may still be attractive in selected workloads, the availability of a mainstream x86 option at these power levels may blunt the pace of ARM design wins both for general-purpose servers as well as embedded designs, notably for storage systems.

Read more

New ARM-based Moonshot Servers from HP Exemplify Workload-Specific Computing

Richard Fichera

One of the developing trends in computing, relevant to both enterprise and service providers alike, is the notion of workload-specific or application-centric computing architectures. These architectures, optimized for specific workloads, promise improved efficiencies for running their targeted workloads, and by extension the services that they support. Earlier this year we covered the basics of this concept in “Optimize Scalable Workload-Specific Infrastructure for Customer Experiences”, and this week HP has announced a pair of server cartridges for their Moonshot system that exemplify this concept, as well as being representative of the next wave of ARM products that will emerge during the remainder of 2014 and into 2015 to tilt once more at the x86 windmill that currently dominates the computing landscape.

Specifically, HP has announced the ProLiant m400 Server Cartridge (m400) and the ProLiant m800 Server Cartridge (m800), both ARM-based servers packaged as cartridges for the HP Moonshot system, which can hold up to 45 of these cartridges in its approximately 4U enclosure. These servers are interesting from two perspectives – that they are both ARM-based products, one being the first tier-1 vendor offering of a 64-bit ARM CPU and that they are both being introduced with a specific workload target in mind for which they have been specifically optimized.

Read more

AMD Quietly Rolls Out hUMA – Potential Game-Changer for Parallel Computing

Richard Fichera

Background  High Performance Attached Processors Handicapped By Architecture

The application of high-performance accelerators, notably GPUs, GPGPUs (APUs in AMD terminology) to a variety of computing problems has blossomed over the last decade, resulting in ever more affordable compute power for both horizon and mundane problems, along with growing revenue streams for a growing industry ecosystem. Adding heat to an already active mix, Intel’s Xeon Phi accelerators, the most recent addition to the GPU ecosystem, have the potential to speed adoption even further due to hoped-for synergies generated by the immense universe of x86 code that could potentially run on the Xeon Phi cores.

However, despite any potential synergies, GPUs (I will use this term generically to refer to all forms of these attached accelerators as they currently exist in the market) suffer from a fundamental architectural problem — they are very distant, in terms of latency, from the main scalar system memory and are not part of the coherent memory domain. This in turn has major impacts on performance, cost, design of the GPUs, and the structure of the algorithms:

  • Performance — The latency for memory accesses generally dictated by PCIe latencies, which while much improved over previous generations, are a factor of 100 or more longer than latency from coherent cache or local scalar CPU memory. While clever design and programming, such as overlapping and buffering multiple transfers can hide the latency in a series of transfers, it is difficult to hide the latency for an initial block of data. Even AMD’s integrated APUs, in which the GPU elements are on a common die, do not share a common memory space, and explicit transfers are made in and out of the APU memory.
Read more