Juniper’s QFabric: The Dark Horse In The Datacenter Fabric Race?

It’s been a few years since I was a disciple and evangelized for HP ProCurve’s Adaptive EDGE Architecture(AEA). Plain and simple, before the 3Com acquisition, it was HP ProCurve’s networking vision: the architecture philosophy created by John McHugh(once HP ProCurve’s VP/GM, currently the CMO of Brocade), Brice Clark (HP ProCurve Director of Strategy), and Paul Congdon (CTO of HP Networking) during a late-night brainstorming session. The trio conceived that network intelligence was going to move from the traditional enterprise core to the edge and be controlled by centralized policies. Policies based on company strategy and values would come from a policy manager and would be connected by high speed and resilient interconnect much like a carrier backbone (see Figure 1). As soon as users connected to the network, the edge would control them and deliver a customized set of advanced applications and services based on user identity, device, operating system, business needs, location, time, and business policies. This architecture would allow Infrastructure and Operation professionals to create an automated and dynamic platform to address the agility needed by businesses to remain relevant and competitive.

As the HP white paper introducing the EDGE said, “Ultimately, the ProCurve EDGE Architecture will enable highly available meshed networks, a grid of functionally uniform switching devices, to scale out to virtually unlimited dimensions and performance thanks to the distributed decision making of control to the edge.” Sadly, after John McHugh’s departure, HP buried the strategy in lieu of their converged infrastracture slogan: Change.

Figure 1 Adaptive EDGE Architecture

Source: HP, “The ProCurve Networking Adaptive EDGE Architecture”

Fast forward 13 years to today, February 23: Juniper Networks releases a set of data center switches that enables their data center version of AEA, the Stratus Project(see Figure 2). The new top-of-rack (ToR) QFX3500 series switch, QFabricinterconnect, and controller deconstruct the traditional three-tiered network into a flatter network, a 1.5-tiered architecture. (This also assumes that you don’t count virtual switches, like VMware vSwitch, which I would consider to be a tier — but that’s another blog.) Picture Juniper’s Stratus Project as distributing the control and data plane to the edge, where servers connect with redundancy and optimization built into the system, and centralizing management functions; it’s a big switch: theoretically, take the line cards from EX8200, place them at the top of a rack (creating a pseudo fixed switch connecting to servers), tether the line cards back to the switch fabric (Part 2) for pure, fast, and resilient switching, and have all the line cards coordinate traffic information through a controller (Part 3). Juniper says this solution can interface with 6,000 servers. Parts 2 and 3 won’t be ready until later this year (see Figure 3).

Figure 2 “One tier network fabric tomorrow” — The Stratus Project

Source: Juniper Networks, “Network Fabrics for the Modern Data Center”

 

Figure 3 Juniper’s Big Switch Data Center Network Architecture

Source: Forrester Research

Even though Juniper still needs to deliver on parts 2 and 3, this product launch moves them up and puts Juniper Networks back in the Data Center Derby with Arista, Avaya, Brocade, and Cisco. Juniper’s vision overlaps layer 3 and 2 which keeps packets from running across fabric for something that can be done locally thereby eliminating waste. The design allows partitioning of network by workgroup too. QFabric’s strongest differentiation is the single management plane that makes all the components behave as one switch without introducing a single failure point. Juniper’s QFabric drives simplicity into the data center, a value long overdue. No one vendor is offering all aspects of the next generation data center network outlined in The Data Center Network Evolution: Five Reasons This Isn't Your Dad's Network, but they are getting close.

My bottom line take: Juniper’s announcement is a big deal and Infrastructure & Operations pros should pay attention. Why? It’s an interesting approach to the datacenter fabric, which is the last hurdle in unlocking cloud economics in your virtual datacenter. But make sure you consider competing approaches, and Brocade’s in particular — the current leader in this race. But regardless, 2011 is the year to build your fabric business case. Not doing so will risk setting your datacenter strategy back 5 years.

Juniper announced a strong horse, but this race is only in the third furlong — with plenty of excitement before the finish line. What’s your take? Is Juniper the dark horse that will win the fabric race?

Comments

The Rise of Ethernet Fabrics??

Andre,

First, I work for Brocade. Second, John Mchugh now works for Brocade. Third, we launched our VCS Technology platform with the first Ethernet fabric switches (you can buy), the VDX 6720 family, back in November 2010.

All of the networking companies with data center footprints see the need for new architectures. An Ethernet fabric seems to address the issues. To get some background on Ethernet fabrics, I provdied some links below.

That said, the devil is in the details. I was unable to see any standards identified by JNPR for building theirs, and in their videocast, they seemed to denigrate standards that are available for building an Ethernet fabric. That saddened me. Are you aware of any other details of how QFabric will, in fact, create an Ethernet fabric?

Feel free to visit the following if you would like an update on Ethernet fabrics and Brocade's VCS Technology to see how the horse we are riding:
www.ethernetfabric.com
www.brocade.com/vcs
www.brocade.com/vdx
community.brocade.com/community/forums/products_and_solutions/ethernet_fabric
community.brocade.com/community/brocadeblogs/vcs

reply to andre's post

Andre'
Well let's see, VDX can do very few switches in a "fabric" (I'm thinking maybe 4, correct me if i'm wrong) and doesn't bring in any of the edge equipment to be managed together, nor is there a shared communication architecture to push routing information down to the edge switch(s). Second VDX doesn't do fc/fcoe gateway functions. Third what has been announced, is exactly what VSS, Virtual-Chassis, and others have announced a few years ago. For Juniper that is the EX line's virtual-chassis, EX4200 and EX8200.

Thirdly, or fourth, losing count now. VDX is based on trill (well fspf not ISIS extensions) technology, which doesn't fix the layer3, security, & management functions in the Datacenter. It only fixes the layer2 multi-path problem, w/o spanning-tree, and that technology has nothing to do with true fabric which everyone else has already discussed and committed to including storage architecture. With Brocade being a storage company it's interesting this has not been announced yet, which leads everyone here to believe that it probably is not ready, or even in existence.

Fifthly, trill is not a standard that is fixing all those complex problems (meaning those above), it only fixes 1 part. With that being said, I will find it hard to believe that an open technology will ever come about anytime soon to bring layer2, layer3, security, and management under a single unbrella. Not to mention no manufacturer has committed to working on such a project, trill seems to be the only one open standard, but please let's not confuse a fabric with trill. They are not the same thing.

If it's trill you want, it can be added, but the bigger problem that Junpier wants to fix is the entire set of problems, not just a single sub issue.

To re-iterate those issues:
Management of entire data center fabric
pooling of resources, and savings of those resources
security transparency
and layer2/layer3 routing propogation throughout the entire fabric from a single pane of glass
performance & scale
a simplified architecture approach that distributes routing, configuration, and troubleshooting dynamically

One more specific tidbit, your trill implementation is pre-standard, and not based on ISIS extensions which makes it proprietary. And more then likely will require new hardware, thus blowing investment protection.

If you want to talk fabric in the data center you need the following features, and BrocadeOne has yet to deliver
-FC/FCOE gateway functions, or even native FC features
-scale beyond 600 ports in the datacenter fabric
-provide integrated layer3 within the fabric, no external routers required
-change your trill approach from FSPF to an ISIS extension which is where the standard is going.

If you want to know how this all works Pradeep explains it around minute 40 of the launch webcast.

http://juniper.stream57.com/juniper/default.aspx?cid=Juniper

Routers in diagram

You say that stratus does not need routers, yet it shows two MX routers and SRX firewalls (L3) in the diagram

Reply to Adam C

Hi Adam,

I appreciate the comments. You didn't mention your affiliation so folks would have some context for your comments. No worries.

Let me try and clarify some of the misconceptions and ignorance shown in your comments, not to attack you, but to try and be as factual as I can. Here goes.

ADAM->Well let's see, VDX can do very few switches in a "fabric" (I'm thinking maybe 4, correct me if i'm wrong)
BROOK-> Yeap, not close. On the first release we support 10 60 port switches. It goes to multiple hundred from there. We annouced a roadmap for chassic class products with high port counts So, not to worry, we can scale.

ADAM->and doesn't bring in any of the edge equipment to be managed together,
BROOK-> Nope, that's not correct. Since the VCS Ethernet Fabric provides Logical Chassis management, all switches in a fabric are managed as single "Logical Chassis".

ADAM->nor is there a shared communication architecture to push routing information down to the edge switch(s).
BROOK-> Nope, that's not exactly correct. It's true that with the initial release we focused on the layer 2 network which is where a lot of pain in the data center is today, particularly with x86 server virtualization and iSCSI and FCoE storage traffic. But layer 3 support will be forthcoming in a later firmware release. VCS Distributed Intelligence is the "common architecture" you mention and is available and running in customer installations today.

ADAM-> Second VDX doesn't do fc/fcoe gateway functions.
BROOK-> That's a fact with the current release. The roadmap includes a switch for this very purpose. As you note later about our heritage in storage, I think we know how to do that :-)
I do want to point out that if all a customer needs today is a ToR FCoE to FC gateway (the only storage feature in the QFX3500 BTW), we have had that for some time with the Brocade 8000 switch and have sold quite a few. But, it's good to see JNPR finally beginning to invest in storage networks, although, today, that investment is limited (rumor is the silicon is from a thrid party) with only NPIV and N_Port login. The 8000 provides our Access Gateway option as well as acting as a full FC switch, customer choice. When you look at the features we offer in Access Gateway, you may find a lot of functionality storage customers expect which seems lacking in the initial release of the QFX3500. Access Gateway has been available going on five years and in time the QFX3500 could provide similar FC gateway functionality, just not right away. That's to be expected when someone enters a new technology they are not familar with. No worries.

ADAM->Third what has been announced, is exactly what VSS, Virtual-Chassis, and others have announced a few years ago. For Juniper that is the EX line's virtual-chassis, EX4200 and EX8200.
BROOK-> Well, maybe if you believe that since Apples and Oranges are fruit then there is no difference between them. The links I provided in my earlier comment will help you appreciate the differences between Apples and Oranges (VCS Technology and QFabric).

ADAM->Thirdly, or fourth, losing count now
BROOK-> Yeap, old age will do that to you :-) BTW, you are about 1 for 4 so far, but who is keeping score :-)

ADAM->. VDX is based on trill (well fspf not ISIS extensions) technology,
BROOK-> Yes, VCS technology and the VDX 6720 use TRILL frames. See comments later on IS-IS link state routing support. To be clear, TRILL requires IS-IS extensions to work at layer 2. We chose to go with a proven and rock solid link state routing protocol in our initial release, FSPF. It lowers customer risk while the industry waits for the changes to IS-IS to be ratified, become available and become stable. The problems with layer 2 in the data center aren't going to wait that long for a solution.

ADAM->which doesn't fix the layer3, security, & management functions in the Datacenter.
BROOK->See my comment regarding firmware updates for L3 support. Yes, today, VCS is focused on the problems with Layer2, that's correct. But we didn't design it ignoring the issues with Layer-3. The more pressing problems our data center customer have today is with Layer-2, so that's where we focused our initial release. It would seem that the focus of the initial QFabric release is to provide a ToR 1/10 GbE switch (40 GbE is not in the initial release) that is kind of pricey, and offers pretty limited storage network connectivity with no qualification by any storage company. But, that may be too harsh an assessment on my part. I'm open to pursuasion.

ADAM->It only fixes the layer2 multi-path problem, w/o spanning-tree, and that technology has nothing to do with true fabric which everyone else has already discussed and committed to including storage architecture.
BROOK-> Ok, I admit I haven't gotten the memo about the "true fabric which everyone else has already discussed". What is that true fabric? I do have a blog posting on what you may be referring to at www.ethernetfabric.com. Please take a look at the post on Gartner's Fabric-based Taxonomy, and let me know if this is what you are referrring to.

ADAM->With Brocade being a storage company
BROOK-> Oh, no, not really. We not only do storage networks (which Juniper does not), but provide a pretty full portfolio of data center, campus/LAN and service provider solutions. In fact, our MLX-e Router is in service at CERN in direct support the most mission critical requirement of the largest physics project every conceived, the Large Hadron Collider. It collects the data from the experiments, and you don't want to loose any of that torrent of bits, do you? Pretty neat demonstration of the full scope of our network products, don't you think?

ADAM-> it's interesting this has not been announced yet, which leads everyone here to believe that it probably is not ready, or even in existence.
BROOK-> I'm not exactly certain what the "not been annouced yet" refers to, nor who "everyone here" is since you didn't let us know you affiliation, but I have already clarified the additiion of L3 support in a future firmware release, if that is what you are talking about.

ADAM->Fifthly, trill is not a standard that is fixing all those complex problems (meaning those above), it only fixes 1 part.
BROOK-> Couldn't agree more. But VCS does more that just TRILL for that reason. That said, for the layer 2 problems, TRILL is a very capable, open standard designed by Radi Perlman, who got the chance to upgrade her earlier contribution, Spanning Tree, for the 21st century. Multiple companies will provide it. Only one company is providing QFabric's interconnect technology. Juniper stated the details are not available today. Hmmm ... why not?

ADAM->With that being said, I will find it hard to believe that an open technology will ever come about anytime soon to bring layer2, layer3, security, and management under a single unbrella.
BROOK->Really. Does that mean that even today, with IEEE, IETF and ANSI standards (to name a few) no one builds an open network? Hmmm...

ADAM->Not to mention no manufacturer has committed to working on such a project, trill seems to be the only one open standard, but please let's not confuse a fabric with trill. They are not the same thing.
BROOK->True. But building an Ethernet fabric using open standards (as VCS technology does) makes sense to me and I believe provides great customer value. In the end, as is always the case, customers will make the call with their wallets. It's heartening to see Brocade report on the earning call last week how much customer demand (orders) for our VCS technology platform and VDX 6720 switches has materialized.

ADAM->If it's trill you want, it can be added, but the bigger problem that Junpier wants to fix is the entire set of problems, not just a single sub issue.
BROOK->We are agreed on that. If you wish, take a look at the Brocade One strategy www.brocade.com/brocadeone. It's not limited to TRILL, but again, building the future leveraging open standards as much as possible seems to be in the customer interest. That's at the heart of the Brocade One strategy.

ADAM->To re-iterate those issues:
Management of entire data center fabric
pooling of resources, and savings of those resources
security transparency
and layer2/layer3 routing propogation throughout the entire fabric from a single pane of glass
performance & scale
a simplified architecture approach that distributes routing, configuration, and troubleshooting dynamically
BROOK->These sound like the right ones. We are agreed.

ADAM->One more specific tidbit, your trill implementation is pre-standard, and not based on ISIS extensions which makes it proprietary. And more then likely will require new hardware, thus blowing investment protection.
BROOK-> Hmm, not what I understand at all. Yes, we are an innovator in networking and that requires taking early standards and making them available so they work. We chose a proven link state routing protocol for our initial release, FSPF, to reduce customer risk as I mentioned earlier. We will provide IS-IS support in a future release. Please appreciate TRILL only requires a link state routing protocol, and it did opt for IS-IS. Having done that, IS-IS had to be extended to work at layer-2. TRILL does not specify what those changes to IS-IS are. The point is, TRILL only requires "an already available link state routing protocol", which FSPF is. Also, link state routing is routing optimized to work at layer 2. So, here's a couple of questions.
- How many routing protocols does a router support?
- Do you buy a new router to run another protocol?
Hmmm ... see how that works?

ADAM->If you want to talk fabric in the data center you need the following features, and BrocadeOne has yet to deliver
BROOK->I don't think that's true, so, I;ll provide a point by point response.

ADAN-> -FC/FCOE gateway functions, or even native FC features
BROOK->Ah, how many FCF does the QFabric support again? And, what about FIP (not just snooping), fabric services, name services, zoning, trunking, Fibre Channel routing, FCIP, etc.? What's the QFabric roadmap for 16 Gbps FC support? Hmm ... Maybe those are important for storage networks? How long will it take a new comer to become expert in these?
As I noted earlier, we will support an FC gateway, but in all honesty, the ToR gateway, the Brocade 8000, provides a great solution for what most storage customers want today. BTW, how do you see storage vendors certifying QFX3500 for storage? I think that's going to be an very interesting to watch. I'll keep my eye out for press releases about that support, because without it, no one buys any FCoE product, do they?

ADAM -> -scale beyond 600 ports in the datacenter fabric
BROOK-> Indeed, not a problem as I mentioned earlier

ADAM -> -provide integrated layer3 within the fabric, no external routers required
BROOK-> Indeed, not a problem as I mentioned earlier

ADAM -> -change your trill approach from FSPF to an ISIS extension which is where the standard is going.
BROOK-> Indeed, not a problem as I mentioned earlier

BROOK->Adam, thanks for posting. I appreciate the opportunity to remove the ignorance as best I can. And please, visit the links I've provided for more information.

What?

I don't have a clue about what you guys are ranting on about. All I know is that I lost a bucket of money on Brocade, their visions, and their debt.

Old wine in new bottle :)

VDX

Hi, I believe that VDX is up to 12 switches in a fabric as of last week so up to 720 ports with the current models in a fabric. I think the chassis is scheduled for q3/4 this year which will take us up to 128 switches in a fabric followed by 1024 early next year if my memory serves correctly?

Cheers
Ady Short

What do Qfabric/VDX/fabricpath really mean to Business?

Hi Andre,
All of the momentum in the data center right now is all about streamlining processes for agility and reducing costs. QFabric and VDX require less cables, converged infrastructure and simplified management ... all about Agility and cost reduction.

One of the hot topics in the Data Center that each of these solutions address is the idea of how to apply network policy on a per/server and per/vm dynamically, so that the network vendors fit in the world of vmotion and stateless compute.

The question I have though is what network policy really should be applied on a per-vm basis. The bottom line is that businesses derive almost all business value from the applications that run on top of infrastructure. The network "Fabric" in use, really doesnt matter to the business, and most of the "killer" features that vary between these product lines really make no percievable difference when users are accessing applications. So the main differences between DC fabrics relate to agility (How quickly and easily infrastructure can be deployed) and cost (This includes simplified management).

Before virtualization, the process of provisioning application infrastructure took a significant amount of time for most organizations. Virtualization streamlined many aspects of the provisioning process, including the time and effort it took for server admins to provision network and storage. Virtualized storage and network adapters made it possible for storage and network teams to pre-provision resources, which gave server admins agility when deploying new servers. In other words, the network and storage teams gave the access layer (Hypervisor switch) in the DC to the server admins, which in turn made it much faster, easier and cheaper to roll out new applications. For years it was Cisco Cisco Cisco, every switch end to end has to be Cisco period ... then vmware came along, and stuck a cheap, software based switch in the access layer of the DC ... and this created business benefit of significance that Cisco (and other networking companies) have not provided in many years.

This business benefit was the result of streamlined process, the ability to provision storage and network resources by server admins in mere seconds versus the days/weeks/months it took in the past. Cisco (and other networking companies) realized that they were at significant risk, and have been highlighting several "problems" that have occured as a byproduct of this new-found agility, namely that corporate network policy was no longer being applied on a per-OS basis. Today most networking companies are still chasing this idea that a traditional, idealistic networking policy needs to be applied on a per-VM basis. In my opinion, their approach is all wrong.

The network policy that, in the opinion of the major network companies, needs to be applied to each OS (Physical server or VM) include VLAN, QOS, and Security. The vlans can easily be applied by server admins on the nic, hypervisor or other method. The view of intra-DC network security policy is based on the idea that if an attacker were to compromise a server in the DC, the network policy must ensure that the compromised server cannot be used to access other servers in the DC. So therefore policy must be applied on a per VM basis. And for QOS, network companies are promoting the idea that as virtual machine density increases, and as more automation is deployed to move stateless servers dynamically, there will be an increased need to apply QOS for more important applications as they will inevitably be sharing infrastructure with less important applications.

Both of these are valid areas for concern, but I would suggest that the traditional network view of resolving these challenges is no longer feasible. Granular security policies on a per application basis is simply not feasible for most organizations. Every time a new application is deployed, it is far too time consuming for network teams to be brought into the loop to try and determine which ports an application uses and what are the ip addresses of hosts that the app needs to communicate with, so they can write static, per application ACL's ... this is simply not feasible in a dynamic and agile environment. And the idea of providing QOS from a network perspective, with an ACL per VM to identify and remark application traffic flows is also not feasible. Again this requires to much time investment to discover application behavior, and then use static policies to enforce ... simply not feasible in a dynamic and agile environment.

So how do we solve these challenges? The first is that the idea of network/storage/server administration as seperate management domains needs to change in the DC. The network guys need to change their traditional viewpoints and start thinking of how they can best help the server admins deploy business applications quickly, and realistically. And IMHO this involves seeding some areas that have traditionally been in the realm of networking teams to the server teams. Applications need to be placed into categories, or tiers, most organizations are in the process of identifying application tiers today. Correspondingly, we can define network class of service tiers such that the server admins simply need to place an network interfaces in a VLAN that is associated with the application tier. Network policy could be associated with VLAN's so that network security and QOS would be applied as traffic flows between vlans. To provide more granular security within a vlan/application class, host based firewalls and IDS's make a lot more sense as they are much easier to deploy, dynamic and agile.

The approach here may not be the right fit, but my point is that innovation requires new processes to be developed, and that often does not mean streamlining legacy processes ... sometimes new approaches are required. IT infrastructure teams need to start working together to jointly determine the best way to streamline end-to-end processes without worrying about who controls what. If they do not, IAAS providers will, and organizations will increasingly turn towards IAAS providers who have streamlined these processes and can provide a more agile deployment at a lower cost.