** this post has been slightly edited thanks to feedback from Sean McGee**
In previous posts I’ve outlined:
- How UCS server failover occurs from a network perspective: http://www.definethecloud.net/ucs-server-failover
- How Inter-fabric traffic is handled in End-Host mode: http://www.definethecloud.net/inter-fabric-traffic-in-ucs
- How inter-fabric traffic is handled in switch mode: http://www.definethecloud.net/inter-fabric-traffic-in-ucspart-ii
If you’re not familiar with UCS networking I suggest you start with those for background. This post is an update to those focused on UCS B-Series server to Fabric Interconnect communication using the new hardware options announced at Cisco Live 2011. First a recap of the new hardware:
The UCS 6248UP Fabric Interconnect
The 6248 is a 1RU model that provides 48 universal ports (1G/10G Ethernet or 1/2/4/8G FC.) This provides 20 additional ports over the 6120 in the same 1RU form factor. Additionally the 6248 is lower latency at 2.0us from 3.2us previously.
The UCS 2208XP I/O Module
The 2208 doubles the total uplink bandwidth per I/O module providing a total of 160Gbps total throughput per 8 blade chassis. It quadruples the number of internal 10G connections to the blades allowing for 80Gbps per half-width blade.
UCS 1280 VIC
The 1280 VIC provides 8x10GE ports total, 4x to each IOM for a total of 80Gbps per half-width slot (160 Gbs with 2x in a full-width blade.) It also double the VIF numbers of the previous VIC allowing for 256 (theoretical) vNICs or vHBAs. The new VIC also supports port-channeling to the UCS 2208 IOM and iSCSI boot.
The other addition that affects this conversation is the ability to port-channel the uplinks from the 2208 IOM which could not be done before (each link on a 2104 IOM operated independently.) All of the new hardware is backward compatible with all existing UCS hardware. For more detailed information on the hardware and software announcements visit Sean McGee’s blog where I stole these graphics: http://www.mseanmcgee.com/2011/07/ucs-2-0-cisco-stacks-the-deck-in-las-vegas/.
Let’s start by discussing the connectivity options from the Fabric Interconnects to the IOMs in the chassis focusing on all gen 2 hardware.
There are two modes of operation for the IOM: Discrete and Port-Channel. in both modes it is possible to configure 1, 2 , 4, or 8 uplinks from each IOM in either Discrete mode (non-bundled) or port-channel mode (bundled.)
UCS 2208 fabric Interconnect Failover
Discrete Mode:
In discrete mode a static pinning mechanism is used mapping each blade to a given port dependent on number of uplinks used. This means that each blade will have an assigned uplink on each IOM for inbound and outbound traffic. In this mode if a link failure occurs the blade will not ‘re-pin’ on the side of the failure but instead rely on NIC-Teaming/bonding or Fabric Failover for failover to the redundant IOM/Fabric. The pinning behavior is as follows with the exception of 1-Uplink (not-shown) in which all blades use the only available Port:
2 Uplinks
Blade |
Port 1 |
Port 2 |
Port 3 |
Port 4 |
Port 5 |
Port 6 |
Port 7 |
Port 8 |
1 |
||||||||
2 |
||||||||
3 |
||||||||
4 |
||||||||
5 |
||||||||
6 |
||||||||
7 |
||||||||
8 |
4 Uplinks
8 Uplinks
The same port-pinning will be used on both IOMs, therefore in a redundant configuration each blade will be uplinked via the same port on separate IOMs to redundant fabrics. The draw of discrete mode is that bandwidth is predictable in link failure scenarios. If a link fails on one IOM that server will fail to the other fabric rather than adding additional bandwidth draws on the active links for the failure side. In summary it forces NIC-teaming/bonding or Fabric Failover to handle failure events rather than network based load-balancing. The following diagram depicts the failover behavior for server three in an 8 uplink scenario.
Discrete Mode Failover
In the previous diagram port 3 on IOM A has failed. With the system in discrete mode NIC-teaming/bonding or Fabric Failover handles failover to the secondary path on IOM B (which is the same port (3) based on static-pinning.)
Port-Channel Mode:
In Port-Channel mode all available links are bonded and a port-channel hashing algorithm (TCP/UDP + Port VLAN, non-configurable) is used for load-balancing server traffic. In this mode all server links are still ‘pinned’ but they are pinned to the logical bundle rather than individual IOM uplinks. The following diagram depicts this mode.
Port-Channel Mode
In this scenario when a port fails on an IOM port-channel load-balancing algorithms handle failing the server traffic flow to another available port in the channel. This failover will typically be faster than NIC-teaming/bonding failover. This will decrease the potential throughput for all flows on the side with a failure, but will only effect performance if the links are saturated. The following diagram depicts this behavior.
In the diagram above Blade 3 was pinned to Port 1 on the A side. When port 1 failed port 4 was selected (depicted in green) while fabric B port 6 is still active leaving a potential of 20 Gbps.
Note: Actual used ports will vary dependent on port-channel load-balancing. These are used for example purposes only.
As you can see the port-channel mode enables additional redundancy and potential per-server bandwidth as it leaves two paths open. In high utilization situations where the links are fully saturated this will degrade throughput of all blades on the side experiencing the failure. This is not necessarily a bad thing (happens with all port-channel mechanisms), but it is a design consideration. Additionally port-channeling in all forms can only provide the bandwidth of a single link per flow (think of a flow as a conversation.) This means that each flow can only utilize 10Gbps max even though 8x10Gbps links are bundled. For example a single FTP transfer would max at 10Gbps bandwidth, while 8xFTP transfers could potentially use 80Gbps (10 per link) dependent on load-balancing.
Next lets discuss server to IOM connectivity (yes I use discuss to describe me monologuing in print, get over it, and yes I know monologuing isn’t a word) I’ll focus on the new UCS 1280 VIC because all other current cards maintain the same connectivity. the following diagram depicts the 1280 VIC connectivity.
The 1280 VIC utilizes 4x10Gbps links across the mid-pane per IOM to form two 40Gbps port-channels. This provides for 80Gbps total potential throughput per card. This means a half-width blade has a total potential of 80Gbps using this card and a full-width blade can receive 160Gbps (of course this is dependent upon design.) As with any port-channel, link-bonding, trunking or whatever you may call it, any flow (conversation) can only utilize the max of one physical link (or back plane trace) of bandwidth. This means every flow from any given UCS server has a max potential bandwidth of 10Gbps, but with 8 total uplinks 8 different flows could potentially utilize 80Gbps.
This becomes very important with things like NFS-based storage within hypervisors. Typically a virtualization hypervisor will handle storage connectivity for all VMs. This means that only one flow (conversation) will occur between host and storage. In these typical configurations only 10Gbps will be available for all VM NFS data traffic even though the host may have a potential 80Gbps bandwidth. Again this is not necessarily a concern, but a design consideration as most current/near-future hosts will never use more than 10Gbps of storage I/O.
Summary:
The new UCS hardware packs a major punch when it comes to bandwidth, port-density and failover options. That being said it’s important to understand the frame flow, port-usage and potential bandwidth in order to properly design solutions for maximum efficiency. As always comments, complaints and corrections are quite welcome!
Great write-up, just wanted to add one small detail: The VIC 1280 card uses PCIe x16 (gen 2), so the maximum bandwidth the card can handle is 64 Gbits instead of the full 80.
A very minor distinction, since 64 Gbits is *a lot* of bandwidth to a single system.
Tony, thanks for the comment and additional info. That’s an excellent point, the card is limited by the PCIe bus and wouldn’t be able to sustain 80Gbps, but that definitely won’t be a limitation most people every see 😉
Great info Joe. I like the way you describe the difference in discrete vs. channel mode. Also the graphical depiction of how the backplane ports line up in discrete mode was useful.
Good stuff, keep it coming…
Chris,
Thanks for reading and the feedback!
Joe
Joe,
The VIC1280 when combined with the 2208 provides port-channeled 20Gbit interfaces to the OS, so I would expect to get greater than 10Gb per flow when the OS is seeing one or more 20Gbit interfaces. Regardless whether the flows are broken up, if the OS doesn’t see the lanes behind the port-channel, it should be able to send more than a single ports worth of bandwidth.
Thanks,
Bryan
Hi
As you said on exhibit “UCS 2208 fabric Interconnect Failover” , two IOM can connect to one single FI ? .
I’m not sure on that.
Haahaahh. I’m not too bright today. Great post!
Asc, aniga ila ahan caqliga inuu ku kooban yahay Millan iyo walaalkeed oo qura waxan leeyahay waalidka ayaa ah tusaalaha ay ku daydaan caruurta. Somalida ilmahoda u direysa irida garaaca halkeey joogaan??????? jawaab u baahan telfonadii baaba la baray ilmahii hadii uu afrika yahay ha qabanina. Alaha dharjiya dadka aduunyada waa wareergo Soomaliyey nin shalay baahnaa baa maanta dhergan xasuusta mar walba Ale ayaa deeq badan. Wabilahi TawfiqU codee: 0 0
If we need to discover in 2-link discrete mode, having one full-width blade occupied in 5 and 6 slots. usually 5-slot comes under link-1, 6-slot comes under link-2, then in which will full-width blade gets discovered?