SAN Storage protocols - FC vs FCoE vs iSCSI vs NFS vs CIFS

A common question when provisioning storage is "which presentation protocol do I use". This article isn't really designed to deep-dive into each protocol, but rather provide an architectural overview of each delivery method to assist with designing a new storage implementation.

NAS vs SAN

This is the simplest place to start, as you are determining whether you need file based access or block based access.

NAS - CIFS and NFS

If you need the storage to be responsible for file services (formatting the filesystem, file-level security access etc) then CIFS or NFS will be your protocol of choice.

Snapshot management is also a consideration here - NAS presentation makes it far easier to access snapshot versions of files, as you can often simply browse the exported/shared NAS path for the snapshot without requiring any complex cloning/re-signaturing of LUNs etc.

On the other hand, NAS uses far more storage controller resources (memory/CPU) than SAN. When a storage controller is serving file data it needs to run the NFS or CIFS subsystem and handle authentication, directory listings, file locking and a raft of other tasks. I've seen NetApp FAS3000 series systems struggling to serve even 10TB of CIFS data due to CPU contention, whereas the same controller would be able to comfortably handle 100TB of SAN (block SCSI) data delivery without a problem.

CIFS

Typical use cases include

  • Corporate File Server
  • Distributed applications requiring centralised file-level access using domain authentication
  • Hyper-V 2012 and SQL Server 2012 (SMB3 Only)

CIFS will almost always operate within a Microsoft Windows environment (although I've occasionally seen some pretty odd application requirements for CIFS in UNIX)

CIFS Protocol selection (SMB2, SMB2.1, SMB3 etc) will entirely depend on the application/user environment. Generally SMB2 is widely accepted as suitable for client access for file servers, whilst SMB3 is reserved to applicaitons/platforms (such as Hyper-V and SQL Server 2012) which have been specifically architected with SMB3 in mind.

NFS

NFS V3 overheads

Typical Use Cases include

  • Distributed UNIX based applications requiring centralised file storage
  • VMWare Datastores
  • User home directories for UNIX operating environments

NFS v3 is the simplest and most common implementation, however it lacks authentication services, check your requirements before going down this path. If you can guarantee network segregation (e.g. a dedicated VLAN for NFS Hypervisor storage) and can guarantee that the hosts accessing the storage are secure (e.g. ESX hosts and vCentre Server) then this should be suitable.

NFS v4 includes authentication features, however this relies on external services so takes a bit more for configuration.

pNFS includes multipathing capabilities, this version not widely implemented yet (but coming soon).

SAN - FC, iSCSI and FCoE

Typical use cases for Block Storage include

  • Older Microsoft clustering technologies (e.g. Hyper-V 2008, SQL Server 2008, etc)
  • Raw SCSI IO access required by databases / applications
  • VMFS Datastores
  • SAN Storage on individual Windows servers or VMs

As mentioned above, block level presentation requires less resources from the storage controller, so this may be favourable in higher IO environments or when expecting to scale to larger storage capacities.

The three presentation methodologies - FC, iSCSI and FCoE all deliver precisely the same block SCSI storage, but differ in the infrastructure required for data delivery. You can take a LUN which is currently presented over FC, disconnect it, re-present the same LUN over iSCSI to the same server, and data access can be seamlessly resumed (once re-mounted on the server).

SCSI - Lossless data bus

The original specification for SCSI storage required guaranteed delivery of SCSI commands along a bus. Concepts common in modern day IP networking such as routing, acknoledgements, retransmits and multipathing aren't part of the SCSI protocol.

As technology started to evolve and storage became shared between more than two computers (initiators), Fibre Channel networks rapidly became common in data centre environments. At this stage, Ethernet was still delivered via 10/100 Megabit hubs and switches, with latency problems preventing Ethernet from being considered for SCSI storage traffic.

Fibre Channel switches presented a very compelling solution to the desire to reliably share SCSI storage amongst large numbers of computers. Firstly, Fibre Channel was a physically separate networking fabric to Ethernet, therefore existing Ethernet traffic would not pose any risk to the reliability of Fibre Channel traffic. Secondly, Fibre Channel network speeds have until recently been much higher than Ethernet

  • 1Gbps & 2Gbps Fibre Channel vs 100Mbps Ethernet
  • 4Gbps and 8Gbps Fibre Channel vs 1Gbps Ethernet
  • 16Gbps Fibre Channel vs 10Gbps Ethernet

More recently, advances in 10Gbps Ethernet / 40Gbps Ethernet, LACP, QOS, and many other features of Ethernet networking have given rise to converged networking (iSCSI, FCoe plus normal Ethernet traffic), negating the need to invest in physically separate Fibre Channel infrastructure.

Today there are still use cases for Fibre Channel storage presentation, although it largely seems to be driven by protection of existing investments in Fibre Channel networks. A lot of environments now run two sets of storage presentation - Legacy servers are still connected using Fibre Channel (and new LUNs are presented to these systems using Fibre Channel), whilst new systems may use iSCSI or FCoE.

Of course, Tape Libraries are almost always connected via Fibre Channel, so this technology isn't going away just yet. (And please, lets not ignite the flame wars about whether tape is dead....)

Fibre Channel

If you already have Fibre Channel switches and your servers have Fibre Channel HBAs (Host Bus Adapters) and IP storage is not appropriate/feasible, then Fibre Channel is a good option.

Lossless Networking is achieved by a Flow Control mechanism called Buffer to Buffer Credits. This involves the communication of how many frames a particular port can receive before its buffers are full, then further notifications once buffers are emptied and more traffic can be received. Buffer to Buffer credit information is communicated end to end, all the way through the fabric (i.e. from HBA, through one or many switches, all the way to the storage and back again), so that an initiator or target will not transmit data if it cannot get through to the other end. In this way, SCSI commands encapsulated into FC frames are guaranteed to reach the destination (except in the case of a bus failure etc, in which case other recovery mechanisms are employed to re-transmit the lost data).

Brocade has an excellent whitepaper (focussed on FICON) which outlines Buffer to Buffer credit theory
http://www.brocade.com/downloads/documents/white_papers/Buffer_to_Buffer_Credits_and_Effect_on_FICON_Performance_WP_00.pdf

Fibre Channel switching has matured extremely well and a well tuned switching fabric is capable of extremely low latency and high performance IO.

Fibre Channel over Ethernet (FCoE)

FCoE requires a specialised set of infrastructure and configuration parameters to support lossless networking over Ethernet switches. Although native Fibre Channel switches are speficially designed to guarantee frame delivery using flow control, Ethernet switching does not employ the same flow control mechanisms and instead relies on upper-level protocols (e.g. TCP) for retransmission of data which has been dropped.

Either the fibre Channel switch needs to also have an Ethernet port on it (and encapsulate the traffic into an FCoE frame) or the Ethernet switch needs to have a Fibre Channel port on it (then the Ethernet switch encapsulates the FC frame within an FCoE Frame). The FCoE data is sent down its own VLAN.

The FCoE capable switches then provide flow control, similar to that found on FC switches, over a special channel/lane on the switch fabric, to facilitate lossless networking. Not all Ethernet switches are capable of FCoE.

Priority Flow Control (802.1Qbb) specifies the data channels used for segregating FCoE and Ethernet traffic

Bandwidth Flow Control (802.3bd) ensures that different link speeds (e.g. 10Gbps Ethernet and 4Gbps Fibre Channel) can operate without saturating the slower link.

Once the data has reached the Server end, converged networking adapters (CNA) are typically used to facilitate both FCoE and Ethernet traffic (although it is also just as valid to use one FCoE adapter and one CNA adapter).
In the CNA adapter, the traffic is separated based on VLAN (i.e. the FCoE VLAN is sent to the FCoE interface) and the FC data stripped from the FCOE Frame. This is then sent to the Fibre Channel controller and ordinary FC operations continue through to the OS.

FCoE provides a lower latency data transfer compared to iSCSI as FCoE does not rely on TCP retransmits.

Jumbo Frames are a prerequisite for FCoE (as a standard FC frame is 2148 Bytes)

FCoE cannot be routed.

iSCSI

This is perhaps the most versatile method for delivering block storage as it does not rely on Fibre Channel or FCoE capable switching hardware - it will work over standard Ethernet.

iSCSI relies on TCP for guaranteed delivery of data.

On a side note: TCP/IP does not work effectively without dropped frames -
Every dropped frame results in the TCP window size becoming halved. This ensures the window size matches the capabilities of the network. Therefore, without dropped frames, the network will be unable to adjust the window size accordingly and you will exceed the capabilities of the network.

In-guest block level storage access is one of the more common use cases for iSCSI, as software initiators can be used to provide this over existing network infrastructure (FCoE and Fibre Channel relies on physical HBAs).

Jumbo Frames should be used if possible, and try to avoid routing (but it will work with increased latency).

Summary

Below are the main considerations when choosing the delivery protocol

  • File or Block (largely based on application requirements)
  • What level of security is required?
  • CIFS considerations
  • Expected IO load (large CIFS environments may overburden storage controller CPU or Memory resources)
  • Block SCSI
  • Available networking infrastructure
  • Do you have Existing Fibre Channel Switches & HBAs?
  • Do your Fibre Channel and/or Ethernet switches support FCoE?

There is no "one solution fits all" approach, use the protocol which best suits the application requirements.