Home > Articles > Building True Business Continuance Solutions Over IP

Building True Business Continuance Solutions Over IP

June 21st, 2005

A well-known study from Gartner Group estimates that 43 percent of businesses who experience a major disaster fail within five years and 29 percent fail within the first two to four months. As a direct result of this, business continuance represents one of the top IT data center initiatives for company of all sizes, and especially for large enterprises.

Every business continuance plan is based on the creation of one or more secondary data centers, where storage and applications are replicated in such a way that if a disaster occurs, the equipment hosted in the secondary site will be able to take over and keep the business up and running.

Common sense suggests that the secondary data center should be located far enough away from the primary one to reduce the likelihood of all IT operations being affected by the same disaster. But longer distances imply higher latency, which is not well tolerated by some applications. In fact, this is particularly true for synchronous replication, while it becomes less relevant in the case of asynchronous replication. Therefore, companies, who need synchronous replication to guarantee that primary and remote data centers constantly have the same information, will have to deploy their secondary site within a typical radius of about 100km from the primary site and use synchronous replication between them.

At these distances it is possible to choose amongst many different transport technologies, including of course IP, but also CWDM (Coarse Wavelength Division Multiplexing) or DWDM (Dense WDM,), which can be more expensive, but provide very high performance and excellent reliability. Companies who do not necessarily require synchronous replication can locate their secondary data center further (more than 100km from the primary site) and will have to consider transport technologies that are capable of spanning across those distances, such as IP. The technology that leverages IP for the interconnection of remote sites is called FCIP (Fibre Channel over IP) and allows Fibre Channel fabrics to be transparently interconnected over any IP infrastructure.

Fibre Channel Over IP

FCIP is a protocol specification developed by the Internet Engineering Task Force (IETF) that allows a device to transparently tunnel Fibre Channel frames over an IP network. The operation of the FCIP protocol is very similar to any other tunneling mechanism. There are two edge devices that interface between the local Fibre Channel SAN in each of the data centers and the IP network. Each of these devices takes Fibre Channel frames from the SAN and encapsulates them within IP packets that can be transferred over an IP network in a reliable manner by using the TCP (Transmission Control Protocol) as the transport layer protocol. At the remote site, another FCIP device receives incoming FCIP traffic, strips off the additional headers and places the original Fibre Channel frames back on to the SAN.

This mode of operation represents both the strength and the weakness of the protocol as it makes FCIP completely transparent to the Fibre Channel SANs that are being interconnected; this implies that most, if not all of the management procedures that a SAN administrator is used to perform on a Fibre Channel SAN within a single data center are easily extended to the interconnected environments, but it also means that the two Fibre Channel SANs have effectively been “bridged,” thus creating one, large, geographically dispersed SAN.

There are two problems associated with bridging Fibre Channel SANs across distance. The first one is the fact that the stability of the extended SAN is now dependent on the stability of the link that connects the two environments. The second issue is related to having one unified SAN across the two (or more) data centers, which means that any fault on any of the two sites gets propagated to the other and may cause disruption to the entire environment.

The main purpose of building a secondary data center is to protect the data and the application that are hosted at the primary site, but the probability of having a service interruption increases as soon as any connection between the primary and secondary locations is established.

The solution to this paradox exists and it’s based on the combined use of FCIP and two advanced Fibre Channel features called Virtual SANs (VSANs) and InterVSAN Routing (IVR.)

The Role of VSANs

Very much like Ethernet, Fibre Channel is a layer-2 protocol without any hierarchical network domain concept that would serve to isolate and localise control protocols and messaging within a given region of the network. Instead, like Ethernet, Fibre Channel maintains a set of control protocols that are fabric-wide in scope such as zoning or state change notifications. Local control protocol events can potentially result in disruptions that span the full extent of the fabric. Obviously this is as true for SANs that are fully confined within a data center as it is for extended SANs, built using any kind of transport. In the case of an extended SAN, the consequence of a disruption on either side of the link equally affects both sides.

In Ethernet the problem of segmenting large physical domains into multiple logical infrastructures is solved by VLANs (Virtual LANs,) whose key attribute is the fact any disruptive fault on one VLAN does not affect any of the others and this is achieved by having a separate control plane per each VLAN. In Fibre Channel the equivalent of VLANs is represented by VSANs. Now part of the ANSI T.11 standard, VSANs behave in a very similar way to VLANs and provide exactly the same benefits in terms of security, scalability and fault isolation.

When multiple VSANs need to be carried over one ISL (Inter Switch Link) each frame is tagged with explicit VSAN membership information in such a way that the receiving switch can take the appropriate forwarding decision also considering the VSAN tag. Of course this VSAN tag is never exposed to end devices, such as HBAs (Host Bus Adapters) or storage array interfaces. A switch-to-switch link that supports VSAN tagging is called EISL (Enhanced ISL.) It goes without saying that ISLs and EISLs can also be extended over long distance, possibly using FCIP.

InterVSAN Routing

VSANs alone do not yet solve the problem of interconnecting two data centers while keeping them isolated from a control plane perspective. VSANs make it possible to isolate two or more data centers by using different sets of VSANs, but this also inhibits data traffic from flowing between the sites. In the Ethernet world, the problem of enabling nodes belonging to different VLANs to communicate to each other is solved by IP. By leveraging a hierarchical addressing scheme and a set of routing techniques, IP, a layer-3 protocol, lets data traffic cross the boundaries of VLANs without merging their control planes.

Unfortunately, there is no IP in SANs and there is not even any kind of layer-3 protocol. The common upper layer protocol for storage networking is SCSI (Small Computer Systems Interface,), which was designed on the basis of completely different assumptions than IP. When SCSI was first conceived, nobody could have ever imagined that the distances span by the protocol could have been as long as those of a FCIP link. SCSI was originally meant to be a bus protocol to connect peripherals, including storage devices to a computer. As such, SCSI architects never thought about including a proper layer-3 protocol, but limited themselves to design a local I/O (Input/Output) protocol.

Since nobody would even dream about changing SCSI today, the solution to SAN internetworking has been built within the fabric and goes under the name of InterVSAN Routing (IVR.) Using IVR, a set of policies can be configured on fabric devices to selectively allow nodes belonging to different VSANs to talk to each other. In this way IVR achieves what IP does in the IP/Ethernet world.

By combining VSANs with IVR, it becomes now possible to join together primary and secondary datacenters and let relevant traffic flow between the two sites, but still preserve the control plane isolation needed to guarantee that any fault on any of the two sites or along the connection link would not adversely affect the operation of the entire IT infrastructure.

Conclusions

Building a reliable, secure and scalable business continuance solution over IP is possible by astutely combining FCIP, VSANs and IVR. This solution is architecturally very powerful as there is no restriction on the transport technology that can be used for the long distance connection. IVR strictly relies on basic Fibre Channel services and therefore can leverage any transport option available to Fibre Channel such as Fibre Channel itself, CWDM or DWDM, SDH/SONET and of course IP.

Articles

Comments are closed.