12.3 Introduction to Storage System based Replicaion

In the storage system based replication, the storage system operating environment performs the replication process. Unlike Server based replication, server resources such as the CPU and memory, are not used in the replication process. Consequently, the server running multiple VMs is not burdened by the replication operations. Storage system-based replication supports both local and remote replication.

Storage System based Local Replication Techniques

Full Volume Replication (Cloning)

Pointer based Virtual Replication (Snapshot)

Storage System based Remote Replication Techniques

Synchronous Replication

Asynchronous Replication

Multi-site Replication

In storage system based local replication, the replication is performed within the storage system. In other words, the source and the target LUNs reside on the same storage system. Local replication enables one to perform operational recovery in the event of data loss and also provides the support for other business operations such as backup. The storage system-based local replication can be implemented as full volume replication (clone), and pointer-based virtual replication (snapshot).

Also Read: Data Replication Overview & Types of Replication

In storage system-based remote replication, the replication is performed between storage systems. Typically one of the storage systems is in source site and the other system is in remote site for DR purpose. Data can be transmitted from the source storage system to the target system over a shared or a dedicated network. Replication between storage systems may be performed in synchronous or asynchronous modes.

Storage System based Local Replication Techniques

Full Volume Replication (Cloning)

Full volume replication provides the ability to create fully populated point-in-time copies of LUNs within a storage system. When the replication session is started, an initial synchronization is performed between the source LUN and the replica (clone). Synchronization is the process of copying data from the source LUN to the clone. During synchronization process, the replica is not available for any server access. Once the synchronization is completed, the replica is exactly same as source LUN. The replica can be detached from the source LUN and it can be made available to another server for business operations. Subsequent synchronizations involve only a copy of any data that has changed on the source LUN since the previous synchronization.

Typically after detachment, changes made to both the source and replica can be tracked at some predefined granularity. This enables incremental resynchronization (source to target) or incremental restore (target to source). The clone must be the same size as the source LUN.

Pointer based Virtual Replication (Snapshot)

Pointer-based virtual replication (also referred as storage system-based snapshot) is a space optimal solution when compared to full volume replica. At the time of replication session activation, the target (snapshot) contains pointers to the location of the data on the source. The snapshot does not contain data at any time. Therefore, the snapshot is known as a virtual replica. Snapshot is immediately accessible after the replication session activation. This replication method either uses Copy on First Write (CoFW) or Redirect on Write (RoW) mechanism. Multiple snapshots can be created from the same source LUN for various business requirements.

Also Read: Software Defined Storage Overview

Data on the target is a combined view of unchanged data on the source and data on the save location. The unavailability of the source device invalidates the data on the target. The target contains only pointers to the data, and therefore, the physical capacity required for the target is a fraction of the source device. The capacity required for the save location depends on the amount of the expected data change.

Some pointer-based virtual replication implementation uses redirect on write technology (RoW). RoW redirects new writes destined for the source LUN to a reserved LUN in the storage pool. Such implementation is different from CoFW, where the writes to the source LUN are held until the original data is copied to the save location to preserve the point-in-time replica. There is always a need to perform a lookup to determine whether data is on the source LUN or save location, which causes snapshot reads to be slower than source LUN reads. In the case of a RoW snapshot, the original data remains where it is, and is therefore read from the original location on the source LUN.

Storage System based Remote Replication Techniques

Synchronous Remote replication

Storage-based remote replication solution can avoid downtime by enabling business operations at remote sites. Storage-based synchronous remote replication provides near zero RPO where the target is identical to the source at all times. In synchronous replication, writes must be committed to the source and the remote target prior to acknowledging “write complete” to the production server. Additional writes on the source cannot occur until each preceding write has been completed and acknowledged. This ensures that data is identical on the source and the target at all times. Further, writes are transmitted to the remote site exactly in the order in which they are received at the source. Therefore, write ordering is maintained and it ensures transactional consistency when the applications are restarted at the remote location. Most of the storage systems support consistency group, which allows all LUNs belonging to a given application, usually a database, to be treated as a single entity and managed as a whole. This helps to ensure that the remote images are consistent. As a result, the remote images are always restartable copies.

Asynchronous Remote Replication

It is important for an organization to replicate data across geographical locations in order to mitigate the risk involved during disaster. If the data is replicated (synchronously) between sites and the disaster strikes, then there would be a chance that both the sites may be impacted. This leads to data loss and service outage. Replicating data across sites which are 1000s of kilometers apart would help organization to face any disaster. If a disaster strikes at one of the regions then the data would still be available in another region and the service could move to the location. Asynchronous replication enables to replicate data across sites which are 1000s of kilometers apart.

Also Read: Software Defined Storage (SDS) Architecture

In asynchronous remote replication, a write from a production server is committed to the source and immediately acknowledged to the server. Asynchronous replication also mitigates the impact to the application’s response time because the writes are acknowledged immediately to the server. This enables to replicate data over distances of up to several thousand kilometers between the source site and the secondary site (remote locations). In asynchronous replication, the server writes are collected into buffer (delta set) at the source. This delta set is transferred to the remote site in regular intervals. Therefore, adequate buffer capacity should be provisioned to perform asynchronous replication. In asynchronous replication, RPO depends on the size of the buffer, the available network bandwidth, and the write workload to the source. This replication can take advantage of locality of reference (repeated writes to the same location). If the same location is written multiple times in the buffer prior to transmission to the remote site, only the final version of the data is transmitted. This feature conserves link bandwidth.

Multi-site Remote Replication

In a two-site synchronous replication, the source and target sites are usually within a short distance. Therefore, if a regional disaster occurs, both the source and the target sites might become unavailable. This can lead to extended RPO and RTO because the last known good copy of data would need to come from another source, such as an offsite tape. A regional disaster will not affect the target site in a two-site asynchronous replication because the sites are typically several hundred or several thousand kilometers apart. If the source site fails, production can be shifted to the target site, but there is no further remote protection of data until the failure is resolved.

Pic Credits - EMC

Multi-site replication mitigates the risks identified in two-site replication. In a multi-site replication, data from the source site is replicated to two or more remote sites. In this approach, data at the source is replicated to two different storage systems at two different sites. The source-to-bunker site (target 1) replication is synchronous with a near-zero RPO. The source-to-remote site (target 2) replication is asynchronous with an RPO in the order of minutes. The key benefit of this replication is the ability to failover to either of the two remote sites in the case of source-site failure, with disaster recovery (asynchronous) protection between the bunker and remote sites.

Disaster recovery protection is always available if any one-site failure occurs. During normal operations, all three sites are available and the production workload is at the source site. At any given instance, the data at the bunker and the source is identical. The data at the remote site is behind the data at the source and the bunker. The replication network links between the bunker and the remote sites will be in place but will not be in use. The difference in the data between the bunker and the remote sites is tracked, so that if a source site disaster occurs, operations can be resumed at the bunker or the remote sites with incremental resynchronization between these two sites.

Previous: Server or Host Based Replication Overview

Next: Network Based Replication Overview

Go To >> Index Page

What Others are Reading Now...