Active-Active Data Center Design

Active-Active Data Center Design
Active-Active Data Center design – High Level Architectural Building Blocks

Defining an active-active data-center strategy is not an easy task when you talk to network, server & compute teams who usually do not collaborate when it comes to planning their infrastructure. Most importantly, active-active data Center design requires end-to-end technology stack working together cohesively. It usually needs an enterprise-level architecture drive to establish the idea. Moreover, it really means to provide availability and traffic load sharing of applications across DC’s with the following key use cases

  • Business Continuity
  • Mobility and load sharing
  • Consistent policy and fast provisioning capability across

As part of the active-active strategy, application load-sharing scenarios must be defined. Example, some applications may be active in DC site-1 with their failover instances on standby in DC site-2, while others may be active in DC site-2 with their failover instances on standby in DC site-1.

Active-Active Data-center Technical Requirement:

Below are the generic technical requirements to be considered when formulating the active-active datacenter design.

Active-Active Data center design – Technical Requirement Summary

In Addition to the above, the followings are the major building blocks and associated considerations to make during active-active data center design

Active-active Transport Technologies

Clearly, transport technologies are the interconnectors of the datacenters. Links and device-level redundancies are part of the transport domain which provides HA & resiliency across the site, This could include redundancy for multiplexers, GPONs, DCI network devices, dark fibers, and diversity POPs for surviving POP failure and 1+1 protection schemes for devices, card, and link, etc.

Furthermore, below list contains, the major considerations to make when designing a transport solution to interconnect data centers

  • Recover from various type of failure scenarios: Link failure, module failures, and node failure, etc
  • Link latency and  applications round trip requirements for the traffic between DC’s
  • Bandwidth requirements and associated scalability factors

Active-Active Network Services

Indeed, network services interconnect all the devices in the data centers by performing required traffic switching and routing functions. The network should facilitate the forwarding of application traffic & load sharing without any disruption. And also application mobility across the data-centers by providing the pervasive gateway, L2 extension, and ingress and egress path optimization. Furthermore, it is good to note, most of the major network vendor’s SDN solution currently provides integrated VxLAN overlay solution to achieve L2 extension, path optimization, and gateway mobility

Also, the following are the major considerations to make when designing active-active network services

  • Recover from various type of failure scenarios: Link, module and network device failure, etc
  • Pervasive Gateway across the infrastructure:  Gateway availability local to the DC and across the DC
  • Stretching L2 domain: Able to extend the L2 domain ( VLAN or VxLAN) between the DC’s
  • Consistent Policy:  Network policies are consistent across the on-premises and also to the various cloud infrastructure – these policies could include the naming, segmentation rules for integrating various L4/L7 services and hypervisor integration, etc.
  • Path Optimizations: Ingress and egress
  • Centralized Management:  Centralized provisioning of the network policies and management (e.g.: Inventory, troubleshooting, AAA capabilities, backup and restore, traffic flow analysis and capacity dashboards, etc.)

Active-Active L4-L7 Services

Undoubtedly, building active-active L4-L7 services across DC’s is always an expensive task as it requires placing security and ADC devices in both DC’s. Importantly global traffic managers, application policy controllers, load-balancers, and firewalls are the major solutions to consider in this space. furthermore, these will need to be deployed at a different tier for the protection of perimeter, extranet, WAN, core server farm, UAT segment, etc. Also to note, currently most of the leading L4-L7 services vendors offering clustering solutions of their products across the DC’s. Clustering allows its members to share the l4/l7 policies, traffic load, and at the same time providing seamless fail-over in case of issues.

It should be noted, major considerations related to L4-L7 services design are below

  • Recover from various types of failure scenarios: Link, module, and l4-l7 device failure, etc.
  • Consistent Policy:  L4-L7 policies are consistent across the on-premises infrastructure and also on to the various clouds – this could include the naming of the policies, L4-L7 rules for various traffic types, etc.
  • Centralized Management:  Centralized provisioning of the network policies and management (e.g.: Inventory, troubleshooting, AAA capabilities, backup and restore, traffic flow analysis, capacity dashboards, etc.)

Active-Active Storage Services 

Definitely, storage and related networking solutions are one of the main pillars of active-active data center design. Moreover, it means storages in both DC’s serving applications. similarly, the design should cater to the ability to accept read and write requests without any interruption. Therefore it is also important to have real-time data mirroring and seamless fail-over capability across DC’s.  Some of the major considerations related to storage design are below

  • Recover from various type of storage failure scenarios such as Single disk, storage array and storage controller failure & split-brain scenarios
  • Synchronous vs. asynchronous replication: With Synchronous replication data write to primary storage and replica simultaneously. Because of that, it consumes more bandwidth and furthermore typically requires using dedicated FC links
  • Storage high availability & redundancy: Storage replication factors & number of disks available for redundancy etc
  • Storage Network failure scenarios:  Link, module and network device failure, etc

Active-Active Server Virtualization

The server virtualization evolved over the years. Most importantly organizations are even moving to microservices and containers.  The main consideration here is to extended hypervisor/container clusters across the DC’s to achieve seamless virtual machine/ container instances movement and fail-over. The dominant players in this space are VMware Docker and Microsoft. And there are others well – such as KVM, Kubernetes( Container Management), etc

Below are some of the key considerations when it comes to server virtualization

  • Virtualization platform to form a cross-DC virtual host cluster
  • HA Function to protect the VM, create affinity rules to prefer local hosts in normal operational conditions.
  • Deploy the same service on VMs in two DCs so that when host machine unavailable, VMs in the other DC can take over the loads in real-time
  • The compute  node devices across the DC’s are provisioned with symmetric configuration with required resources for failover
  • Centralized management of computing resources and hypervisor’s

Active-Active Applications Deployment

The infrastructure is built for application to function. Furthermore, it is important to make sure the high availability of the applications across DCs. And it can do fail-over and can get location proximity access. The key is to have the Web, App and DB tiers available at both data-centers, and in case of the application fails in any of the DC it should allow fail-over and continuity

Follow are the some of the major considerations

  • Deploy the Web services on a virtual machine (VM) or a physical machine, with multiple servers forming independent clusters  per DC
  • Deploy the App services on a virtual machine (VM) or a physical machine. With multiple servers in the DC forming a cluster, or multiple cross-DC servers forming a cluster (Preferably different IP based access – If the application supports distributed deployment).
  • Deploy databases preferably on physical machines to form a cross-DC cluster (Active- standby or active-active). E.g. : Oracle RAC, DB2, SQL with Windows server failover cluster (WSFC)

Summary

The below diagram shows the summary of the active-active data center design components

Active-active data center design full stack network components

Active-active data-center design requires architecture components of the network, storage, l4-l7 services, compute, and virtualization and application components working together. Seamless availability and operation of the business applications in case of the infrastructure failure in any one of the data-center is a key factor. And when it comes to cost, operating active-active data centers are expensive as compared to disaster recovery, but only by about 20% while delivering 35% more capacity and enabling non-stop operations. This improves uptime, enhanced performance, and optimum asset utilization

For futher read, I would recommend following Cisco live presentation: https://www.ciscolive.com/c/dam/r/ciscolive/apjc/docs/2016/pdf/BRKDCT-2615.pdf

Finally, please don’t miss out to read Nutanix Solutions from an architectural perspective blog

25 Comments

Add a Comment

Your email address will not be published. Required fields are marked *