Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The following sections explain the Disaster Recovery model in detail.

Table of Contents
maxLevel3

Overview

Having a robust Disaster Recovery (DR) plan in place helps you recover from a failure of the production environment and continue using the application. Adeptia recommends you that you have the DR environment ready as a failover if anything unpredictable happens to your application, storage, or the database.

The illustration given below depicts the DR model recommended by Adeptia.

Image RemovedImage Added

This can be achieved by having a two zone deployment of the application that share the same database for Read/Write purpose. In an ideal condition, all the requests are handled and processed in the production environment. The application in DR environment does not perform any kind of processing, nor does it interact with the database.

...

  • Region 1 and Region 2 are the physical locations of the Kubernetes Cluster where the production and DR environment are set up respectively.

  • Production environment governs the Adeptia Connect application that is in action and executes all the requests. It also has a storage and two databases (Primary Database and Sync Database).

  • DR environment has another copy of the Adeptia Connect application and a storage. The application in the DR environment remains inactive until the application in the production environment goes down and there is a need to switch to the DR environment. This environment is connected to the same database being used in the production environment. However, the application deployed here neither interacts nor stores any data in the database until it gets active and starts processing the requests in the face of a disaster. 

  • Database is shared by both the environments. This model has a Primary Database synced to a Sync Database. When the Primary Database fails, the Sync Database takes over.

  • Storage in production environment is synced to the storage in the DR environment that comes into picture when the application in the DR environment is in use.

Deploying the application in production and DR environments

The DR model encapsulates the environments – Production and DR – to be set up in Kubernetes Clusters located in two different zones in the cloud.

...

While deploying the application in DR environment, you need to take the following prerequisites into account in addition to the standard ones prescribed for the regular deployment

  • In the values.yaml file, set the value for the environment variable EXECUTE_STATIC_JOB to false so that it doesn't rewrite the static content in the DR storage.In the listener section of the global values.yaml file, set the value for the environment variable replicaCount to 0 (zero) to avoid processing of data in the DR environment. 

  • In the global values.yaml file, set the value for the environment variable QUARTZ_STATUS_ONBOOTUP to pause. This pauses the Scheduler. 
  • Ensure that the application version is same as that in the production environment.
Tip
titleUpgrading the environments

The prerequisites for upgrading the production and DR environments remain the same as that for the deployment of the application discussed in this section.

Anchor
prodToDev
prodToDev
Switching from PROD to DR Environment

Pre-requisites:

You need to You may want to switch from production to DR environment based on your business requirement. To achieve this, follow the steps given below.

Warning
titleImportant!

Before you follow the steps to switch from production to DR environment, install azcopy CLI on

...

the client

...

you use for accessing the Kubernetes cluster.

Steps to Steps to switch from PROD to DR environment

  1. Scale down the Runtime and Listener deployments in the prod PROD environment.
    To scale down the Runtime deployments, use the following format:

    Code Block
    languagecss
    themeMidnight
    kubectl scale --replicas=0 deployment <Release Name>-ac-runtime -n <namespace>

...

  1. To scale down the Listener deployments, use the following format:

    Code Block
    languagecss
    themeMidnight
    kubectl scale --replicas=0 deployment <Release Name>-ac-listener -n <namespace>

...


  1. Pause Scheduler in the PROD environment.

    Code Block
    languagecss
    themeMidnight
    https://<webapp_Gateway_URL>/event/schedulerservice?_dc=1670999141899&query=Pause&page=1&start=0&limit=25

...


  1. Scale down Rabbit MQ statefulset in the PROD environment.

    Code Block
    languagecss
    themeMidnight
    kubectl scale --replicas=0 statefulset <Release Name>-ac-rabbitmq -n <namespace>

...


  1. Scale down Rabbit MQ statefulset in the DR environment.

    Code Block
    languagecss
    themeMidnight
    kubectl scale --replicas=0 statefulset <Release Name>-ac-rabbitmq -n <namespace>

...


  1. Initiate

...

  1. failover for the storage account using

...

  1. the following steps:

...

  1. Info

    >> Initiating failover is required only when the Azure File Share is not accessible due to any failure.

    >> As per Microsoft documentation, the time it takes to failover after initiation can vary though typically less than one hour.

    1. Login into your Azure Portal.

...

    1. Go to the Storage Account being used in the PVC of Prod environment.

...

    1. Within that storage account go to Data Management > Redundancy section.

...

    1. Click Prepare for failover and follow the instruction given on the screen. The failover process starts.

...

    1. Once the Account Failover process completes the data from secondary region will be accessible from the existing

...

    1. endpoint of the storage account

...

Note:

...

    1. .

...

...

As per Microsoft guideline, the time it takes to failover after initiation can vary though typically less than one hour.

...

  1. Copy the web/repository folder from PROD to DR environment.

    Code Block
    languagecss
    themeMidnight
    ./azcopy sync "https://[account].file.core.windows.net/[PVC]/web/repository/?[SAS]" "https://[account].file.core.windows.net/[PVC]/web/repository/?[SAS]" --recursive=true

...

  1. For example:
    ./azcopy

...

  1. sync

...

  1. "https://fxxxxxxxxxxxxxxxxxxxxx1b6a.file.core.windows.net/pvc-e38eca50-xxxx-xxxx-xxxx-828efd6d49df/web/repository/?sv=2021-06-08&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-03-06T01:18:24Z&st=2023-03-05T17:18:24Z&spr=https,http&sig=VxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxaXBmSAKm4%3D"

...

  1. "https://fxxxxxxxxxxxxxxxxxxx3aeb.file.core.windows.net/pvc-1a6f1c1b-xxxx-xxxx-xxxx-44ff485d707f/web/repository/?sv=2021-06-08&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-03-06T01:25:19Z&st=2023-03-05T17:25:19Z&spr=https,http&sig=vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxGazWA%3D"

...

  1. --recursive=true

Note : To know how to get the SAS, refer the Getting Shared Access Signature section.

...

  1. Copy the recovery folder from PROD to DR environment.

    Code Block
    languagecss
    themeMidnight
    ./azcopy sync "https://[account].file.core.windows.net/[PVC]/recovery/?[SAS]" "https://[account].file.core.windows.net/[PVC]/recovery/?[SAS]" --recursive=true

...

  1. For

...

  1. example:

...


  1. ./azcopy

...

  1. sync

...

  1. "https://fxxxxxxxxxxxxxxxxxxxxx1b6a.file.core.windows.net/pvc-e38eca50-xxxx-xxxx-xxxx-828efd6d49df/recovery/?sv=2021-06-08&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-03-06T01:18:24Z&st=2023-03-05T17:18:24Z&spr=https,http&sig=VxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxaXBmSAKm4%3D"

...

  1. "https://fxxxxxxxxxxxxxxxxxxx3aeb.file.core.windows.net/pvc-1a6f1c1b-xxxx-xxxx-xxxx-44ff485d707f/recovery/?sv=2021-06-08&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-03-06T01:25:19Z&st=2023-03-05T17:25:19Z&spr=https,http&sig=vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxGazWA%3D"

...

  1. --recursive=true

...

  1. Copy the reprocessing folder from

...

  1. PROD to DR environment.

    Code Block
    languagecss
    themeMidnight
    ./azcopy sync "https://[account].file.core.windows.net/[PVC]/reprocessing/?[SAS]" "https://[account].file.core.windows.net/[PVC]/reprocessing/?[SAS]" --recursive=true

...

  1. For

...

  1. example:

...


  1. ./azcopy

...

  1. sync

...

  1. "https://fxxxxxxxxxxxxxxxxxxxxx1b6a.file.core.windows.net/pvc-e38eca50-xxxx-xxxx-xxxx-828efd6d49df/web/reprocessing/?sv=2021-06-08&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-03-06T01:18:24Z&st=2023-03-05T17:18:24Z&spr=https,http&sig=VxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxaXBmSAKm4%3D"

...

  1. "https://fxxxxxxxxxxxxxxxxxxx3aeb.file.core.windows.net/pvc-1a6f1c1b-xxxx-xxxx-xxxx-44ff485d707f/reprocessing/?sv=2021-06-08&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-03-06T01:25:19Z&st=2023-03-05T17:25:19Z&spr=https,http&sig=vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxGazWA%3D"

...

  1. --recursive=true

...

  1. Copy the Rabbitmq PVC content from PROD to DR environment.

    Code Block
    languagecss
    themeMidnight
    ./azcopy sync "https://[account].file.core.windows.net/[PVC]/?[SAS]" "https://[account].file.core.windows.net/[PVC]/?[SAS]" --recursive=true --mirror-mode=true

...

  1. For

...

  1. example:

...


  1. ./azcopy

...

  1. sync

...

  1. "https://fxxxxxxxxxxxxxxxxxxxxx1b6a.file.core.windows.net/pvc-ef345e9d-xxxx-xxxx-xxxx-54e38de9cbd1/?sv=2021-06-08&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-03-06T01:18:24Z&st=2023-03-05T17:18:24Z&spr=https,http&sig=VxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxaXBmSAKm4%3D"

...

  1. "https://fxxxxxxxxxxxxxxxxxxx3aeb.file.core.windows.net/pvc-4a47fa22-xxxx-xxxx-xxxx-f6538dfff1a7/?sv=2021-06-08&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-03-06T01:25:19Z&st=2023-03-05T17:25:19Z&spr=https,http&sig=vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxGazWA%3D"

...

  1. --recursive=true

...

  1. --mirror-mode=true

...

  1. Scale up RabbitMQ statefulset in the DR environment.

    Code Block
    languagecss
    themeMidnight
    kubectl scale --replicas=1 statefulset <Release Name>-ac-rabbitmq -n <namespace>

...


  1. Scale up Listener in the DR environment.

    Code Block
    languagecss
    themeMidnight
    kubectl scale --replicas=1 deployment <Release Name>-ac-runtime -n <namespace>

...


  1. Resume Scheduler in the DR environment.
    https://<webapp_Gateway_URL>/event/schedulerservice?_dc=1670999141899&query=Resume&page=1&start=0&limit=25

...

  1. Update DNS record to route all

...

  1. the requests to the DR environment.

Switching back from DR to PROD Environment

  1. Scale down Runtime and Listener deployments in the DR environment.

...

  1. Pause Scheduler in the DR environment.

...

  1. Scale down Rabbit MQ statefulset in the DR environment.

...

  1. Scale down Rabbit MQ statefulset in the PROD environment.

...

  1. Copy web/repository folder from DR to PROD environment.

...

  1. Copy recovery folder from DR to PROD environment.

...

  1. Copy reprocessing folder from DR to PROD environment.

...

  1. Copy Rabbitmq PVC content from DR to PROD environment.

...

  1. Scale up RabbitMQ statefulset in the PROD environment.

...

  1. Scale up Listener in the PROD environment.

...

  1. Resume Scheduler in the PROD environment.

...

  1. Update DNS record to route all

...

  1. the requests to the PROD environment.

Using DR to recover from a failure

Adeptia's DR model covers the following two three scenarios and the action steps you need to perform to address the failure and recover the infrastructure, application, and the data with a quick turnaround time.

When the production application goes down

Perform Follow the following steps to switch to the DR environment:

...

given

...

in

...

Click Scheduler in the left pane. 

...

You can also resume the Scheduler by using an API. The format for the API is given below.

Code Block
languagecss
themeMidnight
<Gateway_URL>/event/schedulerservice?_dc=1670999141899&query=Resume&page=1&start=0&limit=25

Run the command in the following format to scale up the listener pod from 0 to 1.

Code Block
languagecss
themeMidnight
kubectl scale --replicas=1 deployment <name of the listener deployment> -n <namespace>

...

Update the DNS records to point to the DR environment.

...

titleSwitching back to the production environment

Once the production environment is restored, you need to do the followings to switch back to it from the DR environment:

...

the Switching from PROD to DR environment section.

When the Primary Database fails

Perform the following step to switch to the Sync Database:

...

Tip
titleSwitching back to the Primary Database

Once the Primary Database is restored, you need to do the followings to switch back to it from the Sync Database:

  1. Ensure that the Primary Database is connected and performing Read/Write operations.
  2. Replicate the data from the Sync Database to the Primary Database.
  3. Update the DNS records to point to the Primary Database.

When the Storage is not accessible

Follow the steps below to point to secondary storage location:

  1. Log in to your Azure Portal.

  2. Go to the Storage Account being used.

  3. Initiate “Account Failover”. The failover process starts.

  4. Once the Account Failover process completes the data from secondary region will be accessible from the existing endpoint of the storage account.

Appendix: Getting the Shared Access Signature (SAS) of your Azure Storage Account

  1. Log in to your Azure portal.

  2. Go to the Storage Account.

  3. In the Storage Account, go to Security + Networking > Shared Access Signature.

    Image Added
  4. Grant the required permissions as show below:
    Image Added

  5. Update the expiry date/time as per your need.

  6. In the Allowed protocols field, select HTTPS and HTTP.

  7. Click Generate SAS and connection string:
    Image Added

  8. Use the File service SAS URL.