FAQ: ACExpress Backup and Restore Procedures
1. What is the purpose of the backup and restore procedures?
The purpose is to ensure that AC4 cloud environments and associated resources are properly backed up and can be restored in the event of a failure, minimizing downtime and ensuring data integrity.
2. What is included in the scope of the backup and restoration process?
The scope includes:
Complete AKS cluster (Namespace with PVC) backup.
Backup of selected resources (e.g., PVC Namespace).
Backup of AC4 cloud backend and log databases.
Restoration of selected resources.
Restoration of AC4 cloud backend and log databases.
3. What tool is used for backup and restoration?
The Velero tool is used to back up and restore Kubernetes objects. This tool supports cold disaster recovery, enabling object backups and restores as needed during failures.
4. What is the backup frequency and retention period for the different components?
Component | Backup Frequency | Retention Period |
---|---|---|
Database | Point in time | 7 Days |
Adeptia Connect Namespace | Every 8 hours | 14 Days |
Ingress Namespace | Every 8 hours | 14 Days |
Logs (EFK) Namespace | Every 8 hours | 14 Days |
Monitoring (Grafana) | Every 8 hours | 14 Days |
5. Where are the backups stored?
Backups are stored in a BlobStorage account with geo-redundancy between two regions:
Primary Location: West US
Secondary Location: East US
6. How are the backups named?
Backups follow the naming convention:daily<<customername>>prod_YYYYMMDDHHmmSS
This convention includes the environment (e.g., production or non-production), customer name, and timestamp.
7. What are the key recovery points and methods?
Recovery Use Case | Recovery Method | Restore Duration |
---|---|---|
Namespace is lost | Recover using Velero backup | 2-4 hours |
PVC (Persistent Volume Claim) Failed | Recover PVC using Velero backup | 2-4 hours |
Database/Server Failed | Restore database from backup | 2-4 hours |
Kubernetes Failure | Managed by Connectria/Azure | 1-2 days |
Logs/Monitoring Namespace Failure | Recover using Velero backup | 2-4 hours |
8. What is the retention policy for backups?
At a minimum:
Backups must be retained for 7 days.
A fully recoverable version must be stored in a secure off-site location (either a separate Azure region or an off-site storage facility).
9. How often should backup and recovery procedures be tested?
Recovery procedures must be tested annually to ensure that the backup processes and restoration methods are functioning as expected.
10. What are the critical components included in the backup?
Critical components included in the backup are:
AC4 Production and Non-Production namespaces.
PVC (Persistent Volume Claims) for both production and non-production.
AC4 Production and Non-Production databases.
Monitoring tools (Grafana).
EFK (Elasticsearch, Fluentd, Kibana) log view tools.
Ingress Controller.
11. What happens if a Kubernetes cluster fails?
In the event of a Kubernetes cluster failure, Connectria and/or Azure will manage the recovery process, which may take 1-2 days depending on the severity of the issue.
12. What is the process for restoring a lost namespace?
A lost namespace can be restored using the Velero backup. This process will restore all components and objects, including PVC, ConfigMaps, and certificates, typically within 2-4 hours.