OpenStack Backup and Recovery

You don’t think much about Backup and Recovery when it comes to cloud. After all you build your applications on ephemeral storage and compute. These resources are not expected to persist across reboots and power cycles and your application is built against these failures. You create your workloads on the fly from the persistent store such as object store, perform your computation and save the results back to persistent store and the life is beautiful.

Unfortunately, only small subsets of applications are written ground up that fits the cloud paradigm and you need an army of smart people to build your applications for cloud. However the cloud paradigm is here to stay. Its elasticity, scalability and self-service aspects are appealing to many IT managers and are actively looking to host their traditional IT applications on open source cloud platforms such as OpenStack. These applications need to be persistent across failures and backup and recovery is an important strategy for their business continuance.

Backup and Recovery hasn’t gotten much attention until recently. Just as with any service for the cloud, backup and recovery service must enable tenants to define data protection policies for their workloads. Likewise IT managers are looking for a scalable solution that can grow with their cloud. However traditional solutions are built to manage 10s, perhaps 100s of applications. These solutions are centrally administrated and the backup administrator usually has intimate knowledge about the workloads he/she managing. Such solutions are not a natural fit for cloud and hence there is a need to build a solution from ground up that shares the same attributes as your cloud.

OpenStack has been gaining popularity as cloud of choice for IT managers who like to build their own on-prem cloud. It does support few API in Nova and Cinder to backup VMs and storage to Swift but they are short of providing a comprehensive backup and recovery for OpenStack cloud.


Consider a simple workload as shown in the above picture. In order to perform a regular backup of this workload using existing OpenStack APIs, one has to perform following steps:

  • Pause VM1 and VM2
  • Detach Storage Volume1 and Storage Volume2 from respective VMs
  • Snapshot VM1 and VM2 and store on Glance
  • Call Cinder Backup APIs to backup Storage Volume1 and Storage Volume2 to Swift
  • Keep track of these copies’ URIs in an excel sheet
  • Attach Volume1 and Volume2 back to VM1 and VM2
  • Resume VM1 and VM2
  • Repeat above steps needed

As you can realize, this is not a comprehensive backup and recovery solution.

At TrilioData, we believe the backup and recovery solution for your OpenStack cloud must have:

Essential Attributes of Cloud Backup And Recovery Description
Tenant administered backup and recoveryJust like any other service in the cloud, backup and recovery service must present easy to consume policies that tenant can choose apply them to their workloads
Non-disruptive backupsBackup process must not disrupt running workloads. The backup process must be non-intrusive for running workloads with respect to availability and performance
Instant RestoreCloud workloads can be huge and the recovery of a workload from the backup must be as quick as possible. Waiting for entire data to be copied from backup media to production will severely impact the recovery SLA of the service.
Backup/Recover single/multi VM workloadsCloud workloads span multiple VMs and hence the backup process must have the ability to backup workloads that span multiple VMs
Validate BackupsThis is another feature that cloud backup can implement using on demand cloud resources. Backup process must provide a means for tenant to quickly replay a workload from a backup media that tenant can periodically validate the backup sanity.
Efficient data transfers of backup images to service end pointsIncremental backups and performing dedupe at the source significantly improves the backup process
Disaster RecoveryBackup service must include disaster recovery element too. Cloud resources are highly available that periodically replicated to multiple geographical locations. So replication backup media to multiple locations will enhance the backup process capability to restore a workload even in case of an outage at one geo location

If you would like to know more about TrilioData, please stop by the booth in OpenStack Design Summit in Atlanta.

  • holms

    Just a stupid question, why you need backup for HA cluster? Distributed storage is replicated, all configuration usually should be automated.. There’s basically nothing to backup. cause everything is replicated.

  • Murali Balcha

    Hi Holms,
    Few people asked the same question so I thought I will post a blog on this.

    You comments are welcome.

    Murali Balcha

WordPress Video Lightbox Plugin