Azure offers an end-to-end backup and disaster recovery solution that’s simple, secure, scalable, and cost-effective—and can be integrated with on-premises data protection solutions.
In the following article we will see an example of how to implement a Virtual Machine Disaster Recovery with Azure Site Recovery VM Replication using Terraform.
Site Recovery helps ensure business continuity by keeping business apps and workloads running during outages. Site Recovery replicates workloads running on physical and virtual machines (VMs) from a primary site to a secondary location. When an outage occurs at your primary site, you fail over to secondary location, and access apps from there. After the primary location is running again, you can fail back to it.
Our Use case is to replicate a VM using Site Recovery, so in case we have a failure we can failover to a secondary VM that will be available using the same FQDN and Public IP as the first one. The second VM should be replicated to a different Region.
The advantage of using Site Recovery is that the second VM is not running so we do not pay for the computing resources but only for the storage and traffic to the secondary region.
You should already have a VNET and Subnet deployed.
First of all we will deploy our virtual machine with a public IP:
data "azurerm_subnet" "snet-backend" {
depends_on = [var.subnets]
name = var.vm.snet_name
virtual_network_name = var.vm.vnet_name
resource_group_name = var.resource_group
}
resource "azurerm_public_ip" "pip-vm-app" {
name = "pip-app"
location = var.location
resource_group_name = var.resource_group
allocation_method = "Static"
idle_timeout_in_minutes = 30
domain_name_label = var.vm.fqdn
tags = {
environment = "test"
}
}
resource "azurerm_network_interface" "main" {
name = var.vm.nic_name
location = var.location
resource_group_name = var.resource_group
ip_configuration {
name = "testconfiguration1"
subnet_id = data.azurerm_subnet.snet-backend.id
private_ip_address_allocation = "Dynamic"
public_ip_address_id = azurerm_public_ip.pip-vm-app.id
}
}
resource "azurerm_virtual_machine" "main" {
name = var.vm.name
location = var.location
resource_group_name = var.resource_group
network_interface_ids = [azurerm_network_interface.main.id]
vm_size = var.vm.size
# Uncomment this line to delete the OS disk automatically when deleting the VM
# delete_os_disk_on_termination = true
# Uncomment this line to delete the data disks automatically when deleting the VM
# delete_data_disks_on_termination = true
storage_image_reference {
publisher = var.vm.storage_image_reference.publisher
offer = var.vm.storage_image_reference.offer
sku = var.vm.storage_image_reference.sku
version = var.vm.storage_image_reference.version
}
storage_os_disk {
name = "disk-${var.vm.name}-os"
caching = var.vm.storage_os_disk.caching
create_option = var.vm.storage_os_disk.create_option
managed_disk_type = var.vm.storage_os_disk.managed_disk_type
}
os_profile {
computer_name = var.vm.os_profile.computer_name
admin_username = var.vm.os_profile.admin_username
admin_password = var.vm.os_profile.admin_password
#custom_data = file(var.vm.os_profile.custom_data)
}
os_profile_linux_config {
disable_password_authentication = false
}
boot_diagnostics {
enabled = true
storage_uri = "https://${var.storage_account.name}.blob.core.windows.net"
}
tags = var.tags
}
resource "azurerm_managed_disk" "disk-data-app" {
name = "disk-${azurerm_virtual_machine.main.name}-data"
location = var.location
resource_group_name = var.resource_group
storage_account_type = "StandardSSD_LRS"
create_option = "Empty"
disk_size_gb = var.vm.storage_data_disk.disk_size_gb
}
resource "azurerm_virtual_machine_data_disk_attachment" "example" {
managed_disk_id = azurerm_managed_disk.disk-data-app.id
virtual_machine_id = azurerm_virtual_machine.main.id
lun = var.vm.storage_data_disk.lun
caching = "ReadWrite"
}
Once the VM is deployed we will deploy a Recovery Vault to use the service Site Recovery
data "azurerm_resource_group" "secondary" {
name =var.resource_group_secondary
}
data "azurerm_resource_group" "primary" {
name =var.resource_group
}
resource "azurerm_recovery_services_vault" "vault" {
name = "rv-app-${var.region_secondary}-${var.environment}"
location = var.location_secondary
resource_group_name = data.azurerm_resource_group.secondary.name
sku = "Standard"
}
Then we will deploy a recovery fabric and a protection container
resource "azurerm_site_recovery_fabric" "primary" {
name = "primary-fabric"
resource_group_name = data.azurerm_resource_group.secondary.name
recovery_vault_name = azurerm_recovery_services_vault.vault.name
location = data.azurerm_resource_group.primary.location
}
resource "azurerm_site_recovery_fabric" "secondary" {
name = "secondary-fabric"
resource_group_name = data.azurerm_resource_group.secondary.name
recovery_vault_name = azurerm_recovery_services_vault.vault.name
location = var.location_secondary
}
resource "azurerm_site_recovery_protection_container" "primary" {
name = "primary-protection-container"
resource_group_name = data.azurerm_resource_group.secondary.name
recovery_vault_name = azurerm_recovery_services_vault.vault.name
recovery_fabric_name = azurerm_site_recovery_fabric.primary.name
}
resource "azurerm_site_recovery_protection_container" "secondary" {
name = "secondary-protection-container"
resource_group_name = data.azurerm_resource_group.secondary.name
recovery_vault_name = azurerm_recovery_services_vault.vault.name
recovery_fabric_name = azurerm_site_recovery_fabric.secondary.name
}
We will define a replication policy
resource "azurerm_site_recovery_replication_policy" "policy" {
name = "policy"
resource_group_name = data.azurerm_resource_group.secondary.name
recovery_vault_name = azurerm_recovery_services_vault.vault.name
recovery_point_retention_in_minutes = 24 * 60
application_consistent_snapshot_frequency_in_minutes = 4 * 60
}
We will map the source container with the target
resource "azurerm_site_recovery_protection_container_mapping" "container-mapping" {
name = "container-mapping"
resource_group_name = data.azurerm_resource_group.secondary.name
recovery_vault_name = azurerm_recovery_services_vault.vault.name
recovery_fabric_name = azurerm_site_recovery_fabric.primary.name
recovery_source_protection_container_name = azurerm_site_recovery_protection_container.primary.name
recovery_target_protection_container_id = azurerm_site_recovery_protection_container.secondary.id
recovery_replication_policy_id = azurerm_site_recovery_replication_policy.policy.id
}
Now we will deploy a VNET and Subnet were we will replicate or main Virtual Machine, another one to test the failover and a staging storage account for data replication.
resource "random_string" "lower" {
length = 4
upper = false
lower = true
number = true
special = false
}
resource "azurerm_storage_account" "primary" {
name = "prireccache${random_string.lower.result}"
location = var.location
resource_group_name = var.resource_group
account_tier = "Standard"
account_replication_type = "LRS"
}
resource "azurerm_virtual_network" "secondary" {
name = "vnet-app"
resource_group_name = data.azurerm_resource_group.secondary.name
address_space = var.vnet.address_space
location = var.location_secondary
}
resource "azurerm_subnet" "secondary" {
name = "snet-backend"
resource_group_name = data.azurerm_resource_group.secondary.name
virtual_network_name = azurerm_virtual_network.secondary.name
address_prefix = var.vnet.subnets[0].address_prefix
}
resource "azurerm_virtual_network" "test-failover" {
name = "vnet-app-test-failover"
resource_group_name = data.azurerm_resource_group.secondary.name
address_space = [var.vnet.address_space_failover_test[0]]
location = var.location_secondary
}
resource "azurerm_subnet" "test-failover" {
name = var.vnet.subnets_failover_test[0].name
resource_group_name = data.azurerm_resource_group.secondary.name
virtual_network_name = azurerm_virtual_network.test-failover.name
address_prefix = var.vnet.subnets_failover_test[0].address_prefix
}
resource "azurerm_network_interface" "vm" {
name = "vm-nic"
location = var.location_secondary
resource_group_name = data.azurerm_resource_group.secondary.name
ip_configuration {
name = "nic-vm-app-01"
subnet_id = azurerm_subnet.secondary.id
private_ip_address_allocation = "Dynamic"
}
}
Finally we will deploy our replicated VM
resource "azurerm_site_recovery_replicated_vm" "vm-replication" {
name = "vm-replication"
resource_group_name = data.azurerm_resource_group.secondary.name
recovery_vault_name = azurerm_recovery_services_vault.vault.name
source_recovery_fabric_name = azurerm_site_recovery_fabric.primary.name
source_vm_id = azurerm_virtual_machine.main.id
recovery_replication_policy_id = azurerm_site_recovery_replication_policy.policy.id
source_recovery_protection_container_name = azurerm_site_recovery_protection_container.primary.name
target_resource_group_id = data.azurerm_resource_group.secondary.id
target_recovery_fabric_id = azurerm_site_recovery_fabric.secondary.id
target_recovery_protection_container_id = azurerm_site_recovery_protection_container.secondary.id
managed_disk {
disk_id = azurerm_virtual_machine.main.storage_os_disk[0].managed_disk_id
staging_storage_account_id = azurerm_storage_account.primary.id
target_resource_group_id = data.azurerm_resource_group.secondary.id
target_disk_type = var.vm.storage_os_disk.managed_disk_type
target_replica_disk_type = var.vm.storage_os_disk.managed_disk_type
}
managed_disk {
disk_id = azurerm_managed_disk.disk-data-app.id
staging_storage_account_id = azurerm_storage_account.primary.id
target_resource_group_id = data.azurerm_resource_group.secondary.id
target_disk_type = "StandardSSD_LRS"
target_replica_disk_type = "StandardSSD_LRS"
}
target_network_id = azurerm_virtual_network.secondary.id
network_interface {
source_network_interface_id = azurerm_network_interface.main.id
target_static_ip = azurerm_public_ip.pip-vm-app.id
}
}
Once we have deployed the resources, we can verify in the Azure Portal Recovery Vault and do a test failover:
And voila! your VM is replicated to a second region and will keep the same FQDN and public IP in the case of a region outage when performing a manual failover, which can be automated using Azure Monitor and a Automation Account.
Happy Terraforming!
Regarding using the same public IP for ASR – Have you tested this? My understanding is that Azure public IPs are regionally tied, and cannot be moved to another region (or associated with resources in another region), and that you’d need to define a new PIP in the DR region. An Azure Traffic Manager can be used to make this pretty seamless.
I’ve tested it, it works when you do the failover to another region.
How we can configure the DR for multiple VMs above code is for single VM
After a failover would terraform still run or have an issue with the state? How can changes be done to the DR environment after the failover (e.g. let’s say after failover I see that in DR I need a larger machine)?
And how can I reverse ASR for the failback through terraform without messing up the state?