Azure Monitor helps you maximize the availability and performance of your applications and services. It delivers a comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments. This information helps you understand how your applications are performing and proactively identify issues affecting them and the resources they depend on.
All data collected by Azure Monitor fits into one of two fundamental types, metrics and logs. Metrics are numerical values that describe some aspect of a system at a particular point in time. They are lightweight and capable of supporting near real-time scenarios.
Azure provides some out of the box metrics for VM’s that we can use to monitor our resources, but in order to monitor guest level metrics such as free disk space we need to configure Performance Counters (https://docs.microsoft.com/en-us/azure/azure-monitor/agents/data-sources-performance-counters).
In this article we will see how to monitor CPU, Memory, Disk and disk.
Requirements:
- A Virtual Machine
- Log Analytics Workspace
- Connect the virtual machine to log analyitics workspace (https://faun.pub/hook-your-azure-vm-into-log-analytics-with-the-mma-agent-vm-extension-using-terraform-ca438d7e07dc)
Unfortunately for the moment we cannot configure performance counters using terraform, se we are force to create an ARM template file that we will use to do so (which will be deployed by terraform):
{
"$schema": "https://schema.management.azure.com/schemas/2019-08-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"workspaceName": {
"type": "string",
"metadata": {
"description": "Name of the workspace."
}
},
"location": {
"type": "string",
"metadata": {
"description": "Specifies the location in which to create the workspace."
}
}
},
"resources": [
{
"apiVersion": "2020-08-01",
"type": "Microsoft.OperationalInsights/workspaces",
"name": "[parameters('workspaceName')]",
"location": "[parameters('location')]",
"resources": [
{
"apiVersion": "2020-08-01",
"type": "datasources",
"name": "LinuxPerformanceLogicalDisk",
"dependsOn": [
"[concat('Microsoft.OperationalInsights/workspaces/', parameters('workspaceName'))]"
],
"kind": "LinuxPerformanceObject",
"properties": {
"objectName": "Logical Disk",
"instanceName": "*",
"intervalSeconds": 60,
"performanceCounters": [
{
"counterName": "% Used Inodes"
},
{
"counterName": "Free Megabytes"
},
{
"counterName": "% Used Space"
},
{
"counterName": "Disk Transfers/sec"
},
{
"counterName": "Disk Reads/sec"
},
{
"counterName": "Disk Writes/sec"
}
]
}
},
{
"apiVersion": "2020-08-01",
"type": "datasources",
"name": "LinuxPerformanceProcessor",
"dependsOn": [
"[concat('Microsoft.OperationalInsights/workspaces/', parameters('workspaceName'))]"
],
"kind": "LinuxPerformanceObject",
"properties": {
"objectName": "Processor",
"instanceName": "*",
"intervalSeconds": 60,
"performanceCounters": [
{
"counterName": "% Processor Time"
},
{
"counterName": "% Privileged Time"
}
]
}
},
{
"apiVersion": "2020-08-01",
"type": "datasources",
"name": "LinuxPerformanceMemory",
"dependsOn": [
"[concat('Microsoft.OperationalInsights/workspaces/', parameters('workspaceName'))]"
],
"kind": "LinuxPerformanceObject",
"properties": {
"objectName": "Memory",
"instanceName": "*",
"intervalSeconds": 60,
"performanceCounters": [
{
"counterName": "% Used Memory"
},
{
"counterName": "% Available Memory"
}
]
}
},
{
"apiVersion": "2020-08-01",
"type": "datasources",
"name": "DataSource_LinuxPerformanceCollection",
"dependsOn": [
"[concat('Microsoft.OperationalInsights/workspaces/', parameters('workspaceName'))]"
],
"kind": "LinuxPerformanceCollection",
"properties": {
"state": "Enabled"
}
}
]
}
]
}
Create a new file named monitoring.tf
First we will deploy our log analyitics performance counters arm template
resource "random_string" "unique" {
length = 8
special = false
upper = false
}
resource "azurerm_template_deployment" "deploy_log_analyitics_linux_performance_counters" {
name = "linux-perf-counter-${random_string.unique.result}"
resource_group_name = var.resource_group
template_body = file("${path.module}/arm/PerformanceCountersLogAnalytics.json")
parameters = {
"workspaceName" = "${var.log_name}-${var.region}-${var.environment}"
"location" = var.location_log_analytics
}
deployment_mode = "Incremental"
}
Then we will create an action group alert to notify our users by mail if we have an alarm:
resource "azurerm_monitor_action_group" "action_group_alert" {
name = "action-group-test-alert-prod"
resource_group_name = var.resource_group
short_name = "ag-botprod"
dynamic "email_receiver" {
for_each = var.admin_email
content {
name = "sendto-${email_receiver.key}"
email_address = email_receiver.value
}
}
arm_role_receiver {
name = "sentorolemonitoringreader"
role_id = "43d0d8ad-25c7-4714-9337-8ba259a9fe05"
use_common_alert_schema = true
}
arm_role_receiver {
name = "sentorolemonitoringcontributor"
role_id = "749f88d5-cbae-40b8-bcfc-e573ddc772fa"
use_common_alert_schema = true
}
}
To monitor disk space we will create a new schedule rule alert with a log analytics query to monitor disk space:
resource "azurerm_monitor_scheduled_query_rules_alert" "monitor_disk_space" {
name = "monitor-disk-test-${var.environment}"
location = var.location_log_analytics
resource_group_name = var.resource_group
action {
action_group = [ azurerm_monitor_action_group.action_group_alert.id ]
email_subject = "Used Disk Space Over 80%"
}
data_source_id = data.azurerm_log_analytics_workspace.logs.id
description = "Alert to monitor free disk space"
enabled = true
query = <<-QUERY
Perf
| where TimeGenerated > ago(5min)
| where (ObjectName == "Logical Disk" or ObjectName == "LogicalDisk") and CounterName contains "% Used Space" and InstanceName != "_Total" and InstanceName != "HarddiskVolume1" and CounterValue >=85
| project TimeGenerated, Computer, ObjectName, CounterName, InstanceName, CounterValue
QUERY
severity = 1
frequency = 5
time_window = 5
trigger {
operator = "GreaterThan"
threshold = 0
}
}
Same thing for the CPU
resource "azurerm_monitor_metric_alert" "cpu" {
name = "monitor-cpu-test-${var.environment}"
resource_group_name = var.resource_group
scopes = [data.azurerm_virtual_machine.vm-test.id]
description = "Action will be triggered when Average CPU is greater than 85"
severity = 1
criteria {
metric_namespace = "Microsoft.Compute/virtualMachines"
metric_name = "Percentage CPU"
aggregation = "Average"
operator = "GreaterThan"
threshold = 85
}
action {
action_group_id = azurerm_monitor_action_group.action_group_alert.id
}
}
And finally the memory:
resource "azurerm_monitor_scheduled_query_rules_alert" "monitor_memory" {
name = "monitor-memory-test-${var.environment}"
location = var.location_log_analytics
resource_group_name = var.resource_group
action {
action_group = [ azurerm_monitor_action_group.action_group_alert.id ]
email_subject = "Memory Over 80%"
}
data_source_id = data.azurerm_log_analytics_workspace.logs.id
description = "Alert to monitor memory used"
enabled = true
query = <<-QUERY
Perf
| where TimeGenerated > ago(1min)
| where CounterName contains "% Used Memory" and InstanceName != "_Total" and CounterValue >=80
| project TimeGenerated, Computer, ObjectName, CounterName, InstanceName, CounterValue
QUERY
severity = 1
frequency = 5
time_window = 5
trigger {
operator = "GreaterThan"
threshold = 0
}
}

And there you go, you will receive an email if your virtual machine has low disk space or high cpu/ram usage.
Happy terraforming!
Hi Freddy,
Thanks for this wonderful piece, I’m still new to terraform. Using you steps for the monitoring of VMs. I’m getting the error that the data source “data_source_id = data.azurerm_log_analytics_workspace.logs.id” , “scopes = [data.azurerm_virtual_machine.backupvm.id]” and this “data_source_id = data.azurerm_log_analytics_workspace.logs.id” are not yet decleared” has not been declared in the root module. How can i get this done to clear the error?
Hi,
I have a question, before it all , did the vm need to be connecting with the log analitycs workspace?