How can we help you today?

Azure Windows VM Scale Sets - Monitoring and Automation

Modified on: Sun, 16 Apr, 2017 at 6:42 AM

Overview

Information in this article is related to using CloudMonix for monitoring Azure Windows VM Scale Sets. Since Azure Service Fabric and Azure Batch services run on top of VM Scale Sets, they can also be monitored thru VM Scale Set monitoring functionality.

The article covers the following topics:

common use cases where CloudMonix can help with monitoring and automation

what is needed to connect to and monitor an Azure Windows VM Scale Set

Why use CloudMonix for Azure Windows VMSS and Service Fabric?

Automatically adjust the number of VMs based on the actual demand or according to schedule

Restart all VMs in a set once per day, one at a time to keep them fresh

Shutdown Scale Sets during off hours

Ensure VMs availability

Reboot individual VMs if they run low on memory

Configuration

Azure VM Scale Set monitoring can be configured either via Setup Wizard or by using the “Add New” button in the dashboard. It’s highly recommended to use Setup Wizard when configuring permissions for the first time, as that will simplify authorization. Learn more about authorizing with Setup Wizard here.

During configuration it’s necessary to specify the Resource Group, Resource Name, and if available Deployment Id for the monitored resource. It’s also necessary to select the Storage Account that should be used for storing data from Diagnostics Extensions, if the account is not populated automatically.

Do NOT modify Diagnostics configuration checkbox:If required, it’s possible to prevent CloudMonix modifying the Diagnostics configuration, however in such scenario users are fully responsible for managing configuration and updating it for all nodes every time. Learn more here.

Do NOT auto-update my computer nodes checkbox: Azure doesn't automatically deploy configuration changes to all nodes in the Scale Set, therefore CloudMonix will ensure that all nodes use the same configuration by propagating the changes. If required, it's possible to prevent CloudMonix automatically propagating configuration changes to all nodes in the Scale Set, but it's important to understand the potential consequences of doing so. Learn more here.

To use CloudMonix’s auto-scaling feature, users should disable native Azure auto-scaling in the Azure portal. It is also highly recommended to disable Azure’s Over-Provision feature, if the user intends to use auto-scaling from CloudMonix. Over-Provision feature deploys extra VMs during a scale event and then removes unneeded ones, learn more about it here. The extra VM counts can conflict with tracking of current instance quantities in CloudMonix.

Metrics

Every diagnostic data point that CloudMonix retrieves from the monitored resource is considered a metric in CloudMonix. Refer to the Metrics article to learn more about metrics in general.

The metrics can be added, removed and customized in the Metrics tab in the resource configuration dialog.

Built-in Metrics

ResourceStatus

Tracks the overall running status of the monitored instances within Scale Set. This is a critical metric that is captured for most types of resources that CloudMonix tracks. It is used for Uptime reports and should not be removed.

Data Type: string

Possible values: Ready, Down, Unknown

Included in sample profile: yes, in both profiles tracked as a metric called Status

Included in default alerts: yes, in an alert:

ResourceOutage (Error): Raises an alert when monitored server is reported as not-Ready by Azure of if no metrics come through from diagnostic agents, for at least 5 min.

Statuses are determined according to the following rules:

Ready - successfully connected to the resource

Down - there was an error when trying to retrieve data from the resource

Unknown - can’t connect to the resource (e.g. because of invalid credentials)

WindowsPerformanceCounter

Windows Performance Counter is one of the most popular metric types. Windows OS and applications running on it publish a large number of performance counters that highlight various aspects of performance indicators, health, uptime, etc. In order to learn more about the most popular counters refer to the Monitor Windows Server with Performance Counters article. The Performance Counter class documentation explains how to consume and define custom counters, should there be a need for CloudMonix to track user-generated diagnostic data.

CloudMonix can track any published performance counter. Each performance counter that CloudMonix should track must be defined as an individual metric in the Resource Configuration dialog.

Data Type: double

Included in sample profile: yes:

Performance Counter Metrics included in both sample templates:

CPUTime: Processor(_Total)\ % Processor Time

CpuTime30MinAverage: CPUTime aggregated over 30 min.

DiskFreeSpaceTotal: LogicalDisk(_Total)\Free Megabytes

DiskIdleTime: PhysicalDisk(_Total)\% Idle Time

DiskReadSpeed: PhysicalDisk(_Total)\Avg. Disk sec/Read

DiskWriteSpeed: PhysicalDisk(_Total)\Avg. Disk sec/Write

MemoryCommittedPct: Memory\% Committed Bytes In Use

MemoryFree: Memory\Available MBytes

Metrics included in the Sample configuration for IIS farm on Azure VM ScaleSet template:

AspNetApplicationRestarts: ASP.NET\Application Restarts

AspNetBytesOut: ASP.NET Applications(__Total__)\Request Bytes Out Total

AspNetErrors: ASP.NET Applications(__Total__)\Errors Total/Sec

AspNetRequests: ASP.NET Applications(__Total__)\Requests/Sec

AspNetRequestsQueued: ASP.NET\Requests Queued

AspNetRequestsRejected: ASP.NET\Requests Rejected

AspNetRequestWaitTime: ASP.NET\Request Wait Time

Included in default alerts: yes:

Alerts included in both sample templates:

High CPU (Warning): Raises an alert when CPU utilization is over 70% for the last 5 minutes sustained

Low Memory (Warning): Raises an alert if the amount of available physical memory on a specific instance, falls below 100MBs for the last 2 monitoring cycles sustained

Low Disk Space (Warning): Raises an alert when any of the disks has less than 1GB of free space left

Alerts included in the Sample configuration for IIS farm on Azure VM ScaleSet template:

Requests are Queueing Up (Warning): Raises an alert when the number of queued requests exceeds 10, for 5 minutes sustained. Queued requests indicate that IIS or backened processes are not able to process the requests quickly enough

WindowsPerformanceCounterMultiInstance

Data Type: double

Included in sample profile: yes, tracked as a metric:

DiskFreeSpace: LogicalDisk\Free Megabytes

Included in default alerts: yes, included in both profiles:

Low Disk Space (Warning): Raises an alert when any of the disks has less than 1GB of free space left

AzureVirtualMachineOperations

Data Type: array of objects with the following properties:

Name (string): Operation name.

Category (string): Event category.

Description (string): Event description.

Caller (string): Caller.

EventName (string): The event name.This value should not be confused with operation name.

ResourceInstanceCount

WindowsEventLogEntry

Included in sample profile: yes, in both profiles tracked as metrics called ApplicationsEventLogs, SystemEventLogs

Included in default alerts: no

Alerts

Users can create alerts based on changes in any value tracked by CloudMonix (including custom metrics). Each resource template includes alerts which are suitable for a given resource.

Refer to the Alerts article to learn more. The predefined alerts for Azure Windows VM Scale Set are listed in the Metrics section.

Alerts are available during the Trial period or in Professional and Ultimate plans only.

Automation

Automation features (Actions) allow users to set up powerful reactive, proactive and scheduled actions. CloudMonix can execute actions when a specific monitoring condition occurs or according to a schedule. Refer to the Actions article to learn more about automating VM Scale Sets reboots.

Automation features are available during the Trial period or in the Ultimate plan only.

As a general rule, every new action should specify the appropriate Suspended period and Sustained period values. See Automating Actions article to learn more about those settings.

Built-in Actions

AzureVmScaleSetInstanceReboot

CloudMonix will request Azure to reboot the specified VM.

Evaluated and executed on an individual VM level. Available when “Evaluate this condition by individual instance?” is set to true.

Included in the default profiles in the following actions, which have to be explicitly enabled:

Daily reboot (Warning): reboots VMSS instances one per day, one instance at a time.

Low Ram Reboot (Warning): Reboot VMSS instance if available memory drops below 100MB for 5 minutes sustained. This action will not be executed more than once per hour due to Suspended period setting.

AzureVmScaleSetInstanceReimage

CloudMonix will request Azure to re-image the specified VM.

Evaluated and executed on an individual VM level. Available when “Evaluate this condition by individual instance?” is set to true.

AzureVmScaleSetStart

CloudMonix will request that a particular VMs Scale Set is started.

Evaluated and executed on a Scale Set level. Available when “Evaluate this condition by individual instance?” is set to false.

AzureVmScaleSetStopDeallocate

CloudMonix will request that a particular VMs Scale Set is shutdown and deallocated (i.e. resources are released). Deallocating VMs helps to lower the costs as Azure doesn’t charge for deallocated resources.

Evaluated and executed on a Scale Set level. Available when “Evaluate this condition by individual instance?” is set to false.

Auto-scaling

Auto-scaling allow users to set up powerful reactive, proactive and scheduled auto-scaling rules. CloudMonix can execute scale adjustments when a specific monitoring condition occurs or according to a schedule. See the Auto-scaling article to learn more about Auto-scaling VM Scale Sets.