Background

This ZenPack provides support for monitoring Microsoft Windows. Monitoring is performed using the Windows Remote Management (WinRM) and Windows Remote Shell (WinRS) to collect Windows Management Instrumentation (WMI) and Perfmon data.

Note: This ZenPack supersedes the earlier ZenPack named ZenPacks.zenoss.WindowsMonitor for Windows platforms that support WinRM. If you have ZenPacks.zenoss.WindowsMonitor installed on your system, please read the #Transitioning from WindowsMonitor section below.

Video

Gallery

Features

The features added by this ZenPack can be summarized as follows. They are each detailed further below.

Initial discovery and periodic remodeling of relevant components.

Performance monitoring.

Event management.

Custom Commands

Discovery

The following components will be automatically discovered through the Windows server address, username and password you provide. The properties and relationships will be periodically updated by modeling.

Performance Monitoring

Perfmon counters are collected using the PowerShell Get-Counter Cmdlet within a remove shell (WinRS). The following metrics will be collected every 5 minutes by default. Any other Windows Perfmon counters can also be collected by adding them to the appropriate monitoring template.

Device-level graphs

File systems

Device

\Memory\Available bytes

\Memory\Committed Bytes

\Memory\Pages Input/sec

\Memory\Pages Output/sec

\Paging File(_Total)\% Usage

\Processor(_Total)\% Privileged Time

\Processor(_Total)\% Processor Time

\Processor(_Total)\% User Time

\System\System Up Time

File Systems

\LogicalDisk({$here/instance_name})\Disk Read Bytes/sec

\LogicalDisk({$here/instance_name})\% Disk Read Time

\LogicalDisk({$here/instance_name})\Disk Write Bytes/sec

\LogicalDisk({$here/instance_name})\% Disk Write Time

\LogicalDisk({$here/instance_name})\Free Megabytes

Hard Disks

\PhysicalDisk({$here/instance_name})\Disk Read Bytes/sec

\PhysicalDisk({$here/instance_name})\% Disk Read Time

\PhysicalDisk({$here/instance_name})\Disk Write Bytes/sec

\PhysicalDisk({$here/instance_name})\% Disk Write Time

Interfaces

\Network Interface(${here/instance_name})\Bytes Received/sec

\Network Interface(${here/instance_name})\Bytes Sent/sec

\Network Interface(${here/instance_name})\Packets Received Errors

\Network Interface(${here/instance_name})\Packets Received/sec

\Network Interface(${here/instance_name})\Packets Outbound Errors

\Network Interface(${here/instance_name})\Packets Sent/sec

Interfaces on Windows 2012

\Network Adapter(${here/instance_name})\Bytes Received/sec

\Network Adapter(${here/instance_name})\Bytes Sent/sec

\Network Adapter(${here/instance_name})\Packets Received Errors

\Network Adapter(${here/instance_name})\Packets Received/sec

\Network Adapter(${here/instance_name})\Packets Outbound Errors

\Network Adapter(${here/instance_name})\Packets Sent/sec

Active Directory

\NTDS\DS Client Binds/sec

\NTDS\DS Directory Reads/sec

\NTDS\DS Directory Searches/sec

\NTDS\DS Directory Writes/sec

\NTDS\DS Monitor List Size

\NTDS\DS Name Cache hit rate

\NTDS\DS Notify Queue Size

\NTDS\DS Search sub-operations/sec

\NTDS\DS Server Binds/sec

\NTDS\DS Server Name Translations/sec

\NTDS\DS Threads in Use

\NTDS\KDC AS Requests

\NTDS\KDC TGS Requests

\NTDS\Kerberos Authentications

\NTDS\LDAP Active Threads

\NTDS\LDAP Bind Time

\NTDS\LDAP Client Sessions

\NTDS\LDAP Closed Connections/sec

\NTDS\LDAP New Connections/sec

\NTDS\LDAP New SSL Connections/sec

\NTDS\LDAP Searches/sec

\NTDS\LDAP Successful Binds/sec

\NTDS\LDAP UDP operations/sec

\NTDS\LDAP Writes/sec

\NTDS\NTLM Authentications

Note: The Active Directory monitoring template will only be used when the server has the Primary or Backup Domain Controller role.

Note: If monitoring Exchange with a non-administrator user, the user must be a member of the Active Directory group "Exchange View-Only Administrators" for pre-2010 Exchange installations or "View Only Organization Management" for 2010 and later installations.

IIS

Note: IIS Management Scripts and Tools needs to be installed on the server side in order to model and monitor IIS sites. This is done through the Add Roles and Features tool on the Windows Server under Web Server -> Management Tools -> IIS Management Scripts and Tools.

\Web Service(_Total)\Bytes Received/sec

\Web Service(_Total)\Bytes Sent/sec

\Web Service(_Total)\CGI Requests/sec

\Web Service(_Total)\Connection Attempts/sec

\Web Service(_Total)\Copy Requests/sec

\Web Service(_Total)\Delete Requests/sec

\Web Service(_Total)\Files Received/sec

\Web Service(_Total)\Files Sent/sec

\Web Service(_Total)\Get Requests/sec

\Web Service(_Total)\Head Requests/sec

\Web Service(_Total)\ISAPI Extension Requests/sec

\Web Service(_Total)\Lock Requests/sec

\Web Service(_Total)\Mkcol Requests/sec

\Web Service(_Total)\Move Requests/sec

\Web Service(_Total)\Options Requests/sec

\Web Service(_Total)\Other Request Methods/sec

\Web Service(_Total)\Post Requests/sec

\Web Service(_Total)\Propfind Requests/sec

\Web Service(_Total)\Proppatch Requests/sec

\Web Service(_Total)\Put Requests/sec

\Web Service(_Total)\Search Requests/sec

\Web Service(_Total)\Trace Requests/sec

\Web Service(_Total)\Unlock Requests/sec

IIS Sites

\Web Service(${here/sitename})\Bytes Received/sec

\Web Service(${here/sitename})\Bytes Sent/sec

\Web Service(${here/sitename})\CGI Requests/sec

\Web Service(${here/sitename})\Connection Attempts/sec

\Web Service(${here/sitename})\Copy Requests/sec

\Web Service(${here/sitename})\Connection Attempts/sec

\Web Service(${here/sitename})\Delete Requests/sec

\Web Service(${here/sitename})\Files Received/sec

\Web Service(${here/sitename})\Files Sent/sec

\Web Service(${here/sitename})\Get Requests/sec

\Web Service(${here/sitename})\Head Requests/sec

\Web Service(${here/sitename})\ISAPI Extension Requests/sec

\Web Service(${here/sitename})\Lock Requests/sec

\Web Service(${here/sitename})\Mkcol Requests/sec

\Web Service(${here/sitename})\Move Requests/sec

\Web Service(${here/sitename})\Options Requests/sec

\Web Service(${here/sitename})\Other Request Methods/sec

\Web Service(${here/sitename})\Post Requests/sec

\Web Service(${here/sitename})\Propfind Requests/sec

\Web Service(${here/sitename})\Proppatch Requests/sec

\Web Service(${here/sitename})\Put Requests/sec

\Web Service(${here/sitename})\Search Requests/sec

\Web Service(${here/sitename})\Trace Requests/sec

\Web Service(${here/sitename})\Unlock Requests/sec

Note: The IIS monitoring template will only be used when IIS is found during modeling.

Note: The IISAdmin service must be running in order to collect IIS data.

The following metrics are collected directly via WMI.

Processes (Win32_PerfFormattedData_PerfProc_Process)

PercentProcessorTime

WorkingSet

WorkingSetPrivate

Note: IIS 6 Management compatibility role no longer needs to be installed on the server side in order to use the IIS Sites component.

SQL Server

The following performance counters are monitored via Powershell script per database:

\SQLServer:Databases(<dbname>)\Active Transactions

\SQLServer:Databases(<dbname>)\Backup/Restore Throughput/sec

\SQLServer:Databases(<dbname>)\Bulk Copy Rows/sec

\SQLServer:Databases(<dbname>)\Bulk Copy Throughput/sec

\SQLServer:Databases(<dbname>)\Cache Entries Count

\SQLServer:Databases(<dbname>)\Cache Entries Pinned Count

\SQLServer:Databases(<dbname>)\Cache Hit Ratio

\SQLServer:Databases(<dbname>)\Cache Hit Ratio Base

\SQLServer:Databases(<dbname>)\DBCC Logical Scan Bytes/sec

\SQLServer:Databases(<dbname>)\Data File(s) Size (KB)

\SQLServer:Databases(<dbname>)\Log Bytes Flushed/sec

\SQLServer:Databases(<dbname>)\Log Cache Hit Ratio

\SQLServer:Databases(<dbname>)\Log Cache Hit Ratio Base

\SQLServer:Databases(<dbname>)\Log Cache Reads/sec

\SQLServer:Databases(<dbname>)\Log File(s) Size (KB)

\SQLServer:Databases(<dbname>)\Log File(s) Used Size (KB)

\SQLServer:Databases(<dbname>)\Log Flush Wait Time

\SQLServer:Databases(<dbname>)\Log Flush Waits/sec

\SQLServer:Databases(<dbname>)\Log Flushes/sec

\SQLServer:Databases(<dbname>)\Log Growths

\SQLServer:Databases(<dbname>)\Percent Log Used

\SQLServer:Databases(<dbname>)\Log Shrinks

\SQLServer:Databases(<dbname>)\Log Truncations

\SQLServer:Databases(<dbname>)\Percent Log Used

\SQLServer:Databases(<dbname>)\Repl. Pending Xacts

\SQLServer:Databases(<dbname>)\Repl. Trans. Rate

\SQLServer:Databases(<dbname>)\Shrink Data Movement Bytes/sec

\SQLServer:Databases(<dbname>)\Transactions/sec

You can enable/disable any of these or change the cycle time by editing the WinDatabase monitoring template.

Events will be sent depending upon one or more of the following statuses of the database

AutoClosed: The database has been automatically closed.

EmergencyMode: The database is in emergency mode.

Inaccessible: The database is inaccessible. The server might be switched off or the network connection has been interrupted.

Normal: The database is available.

Offline: The database has been taken offline.

Recovering: The database is going through the recovery process.

RecoveryPending: The database is waiting to go through the recovery process.

Restoring: The database is going through the restore process.

Shutdown: The server on which the database resides has been shut down.

Standby: The database is in standby mode.

Suspect: The database has been marked as suspect. You will have to check the data, and the database might have to be restored from a backup.

Status can be multiple items from above. For example, taking a database offline will set the status to 'Offline, AutoClosed'.

The WinDBInstance monitoring template will monitor the status of a SQL Server instance to inform the user if it is up or down.

The WinSQLJob monitoring template will monitor the status of a job on a SQL Server instance to inform the user if it has succeeded, failed, unknown, or other state.

Thresholds

The following thresholds are set by default on the device monitoring template and will trigger an alert if they are reached

CPU Utilization - 90% used

Paging File Usage - 95% used

Memory - 90% of total memory used

Event Management

Events could be collected from the Windows event log using a WinRM subscription. Events collected through this mechanism will be timestamped based on the time they occurred within the Windows event log. Not by the time at which they were collected.

To monitor EventLog events you should add to monitoring template with "Windows EventLog" datasource. For the Event Log field put the name of event log (e.g. "System") that you are interested in, and in the EventQuery you could put the filter for events. The filter can be either a PowerShell Where-Object block or XPath XML taken from a Windows Event Viewer Custom View.

The default Get-WinEvent xml filter returns all events from the last polling cycle. This list can be searched for specific Ids, severity, or specific words in the message using PowerShell.

$$_ is the event object of EventLogEntry class. EntryType is the attribute which determines severity, and could contain one of the following values: Error, Warning, Information, SuccessAudit,<code> orFailureAudit. Also it has such attributes as Message, MachineName, TimeGenerated, Source. Full list you could find at http://msdn.microsoft.com/en-us/library/vstudio/system.diagnostics.eventlogentry .

Note: This query is structured to look for "less than," although we are looking for events "greater than" in severity. This is because the EntryType is an enumeration where the integer values map to 1= Error, 2 = Warning, etc. This means lower numbers indicate higher severity.

Note: This query is structured to look for "less than or equal" although we are looking for events "greater than or equal" in severity. This is because the Level is an enumeration where the integer values map to 1 = Critical, 2 = Error, 3 = Warning, etc. This means lower numbers indicate higher severity. The LogAlways event level evaluates to 0, which is less than a Warning. These events are typically Informational and will display if using the sample powershell query above. To work around this, you could add -and $$_.Level -gt [System.Diagnostics.Eventing.Reader.StandardEventLevel]::LogAlways into your query or use the xml option.

To use the xml query from a custom view in Windows Event Viewer, simply copy the xml and paste into the Event Query field of the event data source. Because we use a polling cycle to query the event log, any TimeCreated filter will be replaced by us to avoid duplicate events.

For example, a custom view that searches for events in the last hour, with severity of Warning or Critical, and Ids of 104, 110-115, 155 will result in the following XPath query:

'{time}' will be replaced by the number of milliseconds since the last query.

Note: The script to search for events and return relevant data is approximately 3700 characters. Due to the Windows 8192 character limit on the shell, any XML or PowerShell queries will need to be less than 4400 characters.

Note: The query for servers with .NET 3.5 and later uses the Get-WinEvent PowerShell cmdlet. If your server does not have one of these later versions, we will revert to using the Get-EventLog cmdlet. It is recommended, but not required, to install .NET version 3.5 SP1 or higher. If you have a mix of these servers using the same Event Log Data Source, you can mix and match the differing powershell queries. e.g. { $$_.Id -eq 4001 -or $$_.EventId -eq 4001 }

Enter script. Be sure to use a double dollar sign, '$$', in order to distinguish any powershell specific variables from a TALES expression.

Add a datapoint to collect the return value from the script which you can then graph

Configuring Service Monitoring

There are multiple ways to configure Windows service monitoring depending on if you want to configure for a single service on a single server, a specific service across all Windows servers, all 'Auto' start services, or somewhere in between.

WinService

Options

Name - Enter a name for the data source

Enabled - Enable or disable the data source

Severity - Choose the severity of the alert

Cycle Time - Frequency of how often the datasource will query service status

Update services immediately - Changes will be picked up during modeling. To have changes take effect immediately, check this box to start a job to index all services on all devices. This job could take several minutes to complete as it will update every service component on every Windows device in the system.

Service Options - Select the start type(s) to monitor. Add any services to include/exclude using a regex

Service Status - Choose to be alerted if a service is either not Running, not Stopped, not Paused, not Running or Paused, or not Stopped or Paused.

See the following examples:

Manually Enable or disable monitoring for a single service on a single server.

Navigate to the service on the server.

Click to select it.

Select Details in the lower component pane.

Choose the Fail Severity.

Choose Monitoring from the gear menu.

Choose Yes or No depending on what you want.

Note: Once monitoring has been enabled or disabled for a service, no monitoring template will apply. To reset this option for a service, uncheck the 'Manually Selected Monitor State' box in the Details of the service and save the change. This check box does not enable or disable monitoring for the service component.

Enable monitoring by default for the WinRM service wherever it is enabled.

Option 1

Navigate to Advanced -> Monitoring Templates.

Verify the list of templates is grouped by template.

Expand the WinService tree.

Click once to select the /Server/Microsoft copy.

Choose Copy / Override Template from the Template gear menu at the bottom left of the page.

Note: Setting a service to be monitored in this fashion will enable monitoring for the service regardless of device class.

Enable/Disable monitoring by default for the WinRM service for a select group of servers.

Create a new device class somewhere under /Server/Microsoft/Windows for the select group of servers.

Move the servers to the new device class.

Follow steps 1-5 from the previous section to create a copy of the WinService template.

Choose your new device class as the target then click submit.

Expand the WinService tree then select the copy in your device class.

Choose View and Edit Details from the gear menu at the bottom left of the page.

Change the template's name to WinRM then click submit.

Double-click to edit the DefaultService' datasource.

Optionally select the Update services immediately option. This will start a background job that could take several minutes to complete for a large number of Windows devices.

Tick/Untick the Auto checkbox under Service Options and click save.

Enable monitoring of all services with a start mode of 'Auto'.

Navigate to Advanced -> Monitoring Templates.

Verify the list of templates is grouped by template.

Expand the WinService tree.

Select /Server/Microsoft.

In the Data Sources pane, click the + button to add a new data source, give it a name, and choose Windows Service as the type.

Choose View and Edit Details from the Data Sources gear menu.

Optionally select the Update services immediately option. This will start a background job that could take several minutes to complete for a large number of Windows devices.

Tick the Auto checkbox under Service Options and click save.

Create an organizer to monitor auto start SQL Server services.

Navigate to Advanced -> Monitoring Templates.

Verify the list of templates is grouped by template.

Expand the WinService tree.

Select /Server/Microsoft.

In the Data Sources pane, click the + button to add a new data source, give it a name such as MSSQLSERVER, and choose Windows Service as the type.

Choose View and Edit Details from the Data Sources gear menu.

Optionally select the Update services immediately option. This will start a background job that could take several minutes to complete for a large number of Windows devices.

Tick the Auto checkbox under Service Options.

Enter +MSSQLSERVER.* into the "Inclusions(+)/Exclusions(-)" text box and click save.

The order of precedence for monitoring a service is:

User manually sets monitoring

'DefaultService' datasource from the WinService template associated with the service

Datasource other than the DefaultService in the WinService template associated with the service

Monitoring is enabled via the Infrastructure -> Windows Services page

Windows Service Startmodes (Template vs Windows Services)

Startmodes

Template includes Service startmode

Template excludes Service startmode

Windows Service Class includes Service startmode

monitored

monitored

Windows Service Class excludes Service startmode

monitored

NOT monitored

Note: The Windows Service Template (default WinService) must have at least one datasource enabled for monitoring to function.

You can optionally include or exclude certain services to be monitored when selecting the Auto, Manual, and/or Disabled start mode(s) by entering a comma separated list of services. These can be the service names or a valid regular expression. Entered names and expressions are case insensitive. To exclude services, you must specify a '-' at the beginning of the name or regular expression. To include services, specify a '+' at the beginning of the name or regular expression. Exclusions will take precedence over inclusions, but the exclusions must be placed before the wildcard +.* inclusion.

Note: To enable monitoring by default of a service or services, you must choose a start mode by ticking the appropriate box. Unticking all three boxes disables monitoring by default.

Note: When saving changes to a service template and you choose to update services immediately, this will create a job to index all services on all devices. These changes may take several minutes to propagate to all of your devices depending upon the size of your organization. Updating is not recommended if you are making several changes in a short period of time. Updates are automatically applied at the time of the next model.

Note: The Windows Service datasource no longer depends on the 'DefaultService' data source name. User defined datasources are now honored.

DCDiag

Beginning with version 2.4.0, you can now monitor the output of DCDiag. By default all dcdiag tests are enabled in the Active Directory monitoring template. If a test fails an error event is issued. You can also add other tests, such as DNS, and supply specific test parameters.

Note: DCDiag must be run as a user with Administrator permissions. If you will be monitoring a Domain Controller with a non administrator user, you should disable these tests.

PortCheck

Beginning with version 2.4.0, you can now monitor specific ports in the Windows Zenpack. By default, the ZenPack will monitor ports 9389, 3268, 3269, 88, 464, 389, 636, 445, 135, and 3389, as part of the Active Directory monitoring template.

You can add and remove any port you wish to be monitored by editing the PortCheck datasource in the Active Directory monitoring template.

To monitor ports on a Windows server that is not a domain controller, simply create a new datasource and choose Windows PortCheck as the type. Then add the ports you wish to monitor with a short description of each.

WinRM Ping

WinRM Ping is a simple datasource that will attempt to retrieve basic data over winrm. If the device cannot return a simple query, then Zenoss will view this device as being down. An event will appear in the /Status/Winrm/Ping event class with any resulting error message. This is a more comprehensive test than using a ping. A simple ping test could easily result in a false positive in many scenarios. The following are just a few:

A target's IP has been reassigned to a non-Windows device between models.

Note: During ZenPack installation, a job to reindex Windows Services may start. It is recommended to either stop zenjobs before installing or to wait until the job finishes before restarting Zenoss. If you restart before the job finishes, you may need to Abort and/or Delete the job after the restart.

Installing Kerberos Dependency

To use kerberos authentication the operating system's kerberos package must be installed on all Zenoss servers. On Enterprise Linux (Red Hat and CentOS) this is the krb5-workstation RPM and can typically be installed by running the following command as the root user.

yum -y install krb5-workstation

Usage

Monitoring User Account

A monitoring user account must be either an Administrator or a least privileged user.

The Least Privileged User requires the following privileges and permissions:

Membership in the following local groups or domain level groups for a Domain Controller

"Performance Monitor Users",

"Performance Log Users",

"Event Log Readers",

"Distributed COM Users",

"WinRMRemoteWMIUsers__"

“Read Folder” access to "C:\Windows\system32\inetsrv\config" if it exists

Each service needs the following permissions

SERVICE_QUERY_CONFIG

SERVICE_QUERY_STATUS

SERVICE_INTERROGATE

READ_CONTROL

SERVICE_START

Note: An Administrator level user can be denied local logon and remote desktop access through a group policy object.

Port Requirements

The ZenPack communicates with a Windows device over port 5985 for HTTP or 5986 for HTTPS requests. Compatible ports of 80 and 443 are also acceptable.
For domain authentication, Kerberos communicates on port 88 of the KDC and on port 749 of the Admin Server.

Note: If using the compatibility ports of 80 or 443, you must create the appropriate listener in your server's WinRM configuration.

Adding a Windows Device

Use the following steps to start monitoring a Windows server using local authentication in the Zenoss web interface.

Navigate to the Infrastructure page.

Select the Server/Microsoft/Windows device class.

The Windows server must be added to this class or to a child of this class.

Click Details and set the configuration properties for zWinRMUser and zWinRMPassword.

Click See All.

Choose Add Single Device from the add device button.

Fill out the form.

Name or IP must be resolvable and accessible from the collector server chosen in the Collector field.

Click ADD.

Alternatively you can use zenbatchload to add Windows servers from the command line. To do this, you must create a text file with hostname, username and password of all the servers you want to add. Multiple endpoints can be added under the same /Devices/Server/Microsoft/Windows section. Here is an example...

You can then load the Windows servers into Zenoss Core or Resource Manager as devices with the following command.

zenbatchload <filename>

Configuration Options

The #Adding a Windows Device steps shown above are for the simplest case of using Windows local authentication. The following configuration properties can be used to support monitoring other environments.

zWinRMUser

The syntax used for zWinRMUser controls whether Zenoss will attempt Windows local authentication or domain (kerberos) authentication. If the value of zWinRMUser is username, local Windows authentication will be used. If zWinRMUser is username@example.com, domain authentication will be used. The zWinKDC and potentially the zWinRMServerName properties become important.

zWinRMPassword

Password for user defined by zWinRMUser.

zWinKDC

The zWinKDC property must be set if domain authentication is used. It must be the IP address or resolvable name of a valid Windows domain controller. To use multiple KDCs, you can enter a comma separated list of valid addresses or supply different KDCs across different Device Classes. See the Kerberos Tickets section for more information.

zWinTrustedRealm

Enter the name of the domain which is trusted by the user's domain. This can be a child or other domain which has a trust relationship with the user's domain. For example, if zWinRMUser is username@example.com, and austin.example.com is a child of the example domain, enter austin.example.com into zWinTrustedRealm.

zWinTrustedKDC

This property must be set if zWinTrustedRealm is set. It must be the IP address or resolvable name of a valid Windows domain controller for the trusted realm.

zWinRMServerName

This property should only be used in conjunction with domain authentication when the DNS PTR record for a monitored server's managed IP address does not resolve to the name by which the server is known in Active Directory. For example, if myserver1 is known as myserver1.ad.example.com by Active Directory and is being managed by IP address 192.51.100.21, but 192.51.100.21 resolves to www.example.com, you will have to set zWinRMServerName to myserver1.ad.example.com for domain authentication to work.

If many Windows servers in your environment don't have DNS PTR records that match Active Directory, it is recommended that you set the name of the Zenoss device's to be the fully-qualified Active Directory name and set zWinRMServerName to ${here/titleOrId} at the /Server/Microsoft/Windows device class. This avoids the necessity of setting zWinRMServerName on every device.

If the server name cannot be resolved and you are using domain authentication, it is recommended that you set the Id of the device to the IP address and the Title to the server name it is known by in Active Directory. Then use ${here/title} for zWinRMServerName. This situation can occur when no DNS server is available. Kerberos always performs a reverse lookup when obtaining a ticket to use a service on a computer. If your servers are known by multiple names, the reverse lookup may return the wrong name and you will see "Server not found in kerberos database" errors. See the troubleshooting section on this topic for a solution.

zWinScheme

This must be set to either http or https. The default is http.

zWinUseWsmanSPN

If the HTTP/HTTPS service principals are exclusively in use for a particular service account, such as on an IIS server, set this option to true to use the WSMAN service principal name. You can use this option for all domain joined Windows Servers that are using a domain monitoring account.

Note: A domain controller may need “Validated write to service principal name” permission for the NETWORK SERVICE account in order for the WSMAN service principal name to be used.

zWinRMPort

The port on which the Windows server is listening for WinRM or WS-Management connections. The default is 5985. It is uncommon for this to be configured as anything else.

zWinPerfmonInterval

The default interval in seconds at which Windows Perfmon datapoints will be collected. The default is 300 seconds or 5 minutes. It is also possible to override the collection interval for individual counters.

zWinKeyTabFilePath

This property is currently used and reserved for future use when keytab files are supported.

zDBInstances

This setting is only relevant when the zenoss.winrm.WinMSSQL modeler plugin is enabled. Multiple instances can be specified to monitor multiple SQL Server instances per server using different credentials. The default instance is MSSQLSERVER. Fill in the user and password to use SQL authentication. Leave the user and password blank to use Windows authentication. The default MSSQLSERVER credentials will be used for all instances not specified.

zWinRMEnvelopeSize

This property is used when the winrm configuration setting for MaxEnvelopeSizekb exceeds the default of 512k. Some WMI queries return large amounts of data and this envelope size may need to be enlarged. A possible symptom of this is seeing an xml parsing error during collection.

zWinRMLocale

The locale to use for communicating with a Windows server. The default is en-US. This property is reserved for future use.

Optional directory which contains one or more kerberos configuration files. This is useful when extra kerberos options are needed, such as disabling reverse dns lookup. See http://web.mit.edu/kerberos/krb5-devel/doc/admin/conf_files/krb5_conf.html for a description of includedir and krb5.conf options available. The directory must exist and contain only kerberos configuration files. If the directory contains non-kerberos configuration files, it will be ignored.

zWinRMKrb5DisableRDNS

Kerberos always performs a reverse lookup when obtaining a ticket to use the HTTP/HTTPS/WSMAN service principal. If there are multiple names by which servers are known in your organization, or if you do not want to use reverse lookups, set this value to True. Because this is a kerberos property, it can only be set one way or another. You cannot mix and match this value and only the top level value at /Server/Microsoft will be honored.

zWinRMClusterNodeClass

Path under which to create cluster nodes. If you need to add cluster nodes to a specific class under the /Server/Microsoft/Windows device class, specify it with this property. The default is /Server/Microsoft/Windows

Note: HyperV and MicrosoftWindows ZenPacks share krb5.conf file as well as tools for sending/receiving data. Therefore if either HyperV or Windows device has a correct zWinKDC setting, it will be used for another device as well.

Open Security > Logins, select the user you specified in zDBInstances property or the zWinRMUser property if using Windows Authentication.

Check user Properties > Status and make sure that the user is Enabled.

Check user Properties > Server Roles and make sure that the user has the public role.

If using an Administrator user, make sure it has the sysadmin role.

If not using an Administrator user, check user Properties > Securables and make sure the user has been granted View server state rights.

Support for Local and Failover Cluster SQL instances

This ZenPack adds support for both local and failover cluster SQL Server instances.
Local SQL Server instances can be modeled/monitored within windows devices (devices in Server/Microsoft/Windows device class).
SQL Server failover cluster instances can be modeled/monitored within cluster devices (devices in Server/Microsoft/Cluster device class).

Use the following steps to model/monitor SQL Server instances:

Create a device in Server/Microsoft/Windows device class if you intend to model local SQL instances, or in Server/Microsoft/Cluster device class if you intend to model failover cluster instances.

Optionally specify the instance names to be modeled in zDBInstances zProperty. Provide user names and passwords if SQL Server Authentication is to be used.

Enable zenoss.winrm.WinMSSQL modeler plugin.

Remodel device.

SQL Server Monitoring

The monitoring templates for SQL Server are component templates so there is no need to perform a bind. They will automatically be used to monitor databases, instances, and jobs.

Note: The default instance of MSSQLSERVER appears as the host name.

Note: The authenticated user will need to be granted permission to view the server state. For example, "GRANT VIEW SERVER STATE TO 'MYDOMAIN\zenoss_user'" or through the GUI in SQL Server Management Studio. The user must also be interactive, i.e. the account must not be denied local logon rights.

Working with WinCommand Notification Action

This ZenPack adds a new event notification action that can be used by the zenactiond daemon to allow an arbitrary command to be executed on the remote windows machine.

Use the following steps to set up a notification:

Select Events > Triggers from the Navigation Menu.

Create a trigger, selecting the rules that define it.

Select Notifications from the left panel. Add a new notification, enter a name for it and select WinCommand Action from the drop-down menu. Click Submit.

In the Edit Notification dialog on the Notification tab associate the trigger with the notification and optionally select the notification properties (Enabled, Send Clear, Send only on Initial Occurrence, Delay, Repeat).

On the Content tab of the notification specify the 'Windows CMD Command to run when configured triggers are matched. You may optionally specify Clear Windows CMD Command to run when the triggering event clears.

Basic Authentication (Windows default is Kerberos see note below for more information)

winrm s winrm/config/service/auth '@{Basic="true"}'

winrm s winrm/config/service '@{AllowUnencrypted="true"}'

Note: The above instructions use the max values for MaxConcurrentOperationsPerUser and WinRS MaxShellsPerUser. If you do not want to set these to the max, then a value of 50 should be adequate. The default is 5 on both, which will cause problems because Zenoss will open up concurrent requests for each WQL query and set of Perfmon counters.

Note: If you choose to use Basic authentication it is highly recommended that you also configure HTTPS. If you do not use the HTTPS protocol your user name and password will be sent over in clear text. If you have challenges setting up HTTPS on the Windows clients but require the user name and password to be encrypted, then using the Kerberos authentication is the best option. HTTPS is not required for Kerberos but is recommended. If you choose to use Kerberos authentication, then your payload will be encrypted.

Note: If you are using kerberos on EL6 and higher to connect to your Windows Server, your data will be encrypted over HTTP. For kerberos on EL5, encryption is not supported so you must set the winrm AllowUnencrypted option to true.

Note: If you choose to take the WinRM default configurations you must supply Kerberos authentication settings in the zProperties. The Kerberos authentication process requires a ticket granting server. In the Microsoft Active Directory environment the AD Server is also the KDC. The zWinKDC value must be set to the IP address of the AD Server and the collector must be able to sent TCP/IP packets to this server. Once this is set your zWinRMUserName must be a FQDN such as jsmith@Zenoss.com and the zWinRMPassword must be set correctly for this user account.

Note: In order to use a single domain user in a child domain or other trusted domain, set zWinKDC to the AD server of the user's domain. Then enter the trusted domain name and associated AD server in the zWinTrustedRealm and zWinTrustedKDC properties, respectively.

Note: The HTTPS setup must be completed on each client. At this time we do not have notes on automating this task but are currently in the process of testing several options. To successfully encrypt your payload between the Zenoss server and the Windows client you must install a Server Authentication certificate on the client machine. The process for requesting and installing the appropriate certificate can be found at the following URL.
http://blogs.technet.com/b/meamcs/archive/2012/02/25/how-to-force-winrm-to-listen-interfaces-over-https.aspx
Once the client has the correct certificate installed you only need to change the zWinScheme to HTTPS and zWinRMPort to 5986. If you are still having challenges setting up HTTPS on the client you can execute the following command on any AD server to verify the appropriate SPN record exists for Kerberos authentication.

c:\>setspn -l hostname1

If you do not see a record with HTTPS/ at the beginning of the hostname you can create the record, but this is not typically necessary as Windows will use the HOST/ record as the default for most built in services.

c:\>setspn -s HTTPS/hostname1.zenoss.com hostname1

Note: The IdleTimeout/Shell Timeout is the time, in milliseconds, to keep an idle remote shell alive on a Windows Server. It should be between 5-15 minutes.

Transitioning from WindowsMonitor

If you are installing this ZenPack on an existing Zenoss system or upgrading from an earlier Zenoss version you may have a ZenPack named ZenPacks.zenoss.WindowsMonitor already installed on your system. You can check this by navigating to Advanced -> ZenPacks.

This ZenPack functionally supersedes ZenPacks.zenoss.WindowsMonitor for Windows platforms that support WinRM, but does not automatically migrate monitoring of your Microsoft Windows resources when installed. The ZenPacks can coexist gracefully to allow you time to manually transition monitoring to the newer ZenPack with better capabilities.

Navigate to the Infrastructure page.

Expand the Server/Windows/WMI device class.

Single-click to select a Windows device.

Click the delete (-) button in the bottom-left.

Click OK to confirm deleting the Windows device.

Add the device back using the #Adding a Windows Device instructions above. Be sure to select the /Server/Microsoft/Windows device class and not the /Server/Windows/WMI device class.

Repeat steps 3-6 for each Windows device.

Note: It is also possible to drag and drop selected Windows devices from one class to another. You will need to remodel the devices after the move.

Limitations of Current Release

The current release is known to have the following limitations.

Support for team NICs is limited to Intel and Broadcom interfaces.

The custom widget for MSSQL Server credentials is not compatible with Zenoss 4.1.x, therefore the zDBInstances property in this version should be set as a valid JSON list (e.g. [{"instance": "MSSQLSERVER", "user": "", "passwd": ""}] ).

When upgrading to version 2.2.0, you may see a segmentation fault during the install. This occurs when upgrading from versions 2.1.3 and previous. To ensure a successful installation, run the install once more and restart Zenoss.

Payload encryption is not supported on EL5 systems. This is due to the fact that the default kerberos library on EL5 systems does not contain the necessary functionality.

Current functionality for monitoring Server 2003 has not been removed from the ZenPack, but no future development will be done for Server 2003.

Starting with version 2.6.0 of the ZenPack, existing Windows Service components are no longer compatible. These will be removed upon installation. Once the device is modeled with the Services plugin enabled, Windows Service components will be discovered. Any existing monitoring templates will still apply. Any services that were manually selected to be monitored will not. See the section on Configuring Service Monitoring.

The current release of this ZenPack uses the ZenPack SDK. Some component classes have changed from pre-2.6.x versions of the ZenPack. During installation, the ZenPack will create a job that will update the Windows Devices and Components class types used by the SDK. Depending on your Zenoss instance resources, this job could take a very long time to complete. If the job, ResetClassTypes, was not added during installation, it can be added manually using zendmd:

In [1]: from ZenPacks.zenoss.Microsoft.Windows.jobs import ResetClassTypes
In [2]: dmd.JobManager.addJob(ResetClassTypes)
In [3]: commit()

This is the last version of the Microsoft Windows ZenPack where we provide fixes for Windows 2008.

When removing a Windows device or the Microsoft.Windows ZenPack, you may see errors in the event.log. This is expected and is a known defect in ZenPackLib.

If upgrading from a version prior to 2.6.3 to 2.7.x, you may not be able to view your Windows services until the device is remodeled.

A current list of known issues related to this ZenPack can be found with this JIRA query. You must be logged into JIRA to run this query. If you don't already have a JIRA account, you can create one here.

Kerberos Tickets

The ZenPack will automatically generate a kerberos configuration file, krb5.conf, in the $ZENHOME/var/krb5/ directory. To use a custom configuration file, place it in the $ZENHOME/var/krb5/config/ directory. In Zenoss 5.x, this location is in a container so you will need to be certain to commit any changes made. Upgrading Zenoss will lose these changes, so you will need to update your container after upgrade. The file name can be anything that contains alphanumeric, dashes, and underscores.

To add a permanent location for you configuration file, you can make use of the zWinRMKrb5includedir property. This must be a location accessible from within a container and contain ONLY kerberos configuration file(s). If the location is invalid or contains files other than kerberos configuration files, it will be ignored and not added to the main krb5.conf file.

You can also supply multiple KDCs for a domain with the Windows ZenPack. This can be done using either a comma separated list in the zWinKDC property or supplying single KDCs for multiple devices or device classes under the /Server/Microsoft device class.

This list also supports a simple regex to add, remove, and specify an admin_server.

Adding a KDC

Use just the address or append a + to the beginning of the address to add a new kdc.

Removing a KDC

Append a - to the beginning of the KDC address to remove an existing KDC from the krb5.conf file. This can be used if a KDC is no longer in service or if the wrong address was entered previously. This can be removed from zWinKDC once a ticket granting ticket for the user has been obtained and the krb5.conf file is correct.

Specifying an admin_server

Optionally, use an asterisk, *, to denote the admin_server. If none is provided, the first kdc in the list will be used. The admin_server is used for any admin work, such as changing a password through kinit.

For example, set zWinKDC to *10.10.10.10,10.10.10.20,+10.10.10.30,-10.10.10.40 for specifying a comma separated list. 10.10.10.10 will be a kdc and admin_server, 10.10.10.20 and 10.10.10.30 will be added as kdcs, and 10.10.10.40 is no longer a valid kdc address and will be removed.

Note: Removing one or more errant KDCs from the system can be a time consuming process, so we recommend double-checking that the addresses are valid when entering them into the zWinKDC property.

Service Impact

When combined with the Zenoss Service Dynamics product, this ZenPack adds built-in service impact capability for services running on Microsoft Windows. The following service impact relationships are automatically added. These will be included in any services that contain one or more of the explicitly mentioned entities.

Troubleshooting

Please refer the Zenoss Service Dynamics documentation if you run into any of the following problems:

ZenPack will not install

Adding a device fails

Don't understand how to add a device

Don't understand how to model a device

If you cannot find the answer in the documentation, then Resource Manager (Service Dynamics) users should contact Zenoss Customer Support. Core users can use the #zenoss IRC channel or the community.zenoss.org forums (there is a forum specific to Windows monitoring).

Troubleshooting Windows

If you see 100% CPU usage on a domain controller and your forest functional level is Windows 2003 or Windows 2008, you could be missing the WinRMRemoteWMIUsers__ security group. Adding this group to your domain should fix this problem. It is a known error from Microsoft, https://support.microsoft.com/en-us/kb/3118385.

Troubleshooting Kerberos Error Messages

Cannot determine realm for numeric host address

If you enter an IP address for the device id, make sure that the address is resolvable to a name. Common solutions to this is to use the zWinRMServerName property.

Server not found in Kerberos database

More often than not, this error indicates a DNS issue in which the domain controller is unable to locate the specified server by either IP address or name. The best solution varies over different domains and it is left to the user to decide which is best for their environment.

One solution is to disable reverse DNS lookups for kerberos. This can be achieved by setting the zWinRMDisableRDNS property to True. If you use this option, you *MUST* only set it in at the /Server/Microsoft device class level.

You should also ensure that the correct name is returned for lookups.

Preauthentication failed while getting initial credentials.

This typically indicates a bad or expired password.

Realm not local to KDC while getting initial credentials

This indicates that one or more of the defined KDCs for a domain are incorrect. Add a - to the beginning of the errant KDC address to the beginning of the incorrect address in the zWinKDC property to remove it from the list of KDCs for a domain.

Message stream modified

This indicates that Windows was unable to decrypt the kerberos encrypted payload. This will typically occur if the HTTP and/or HTTPS service principal is dedicated to a specific service account. For example, many IIS servers will do this. To fix this, set the zWinUseWsmanSPN property.

Troubleshooting Kerberos Authentication with Wireshark

There are many reasons for kerberos authentication not to work, and a lot of them result in the following unhelpful error message.

kerberos authGSSClientStep failed (None)

While Zenoss is unable to extract a useful error message when this occurs, it turns out that Wireshark can get useful errors by looking at the kerberos packets sent between Zenoss, your domain controller (zWinKDC) and the monitored Windows server. Let's walk through an example of using Wireshark to resolve an authGSSClientStep failed error.

First install Wireshark on your system. It's GUI is easier to use than the command line equivalent.

Next you will need to create a packet capture file on your Zenoss server. Assuming the Windows server you're trying to monitor is 192.0.2.101 and the domain controller (zWinKDC) is 203.0.113.10, you would run the following command as the root user on your Zenoss server.

This will start capturing all packets to or from those two IP addresses. It will continue to capture these packets until you type CTRL-C.

Now you should attempt to remodel the Windows server where you're encountering the error. Once it completes, and fails, again you should go back to the terminal where tcpdump is running and type CTRL-C. You will now have a kerberdebug.pcap file in the directory where you ran the command.

Copy kerberdebug.pcap to your system where you installed Wireshark. Start Wireshark and open kerberdebug.pcap. You should see something like the following.

You'll see that there's a KRB5KRB_AP_ERR_SKEW error. Searching for this specific error code will quickly show that it occurs when the kerberos client and server don't have their time's synchronized. There's a tolerance for some difference, but in this case it was a big difference due to misconfiguration.

There are some kerberos errors you'll see in the packets that a completely normal part of negotiation and won't lead to any problems. You should ignore the following errors shown in Wireshark:

KRB5KRB_API_ERR_TKT_EXPIRED: Zenoss will subsequently request a new ticket when this occurs.

KRB5KRB_ERR_PREAUTH_REQUIRED: This is a normal part of kerberos negotiation.

KRB5KRB_ERR_RESPONSE_TOO_BIG: Most requests won't fit in UDP. Zenoss will automatically switch to TCP.

You'll also see other kerberos messages that are normal. You should ignore these kerberos messages shown by Wireshark:

TGS-REQ

AS-REQ

The following are the most common errors:

KRB5KRB_AP_ERR_SKEW: As shown in the above example. A clock synchronization issue.

KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN: This can happen if zWinRMServerName resolves to the server's IP address, but is not the name the server is known by in Active Directory. This will also be the error if you don't enter a zWinRMServerName and the reverse resolution of the device's manage IP address resolves to a name that doesn't match the server's name in Active Directory. Typical solutions to this are to add the name to the /etc/hosts file or to directly use the IP address of the server.

Troubleshooting on Resource Manager 4.1.1

In some cases updating the Microsoft Windows ZenPack on Zenoss Resource Manager 4.1.1 may result in the zenhub daemon not starting. The error message will contain AttributeError: zDBInstancesPassword. If you encounter this issue, install the ZenPack again.

If there are existing SQL server instances being monitored, make sure to reconfigure zDBInstances property since the zDBInstancesPassword property no longer exists.

Troubleshooting Services

If you see an event error that shows "The maximum number of concurrent operations for this user has been exceeded", you will need to increase the number of concurrent operations per user in the winrm config.
For example:

If you see an "Index out of range" error, this could indicate a low number of available file handles in Linux. The default is 1024. To view this information on your system, enter 'ulimit -n'. To increase this limit, edit your /etc/sysctl.conf file and set fs.file-max to a sufficiently large number.
For example:

vi /etc/sysctl.conf

fs.file-max=10000

Troubleshooting monitoring

The first step in troubleshooting any monitoring issues is to scan the zenpython log for errors.

While monitoring, possible network connectivity issues may occur while trying to complete the Get-Counter command.
If you experience OperationTimeout errors, it may be a solution to decrease value of zWinPerfmonInterval property to 30 seconds.

Other timeout issues on a domain could involve having a large Kerberos token. This could be caused by the user belonging to a large number of groups. See https://support.microsoft.com/en-us/kb/970875 for more information on the cause and resolution. Possible side effects of a large token include high CPU usage on the Windows server.

If you see a corrupt counters error event, this indicates that the specified counters have been corrupted on the Windows device. No data will be collected for the specified counters until the counters have been repaired on the device and zenpython has been restarted.

If you see the following error, check the zenhub log for errors:

Configuration for <device> unavailable -- is that the correct name?

If you see an event stating that a plugin was disabled due to blocking, see the PythonCollector ZenPack documentation for steps to remedy this.

Note: Starting with v2.7.0, this should not occur for Windows data source plugins.

Troubleshooting modeling/monitoring

Version 2.6.0 introduces a command line option to save modeling/monitoring results for troubleshooting. This option will save the results returned from a Windows server from a modeler or datasource plugin. This data can then be viewed/tested using unit tests to determine issues.