The ECS basic metric data may be inconsistent with the operating system (OS) metric data mainly because of:

Different statistical frequencies Metric chart data has the average values collected during measurement periods. The statistical frequency of basic monitoring is one minute, whereas that of OS monitoring is 15 seconds. In case of large metric data fluctuations, basic metric data is smaller than OS metric data because the former data is de-peaked.

Different statistical perspectives The network traffic billing data in basic monitoring does not include the unbilled network traffic between ECS and Server Load Balancer. Whereas, the network traffic statistics in OS monitoring records the actual network traffic of each network adapter. Therefore, the network data in OS monitoring is greater than that in basic monitoring (that is, the agent-collected data is greater than the actual purchased bandwidth or traffic quota).

Agent-collected metrics

CPU metrics

You can refer to the Linux top command to understand the meaning of the metrics.

Metric

Definition

Unit

remark

Host.cpu.idle

Percentage of currently idle CPUs

%

Percentage of the current CPU is idle

Host.cpu.system

Percentage of the current kernel space used as CPU

%

This metric measures the consumption resulting from system context switchover. A great value indicates that many processes or threads are running on the server.

Host.cpu.user

This metric measures the CPU consumption of user processes.

%

CPU consumption by user processes

Host. CPU. iowaiit

Percentage of CPUs currently waiting for Io operation

%

This is a relatively high value, which means that there are frequent Io operations.

Host.cpu.other

Other CPU usage percentage

%

Other consumption, calculated in the form of (Nice + sofpratt q + IRQ + stolen) Consumption

Host.cpu.totalUsed

Percentage of total CPU currently consumed

%

The sum of the CPU consumption above, usually used for alarm purposes.

Memory related monitors

You can refer to the free command to understand the meaning of the indicators.

Metrics

Definition

Unit

Description

Host.mem.total

Total memory

Bytes

Total Server Memory

Host.mem.used

Amount of used memory

Bytes

Memory Used by the user program + buffers + Cache, the amount of memory used for the buffer, and the amount of memory used for the system cache used by the cache

You can refer to the Linux TOP command to understand what the metrics mean. The higher the value of the monitoring item indicates that the more busy the system is.

Metrics

Definition

Unit

Host.load1

Average system load over the past 1 minute, Windows operating system does not have this metric

None

Host. load5

Average system load over the past 5 minutes, Windows operating system does not have this metric

None

Host. load15

Average system load over the past 15 minutes, Windows operating system does not have this metric

None

Disk related metrics

Disk usage and inode usage refer to the Linux DF command.

Disk read/write metrics can refer to the Linux iostat command.

Metric

Definition

Unit

Host.diskusage.used

Used storage space on disk

Bytes

Host.disk.utilization

Disk usage

%

Host.diskusage.free

Remaining storage space on disk

Bytes

Host.diskussage.total

Total disk storage

Bytes

Host.disk.readbytes

The number of bytes read per second by the disk.

Bytes/s

Host.disk.writebytes

Number of bytes written per second on disk

Bytes/s

Host.disk.readiops

Number of read requests per second on disk

Times/second

Host.disk.writeiops

Number of write requests per second on disk

Times/second

File System Monitor

Metrics

Definition

Unit

Description:

Host.fs.inode

Inode usage, the Unix/Linux system uses inode numbers to identify files, and the disks are not fully stocked, however, when inode has been assigned, it will not be able to create a new file on disk, windows operating system does not have this metric.

%

Inode number represents the number of file system files, and a large number of small files can cause too high inode usage.

Network related metrics

You can refer to the Linux iftop command For a collection of TCP connections, refer to the Linux SS Command.

The number of TCP connections is collected by default By default, statistics are collected on the number of TCP connections by TCP_TOTAL (total connections), ESTABLISHED (normally established connections), and NON_ESTABLISHED (connections not in the established state). If you want to obtain the number of connections in each state, follow the subsequent procedure:

Linux

Set netstat.tcp.disable in the cloudmonitor/config/conf.propertiesconfiguration file to false to enable data collection. Restart the Agent once you modify the configuration. Restart the Agent once you modify the configuration.

Windows

Set netstat.tcp.disable in the C:\”Program\Alibaba\cloudmonitor\configconfiguration file to falseto enable data collection. Restart the Agent once you modify the configuration.

Metric

Definition

Unit

Host.netin.rate

Number of bits received by the network adapter per second, that is, the uplink bandwidth of the network adapter.

bits/s

Host.netout.rate

Number of bits sent by the network adapter per second, that is, the downlink bandwidth of the network adapter.

bits/s

Host.netin.packages

Number of packets received by the network adapter per second.

packets/s

Host.netout.packages

Number of incoming error packets detected by the drive.

packets/s

Host.netin.errorpackage

Number of outgoing error packets detected by the drive.

packets/s

Host.netout.errorpackages

Number of outgoing error packets detected by the drive.

packets/s

Host.tcpconnection

Number of TCP connections in various states, including LISTEN, SYN_SENT, ESTABLISHED, SYN_RECV, FIN_WAIT1, CLOSE_WAIT, FIN_WAIT2, LAST_ACK, TIME_WAIT, CLOSING, and CLOSED.