Runtime Configuration

As the name implies, these statistics can help with verifying the runtime configuration of
a GemFire system:

The number of peer nodes (i.e. servers or peer accessors) in the system:
DistributionStats:nodes. This value should be the same for every node in the
system.

The number of clients and client connections for each server:
CacheServerStats: currentClients, and currentClientConnections

The number of data entries:

CachePerfStats:entries. Each region has its own CachePerfStats
instance per JVM named RegionStats-<region name>,
orRegionStats-partition-<region name> for partitioned regions. Its
entries statistic is the number of entries for that region in the JVM.

DiskRegionStatistics (a per region disk statistic category about the region’s
disk use): entriesInVM, andentriesOnlyOnDisk show the number of entries
in the JVM (which can also be on disk too), and the number of entries that are only on
disk, respectively.

Partitioned Region Configuration: One of the main parameters of Partitioned
Region (PR) configuration is the primary bucket distribution. To make sure that primary
buckets for a PR are evenly distributed, check
thePartitionedRegionStats.primaryBucketCount statistic for each partition. This
statistic shows the number of primary buckets in a partition.

Resources

The resources that are vital for normal operation and
performance are memory, file descriptors (most importantly sockets, then files), CPU,
network, and disk (when disk operations, such as overflow and persistence, are involved).
The following stats cover all those:

Memory: There are several stats categories that show memory usage, for different
types and granularity of memory.

Heap: VMMemoryUsageStats:vmHeapMemoryStats are all about heap usage,
as are the memory stats under VMStats:vmStats: freeMemory,
totalMemory, maxMemory.

Non-heap memory: VMMemoryUsageStats:vmNonHeapMemoryStats.

System-wide memory stats as reported by the OS: The OS statistic category
(e.g. LinuxSystemStats on Linux) includes various system level memory
statistics, such as freeMemory, which shows the free memory on the host (as
opposed to related to the JVM process), physicalMemory (total physical memory
on the host), paging related statistics (pagesSwappedIn,
pagesSwappedOut, unallocatedSwap).

Client and gateway queue sizes: while not actual resources, these queues may
be responsible for increased memory usage, so it’s good to keep them in mind when
investigating memory issues. The client queue stats are in
ClientSubscriptionStats category: eventsQueued, and
eventsRemoved. The difference between the two is the current queue size. The
gateway queue stats are in GatewaySenderStatistics category:
eventQueueSize is the size of the queue.

File Descriptors: file descriptor related statistics are captured in the category
VMStats: fdsOpen and fdLimit show the number of open file descriptors, and
the limit on file descriptors for the host, respectively.

CPU: The CPU usage is captured in OS statistic category, e.g.
LinuxSystemStats. The statistic cpuActive shows the percentage of the
total available CPU time that has been used in a non-idle state.

System load: OS statistic category (e.g. LinuxSystemStats) includes the
loadAverage1, loadAverage5, loadAverage15 statistics, which show
the average system load for 1, 5, and 15 minutes.

Disk: DiskDirStatistics:diskSpace shows the amount of disk space used for GemFire
disk storage on a given disk. Above mentioned entriesOnlyOnDisk, and
entriesInVM from DiskRegionStatistics are useful for determining the
distribution of data between memory and disk, for regions that use disk
overflow/persistence.

The following chart is an example of examining the vmHeapMemoryStats in
relation to entriesInVM statstic.

Throughput for Different Operations

There are several stat categories that capture the throughput for gemfire operations:
CachePerfStats (non-PR, and PR specific), andCacheServerStats, which capture
throughput statistics with respect to clients. Note that the PR specific instances of
CachePerfStats cover only the specific partitioned regions, while the
CachePerfStats instance includes aggregate stats for all non-PR regions.

CachePerfStats category includes the following stats (all measured in the number
of operations per second):

gets: the number of successful gets

puts: the number of times an entry has been added or replaced as a result of
a local operation (put, create, or get which results in a load, netsearch, or netload
of a value)

updates: the number of updates originating remotely

putalls: the number of putAll operations

destroys: the number of destroys

Function execution: FunctionService

Queries: queryExecutions: the number of query executions

Transactions: txCommits, txFailures, txRollbacks: the number of
successful, failed, and rolled back transactions, respectively

CacheServerStats category includes the following throughput stats for client
operations on the cache server:

getRequests, getResponses

getAllRequests, getAllResponses

putRequests, putResponses

putAllReuqests, putAllResponses

queryRequests, queryResponses

Disk operations: If any disk related statistic categories are present in VSD, it
means that there is disk activity (some entries are on disk). Presence of disk operations
may explain a drop in throughput, as disk use slows things down.

DiskStoreStatistics are statistics about a specific disk store’s use of disk.
In addition to write/read as those in DiskRegionStatistics, this category
includes queueSize statistic, which shows the current number of entries in the
asynchronous queue waiting to be flushed to disk.