Graphite and
Grafana

Introduction

CPS system and
application statistics and Key Performance Indicators (KPI) are collected by
the system and can be displayed using a browser-based graphical metrics tool.
This chapter provides a high level overview of the tools CPS uses to collect
and display these statistics.

The list of statistics
available in CPS is consolidated in an Excel spreadsheet. After CPS is
installed, this spreadsheet can be found in the following location on the
Cluster Manager VM:

Graphite

Collected clients
running on all CPS Virtual Machines (such as Policy Server (QNS), Policy
Director (LB), and sessionmgr) push data to the Collected master on the
pcrfclient01. The Collected master node in turn forwards the collected data to
the Graphite database on the pcrfclient01.

The Graphite database
stores system related statistics such as CPU usage, memory usage, and Ethernet
interface statistics, as well as application message counters such as Gx, Gy,
and Sp.

Figure 1. Graphite

Pcrfclient01 and
pcrfclient02 collect and store these bulk statistics independently.

As a best practice,
always use the bulk statistics collected from pcrfclient01. Pcrfclient02 can be
used as a backup in the event of failure of pcrfclient01.

In the event that
pcrfclient01 becomes unavailable, statistics will still be gathered on
pcrfclient02. Statistics data is not synchronized between pcrfclient01 and
pcrfclient02, so a gap would exists in the collected statistics while
pcrfclient01 is down.

Note

It is normal to have
slight differences between the data on pcrfclient01 and pcrfclient02. For
example, pcrfclient01 will generate a file at time t and pcrfclient02 will
generate a file at time t +/- clock drift between the two machines.

Additional Grafana
Documentation

This chapter provides
information about the CPS implementation of Grafana. For more information about
Grafana, or access the general Grafana documentation, refer to:
http:/​/​docs.grafana.org.

Configure Grafana
Users using CLI

In CPS 7.0.5 and higher releases, users must be authenticated to access Grafana. No default users are provided. In order to access Grafana, you must add at least one user as described in the following sections.

The steps mentioned in the sections describe how to add and delete users who are allowed view-only access of Grafana. In order to create or modify dashboards, refer to Grafana Administrative User.

After adding or deleting a Grafana user, manually copy the /var/broadhop/.htpasswd file from the pcrfclient01 VM to the pcrfclient02 VM.

Also, run /var/qps/bin/support/grafana_sync.sh to synchronize the information between two OAM (pcrfclient) VMs.

There is no method to change the password for a Grafana user; you can only add and delete users. The change_passwd.sh script cannot be used to change the password for Grafana users.

Log on to the pcrfclient01 VM to perform any of the following operations.

Change the Role of
Grafana User

The steps
mentioned here can be performed only by
administrative user.

Click
Main
Org. drop-down list to select
Users. This will open
Organization users pane, where you can change the
role of a user from
Role drop-down list.

The user can
have Admin/Viewer/Editor/Read Only Editor roles.

Admin: An admin
user can view, update and create dashboards. Also the admin can edit and add
data sources and organization users.

Viewer: A viewer
can only view dashboards, not save or create them.

Editor: An
editor can view, update and create dashboards.

Read Only
Editor: This role behaves just like the Viewer role. The only
difference is that you can edit graphs and queries but not save dashboards. The
Viewer role has been modified in Grafana 2.1 so that users assigned this role
can no longer edit panels.

Add an
Organization

Grafana supports multiple organizations in order to support a wide
variety of deployment models, including using a single Grafana instance to
provide service to multiple potentially untrusted Organizations.

In many cases, Grafana will be deployed with a single Organization.
Each Organization can have one or more Data Sources. All Dashboards are owned
by a particular Organization.

Note

The steps mentioned here can be performed only by
administrative user.

Step 1

Click
Main Org. drop-down list to select
New Organization.
Figure 8. New Organization

Step 2

This will open a new pane
Add Organization. Enter organization name in
Org. name field. For example, test.

Step 3

After adding the name, click
Create to open
Organization pane.
Figure 9. Organization

In this pane, you can modify the organization name and other
organization information. After modifying the information, click
Update to update the information.

Move Grafana User
to another Organization

Note

The steps mentioned here can be performed only by
administrative user.

Step 1

Click
Grafana admin from the main page to
System Info page.

Step 2

Click
Global Users from the left pane to open
Users pane on the right.

Step 3

Click
Edit against the user for whom you want to
make the changes.

Step 4

Under
Organizations section, you can add the user to
some other organizations.
Figure 10. Move User to another Organization

Step 5

In
Add organization field, you need to enter the name
of the new organization.

Step 6

You can also change the role of the user from the
Role drop-down list.

Step 7

After adding the required information, click
Add to add the user into a new organization.

Step 8

In the above example, you can see that the user is added to the
new organization. If you want to remove the user from pervious organization,
click the
red cross at the end.

Configure Grafana
for First Use

After an initial installation or
after upgrading an existing CPS deployment which used Grafana, you must perform
the steps in the following sections to validate the existing data sources.

Manual Dashboard
Configuration using Grafana

Grafana enables you to
create custom dashboards which provide graphical representations of data by
fetching information from the Graphite database. Each dashboard is made up of
panels spread across the screen in rows.

Note

CPS includes a
series of preconfigured dashboard templates. To use these dashboards, refer to
Updating Imported Templates.

Create a New
Dashboard Manually

Click
Home at the top of the Grafana window and select
New as shown below:

A blank
dashboard is created.

Step 3

At the top of
the screen, click the gear icon, then click
Settings.

Step 4

Provide a name
for the dashboard and configure any other Dashboard settings. When you have
finished, click the
X icon in the upper right corner to close the
setting screen.

Step 5

To add a graph
to this dashboard, hover over the green box on the left side of the dashboard,
then point to
Add
Panel, then click
Graph.

Configure Data
Points for the Panel

Step 1

Click on the panel title, as
shown below, then select
Edit.

Step 2

Select the necessary metrics by
clicking on the select metric option provided in the query window. A drop-down
list appears from which you can choose the required metrics.

Select metrics by clicking
select metric repeatedly until the lowest level of the hierarchy.

Note

Clicking the ‘*’ option in the drop-down list selects all the
available metrics.

Step 3

Click the ‘+’ tab to add
aggregation functions for the selected metrics. the monitoring graph is
displayed as shown below.

Step 4

The x-axis and y-axis values can be configured in the
Axes & Grid tab.

Step 5

Click the disk icon (Save dashboard) at the top of the screen, as
shown in the following image.

Note

our changes to this dashboard will be lost if you do not click
the
Save icon.

Graphical representation of application-messages such as - CCR,
CCA, Gx, Gy, Ldap, Rx messages and so on, can be configured in the dashboard
panel by using the queries shown in the below figure.

Configure Useful
Dashboard Panels

The following section
describes the configuration of several useful dashboard panels that can be used
while processing Application Messages. Configure the dashboard panel as shown
in the screens below.

Total
Error:

This dashboard panel
lists the errors found during the processing of Application Messages. To
configure Total Error dashboard panel, create a panel with name 'Total Error'
and configure its query as shown:

Total
Delay:

This dashboard panel
displays the total delay in processing various Application Messages. To
configure Total Delay dashboard panel, create a panel with name Total Delay and
configure its query as shown:

Total TPS:

This panel displays
the total TPS of CPS system. Total TPS count includes all Gx, Gy, Rx, Sy, Ldap
and so on. The panel can be configured as shown below:

Updating Imported
Templates

Some of the
preconfigured templates (such as diameter statistics panels) have matrices
configured which are specific to a particular set of diameter realms. These
panels need to be reconfigured to match customer specific diameter realms.

For example, the Gx
PGW panel in the Diameter Statistics dashboard does not fetch the stats and
displays the message “No Datapoints”. The probable reasons could be:

Matrices used in
query uses matrices specific to particular diameter realm which is different on
customer setup.

No application
call of such type has ever landed on CPS Policy Directors (LBs) (no diameter
call from PGW has ever landed on Policy Director after the Graphite-Grafana
setup).

Copy Dashboards and
Users to pcrfclient02

As a best practice, the internal Grafana database should be kept in
sync between pcrfclient01 and pcrfclient02. This sync operation should be
performed after any dashboard or Grafana user is migrated, updated, added or
removed.

Under normal operating conditions, all Grafana operations occur from
pcrfclient01. In the event of a pcrfclient01 failure, pcrfclient02 is used as
backup, so keeping the database in sync provides a seamless user experience
during a failover.

The following steps copy all configured Grafana dashboards, Grafana
data sources, and Grafana users configured on pcrfclient01 to pcrfclient02.

Log in to the pcrfclient01 VM and run the following command:

/var/qps/bin/support/grafana_sync.sh

As a precaution, the existing database on pcrfclient02 is saved as a
backup in the
/var/lib/grafana directory.

Configure Garbage
Collector KPIs

The following sections
describe the steps to configure Garbage Collector (GC) KPIs in Grafana:

Backend changes:
Changes in the collectd configuration so that GC related KPIs will be collected
by collectd and stored in graphite database.

where
<hostname> is regular expression for the name of
hosts from which KPI needs to be reported.

If this is a CPS
All in One (AIO) deployment, the host-name is “lab”.

If this is a
High Availability (HA) CPS deployment, KPIs need to be reported from all Policy
Server (QNS) VMs.

Assuming the
Policy Server (QNS) VMs have “qns” in their hostname, then a regular expression
would be *qns*. This would report data for all VMs that have a hostname
containing “qns” (qns01 qns02 etc.).

AIO Setup

Figure 11. On AIO
Setup

HA Setup

Figure 12. On HA
Setup

An example
statistics graph is shown below.

Figure 13. Example Graph

Step 3

Save the
dashboard by clicking on Save icon.

Export and Import
Dashboards

Existing dashboard templates can be exported and imported between
environments. This is useful for sharing Grafana dashboards with others.

Session Consumption
Report

Introduction

This feature generates the session
consumption report and stores the data into a separate log. The total number of
sessions limited by the license, the total number of active sessions, and total
transactions per second are documented at regular time intervals into the log.
The core license number is derived from the license file that has the total
number of sessions limited by the license. The active session count and the
transaction count has been taken from Grafana using the graphite query. A
single entity of the feature mainly prints the current timestamp with the
statistics values.

Data
Collection

The session and TPS count is
collected from the graphite API with a JSON response. The JSON response is then
parsed to get the counter, which is then logged into the consolidated log. The
sample URL and the JSON response are given below:

Logging

Data logging is done using the
logback mechanism. The consolidated data that is generated is stored in a
separate log file named
consolidated-sessions.log inside the
/var/log/broadhop directory along with other logs.
The data entries are appended to the log every 90 seconds. The logs generated
are detailed and have the counter name and the current value with the
timestamp.

Performance

The codebase pulls
the JSON response from the Graphite API. The overhead by the codebase adds an
average of 350 ms of time.

Log Rotation

A log rotation policy is applied on
the logs generated for the session Consumption Report. The file size limitation
for each log file is 100 MB. The limitation on number of log files is 5. The
logs get rotated after reaching the limitations. One file contains a little
more than two years of data, so five such files can contain 10 years of data
until the first file get replaced.