Strategies for scaling the world’s biggest networks

By Joe Kim

Aug 25, 2016

It can be truly astounding to think about the scale of today’s largest government networks – and they’re only going to get larger and more complex.

As a public sector IT pro, it may seem like an impossible challenge to manage this growing behemoth -- an ever-increasing number of network devices, servers and applications, all of which have less and less leeway for downtime, hiccups or problems of any sort.

Challenge? Yes. Impossible? No. There are a range of strategies that government IT pros can employ to support network growth and scalability while ensuring all architectural and infrastructural requirements are met and system failover scenarios are accounted for.

As the IT environment expands, it becomes more important that its monitoring and management system can scale alongside this growth. But, how can IT managers know if their current system can scale to handle agency’s growth?

First, most monitoring systems are built with the following elements, each of which has its own requirements and challenges to scale:

A server that hosts the monitoring product and polls for status and performance.

A database where the polled information is stored for historical data access and reporting.

A web console for software management, data visualization and reporting.

Within this environment, three primary variables will affect a system’s scalability:

Infrastructure size: The number of monitored elements (where an element is defined as a single, identifiable node, interface or volume) or the number of servers and applications that can be monitored.

Polling frequency: The interval at which the monitoring system polls for information. For example, statistics collected every few seconds instead of every minute, will make the system work harder and requirements will increase.

Number of simultaneous users accessing the monitoring system.

Those are the basics of understanding the feasibility of scalability. Now, let’s move on to ways to manage that environment.

As the infrastructure grows, a command center for monitoring enterprisewide network health is critical. A single interactive screen that aggregates data from multiple locations simplifies the management of large, distributed networks by providing a unified view into the performance of the network and accelerating issue identification and resolution.

A command center is particularly well suited to agencies with multiple regions or sites where the quantity of nodes to be monitored in each region would warrant both localized data collection and storage. It works well for regional teams that are responsible for their own environments and need autonomy over their monitoring platform. Yet while the systems are segregated between regions, all data can still be accessed from the centrally located console.

Additional scalability tips

There are several additional strategies that will help manage an agency’s growing infrastructure:

Add polling engines: Distributing the polling load for the monitoring system among multiple servers will provide scalability for large networks. Additional polling engines will also reduce the impact on the monitoring system’s core poller performance if the network grows rapidly.

Add web servers: Additional web servers can help support increasing numbers of concurrent monitoring sessions. This will ensure that more users have uninterrupted web access to network monitoring software.

Add a failover server: To ensure the monitoring system is always available, install a failover mechanism that will switch monitoring system operation to a secondary server if the primary server should fail.

Agency networks will certainly get larger -- it’s the nature of an increasingly technically driven government. While it may seem overwhelming, implementing these few tactics will help IT managers embrace the growth and ultimately realize its value.

About the Author

Joe Kim is executive vice president engineering and global CTO at SolarWinds.