Resolving disk I/O bottlenecks

With the high-speed disks available today, a system’s hard
disks are rarely the primary reason for a bottleneck. It is more
likely that a system is having to do a lot of disk reads and writes
because there isn’t enough physical memory available and the system
has to page to disk. Because reading from and writing to disk is
much slower than reading and writing memory, excessive paging can
degrade the server’s overall performance. To reduce the amount of
disk activity, you want the system to manage memory as
efficiently as possible and page to disk only when necessary.

That said, you can do several things with a system’s hard
disks to improve performance. If the system has faster drives than the
ones used for the paging file, you might consider moving the paging
file to those disks. If the system has one or more drives that are
doing most of the work and other drives that are mostly idle, you
might be able to improve performance by balancing the load across
the drives more efficiently.

To help you better gauge disk I/O activity, use the following counters:

PhysicalDisk\%Disk
Time Records the percentage of time the physical disk
is busy. Track this value for all hard disk drives on the system in conjunction with
Processor\%Processor Time and Network Interface Connection\Bytes
Total/Sec. If the %Disk Time value is high and the processor and
network connection values aren’t high, the system’s hard disk
drives might be creating a bottleneck. You might be able to
improve performance by balancing the load across the drives more
efficiently or by adding drives and configuring the system so
that they are used.

Note

Redundant array of independent disks (RAID) devices can
cause the PhysicalDisk\%Disk Time value to exceed 100 percent.
For this reason, don’t rely on PhysicalDisk\%Disk Time for
RAID devices. Instead, use PhysicalDisk\Current Disk Queue Length.

PhysicalDisk\Current Disk Queue
Length Records the number of system requests that are
waiting for disk access. A high value indicates that the disk
waits are affecting system performance. In general, you want
there to be very few waiting requests.

Note

Physical disk queue lengths are relative to the number
of physical disks on the system and proportional to the length
of the queue minus the number of drives. For example, if a
system has two drives and there are 6 waiting requests, that
could be considered a proportionally large number of queued
requests; but if a system has eight drives and there are 10
waiting requests, that is considered a proportionally small
number of queued requests.

PhysicalDisk\Avg. Disk Write Queue
Length Records the number of write requests that are
waiting to be processed.

PhysicalDisk\Avg. Disk Read Queue
Length Records the number of read requests that are
waiting to be processed.

PhysicalDisk\Disk
Writes/Sec Records the number of disk writes per second. It
is an indicator of how much disk I/O activity there is. By tracking the number of
writes per second and the size of the write queue, you can
determine how write operations are affecting disk performance.
If lots of write operations are queuing and you are using RAID
5, it could be an indicator that you would get better
performance by using RAID 1. Remember that by using RAID 5 you
typically get better read performance than with RAID 1. So,
there’s a tradeoff to be made by using either RAID
configuration.

PhysicalDisk\Disk
Reads/Sec Records the number of disk reads per second. It
is an indicator of how much disk I/O activity there is. By tracking the number
of reads per second and the size of the read queue, you can
determine how read operations are affecting disk performance. If
lots of read operations are queuing and you are using RAID 1, it
could be an indicator that you would get better performance by
using RAID 5. Remember that by using RAID 1 you typically get
better write performance than RAID 5. So, as mentioned, there’s
a tradeoff to be made by using either RAID configuration.

Resolving network bottlenecks

The network that connects your computers is critically
important. Its responsiveness, or lack thereof, weighs heavily on the
way users perceive the responsiveness of their computers and any
computers to which they connect. It doesn’t matter how fast their
computers are or how fast your servers are. If there’s a big delay
(and big network delays are measured in tens of milliseconds)
between when a request is made and the time it’s received, users
might think systems are slow or nonresponsive.

Unfortunately, in most cases, the delay (latency) users experience is beyond your control. It’s
a function of the type of connection the user has and the route the
request takes to your server. The total capacity of your server to handle requests and the
amount of bandwidth available to your servers are factors you can
control, however. Network capacity is a function of the network
cards and interfaces configured on the servers. Network bandwidth availability is a function of your
organization’s network infrastructure and how much traffic is on it
when a request is made.

Counters you can use to check network activity and look for bottlenecks include the following:

Network Interface\Bytes
Total/Sec Records the rate at which bytes are sent and
received over a network adapter. Track this value separately for
each network adapter configured on the system. If the Bytes
Total/Sec for a particular adapter is substantially slower than
what you’d expect given the speed of the network and the speed
of the network card, you might want to check the network card
configuration. Check to see whether the link speed is set for
half duplex or full duplex. In most cases, you’ll want to use
full duplex.

Network Interface\Current
Bandwidth Estimates the current bandwidth for the selected
network adapter in bits per second. Track this value separately
for each network adapter configured on the system. Most servers
use 100-Mbps, 1-Gbps, or 10-Gbps network cards, which can be
configured in many ways. Someone might have configured a 1-Gbps
card for 100 megabits per second (Mbps). If that is the case,
the current bandwidth might be off by a factor of 10.

Network Interface\Bytes
Received/Sec Records the rate at which bytes are received over
a network adapter. Track this value separately for each network
adapter configured on the system.

Network Interface\Bytes
Sent/Sec Records the rate at which bytes are sent over a
network adapter. Track this value separately for each network
adapter configured on the system.

TROUBLESHOOTING: Compare network activity to disk time and processor
time

Compare these values in conjunction with PhysicalDisk\%Disk
Time and Processor\%Processor Time. If the disk time and processor
time values are low but the network values are very high, a
capacity problem might exist. Solve the problem by optimizing the
network card settings or by adding an additional network
card.

You might be able to improve network performance by installing
multiple network adapters and teaming the network
cards. You configure NIC teaming using Server Manager by selecting Local
Server in the left pane and then tapping or clicking the link
provided for NIC teaming. You can then create and configure NIC
teams.

INSIDE OUT: NIC teaming

NIC teaming allows multiple network adapters to have their bandwidth aggregated
for the purposes of load balancing and failover protection. Windows Server 2012 supports up
to 32 network adapters aggregated into a team. In turn, these
aggregated adapters then present one or more virtual adapters,
referred to as team network adapters, to the
operating system. Each team network adapter organizes network
traffic by virtual LAN (VLAN), allowing applications to
simultaneously connect to different VLANs.

When you are configuring NIC teaming, you can tap or click Additional
Properties to configure the teaming mode, load-balancing mode, and
standby-adapter mode. By default, team network adapters use the
switch independent team mode, which doesn’t require the network
switch to participate in the teaming, and this allows the team
network adapters to be connected to different switches.
Alternatively, you can configure

Static teaming as the teaming mode, which requires you
to configure the switch and the server to work with NIC
teaming. Here, you typically use a server-class switch and identify which links
form the team. Because there is no error detection and
correction, you must be certain the network cables are
properly connected.

Link Aggregation Control Protocol (LACP) as the teaming
mode, which uses Institute of Electrical and Electronics
Engineers (IEEE) 802.1ax LACP to automatically create the NIC
team by dynamically identifying links between the server and
the switch. Here, you typically use a server-class switch and
enable LACP on the appropriate switch ports.

If you have a server-class switch, the switch likely
supports IEEE 802.1ax (also referred to as IEEE 802.3ad) and
you can gain some additional performance benefits by having the
switch participate in the teaming.

For load balancing, the default mode (Address Hash) creates
a simple hash for packets and then assigns packets that have a
particular hash to one of the available team network adapters.
This can help to balance the workload across the team network
adapters. Alternatively, if a server has virtual machines, you can
use the MAC address of each virtual machine to determine how
traffic is balanced. Load balancing by MAC address works best when
virtual machines have similar workloads. Keep in mind that
failover between network adapters in a virtual
machine might result in traffic being sent with the MAC address of
a different network adapter. If so, to prevent this from being
blocked automatically, NIC teaming must be set to allow MAC
spoofing or must have the “AllowTeaming=On”
parameter set using the Set-VmNetworkAdapter cmdlet in Windows
PowerShell.

Finally, the Standby Adapter setting allows you to specify
whether all network adapters are active. Typically, for optimal
performance, you’ll want all network adapters in a
team to be active. However, you can designate one or more network
adapters in a team as standby adapters. Exactly as its name
suggests, a standby adapter is inactive until
another active adapter fails and is then activated as part of
failover. Keep in mind that, technically, you can
place a single network adapter in a team. However, you need two or
more network adapters for fault protection through
failover.