Take Away:

Skype for Business (S4B) and Lync clients may experience problems when traversing a split-tunnel VPN. Use Name Resolution Policy Table (NRPT) and Windows firewall group policies (GPOs) to bypass split-tunnel VPNs. This solution is easy to administer and provides remote offices the best multimedia experience.

STUN identifies client Network Address Translation (NAT) (i.e., private IPs). This process also identifies the default gateway (i.e., public IP). Multimedia travels directly between end-points when STUN is used. S4B/ Lync clients prefer to communicate directly (i.e., peer-to-peer) between clients which reside on the same LAN. N.B., LAN is not a reference for broadcast domains. LAN, in this situation, includes all internal networks (i.e., subnets) with routes to the Front-End subnet. Internal clients never use the Access Edge server for internal communication.

Similarly, external clients prefer STUN for communicating multimedia content to other external peers. The Access Edge server will only bridge external-to-external clients (i.e., TURN) if peer-to-peer communication is not possible.

Lync clients use TURN framework when end-points do not share a common LAN. The TURN process creates dynamic ports on the Access Edge server; and in turn (pun), proxies external multimedia. TURN is similar to Port Address Translation (PAT), just as the Access Edge server is similar to an Internet gateway.

Split-Tunnel Problems:

ICE framework (generally) provides the best multimedia experience. However, it does not work well over split-tunnel VPNs. Split-tunnel VPNs create STUN and TURN mismatches. For example, the DCA branch office firewall forwards all domain traffic to the JFK primary office; all other traffic forwards out the local gateway (i.e., Internet). DCA and external Lync clients interpret this topology differently (Table 1).

The primary problem with split-tunnel VPNs is with how the S4B/ Lync client interprets the topology. Recall, internal clients always use the Access Edge server for external communication. Likewise, internal clients never use the Access Edge for internal conversations. The VPN firewall forwards all domain traffic to the JKF network. Therefore, DCA clients consider themselves as internal; and external clients as external. DCA clients will only use the Access Edge server when communicating with external clients.

External clients have an entirely different interpretation of the topology. External clients are aware of the DCA Internet gateway, but they remain unaware of its split-tunneling. External clients will therefore interpret DCA clients as external peers; multimedia traffic is sent directly to the DCA clients (i.e., STUN).

To recap, external clients are unaware of the DCA split-tunnel. These external clients attempt to send audio and video (AV), and expect to receive AV, directly from the DCA clients. Whereas DCA clients send AV, and expect to receive AV, proxied from the Access Edge server.

Figure 2. Lync directional mismatch.

The split-tunnel VPN causes a secondary problem between JFK and DCA. These clients use STUN to establish peer-to-peer connections across the VPN. Users complain about overall client AV quality between these locations.

All internal clients, including those on the VPN, use internal DNS for Lync Discovery resolution. External clients use external DNS for their Lync Discovery process. Therefore, VPN clients can bypass split-tunneling using a process that distinguishes Lync traffic, and resolves it using external name records. N.B., Internal DNS continues to resolve all other (i.e., non-Lync) requests. Otherwise, what's the point of having a VPN?

Right click on Inbound Rules → New Inbound Rule → Program → Path: %ProgramFiles%\Microsoft Office\Office15\lync.exe → Block the Connection → Apply rule to Domain. N.B, Use applicable application paths. For example, Lync Basic and Lync Professional may use different paths.

Edit the new Inbound Rule: Right click on the new rule → Click on the Scope tab → Add all internal IP subnets (i.e., primary office) to the Remote IP address field → Click Add → Click OK.

Figure 5. Windows Firewall GPO to bypass VPN.

Apply the newly created Firewall GPO to apply the AD site that correlates with the branch office. Alternately, apply this GPO to OU that nests branch office computers.

Skype for Business (S4B)/ Lync 2013 Resiliency Outline: S4B and Lync 2013 resilience pools are similar to other highly available (HA) and fault tolerance solutions. If one pool fails (i.e., server or network disruption) its clients automatically connect to the second pool. Even better, the Lync clients maintain their client-to-client sessions (i.e., VOIP and IM) after connecting to the backup pool :

UDP Direct.

UDP NAT.

UDP Relay.

TCP Relay.

When the primary server pool fails (e.g., server down) its clients experience a brief hiccup (e.g., 1 second delay) as they connect to the backup pool. Although this near-HA solution is useful, the system is not without flaws. The resiliency caveat is that clients enter a limited resilience mode upon connecting to the backup pool. Clients in resilience mode operate with limited functionality:

New users connect in resilience mode.

Scheduling features are unavailable.

Presence state displays as unknown.

The Contact list and Address book is not available (searches for individuals work though).

Network administrators can later use PowerShell to manually fail-over remaining Lync services. This process changes clients' resilience mode to the fully functional regular mode. A talented Lync administrator can pair a network monitor application with a custom PowerShell script to fully automate the entire fail-over process.

What is the Lync CMS? The central management store (CMS) is a SQL Express database (Standard Edition) that stores configuration data in XML format. The CMS database is named RTS. Lync server tools make changes to the CMS database:

Lync PowerShell Console.

Lync Server Control Panel (LsCP).

Lync Topology builder.

Each pool has a single CMS database; however only one of the CMS instances is active.

What is the replication database? The CMS database uses a master-slave replication model. All changes to the active RTS database are replicated to the RTSLocal databases. Changes are replicated across pools. Lync uses the RTCLocal instance -not the master RTC for its client services. This distinction is important for resiliency purposes.

What is the Lync Front End Pool? The Lync Front End pool consists of a single (i.e., Lync Standard) or multiple (i.e., Lync Enterprise) Front End servers that connect to an associated back-end CMS database. There is only one pool per CMS instance. Users are assigned to a single pool; never both.

When users sign-on to Lync, their clients automatically connect to Front end servers which resides in the users' assigned Front End pool. The Front End sever uses the replication database located in its local pool.

What are Lync Resiliency Pools? The Lync Resiliency pool consists of exactly two Front End pools (e.g., pool_A or pool_B). This 1:1 pool ratio provides one primary pool and one backup pool. If a single resiliency pool becomes unavailable, the second Front End server prevents any change to its CMS database. However, the second server continues to operate using its replica's database which maintains a static configuration from the time off the loss. Clients that re-connect use resilient mode because the back-end service is in a read-only state. Consider:

The resiliency pool is considered active/passive when individuals are all assigned to the same pool.

The resiliency pool is considered active/active when individual Lync users are assigned to separate Front End pools.

The difference between the two models is significant during the fail-over process. In the event one of the Lync servers fail there will be some clients that enter resiliency mode and others will remain in standard mode.

What is Lync High Availability? Lync Enterprise separates the front-end and back-end roles for truly high-available services within a single pool. Each pool allows up to twelve Front End servers and connects up to 80,000 simultaneous users. In addition, Microsoft SQL 2012 hosts the CMS database by either mirroring or shared storage clustering. Disruption to any single server within the same pool is negligible and are therefore considered highly available.

Lync 2013 Standard edition differs because it consolidates roles; both the Front End and CMS installs on the same server. Nonetheless, Lync Standard remains a robust solution. It scales up to 5,000 users, is simple to deploy, and supports all workloads (i.e., VOIP, IM, etc...). Microsoft provides an alternative to HA through the use of resiliency pools.

How do Lync resiliency pools work?

Back-End Server Perspective:

There are only two resiliency pools (e.g., pool_A or pool_B).

There is only one CMS master database per pool.

The CMS instance replicates to the LocalRTS database.

Lync servers always use the LocalRTS -never the CMS.

Front-End Server Perspective:

Front End Servers belong to a single pool. Each pool has at least one Front End Server.

Users are assigned to a single pool; never both.

Active/Active pools assign users to both pools.

Active/Passive pools assign users to a single pool.

Lync Client Perspective.

Users connect to Front End servers (i.e., pools) based on administrative configuration.

Clients from the primary pool communicate seamlessly with clients from the secondary pool; and vice verse.

Pools are not important to the end users unless the client is forced into resiliency mode.

What happens when a Lync server goes down?

Changes cannot be made to the CMS database from any pool when an outage occurs.

Clients access the data, rather than modify it. The Lync FE server uses a second database

Users associated with the unavailable server enter resiliency mode.

How do we fail-over Lync clients?

Determine which Front End pool hosts the CMS:

Get-CsService -CentralManagement

The command will most likely error out because the primary pool, which normally hold the active CMS is unavailable.

However, the CMS can run on either pool, and we can at least determine if the secondary pool is the active CMS host. Look for the active attribute in the host Identity field.

Let's assume the primary pool hosted the active CMS and is no longer available. Fail over the CMS to the secondary pool:

About Me

Steven Jordan is an infrastructure and process management specialist. Steven holds a Master of Science degree in ICT from the University of Wisconsin Stout.
Steven is also a Cisco Certified Network Professional (CCNP) and a University of Wisconsin Extension Master Gardener.