This document gives an overview of the Wireless LAN Controller (WLC)
Discovery and Join Process. This document also provides information on some of
the issues why a Lightweight Access Point (LAP) fails to join a WLC and how to
troubleshoot the issues.

In a Cisco Unified Wireless network, the LAPs must first discover and
join a WLC before they can service wireless clients.

Originally, the controllers only operated in Layer 2 mode. In Layer 2
mode, the LAPs are required to be on the same subnet as the management
interface and the Layer 3 mode AP-manager interface is not present on the
controller. The LAPs communicate with the controller using Layer 2
encapsulation only (ethernet encapsulation) and do not Dynamic Host
Configuration Protocol (DHCP) an IP address.

When Layer 3 mode on the controller was developed, a new Layer 3
interface called AP-manager was introduced. In Layer 3 mode, the LAPs would
DHCP an IP address first and then send their discovery request to the
management interface using IP addresses (Layer 3). This allowed the LAPs to be
on a different subnet than the management interface of the controller. Layer 3
mode is the dominate mode today. Some controllers and LAPs can only perform
Layer 3 mode.

However, this presented a new problem: how did the LAPs find the
management IP address of the controller when it was on a different subnet?

In Layer 2 mode, they were required to be on the same subnet. In Layer
3 mode, the controller and LAP are essentially playing hide and seek in the
network. If you do not tell the LAP where the controller is via DHCP option 43,
DNS resolution of "Cisco-lwapp-controller@local_domain", or statically
configure it, the LAP does not know where in the network to find the management
interface of the controller.

In addition to these methods, the LAP does automatically look on the
local subnet for controllers with a 255.255.255.255 local broadcast. Also, the
LAP remembers the management IP address of any controller it joins across
reboots. Therefore, if you put the LAP first on the local subnet of the
management interface, it will find the controller's management interface and
remember the address. This is called priming. This does not help find the
controller if you replace a LAP later on. Therefore, Cisco recommends using the
DHCP option 43 or DNS methods.

When the LAPs discover the controller, they do not know if the
controller is in Layer 2 mode or Layer 3 mode. Therefore, the LAPs always
connect to the management interface address of the controller first with a
discovery request. The controller then tells the LAP which mode it is in the
discovery reply. If the controller is in Layer 3 mode, the discovery reply
contains the Layer 3 AP-manager IP address so the LAP can send a join request
to the AP-manager interface next.

Note: By default both management and AP-manager interfaces are left
untagged on their VLAN during configuration. In case these are tagged, make
sure they are tagged to the same VLAN in order to properly receive discovery
and join response from the WLC.

The LWAPP AP goes through this process on startup for Layer 3
mode:

The LAP boots and DHCPs an IP address if it was not previously
assigned a static IP address.

The LAP sends discovery requests to controllers through the various
discovery algorithms and builds a controller list. Essentially, the LAP learns
as many management interface addresses for the controller list as possible via:

DHCP option 43 (good for global companies where offices and
controllers are on different continents)

DNS entry for
cisco-capwap-controller (good for local
businesses - can also be used to find where brand new APs join)

Note: If you use CAPWAP, make sure that there is a DNS entry for
cisco-capwap-controller.

Management IP addresses of controllers the LAP remembers
previously

A Layer 3 broadcast on the subnet

Over the air provisioning

Statically configured information

From this list, the easiest method to use for deployment is to have
the LAPs on the same subnet as the management interface of the controller and
allow the LAP’s Layer 3 broadcast to find the controller. This method should be
used for companies that have a small network and do not own a local DNS
server.

The next easiest method of deployment is to use a DNS entry with
DHCP. You can have multiple entries of the same DNS name. This allows the LAP
to discover multiple controllers. This method should be used by companies that
have all of their controllers in a single location and own a local DNS server.
Or, if the company has multiple DNS suffixes and the controllers are segregated
by suffix.

DHCP option 43 is used by large companies to localize the information
via the DHCP. This method is used by large enterprises that have a single DNS
suffix. For example, Cisco owns buildings in Europe, Australia, and the United
States. In order to ensure that the LAPs only join controllers locally, Cisco
cannot use a DNS entry and must use DHCP option 43 information to tell the LAPs
what the management IP address of their local controller is.

Finally, static configuration is used for a network that does not
have a DHCP server.You can statically configure the information necessary to
join a controller via the console port and the AP’s CLI. For information on how
to statically configure controller information using the AP CLI, refer to
Manually
Configuring Controller Information Using the Access Point CLI.

For a detailed explanation on the different discovery algorithms that
LAPs use to find controllers, refer to
LAP
Registration with WLC.

Send a discovery request to every controller on the list and wait for
the controller's discovery reply which contains the system name, AP-manager IP
addresses, the number of APs already attached to each AP-manager interface, and
overall excess capacity for the controller.

Look at the controller list and send a join request to a controller
in this order (only if the AP received a discovery reply from it):

Primary Controller system name (previously configured on
LAP)

Secondary Controller system name (previously configured on
LAP)

Tertiary Controller system name (previously configured on
LAP)

Master controller (if the LAP has not been previously configured
with any Primary, Secondary, or Tertiary controller names. Used to always know
which controller brand new LAPs join)

If none of the above are seen, load balance across controllers
using the excess capacity value in the discovery response.

If two controllers have the same excess capacity, then send the
join request to the first controller that responded to the discovery request
with a discovery response. If a single controller has multiple AP-managers on
multiple interfaces, choose the AP-manager interface with the least number of
APs.

The controller will respond to all discovery requests without
checking certificates or AP credentials. However, join requests must have a
valid certificate in order to get a join response from the controller. If the
LAP does not receive a join response from its choice, the LAP will try the next
controller in the list unless the controller is a configured controller
(Primary/Secondary/Tertiary).

When it receives the join reply, the AP checks to make sure it has
the same image as that of the controller. If not, the AP downloads the image
from the controller and reboots to load the new image and starts the process
all over again from step 1.

If it has the same software image, it asks for the configuration from
the controller and moves into the registered state on the controller.

After you download the configuration, the AP might reload again to
apply the new configuration. Therefore, an extra reload can occur and is a
normal behavior.

As mentioned in the previous section, once a LAP registers with the
WLC, it checks to see if it has the same image as the controller. If the images
on the LAP and the WLC are different, the LAPs download the new image from the
WLC first. If the LAP has the same image, it continues to download the
configuration and other parameters from the WLC.

You will see these messages in the debug lwapp events
enable command output if the LAP downloads an image from the
controller as a part of the registration process:

As a part of the join process, the WLC authenticates each LAP by
verifying that its certificate is valid.

When the AP sends the LWAPP Join Request to the WLC, it embeds its
X.509 certificate in the LWAPP message. The AP also generates a random session
ID that is also included in the LWAPP Join Request. When the WLC receives the
LWAPP Join Request, it validates the signature of the X.509 certificate using
the AP's public key and checks that the certificate was issued by a trusted
certificate authority.

It also looks at the starting date and time for the AP certificate's
validity interval and compares that date and time to its own date and time
(hence the controller’s clock needs to be set to close to the current date and
time). If the X.509 certificate is validated, the WLC generates a random AES
encryption key. The WLC plumbs the AES key into its crypto engine so that it
can encrypt and decrypt future LWAPP Control Messages exchanged with the AP.
Note that data packets are sent in the clear in the LWAPP tunnel between the
LAP and the controller.

The debug pm pki enable command shows the
certification validation process that occurs during the join phase on the
controller. The debug pm pki enable command will
also display the AP hash key during the join process if the AP has a
self-signed certificate (SSC) created by the LWAPP conversion program. If the
AP has a Manufactured Installed Certificate (MIC), you will not see a hash
key.

Note: All APs manufactured after June 2006 have a MIC.

Here is the output of the debug pm pki
enable command when the LAP with a MIC joins the
controller:

Note: Some lines of the output has been moved to the second line due to
space constraints.

If the controller debugs do not indicate a join request, you can debug
the process from the LAP as long as the LAP has a console port. You can see the
LAP boot up process with these commands, but you must first get into enable
mode (default password is Cisco):

debug dhcp detail—Shows DHCP option 43
information.

debug ip udp—Shows the join/discovery
packets to the controller as well as DHCP and DNS queries (all of these are UDP
packets. Port 12223 is the controller’s source port).

debug lwapp client event—Shows LWAPP
events for the AP.

undebug all—Disables debugs on the
AP.

Here is an example of the output from the debug ip
udp command. This partial output gives an idea of the packets
that are sent by the LAP during the boot process to discover and join a
controller.

LAPs that use DHCP to find an IP address before they start the WLC
discovery process might have trouble receiving a DHCP address due to the
misconfiguration of DHCP related parameters. This section explains how DHCP
works with WLCs and provides some of the best practices to avoid DHCP related
issues.

For DHCP, the controller behaves like a router with an IP helper
address. That is, it fills in the gateway IP address and forwards the request
via a unicast packet directly to the DHCP server.

When the DHCP offer comes back to the controller, it changes the DHCP
server IP address to its virtual IP address. The reason it does this is because
when Windows roams between APs, the first thing it does is try to contact the
DHCP server and renew the address.

With the DHCP server address being 1.1.1.1 (typical virtual IP address
on a controller), the controller can intercept that packet and quickly respond
to Windows.

This is also why the virtual IP address is the same on all controllers.
If a Windows laptop roams to an AP on another controller, it will try to
contact the virtual interface on the controller. Due to the mobility event and
context transfer, the new controller that the Windows client roamed to already
has all the information to respond to Windows again.

If you want to use the internal DHCP server on the controller, all you
have to do is put the management IP address as the DHCP server on the dynamic
interface you create for the subnet. Then assign that interface to the
WLAN.

The reason the controller needs an IP address on each subnet is so it
can fill in the DHCP gateway address in the DHCP request.

These are some of the points to remember when you configure DHCP
servers for the WLAN:

The DHCP server IP address should not fall within any dynamic subnet
that is on the controller. It will be blocked but can be overridden with this
command:

config network mgmt-via-dynamic-interface on version 4.0 only
(command not available in version 3.2)

The controller will forward the DHCP via unicast from its dynamic
interface (in later code) using its IP address on that interface. Make sure
that any firewall allows this address to reach the DHCP server.

Make sure that the response from the DHCP server can reach the
controller's dynamic address on that VLAN through any firewalls. Ping the
dynamic interface address from the DHCP server. Ping the DHCP server with a
source IP address of the dynamic interface's gateway address.

Make sure the AP's VLAN is allowed on the switches and routers, and
that their ports are configured as trunks so the packets (includes DHCP) tagged
with the VLAN are allowed through the wired network.

This information clearly shows that the controller time is outside
the certificate validity interval of the LAP. Therefore, the LAP cannot
register with the controller. Certificates installed in the LAP have a
predefined validity interval. The controller time should be set in such a way
that it is within the certificate validity interval of the LAP’s
certificate.

Issue the show time command from the
controller CLI in order to verify that the date and time set on your controller
falls within this validity interval. If the controller time is higher or lower
than this certificate validity interval, then change the controller time to
fall within this interval.

Note: If the time is not set correctly on the controller, choose
Commands > Set Time in the controller GUI mode, or issue
the config time command in the controller CLI in
order to set the controller time.

On LAPs with CLI access, verify the certificates with the
show crypto ca certificates command from the AP
CLI.

This command allows you to verify the certificate validity interval
set in the AP. This is an example:

The entire output is not listed as there can be many validity
intervals associated with the output of this command. You need to consider only
the validity interval specified by the Associated Trustpoint:
Cisco_IOS_MIC_cert with the relevant AP name in the name field. In this example
ouput, it is Name: C1200-001563e50c7e. This is
the actual certificate validity interval to be considered.

The message clearly indicates that there is a mismatch in the
regulatory domain of the LAP and the WLC. The WLC supports multiple regulatory
domains but each regulatory domain must be selected before an LAP can join from
that domain. For example, the WLC that uses regulatory domain -A can only be
used with APs that use regulatory domain -A (and so on). When you purchase APs
and WLCs, ensure that they share the same regulatory domain. Only then can the
LAPs register with the WLC.

Note: Both 802.1b/g and 802.11a radios must be in the same regulatory
domain for a single LAP.

By default, the 4400 Series Controllers can support up to 48 APs per
port. When you try to connect more than 48 APs on the controller, you receive
this error message. However, you can configure your 4400 Series Controller to
support more APs on a single interface (per port) using one of these
methods:

This command shows the public key-hash that the controller has in
storage.

Issue the debug pm pki enable
command.

This command shows the actual public key-hash. The actual public
key-hash must match the public key-hash that the controller has in storage. A
discrepancy causes the problem. This is a sample output of this debug
message:

Note: Some lines of the output has been moved to the second line due to
space constraints.

If you have a WCS, you can push the SSCs to the new WLC. For more
information on how to configure APs using the WCS, refer to the
Configuring
Access Points section of Cisco Wireless Control System
Configuration Guide, Release 5.1.

Received a Discovery Request with subnet broadcast with wrong AP IP address (A.B.C.D)!

This message means that the controller received a discovery request via
a broadcast IP address that has a source IP address which is not in any
configured subnets on the controller. This also means the controller is
dropping the packet.

The problem is that the AP is not sending the discovery request to the
management IP address. The controller is reporting a broadcast discovery
request from a VLAN that is not configured on the controller. This typically
occurs when the customer trunks allowed VLANs instead of restricting them to
wireless VLANs.

Complete these steps in order to resolve this problem:

If the controller is on another subnet, the APs must be
primed for the controller IP address, or the APs must receive
the controllers IP address using any one of the discovery
methods.

The switch is configured to allow some VLANs that are not on the
controller. Restrict the allowed VLANs on the trunks.

Solution: This is because the Cisco 1250 series LAP is
not supported on version 4.1. The Cisco Aironet 1250 Series AP is supported
from controller versions 4.2.61 and later. In order to fix this issue, upgrade
the controller software to 4.2.61.0 or later.

This is another common issue that is seen when the AP tries to join the
WLC. You might see this error message when the AP tries to join the
controller.

No more AP manager IP addresses remain

One of the reasons for this error message is when there is a duplicate
IP address on the network that matches the AP manager IP address. In such a
case, the LAP keeps power cycling and cannot join the controller.

The debugs will show that the WLC receives LWAPP discovery requests
from the APs and transmits a LWAPP discovery response to the APs. However, WLCs
do not receive LWAPP join requests from the APs.

In order to troubleshoot this issue, ping the AP manager from a wired
host on the same IP subnet as the AP manager. Then, check the ARP cache. If a
duplicate IP address is found, remove the device with the duplicate IP address
or change the IP address on the device so that it has a unique IP address on
the network.

This is because of Cisco bug ID CSCsd94967. LWAPP APs
might fail to join a WLC. If the LWAPP join request is larger than 1500 bytes,
LWAPP must fragment the LWAPP join request. The logic for all LWAPP APs is that
the size of the first fragment is 1500 bytes (including IP and UDP header) and
the second fragment is 54 bytes (including IP and UDP header). If the network
between the LWAPP APs and WLC has a MTU size less than 1500 (as might be
encountered when using a tunneling protocol such as IPsec VPN, GRE, MPLS,
etc.), WLC cannot handle the LWAPP join request.

You will encounter this problem under these conditions:

WLC that runs version 3.2 software or earlier

Network path MTU between the AP and WLC is less than 1500
bytes

In order to resolve this issue, use any one of these options:

Upgrade to WLC software 4.0, if the platform supports it. In WLC
version 4.0, this problem is fixed by allowing the LWAPP tunnel to reassemble
up to 4 fragments.

Increase the network path MTU to 1500 bytes.

Use 1030 REAPs for the locations reachable via low MTU paths. REAP
LWAPP connections to 1030 APs have been modified to handle this situation by
reducing the MTU used for REAP mode.

The 1142 series LAPs are supported only with WLC release 5.2 and later.
If you run WLC versions earlier than 5.2, you cannot register the LAP to the
Controller and you will see an error message similar to this:

This is because WLC software release 5.0.148.0 or later is not
compatible with Cisco Aironet 1000 series APs. If you have a Cisco 1000 series
LAP in a network, which runs WLC versions 5.0.48.0, the 1000 series LAP does
not join the controller and you see this trap message on the WLC.

This can happen if the Lightweight Access Point was shipped with a mesh
image and is in Bridge mode. If the LAP was ordered with mesh software on it,
you need to add the LAP to the AP authorization list. Choose Security
> AP Policies and add AP to the Authorization
List. The AP should then join, download the image from the controller, then
register with the WLC in bridge mode. Then you need to change the AP to local
mode. The LAP downloads the image, reboots and registers back to the controller
in local mode.

There is a limit to the number of LAPs that can be supported by a WLC.
Each WLC supports a certain number of LAPs, which depends on the model and
platform. This error message is seen on the WLC when it receives a discovery
request after it has reached its maximum AP capacity.

Here is the number of LAPs supported on the different WLC platform and
models:

The 2100 series controller supports up to 6, 12, or 25 LAPs. This
depends on the model of the WLC.

The 4402 supports up to 50 LAPs, while the 4404 supports up to 100.
This makes it ideal for large-sized enterprises and large-density
applications.

The Catalyst 6500 Series Wireless Services Module (WiSM) is an
integrated Catalyst 6500 switch and two Cisco 4404 controllers that supports up
to 300 LAPs.

The Cisco 7600 Series Router WiSM is an integrated Cisco 7600 router
and two Cisco 4404 controllers that supports up to 300 LAPs.

The Cisco 28/37/38xx Series Integrated Services Router is an
integrated 28/37/38xx router and Cisco controller network module that supports
up to 6, 8, 12, or 25 LAPs, depending on the version of the network module. The
versions that support 8, 12, or 25 APs and the NME-AIR-WLC6-K9 6-access-point
version feature a high-speed processor and more on-board memory than the
NM-AIR-WLC6-K9 6-access-point version.

The Catalyst 3750G Integrated WLC Switch is an integrated Catalyst
3750 switch and Cisco 4400 series controller that supports up to 25 or 50
LAPs.