Sign up to receive free email alerts when patent applications with chosen keywords are publishedSIGN UP

Abstract:

One embodiment of the present invention provides a switch. The switch
includes a tunnel management module, a packet processor, and a forwarding
module. The tunnel management module operates the switch as a tunnel
gateway capable of terminating an overlay tunnel. During operation, the
packet processor, which is coupled to the tunnel management module,
identifies in a data packet a virtual Internet Protocol (IP) address
associated with a virtual tunnel gateway. This virtual tunnel gateway is
associated with the switch and the data packet is associated with the
overlay tunnel. The forwarding module determines an output port for an
inner packet in the data packet based on a destination address of the
inner packet.

Claims:

1. A switch, comprising: a tunnel management module configurable to
operate the switch as a tunnel gateway capable of terminating an overlay
tunnel; a packet processor coupled to the tunnel management module and
configurable to identify in a data packet a virtual Internet Protocol
(IP) address associated with a virtual tunnel gateway, wherein the
virtual tunnel gateway is associated with the switch, and wherein the
data packet is associated with the overlay tunnel; and a forwarding
module configurable to determine an output port for an inner packet in
the data packet based on a destination address of the inner packet.

2. The switch of claim 1, wherein the tunnel management module is further
configurable to identify a hypervisor controlling a virtual machine,
wherein the virtual machine initiates the overlay tunnel by encapsulating
the inner packet using the virtual IP address.

3. The switch of claim 1, wherein the packet processor is further
configurable to identify in the data packet a virtual media access
control (MAC) address corresponding to the virtual IP address.

4. The switch of claim 1, further comprising a device management module
configurable to generate a configuration message comprising the virtual
IP address as a tunnel gateway address in response to detecting a
hypervisor.

5. The switch of claim 4, wherein the virtual IP address in the
configuration message corresponds to a default gateway router.

6. The switch of claim 1, wherein the virtual IP address is further
associated with a remote switch, wherein the remote switch operates as a
tunnel gateway and is associated with the virtual tunnel gateway.

7. The switch of claim 1, wherein the data packet is encapsulated based
on Transparent Interconnection of Lots of Links (TRILL) protocol; wherein
the packet processor is further configurable to identify a virtual
routing bridge (RBridge) identifier in the data packet; and wherein the
virtual RBridge identifier is associated with the switch.

8. The switch of claim 1, further comprising a fabric switch management
module configurable to maintain a membership in a fabric switch.

9. The switch of claim 8, wherein the packet processor is further
configurable to identify the inner packet to be a broadcast, unknown
unicast, or multicast packet; and wherein the tunnel management module is
further configurable to select a multicast tree in the fabric switch to
distribute the inner packet based on one or more of: multicast group
membership, virtual local area network (VLAN) membership, and network
load.

10. The switch of claim 1, wherein the tunnel management module is
further configurable to learn a MAC address of a virtual machine via a
tunnel initiated by a first hypervisor associated with the virtual
machine.

11. The switch of claim 10, wherein the tunnel management module is
further configurable to operate in conjunction with the packet processor
to construct a message for a second hypervisor comprising an IP address
of the first hypervisor in response to receiving a data frame with
unknown destination from a virtual machine associated with the second
hypervisor.

12. A computer-executable method, comprising: operating a switch as a
tunnel gateway capable of terminating an overlay tunnel; identifying in a
data packet a virtual Internet Protocol (IP) address associated with a
virtual tunnel gateway, wherein the virtual tunnel gateway is associated
with the switch, and wherein the data packet is associated with the
overlay tunnel; and determining an output port for an inner packet in the
data packet based on a destination address of the inner packet.

13. The method of claim 12, further comprising identifying a hypervisor
controlling a virtual machine, wherein the virtual machine initiates the
overlay tunnel by encapsulating the inner packet using the virtual IP
address.

14. The method of claim 12, further comprising identifying in the data
packet a virtual media access control (MAC) address corresponding to the
virtual IP address.

15. The method of claim 12, further comprising generating a configuration
message comprising the virtual IP address as a tunnel gateway address in
response to detecting a hypervisor.

16. The method of claim 15, wherein the virtual IP address in the
configuration message corresponds to a default gateway router.

17. The method of claim 12, wherein the virtual IP address is further
associated with a remote switch, wherein the remote switch operates as a
tunnel gateway and is associated with the virtual tunnel gateway.

18. The method of claim 12, wherein the data packet is encapsulated based
on Transparent Interconnection of Lots of Links (TRILL) protocol; wherein
an egress routing bridge (RBridge) identifier in the data packet
corresponds to a virtual RBridge identifier; and wherein the virtual
RBridge identifier is associated with the switch.

19. The method of claim 12, further comprising maintaining a membership
in a fabric switch.

20. The method of claim 19, further comprising: identifying the inner
packet to be a broadcast, unknown unicast, or multicast packet; and
selecting a multicast tree in the fabric switch to distribute the inner
packet based on one or more of: multicast group membership, virtual local
area network (VLAN) membership, and network load.

21. The method of claim 12, further comp rising learning a MAC address of
a virtual machine via a tunnel initiated by a first hypervisor associated
with the virtual machine.

22. The method of claim 21, constructing a message for a second
hypervisor comprising an IP address of the first hypervisor in response
to receiving a data frame with unknown destination from a virtual machine
associated with the second hypervisor.

[0008] The present disclosure relates to network management. More
specifically, the present disclosure relates to dynamic insertion of
services in a fabric switch.

[0009] 2. Related Art

[0010] The exponential growth of the Internet has made it a popular
delivery medium for a variety of applications running on physical and
virtual devices. Such applications have brought with them an increasing
demand for bandwidth. As a result, equipment vendors race to build larger
and faster switches with versatile capabilities, such as awareness of
virtual machine migration, to move more traffic efficiently. However, the
size of a switch cannot grow infinitely. It is limited by physical space,
power consumption, and design complexity, to name a few factors.
Furthermore, switches with higher capability are usually more complex and
expensive. More importantly, because an overly large and complex system
often does not provide economy of scale, simply increasing the size and
capability of a switch may prove economically unviable due to the
increased per-port cost.

[0011] A flexible way to improve the scalability of a switch system is to
build a fabric switch. A fabric switch is a collection of individual
member switches. These member switches form a single, logical switch that
can have an arbitrary number of ports and an arbitrary topology. As
demands grow, customers can adopt a "pay as you grow" approach to scale
up the capacity of the fabric switch.

[0012] Meanwhile, layer-2 (e.g., Ethernet) switching technologies continue
to evolve. More routing-like functionalities, which have traditionally
been the characteristics of layer-3 (e.g., Internet Protocol or IP)
networks, are migrating into layer-2. Notably, the recent development of
the Transparent Interconnection of Lots of Links (TRILL) protocol allows
Ethernet switches to function more like routing devices. TRILL overcomes
the inherent inefficiency of the conventional spanning tree protocol,
which forces layer-2 switches to be coupled in a logical spanning-tree
topology to avoid looping. TRILL allows routing bridges (RBridges) to be
coupled in an arbitrary topology without the risk of looping by
implementing routing functions in switches and including a hop count in
the TRILL header.

[0013] As Internet traffic is becoming more diverse, virtual computing in
a network is becoming progressively more important as a value proposition
for network architects. In addition, the evolution of virtual computing
has placed additional requirements on the network. For example, as the
locations of virtual servers become more mobile and dynamic, it is often
desirable that the network infrastructure can provide network overlay
tunnels to assist the location changes of the virtual servers.

[0014] While a fabric switch brings many desirable features to a network,
some issues remain unsolved in facilitating network overlay tunnels to
support virtual machine migration.

SUMMARY

[0015] One embodiment of the present invention provides a switch. The
switch includes a tunnel management module, a packet processor, and a
forwarding module. The tunnel management module operates the switch as a
tunnel gateway capable of terminating an overlay tunnel. During
operation, the packet processor, which is coupled to the tunnel
management module, identifies in a data packet a virtual Internet
Protocol (IP) address associated with a virtual tunnel gateway. This
virtual tunnel gateway is associated with the switch and the data packet
is associated with the overlay tunnel. The forwarding module determines
an output port for an inner packet in the data packet based on a
destination address of the inner packet.

[0016] In a variation on this embodiment, a hypervisor controlling one or
more virtual machines initiates the overlay tunnel by encapsulating the
inner packet.

[0017] In a variation on this embodiment, the packet processor also
identifies in the data packet a virtual media access control (MAC)
address mapped to the virtual IP address.

[0018] In a variation on this embodiment, the switch also includes a
device management module which operates in conjunction with the packet
processor and generates for a hypervisor a configuration message
comprising the virtual IP address as a tunnel gateway address.

[0019] In a further variation, the virtual IP address in the configuration
message also corresponds to a default gateway router.

[0020] In a variation on this embodiment, the virtual IP address is
further associated with a remote switch. This remote switch also operates
as a tunnel gateway and is associated with the virtual tunnel gateway.

[0021] In a variation on this embodiment, the data packet is encapsulated
based on the Transparent Interconnection of Lots of Links (TRILL)
protocol. Under such a scenario, the packet processor also identifies a
virtual routing bridge (RBridge) identifier, which is associated with the
switch, in the data packet.

[0022] In a variation on this embodiment, the switch also includes a
fabric switch management module which maintains a membership in a fabric
switch. Such a fabric switch accommodates a plurality of switches and
operates as a single logical switch.

[0023] In a further variation, the packet processor identifies the inner
packet to be a broadcast, unknown unicast, or multicast packet. In
response, the tunnel management module selects a multicast tree in the
fabric switch to distribute the inner packet based on one or more of:
multicast group membership, virtual local area network (VLAN) membership,
and network load.

[0024] In a variation on this embodiment, the tunnel management module
operates in conjunction with the packet processor to learn a MAC address
of a virtual machine via a tunnel initiated by a first hypervisor
associated with the virtual machine.

[0025] In a further variation, the tunnel management module operates in
conjunction with the packet processor to construct a message for a second
hypervisor comprising an IP address of the first hypervisor in response
to receiving a data frame with unknown destination from a virtual machine
associated with the second hypervisor.

BRIEF DESCRIPTION OF THE FIGURES

[0026]FIG. 1A illustrates an exemplary fabric switch with a virtual
tunnel gateway, in accordance with an embodiment of the present
invention.

[0027]FIG. 1B illustrates a virtual tunnel gateway being associated with
a respective member switch of a fabric switch in conjunction with the
example in FIG. 1A, in accordance with an embodiment of the present
invention.

[0028]FIG. 2A illustrates an exemplary configuration of a fabric switch
with a virtual tunnel gateway, in accordance with an embodiment of the
present invention.

[0029]FIG. 2B illustrates exemplary multi-switch trunks coupling a
plurality of member switches in a fabric switch, in accordance with an
embodiment of the present invention.

[0030]FIG. 3A presents a flowchart illustrating the process of a member
switch in a fabric switch facilitating dynamic configuration of a
hypervisor discovered via an edge port, in accordance with an embodiment
of the present invention.

[0031]FIG. 3B presents a flowchart illustrating the process of a member
switch in a fabric switch facilitating dynamic configuration of a
hypervisor discovered via an inter-switch port, in accordance with an
embodiment of the present invention.

[0032]FIG. 4A presents a flowchart illustrating the process of a member
switch of a fabric switch forwarding a frame received via an edge port,
in accordance with an embodiment of the present invention.

[0033]FIG. 4B presents a flowchart illustrating the process of a member
switch of a fabric switch forwarding a frame received via an inter-switch
port, in accordance with an embodiment of the present invention.

[0034]FIG. 5 illustrates an exemplary processing of broadcast, unknown
unicast, and multicast traffic in a fabric switch with a virtual tunnel
gateway, in accordance with an embodiment of the present invention.

[0035]FIG. 6 presents a flowchart illustrating the process of a member
tunnel gateway in a fabric switch processing broadcast, unknown unicast,
and multicast traffic, in accordance with an embodiment of the present
invention.

[0036] FIG. 7 illustrates an exemplary member switch associated with a
virtual member tunnel gateway in a fabric switch, in accordance with an
embodiment of the present invention.

[0037] In the figures, like reference numerals refer to the same figure
elements.

DETAILED DESCRIPTION

[0038] The following description is presented to enable any person skilled
in the art to make and use the invention, and is provided in the context
of a particular application and its requirements. Various modifications
to the disclosed embodiments will be readily apparent to those skilled in
the art, and the general principles defined herein may be applied to
other embodiments and applications without departing from the spirit and
scope of the present invention. Thus, the present invention is not
limited to the embodiments shown, but is to be accorded the widest scope
consistent with the claims.

Overview

[0039] In embodiments of the present invention, the problem of
facilitating overlay tunneling in a fabric switch is solved by operating
one or more member switches of the fabric switch as tunnel gateways
(which can be referred to as member tunnel gateways) virtualized as one
virtual tunnel gateway. To achieve high utilization of network devices
(e.g., servers and switches), a hypervisor often requires communication
to physical and virtual devices which are external to its VLAN and cannot
establish a tunnel with the hypervisor. For example, a default router of
a network may support a different tunneling technology or may not support
tunneling. A tunnel gateway allows the hypervisor to communicate beyond
its VLAN boundaries without requiring any tunnel support from the desired
destination. Whenever a hypervisor requires communication beyond its VLAN
boundaries, the hypervisor initiates and establishes an overlay tunnel
with the tunnel gateway, which in turn communicates with the desired
destination.

[0040] Because a large number of hypervisors can be associated with a
single network, the tunnel gateway of the network can become a
bottleneck. To reduce the bottleneck, the network can include multiple
tunnel gateways. Consequently, a respective hypervisor requires
configurations to establish association with a tunnel gateway. For
example, if the network has three tunnel gateways, a respective
hypervisor is configured to associate with one of the three tunnel
gateways. Furthermore, if the number of hypervisors increases, the
existing tunnel gateways can again become a bottleneck. When an
additional tunnel gateway is added to the network to reduce the
bottleneck, the hypervisors require reconfigurations. Similarly, when a
tunnel gateway fails, the hypervisors associated with the failed tunnel
gateway need to be reassigned to the existing tunnel gateways. Such
configurations and reconfigurations can be tedious, repetitious, and
error-prone.

[0041] To solve this problem, the member switches, which are member tunnel
gateways in a fabric switch, present the entire fabric switch as one
single logical tunnel gateway to the local hypervisors. The member tunnel
gateways are virtualized as a virtual member switch and a virtual member
tunnel gateway. Other member switches, which are not member tunnel
gateways, consider the virtual gateway switch as another member switch
coupled to the member tunnel gateways. At the same time, the local
hypervisors consider the virtual member tunnel gateway as a local tunnel
gateway. The virtual member tunnel gateway is associated with a virtual
Internet Protocol (IP) address and a virtual Media Access Control (MAC)
address. A respective member tunnel gateway considers these virtual
addresses as local addresses.

[0042] A respective hypervisor coupled to the fabric switch is dynamically
configured to consider the virtual member tunnel gateway as the tunnel
gateway for the hypervisor. This allows the whole fabric switch to act as
a distributed tunnel gateway. As a result, the hypervisor can establish
an overlay tunnel with any of the member tunnel gateways in the fabric
switch associated with the virtual member tunnel gateway; and a member
tunnel gateway can be dynamically added to or removed from the fabric
switch without reconfiguring the local hypervisors. In this way, the
fabric switch with a virtual tunnel gateway supports a large number of
tunnels in a scalable way.

[0043] In some embodiments, the fabric switch is an Ethernet fabric
switch. In an Ethernet fabric switch, any number of switches coupled in
an arbitrary topology may logically operate as a single switch. Any new
switch may join or leave the fabric switch in "plug-and-play" mode
without any manual configuration. A fabric switch appears as a single
logical switch to an external device. In some further embodiments, the
fabric switch is a Transparent Interconnection of Lots of Links (TRILL)
network and a respective member switch of the fabric switch is a TRILL
routing bridge (RBridge).

[0044] Although the present disclosure is presented using examples based
on the TRILL protocol, embodiments of the present invention are not
limited to networks defined using TRILL, or a particular Open System
Interconnection Reference Model (OSI reference model) layer. For example,
embodiments of the present invention can also be applied to a
multi-protocol label switching (MPLS) network. In this disclosure, the
term "fabric switch" is used in a generic sense, and can refer to a
network operating in any networking layer, sub-layer, or a combination of
networking layers.

[0045] The term "external device" can refer to a device coupled to a
fabric switch. An external device can be a host, a server, a conventional
layer-2 switch, a layer-3 router, or any other type of device.
Additionally, an external device can be coupled to other switches or
hosts further away from a network. An external device can also be an
aggregation point for a number of network devices to enter the network.
The terms "device" and "machine" are used interchangeably.

[0046] The term "hypervisor" is used in a generic sense, and can refer to
any virtual machine manager. Any software, firmware, or hardware that
creates and runs virtual machines can be a "hypervisor." The term
"virtual machine" also used in a generic sense and can refer to software
implementation of a machine or device. Any virtual device which can
execute a software program similar to a physical device can be a "virtual
machine." A host external device on which a hypervisor runs one or more
virtual machines can be referred to as a "host machine."

[0047] The term "tunnel" refers to a data communication where one or more
networking protocols are encapsulated using another networking protocol.
Although the present disclosure is presented using examples based on a
layer-3 encapsulation of a layer-2 protocol, "tunnel" should not be
interpreted as limiting embodiments of the present invention to layer-2
and layer-3 protocols. A "tunnel" can be established for any networking
layer, sub-layer, or a combination of networking layers.

[0048] The term "frame" refers to a group of bits that can be transported
together across a network. "Frame" should not be interpreted as limiting
embodiments of the present invention to layer-2 networks. "Frame" can be
replaced by other terminologies referring to a group of bits, such as
"packet," "cell," or "datagram."

[0049] The term "switch" is used in a generic sense, and it can refer to
any standalone or fabric switch operating in any network layer. "Switch"
should not be interpreted as limiting embodiments of the present
invention to layer-2 networks. Any device that can forward traffic to an
external device or another switch can be referred to as a "switch."
Examples of a "switch" include, but are not limited to, a layer-2 switch,
a layer-3 router, a TRILL RBridge, or a fabric switch comprising a
plurality of similar or heterogeneous smaller physical switches.

[0050] The term "RBridge" refers to routing bridges, which are bridges
implementing the TRILL protocol as described in Internet Engineering Task
Force (IETF) Request for Comments (RFC) "Routing Bridges (RBridges): Base
Protocol Specification," available at http://tools.ietf.org/html/rfc6325,
which is incorporated by reference herein. Embodiments of the present
invention are not limited to application among RBridges. Other types of
switches, routers, and forwarders can also be used.

[0051] The term "edge port" refers to a port in a fabric switch which
exchanges data frames with an external device outside of the fabric
switch. The term "inter-switch port" refers to a port which couples a
member switch of a fabric switch with another member switch and is used
for exchanging data frames between the member switches.

[0052] The term "switch identifier" refers to a group of bits that can be
used to identify a switch. If the switch is an RBridge, the switch
identifier can be an "RBridge identifier." The TRILL standard uses
"RBridge ID" to denote a 48-bit
Intermediate-System-to-Intermediate-System (IS-IS) ID assigned to an
RBridge, and "RBridge nickname" to denote a 16-bit value that serves as
an abbreviation for the "RBridge ID." In this disclosure, "switch
identifier" is used as a generic term, is not limited to any bit format,
and can refer to any format that can identify a switch. The term "RBridge
identifier" is used in a generic sense, is not limited to any bit format,
and can refer to "RBridge ID," "RBridge nickname," or any other format
that can identify an RBridge.

[0053] The term "fabric switch" refers to a number of interconnected
physical switches which form a single, scalable logical switch. In a
fabric switch, any number of switches can be connected in an arbitrary
topology, and the entire group of switches functions together as one
single, logical switch. This feature makes it possible to use many
smaller, inexpensive switches to construct a large fabric switch, which
can be viewed as a single logical switch externally.

Network Architecture

[0054]FIG. 1A illustrates an exemplary fabric switch with a virtual
tunnel gateway, in accordance with an embodiment of the present
invention. As illustrated in FIG. 1A, a fabric switch 100 includes member
switches 101, 102, 103, 104, and 105. Switch 101 is coupled to service
appliance 132 and a layer-3 router 134; and switch 102 is coupled to
layer-3 router 134 and a physical switch 136. Appliance 132 can provide a
service to fabric switch 100, such as firewall protection, load
balancing, and instruction detection. Member switches in fabric switch
100 send frames outside of fabric switch 100 via router 134. Switch 136
can be coupled to other devices, such as a high-performance database.
Member switches in fabric switch 100 use edge ports to communicate to
external devices and inter-switch ports to communicate to other member
switches. For example, switch 102 is coupled to external devices, such as
router 134 and switch 136, via edge ports and to switches 101, 103, 104,
and 105 via inter-switch ports.

[0055] Switches 101 and 102 also operate as tunnel gateways (i.e., member
tunnel gateways 101 and 102) in fabric switch 100. Switches 101 and 102
are virtualized as a virtual gateway switch 150. Switches 103, 104, and
105 consider virtual gateway switch 150 as another member switch
reachable via switches 101 and 102. Virtual gateway switch 150 is also
virtualized as a virtual member tunnel gateway 150 to the hypervisors
coupled to fabric switch 100. Hence, the terms "member switch" and
"member tunnel gateway" are used interchangeably for virtual gateway
switch 150, and associated member switches 101 and 102. Virtual tunnel
gateway 150 is associated with a virtual IP address and a virtual MAC
address. Member tunnel gateways 101 and 102 are associated with these
virtual addresses in conjunction with each other. Consequently, member
tunnel gateways 101 and 102 consider these virtual addresses as local
addresses. In some embodiments, fabric switch 100 is a TRILL network;
switches 101, 102, 103, 104, and 105 are RBridges; and data frames
transmitted and received via inter-switch ports are encapsulated in TRILL
headers. Under such a scenario, virtual member tunnel gateway 150 can be
a virtual RBridge with a virtual RBridge identifier. Switch
virtualization in a fabric switch and its associated operations, such as
data frame forwarding, are specified in U.S. Pat. No. Publication No.
2010/0246388, titled "Redundant Host Connection in a Routed Network," the
disclosure of which is incorporated herein in its entirety.

[0056] Host machines 112 and 114 are coupled to switches 103 and 105,
respectively. During operation, switch 103 discovers the hypervisor of
host machine 112. Switch 103 then sends a configuration message to the
hypervisor with the virtual IP address, and optionally, the virtual MAC
address associated with virtual member tunnel gateway 150. In some
embodiments, switch 103 forwards the hypervisor information toward
virtual gateway switch 150. Switch 101 or 102 receives the information
and sends the configuration message to the hypervisor via switch 103.
Upon receiving the configuration message, the hypervisor is dynamically
configured with the virtual IP address as the tunnel gateway address. In
the same way, the hypervisor in host machine 114 is also configured with
the virtual IP address as the tunnel gateway address. This allows fabric
switch 100 to act as a distributed tunnel gateway represented by virtual
member tunnel gateway 150.

[0057] Suppose that virtual machine 122 in host machine 112 initiates a
data communication which crosses its VLAN boundary and sends an
associated data frame toward router 134. The hypervisor in host machine
112 initiates an overlay tunnel for the frame by encapsulating the frame
in a layer-3 packet with the virtual IP address as the destination IP
address. Examples of such a tunnel include, but are not limited to,
Virtual Extensible Local Area Network (VXLAN), Generic Routing
Encapsulation (GRE), and its variations, such as Network Virtualization
using GRE (NVGRE) and openvSwitch GRE. The hypervisor in host machine 112
can further encapsulate the packet in an Ethernet frame with the virtual
MAC address as the destination MAC address, and forwards the frame toward
virtual member tunnel gateway 150.

[0058] Upon receiving the frame, egress switch 103 identifies the
destination MAC address to be associated with virtual gateway switch 150.
Switch 103 considers virtual gateway switch 150 to be another member
switch and forwards the frame to switch 101. Upon receiving the frame,
switch 101 recognizes the virtual IP and MAC addresses to be local
addresses, extracts the inner packet, and forwards the inner packet to
router 134 based on the forwarding information of the inner packet.
Similarly, if virtual machine 124 in host machine 114 sends a frame
toward switch 136, the hypervisor in host machine 114 tunnels the frame
by encapsulating the frame in a layer-3 packet with the virtual IP
address as the destination IP address. Switch 103 receives the frame,
recognizes the virtual IP and MAC addresses to be local addresses,
extracts the inner packet, and forwards the inner packet to switch 136
based on the forwarding information of the inner packet.

[0059] Suppose that virtual machine 122 requires migration from host
machine 112 to a remote location via router 134. The hypervisor of host
machine 112 tunnels the data associated with the migration by
encapsulating the data in an IP packet with the virtual IP address of
virtual member tunnel gateway 150 as the destination address. On the
other hand, if virtual machine 122 requires migration from host machine
112 to host machine 114, the hypervisor of host machine 112 can simply
send the associated data to the hypervisor of host machine 114, as long
as they are configured with the same VLAN. If virtual tunnel gateway 150
also operates a default router for the hypervisors in host machines 112
and 114, the hypervisor of host machine 112 can tunnel the associated
data directly to the hypervisor of host machine 114 via default router
150. Member tunnel gateways 101 and 102 can age out the tunnels from the
hypervisors of host machines 112 and 114 upon detecting inactivity from
the tunnels. In some embodiments, member tunnel gateways 101 and 102
maintain an activity bit for a respective tunnel to indicate activity or
inactivity over a period of time.

[0060]FIG. 1B illustrates a virtual tunnel gateway being associated with
a respective member switch of a fabric switch in conjunction with the
example in FIG. 1A, in accordance with an embodiment of the present
invention. Because the entire fabric switch 100 appears as a single
tunnel gateway represented by virtual member tunnel gateway 150, another
member tunnel gateway can be dynamically added to fabric switch 100. In
some embodiments, existing member switches can be configured as member
tunnel gateways as well. In the example of FIG. 1B, switches 103, 104,
and 105 are also configured as member tunnel gateways. Switches 103, 104,
and 105 become associated with virtual gateway switch 150, and establish
association with the corresponding virtual IP address and the virtual MAC
address. The hypervisors of host machines 112 and 114 simply continue to
tunnel frames by encapsulating the frames using the virtual IP address.
Consequently, when the hypervisor in host machine 112 tunnels frames
toward virtual member tunnel gateway 150, egress switch 103 recognizes
the virtual IP and MAC addresses and local addresses, extracts the inner
frame, and forwards the frame to router 134 based on the forwarding
information of the inner frame.

Network Configurations

[0061]FIG. 2A illustrates an exemplary configuration of a fabric switch
with a virtual tunnel gateway, in accordance with an embodiment of the
present invention. In this example, a fabric switch 200 includes switches
212, 214, and 216. Fabric switch 200 also includes switches 202, 204, 222
and 224, each with a number of edge ports which can be coupled to
external devices. For example, switches 202 and 204 are coupled with host
machines 250 and 260 via Ethernet edge ports. Switches 222 and 224 are
coupled to network 240, which can be any local or wide area network, such
as the Internet. Host machine 250 includes virtual machines 254, 256, and
258, which are managed by hypervisor 252. Host machine 260 includes
virtual machines 264, 266, and 268, which are managed by hypervisor 262.
Virtual machines in host machines 250 and 260 are logically coupled to
virtual switches 251 and 261, respectively, via their respective virtual
ports. For example, virtual machines 254 and 264 are coupled to virtual
switches 251 and 261, respectively, via virtual ports 253 and 263,
respectively.

[0062] In some embodiments, switches in fabric switch 200 are TRILL
RBridges and in communication with each other using TRILL protocol. These
RBridges have TRILL-based inter-switch ports for connection with other
TRILL RBridges in fabric switch 200. Although the physical switches
within fabric switch 200 are labeled as "TRILL RBridges," they are
different from conventional TRILL RBridge in the sense that they are
controlled by the Fibre Channel (FC) switch fabric control plane. In
other words, the assignment of switch addresses, link discovery and
maintenance, topology convergence, routing, and forwarding can be handled
by the corresponding FC protocols. Particularly, each TRILL RBridge's
switch ID or nickname is mapped from the corresponding FC switch domain
ID, which can be automatically assigned when a switch joins fabric switch
200 (which is logically similar to an FC switch fabric).

[0063] Note that TRILL is only used as a transport between the switches
within fabric switch 200. This is because TRILL can readily accommodate
native Ethernet frames. Also, the TRILL standards provide a ready-to-use
forwarding mechanism that can be used in any routed network with
arbitrary topology (although the actual routing in fabric switch 200 is
done by the FC switch fabric protocols). Embodiments of the present
invention should be not limited to using only TRILL as the transport.
Other protocols (such as multi-protocol label switching (MPLS) or
Internet Protocol (IP)), either public or proprietary, can also be used
for the transport.

[0064] In the example in FIG. 2, RBridges 222 and 224 are also member
tunnel gateways. In some embodiments, a respective member tunnel gateway
is capable of processing layer-3 (e.g., IP) packets to facilitate layer-3
overlay tunnels over layer-2 and TRILL network. RBridges 222 and 224 are
virtualized as a virtual RBridge 230 (which corresponds to a virtual
gateway switch) with virtual RBridge identifier 232. RBridges 222 and 224
are associated with virtual RBridge identifier 232. RBridges 202, 204,
212, 214, and 216 consider virtual RBridge 230 as another member switch
reachable via RBridges 222 and 224. Virtual RBridge 230 is presented to
hypervisors 252 and 262 as virtual member tunnel gateway 230. Hence, the
terms "RBridge" and "member tunnel gateway" are used interchangeably for
virtual RBridge 230, and associated RBridges 222 and 224. Virtual tunnel
gateway 230 is associated with a virtual IP address 236 and a virtual MAC
address 234. Member tunnel gateways 222 and 224 are associated with
virtual IP address 236 and virtual MAC address 234. Consequently, member
tunnel gateways 222 and 224 consider virtual IP address 236 and virtual
MAC address 234 as local addresses.

[0066] Upon receiving the configuration message, hypervisor 252 configures
virtual IP address 236 as the tunnel gateway address, which can also be
the default router IP address for hypervisor 252. In some embodiments,
RBridge 222 can use Dynamic Host Configuration Protocol (DHCP) for
providing the configuration information. Similarly, upon receiving a
configuration message from RBridge 204, hypervisor 262 configures virtual
IP address 236 as the tunnel gateway address for hypervisor 262. Suppose
that virtual machine 254 sends a frame toward network 240. Hypervisor
252, via virtual switch 251, tunnels the frame by encapsulating the frame
in a layer-3 packet with virtual IP address 236 as the destination IP
address. Hypervisor 252 further encapsulates the packet in an Ethernet
frame with virtual MAC address 234 as the destination MAC address, and
forwards the frame to RBridge 202. Upon receiving the frame, egress
RBridge 202 identifies virtual MAC address 234 to be associated with
virtual RBridge 230 reachable via RBridges 222 and 224. RBridge 202 then
encapsulates the frame in a TRILL packet with virtual RBridge identifier
232 as the egress RBridge identifier and forwards the frame toward
virtual RBridge 230.

[0067] The TRILL packet is received by one of intermediate RBridges 212
and 214, and forwarded to RBridge 222 or 224 based on the TRILL routing
in fabric switch 200. Suppose that RBridge 222 receives the TRILL packet.
RBridge 222 identifies virtual RBridge identifier 232 as the egress
RBridge identifier and recognizes virtual RBridge identifier 232 as a
local RBridge identifier. RBridge 222 removes the TRILL encapsulation and
extracts the layer-2 frame. RBridge 222 identifies virtual MAC address
234 as the destination MAC address of the frame and recognizes virtual
MAC address 234 to be a local MAC address. Because RBridge 222 has IP
processing capability, RBridge 222 then promotes the packet in the frame
to the upper layer (e.g., IP layer).

[0068] RBridge 222 identifies virtual IP address 232 as the destination IP
address of the packet, recognizes virtual IP address 232 as a local IP
address, and extracts the inner frame. RBridge 222 thus removes the
tunneling encapsulation of hypervisor 252. RBridge 222 then forwards the
inner frame to network 240 based on the forwarding information of the
inner frame. In this way, the entire fabric switch 200 operates as a
tunnel gateway for hypervisor 252.

[0069] When RBridge 222 removes the tunneling encapsulation, RBridge 222
learns the MAC address of virtual machine 254 from the inner frame. In
some embodiments, RBridge 222 learns the MAC address of virtual machine
254 directly from the tunnel encapsulated packet. RBridge 222 can also
learn other associated information, such as the MAC and IP addresses of
hypervisor 252, and outer and inner VLANs associated with the frame. In
some embodiments, RBridge 222 shares the learned information with other
member tunnel gateways in fabric switch 200, such as RBridge 224. RBridge
224 can consider the information received from RBridge 222 to be learned
from a locally terminated tunnel.

[0070] In this way, RBridges 222 and 224 learn the MAC addresses (and the
associated information) of virtual machines 256, 258, 264, 266, and 268
as well. In some embodiments, RBridges 222 and 224 share the learned MAC
addresses with the rest of fabric switch 200. RBridges 222 and 224 can
also share the learned associated information with the rest of fabric
switch 200 as well. Consequently, whenever any member switch of fabric
switch 200 learns a MAC address, all other member switches learn the MAC
address as well. In some embodiments, switches 202 and 204 use internal
control messages to share the learned MAC addresses.

[0071] In some embodiments, all RBridges in fabric switch 200 operate as
member tunnel gateways and are associated with virtual RBridge 230. Under
such a scenario, RBridge 202 removes tunneling encapsulation of
hypervisor 252 and extracts the internal frame. RBridge 202 recognizes
network 240 to be reachable via RBridges 222 and 224. RBridge 202 then
encapsulates the inner frame in a TRILL packet and forwards the
TRILL-encapsulated inner frame toward one of RBridges 222 and 224. If
hypervisor 252 is sending multiple frames to network 240, RBridge 202 can
use equal cost multiple paths (ECMP). Hence, multi-pathing can be
achieved when RBridges 202 and 204 choose to send TRILL-encapsulated data
frames toward virtual RBridge 230 via RBridges 222 and 224.

[0072]FIG. 2B illustrates exemplary multi-switch trunks coupling a
plurality of member switches in a fabric switch, in accordance with an
embodiment of the present invention. As illustrated in FIG. 2B, RBridges
202 and 204 are configured to operate in a special "trunked" mode for
host machines 250 and 260, and hypervisors 252 and 262. Hypervisors 252
and 262 view RBridges 202 and 204 as a common virtual RBridge 270, with a
corresponding virtual RBridge identifier 272. Hypervisors 252 and 262 are
considered to be logically coupled to virtual RBridge 270 via logical
links represented by dotted lines. Virtual RBridge 270 is considered to
be logically coupled to both RBridges 202 and 204, optionally with
zero-cost links (also represented by dotted lines).

[0073] While forwarding data frames from hypervisors 252 and 262, RBridges
202 and 204 encapsulate the frame using the TRILL protocol and assign
virtual RBridge identifier 272 as the ingress RBridge identifier. As a
result, other RBridges in fabric switch 200 learn that hypervisors 252
and 262, and their corresponding virtual machines are reachable via
virtual RBridge 270. In the following description, RBridges which
participate in link aggregation are referred to as "partner RBridges."
Since the two partner RBridges function as a single logical RBridge, the
MAC address reachability learned by a respective RBridge is shared with
the other partner RBridge. For example, during normal operation, virtual
machine 254 may choose to send its outgoing data frames only via the link
to RBridge 202. As a result, only RBridge 202 would learn virtual machine
254's MAC address. This information is then shared by RBridge 202 with
RBridge 204 via their respective inter-switch ports. In some embodiments,
RBridges 202 and 204 can advertise their respective connectivity
(optionally via zero-cost links) to virtual RBridge 270. Hence,
multi-pathing can be achieved when other RBridges choose to send data
frames to virtual RBridge 270 (which is marked as the egress RBridge in
the frames) via RBridges 202 and 204.

[0074] Note that virtual RBridge 270 is distinct from virtual RBridge 230.
Virtual RBridge 230 represents the member tunnel gateways (i.e., the
gateway switches) in fabric switch 200 as a single logical switch, and,
in addition to virtual RBridge identifier 232, is typically associated
with virtual MAC address 234 and virtual IP address 236. On the other
hand, virtual RBridge 270 represents a multi-switch trunk as one logical
connection via virtual RBridge 270, and is associated with virtual
RBridge identifier 272. Fabric switch 200 can have a plurality of virtual
RBridges associated with different multi-switch trunks.

Dynamic Configuration

[0075] In the example in FIG. 2A, upon detecting hypervisor 252, RBridge
222 dynamically provides configuration information, such as virtual IP
address 236, to hypervisor 252. Hypervisor 252 then configures virtual IP
address 236 as the tunnel gateway address, which can also be the default
router IP address for hypervisor 252. FIG. 3A presents a flowchart
illustrating the process of a member switch in a fabric switch
facilitating dynamic configuration of a hypervisor discovered via an edge
port, in accordance with an embodiment of the present invention. Upon
detecting a new hypervisor via an edge port (operation 302), the switch
checks whether the local switch is a tunnel gateway (operation 304). In
some embodiments, the switch checks whether the local switch is
associated with the virtual IP address to determine whether the local
switch is a tunnel gateway.

[0076] If the local switch is not a tunnel gateway (operation 304), the
switch identifies the virtual gateway switch (operation 312), which is
also a virtual tunnel gateway. The switch constructs a notification
message comprising detected hypervisor information (operation 314) and
encapsulates the notification message with a virtual identifier of the
virtual gateway switch as the egress switch identifier (operation 316).
In some embodiments, the notification message is encapsulated in a TRILL
packet and the virtual identifier is a virtual RBridge identifier. The
switch then sends the encapsulated message toward the virtual gateway
switch (operation 318).

[0077] If the local switch is a tunnel gateway, the switch is aware of the
virtual IP address and the virtual MAC address. The switch then
constructs a configuration message comprising the virtual IP address as
the tunnel gateway address for the hypervisor (operation 322). This
configuration message can be a layer-2 notification/control message. In
some embodiments, the switch sends the configuration message using DHCP.
The configuration message can also indicate the virtual IP address as the
default router address for the hypervisor. The switch, operationally, can
include a mapping between the virtual IP address and the corresponding
virtual MAC address in the configuration message (operation 324). If not
included, upon receiving the configuration message, the hypervisor can
obtain the virtual MAC address by sending an ARP query with the virtual
IP address. The switch then transmits the configuration message to the
edge port coupling the hypervisor (operation 326).

[0078]FIG. 3B presents a flowchart illustrating the process of a member
switch in a fabric switch facilitating dynamic configuration of a
hypervisor discovered via an inter-switch port, in accordance with an
embodiment of the present invention. Upon receiving a notification
message from a remote ingress member switch via an inter-switch port
(operation 352), the switch decapsulates the notification message
(operation 354). In some embodiments, the switch removes a TRILL and/or
an FC header to decapsulate the notification message. The switch checks
whether the notification message is for a new hypervisor (operation 356).
If not, the switch takes action based on the information in the
notification message (operation 358).

[0079] If the notification message is for a new hypervisor (operation
356), the switch constructs a configuration message comprising the
virtual IP address as the tunnel gateway address for the hypervisor
(operation 362). The configuration message can also indicate the virtual
IP address as the default router address for the hypervisor. The switch,
optionally, can include a mapping between the virtual IP address and the
corresponding virtual MAC address in the configuration message (operation
364). The switch encapsulates the configuration message with the remote
member switch identifier as the egress switch identifier (operation 366).
In some embodiments, the notification message is encapsulated in a TRILL
packet and the remote member switch identifier is an RBridge identifier.
The switch then sends the encapsulated message toward the egress switch
(operation 368).

Frame Forwarding

[0080]FIG. 4A presents a flowchart illustrating the process of a member
switch of a fabric switch forwarding a frame received from a hypervisor
via an edge port, in accordance with an embodiment of the present
invention. The switch receives a data frame from the hypervisor via an
edge port (operation 402) and obtains the destination MAC address of the
received frame (operation 404). If the frame has a tunnel encapsulation,
the destination MAC address is a virtual MAC address associated with the
virtual tunnel gateway. The switch checks whether the MAC address is a
local address (operation 406). For example, if the switch is a member
tunnel gateway, the virtual MAC address is a local address. If the
destination MAC address is local, the switch promotes the frame to the
upper layer (e.g., layer-3) and extracts the internal encapsulated packet
(operation 408) and obtains the IP address of the extracted packet
(operation 412).

[0081] The destination IP address of the extracted packet is a virtual IP
address associated with the virtual tunnel gateway. The switch checks
whether the destination IP address is a local address (operation 414).
For example, if the switch is a member tunnel gateway, the virtual IP
address is a local address. If the IP address is local, the switch
terminates the tunnel encapsulation (i.e., decapsulates the frame)
(operation 422). The switch extracts the inner frame (operation 424) and
forwards the inner frame based on the destination address of the inner
frame (operation 426), as described in conjunction with FIG. 2A. If the
IP address is not local (operation 414), the switch is incorrectly
configured. If the switch is configured with the virtual MAC address, the
switch should also be configured with the corresponding virtual IP
address. The switch can optionally log the error associated with the
virtual IP address configuration (operation 416).

[0082] If the MAC address is not associated with the switch (operation
406), the frame can be a regular layer-2 frame without any tunnel
encapsulation. The switch identifies the egress switch associated with
the destination MAC address (operation 428). Because a respective member
switch in a fabric switch shares the learned MAC addresses with other
member switches, the switch can be aware of the egress switch associated
with the MAC address. The switch encapsulates the frame using an
identifier of the egress switch (operation 430). In some embodiments, the
switch encapsulates the frame in a TRILL packet and assigns an RBridge
identifier associated with the egress switch as the egress RBridge
identifier. The switch then forwards the frame to the egress switch
(operation 432).

[0083]FIG. 4B presents a flowchart illustrating the process of a member
switch of a fabric switch forwarding a frame received via an inter-switch
port, in accordance with an embodiment of the present invention. The
switch receives an encapsulated frame via an inter-switch port (operation
452) and checks whether the egress switch identifier is a local
identifier (operation 454). This local identifier can be a virtual switch
identifier. If not, the switch forwards the frame toward the egress
switch based on the egress switch identifier (operation 468). If the
identifier, which can be a virtual switch identifier, is local, the
switch decapsulates the frame (operation 456). In some embodiments, the
frame encapsulation is based on the TRILL protocol and the egress switch
identifier is a virtual RBridge identifier.

[0084] If the frame has a tunnel encapsulation, the destination MAC
address of the decapsulated frame is a virtual MAC address associated
with the virtual tunnel gateway. The switch checks whether the
destination MAC address is a local address (operation 458). For example,
if the switch is a member tunnel gateway, the virtual MAC address is a
local address. If the destination MAC address is not local, the frame is
destined for a locally coupled external device, and the switch forwards
the decapsulated frame to the locally coupled external device (operation
470). If the MAC address is local, the switch promotes the frame to the
upper layer and extracts the internal encapsulated packet (operation
460), and obtains the IP address of the extracted packet (operation 462).

[0085] The destination IP address of the extracted packet is a virtual IP
address associated with the virtual tunnel gateway. The switch checks
whether the IP address is a local address (operation 464). For example,
if the switch is a member tunnel gateway, the virtual IP address is a
local address. If the IP address is local, the switch terminates the
tunnel encapsulation (operation 472). The switch extracts the inner
packet (operation 474) and forwards the inner packet based on the
destination address of the inner packet (operation 476), as described in
conjunction with FIG. 2A. If the destination IP address is not local, the
switch is incorrectly configured. If the switch is configured with the
virtual MAC address, the switch should also be configured with the
virtual IP address. The switch can optionally log the error associated
with the virtual IP address configuration (operation 466).

Broadcast, Unknown Unicast, and Multicast Server

[0086] Typically broadcast, unknown unicast, or multicast traffic (which
can be referred to as "BUM" traffic) is distributed to multiple
recipients. For ease of deployment, hypervisors typically make multiple
copies of the data frames belonging to such traffic and individually
unicast the data frames. This often leads to inefficient usage of
processing capability of the hypervisors, especially in a large scale
deployment. To solve this problem, a fabric switch with a virtual tunnel
gateway can facilitate efficient distribution of such traffic. FIG. 5
illustrates an exemplary processing of broadcast, unknown unicast, and
multicast traffic in a fabric switch with a virtual tunnel gateway, in
accordance with an embodiment of the present invention. As illustrated in
FIG. 5, a fabric switch 500 includes member switches 501, 502, 503, 504,
and 505. Member switches in fabric switch 500 use edge ports to
communicate to external devices and inter-switch ports to communicate to
other member switches

[0087] A respective member switch in fabric switch 500 operates as a
member tunnel gateway. Switches 501, 502, 503, 504, and 505 are
virtualized as a virtual member tunnel gateway 510 to hypervisors 522,
532, 542, 552, 562, and 572 in host machines 520, 530, 540, 550, 560, and
570, respectively. Virtual tunnel gateway 510 is associated with a
virtual IP address and a virtual MAC address. All member tunnel gateways
consider these virtual addresses to be local addresses. In some
embodiments, fabric switch 500 is a TRILL network; switches 501, 502,
503, 504, and 505 are RBridges; and data frames transmitted and received
via inter-switch ports are encapsulated using the TRILL protocol. Under
such a scenario, virtual member tunnel gateway 510 can be a virtual
RBridge with a virtual RBridge identifier.

[0088] To facilitate multicast traffic distribution, fabric switch 500
maintains states for a respective multicast group associated with
hypervisors 522, 532, 542, 552, 562, and 572. Note that such states are
not proportional to the number of virtual machines coupled to the fabric,
but are dependent on the number of multicast groups and VLANs associated
with the virtual machines. A respective member tunnel gateway in fabric
switch 500 is aware of the VLAN and multicast group association of a
respective hypervisor. When a virtual machine sends a join or leave
request for a multicast group, the corresponding hypervisor tunnels the
request to the virtual IP address of virtual tunnel gateway 510.

[0089] In some embodiments, a respective hypervisor implements a multicast
proxy server (e.g., an Internet Group Management Protocol (IGMP) proxy
server) and sends only the first join and last leave requests associated
with a specific multicast group. For example, if virtual machines 554,
556, and 558 send join requests for a multicast group, hypervisor 552
sends only the first join request toward virtual member tunnel gateway
510. On the other hand, if virtual machines 554 and 558 send leave
requests for the multicast group, hypervisor 552 does not send out the
leave requests because virtual machine 556 continues to receive traffic
for the multicast group. However, when virtual machine 556 sends a leave
request for the multicast group, hypervisor 552 recognizes it to be the
last leave request and forwards the leave request toward virtual member
tunnel gateway 510.

[0090] During operation, virtual machines 524, 546, and 564 become members
of a multicast group. When switch 503 receives a multicast frame from
multicast router 580, switch 503 forwards the frame via multicast tree
592. As a result, a respective switch in fabric switch receives the
frame. Switches 502, 503, and 505 transmit the frame to corresponding
hypervisors 522, 542, and 562, while switches 501 and 504 discard the
frame. In some embodiments, switch 503 identifies virtual machines 524,
546, and 564 to be the members of the multicast group, and forwards the
frame via multicast tree 596, which includes only switches 502, 503, and
505.

[0091] In some embodiments, fabric switch 500 operates as an ARP server.
When virtual machine 534 sends an ARP request, instead of broadcasting
(i.e., unicasting multiple copies), hypervisor 532 tunnels a single copy
of the request toward virtual member tunnel gateway 510. Switch 505,
which is also a member tunnel gateway, receives and decapsulates the
request, as described in conjunction with FIGS. 2A and 2B. Switch 505
then distributes the request in fabric switch 500 via multicast tree 592.
Similarly, when virtual machine 574 sends an ARP request, hypervisor 572
tunnels a single copy of the request toward virtual member tunnel gateway
510. Switch 501 receives the request and distributes the frame in fabric
switch 500 via a different multicast tree 594. In this way, the member
tunnel gateways in fabric switch 500 load balance across a plurality of
multicast trees for broadcast, unknown unicast, or multicast traffic.
Selection of multicast tree can further depend on VLAN memberships of the
member switches.

[0092]FIG. 6 presents a flowchart illustrating the process of a member
tunnel gateway in a fabric switch processing broadcast, unknown unicast,
and multicast traffic, in accordance with an embodiment of the present
invention. The member tunnel gateway receives a packet, which is part of
a broadcast, unknown unicast, or multicast traffic flow, from a
hypervisor (operation 602). This packet is encapsulated with the virtual
MAC and IP addresses of a virtual member tunnel gateway, as described in
conjunction with FIG. 5. The member tunnel gateway terminates the tunnel
encapsulation and extracts the inner packet (operation 604), as described
in conjunction with FIGS. 4A and 4B. The member tunnel gateway checks
whether the packet is a multicast packet (operation 606). If so, the
member tunnel gateway selects a multicast tree in the fabric switch based
on the multicast group and the network load (operation 608).

[0093] If the packet is not a multicast packet, the member tunnel gateway
checks whether the packet is a broadcast packet (operation 610). For
example, an ARP request from a hypervisor is a layer-2 broadcast frame
encapsulated in a layer-3 packet. If the packet is not a broadcast
packet, the member tunnel gateway checks whether the packet is a frame of
unknown destination (operation 620). If the packet is not a frame of
unknown destination (i.e., the member tunnel gateway has already learned
the destination MAC address), the member tunnel gateway sends back a
mapping of the destination MAC address and the corresponding IP address
(which can be a hypervisor IP address) (operation 622) and forwards the
frame based on the destination MAC address (operation 624). For example,
the MAC address can be associated with a remote member switch. The member
tunnel gateway forwards the frame toward that remote member switch.

[0094] If the packet is a broadcast packet (operation 610) or the packet
is a frame with unknown destination (operation 620), the member tunnel
gateway selects a multicast tree comprising all switches in the fabric
switch based on network load and VLAN configuration (operation 612).
After selecting a multicast tree (operations 608 and 612), the member
tunnel gateway forwards the frame via the selected multicast tree
(operation 614). In some embodiments, for multicast traffic of a
multicast group, the member tunnel gateway selects a multicast tree only
with the member switches coupling virtual machines belonging to the
multicast group (e.g., multicast tree 596 in the example in FIG. 5).

Exemplary Switch

[0095] FIG. 7 illustrates an exemplary member switch associated with a
virtual member tunnel gateway in a fabric switch, in accordance with an
embodiment of the present invention. In this example, a switch 700
includes a number of communication ports 702, a forwarding module 720, a
tunnel management module 730, a packet processor 710 coupled to tunnel
management module 730, and a storage 750. In some embodiments, switch 700
may maintain a membership in a fabric switch, wherein switch 700 also
includes a fabric switch management module 760. Fabric switch management
module 760 maintains a configuration database in storage 750 that
maintains the configuration state of a respective switch within the
fabric switch. Fabric switch management module 760 maintains the state of
the fabric switch, which is used to join other switches. Under such a
scenario, communication ports 702 can include inter-switch communication
channels for communication within a fabric switch. This inter-switch
communication channel can be implemented via a regular communication port
and based on any open or proprietary format.

[0096] Tunnel management module 730 operates switch 700 as a tunnel
gateway capable of terminating an overlay tunnel, as described in
conjunction with FIG. 2A. Tunnel management module 730 also maintains an
association between switch 700 and a virtual tunnel gateway. The virtual
tunnel gateway is associated with a virtual IP address. If switch 700 is
a member switch of a fabric switch, the virtual IP address can also be
associated with another member switch of the fabric switch. This other
member switch also operates as a tunnel gateway and is associated with
the virtual tunnel gateway. In some embodiments, switch 700 is a TRILL
RBridge. Under such a scenario, the virtual tunnel gateway is also
associated with a virtual RBridge identifier.

[0097] In some embodiments, switch 700 also includes a device management
module 732, which operates in conjunction with the packet processor. Upon
detecting a new hypervisor, device management module 732 generates a
configuration message comprising the virtual IP address as a tunnel
gateway address for the hypervisor, as described in conjunction with
FIGS. 3A and 3B. In some embodiments, the virtual IP address in the
configuration message also corresponds to a default gateway router.
During operation, the hypervisor initiates an overlay tunnel with switch
700 by encapsulating inner data packets in another layer-3 data packet.

[0098] Upon receiving the tunnel encapsulated data packet from the
hypervisor, packet processor 710 identifies in the data packet the
virtual IP address associated with the virtual tunnel gateway and
extracts the inner packet from the data packet. In some embodiments, the
packet is TRILL encapsulated and is received via one of the communication
ports 702 capable of receiving TRILL packets. Packet processor 710
identifies the virtual RBridge identifier in the TRILL header, as
described in conjunction with FIG. 2A. Forwarding module 720 then
determines an output port from one of the communication ports 702 for the
inner packet based on the destination address of the inner packet. To
facilitate layer-2 switching, the encapsulated data packet can include a
virtual MAC address mapped to the virtual IP address. Packet processor
710 can identify this virtual MAC address in the data packet as well.

[0099] Note that the above-mentioned modules can be implemented in
hardware as well as in software. In one embodiment, these modules can be
embodied in computer-executable instructions stored in a memory which is
coupled to one or more processors in switch 700. When executed, these
instructions cause the processor(s) to perform the aforementioned
functions.

[0100] In summary, embodiments of the present invention provide a switch
and a method for facilitating overlay tunneling in a fabric switch. In
one embodiment, the switch includes a tunnel management module, a packet
processor, and a forwarding module. The tunnel management module operates
the switch as a tunnel gateway capable of terminating an overlay tunnel.
During operation, the packet processor, which is coupled to the tunnel
management module, identifies in a data packet a virtual IP address
associated with a virtual tunnel gateway. This virtual tunnel gateway is
associated with the switch and the data packet is associated with the
overlay tunnel. The forwarding module determines an output port for an
inner packet in the data packet based on a destination address of the
inner packet.

[0101] The methods and processes described herein can be embodied as code
and/or data, which can be stored in a computer-readable non-transitory
storage medium. When a computer system reads and executes the code and/or
data stored on the computer-readable non-transitory storage medium, the
computer system performs the methods and processes embodied as data
structures and code and stored within the medium.

[0102] The methods and processes described herein can be executed by
and/or included in hardware modules or apparatus. These modules or
apparatus may include, but are not limited to, an application-specific
integrated circuit (ASIC) chip, a field-programmable gate array (FPGA), a
dedicated or shared processor that executes a particular software module
or a piece of code at a particular time, and/or other programmable-logic
devices now known or later developed. When the hardware modules or
apparatus are activated, they perform the methods and processes included
within them.

[0103] The foregoing descriptions of embodiments of the present invention
have been presented only for purposes of illustration and description.
They are not intended to be exhaustive or to limit this disclosure.
Accordingly, many modifications and variations will be apparent to
practitioners skilled in the art. The scope of the present invention is
defined by the appended claims.