A Primer on Socket Programming

The Unix operating system revolutionized many features in the programming world. Among these is the file descriptor. A file descriptor provides a programming interface to a file object. Because nearly every object contained in a Unix system is defined as a file, the file descriptor can be used to send and receive data that has many objects across the Unix system. This makes life much simpler for Unix programmers. The same type of programming model works no matter what type of device (or file) you are trying to access.

Starting in the 4.2BSD Unix release, network access was also defined using file descriptors. A network file descriptor is called a socket. Unix (and Windows) network programs both utilize sockets for all network communication. This section describes the features of socket network programming in general so that you will be better prepared to understand the concepts behind C# network programming. Following this is the “Socket Programming in Windows” section, which describes the Windows Winsock implementation of sockets. Winsock is the base on which the C# Socket class implementation is built.

Sockets

In socket-based network programming, you do not directly access the network interface device to send and receive packets. Instead, an intermediary file descriptor is created to handle the programming interface to the network. The Unix operating system handles the details of determining which network interface device will be used to send out the data and how.

The special file descriptors used to reference network connections are called sockets. The socket defines the following:

A specific communication domain, such as a network connection or a Unix Interprocess Communication (IPC) pipe

A specific communication type, such as stream or datagram

A specific protocol, such as TCP or UDP

After the socket is created, it must be bound to either a specific network address and port on the system, or to a remote network address and port. Once the socket is bound, it can be used to send and receive data from the network. Figure shows what this process looks like.

Figure: The socket interface

Unix provides the socket() C function to create new sockets:

int socket(int domain, int type, int protocol)

The socket() function returns a socket descriptor which can then be used to send and receive data from the network (more on that later). The three parameters used to create the socket define the communication’s domain, type, and protocol used. Let’s look first at the possible domain values that can be used; see Figure.

Figure: Values for the Socket’s Domain Parameter

Domain Value

Description

PF_UNIX

Unix IPC communication

PF_INET

IPv4 Internet protocol, which is the type covered in this book

PF_INET6

IPv6 Internet protocol

PF_IPX

Novell protocol

PF_NETLINK

Kernel user interface driver

PF_X25

ITU-T X.25 /ISO-8208 protocol

PF_AX25

Amateur radio AX.25 protocol

PF_ATMPVC

Access to raw ATM PVC’s

PF_APPLETALK

AppleTalk protocol

PF_PACKET

Low-level packet interface

The type value defines the type of network communication used for transmitting the data packets on the domain. Figure shows the type values that can be used.

Figure: Values for Socket Type

Type Value

Description

SOCK_STREAM

Uses connection-oriented communication packets

SOCK_DGRAM

Uses connectionless communication packets

SOCK_SEQPACKET

Uses connection-oriented packets with a fixed maximum length

SOCK_RAW

Uses raw IP packets

SOCK_RDM

Uses a reliable datagram layer that does not guarantee packet ordering

The two most popular type values used for IP communications are SOCK_STREAM, for connection-oriented communication, and SOCK_DGRAM, for connectionless communication.

The specific protocol value used to create the socket depends on which type value you choose. Most socket types (such as SOCK_STREAM and SOCK_DGRAM) can be safely used only with their default protocols (TCP for SOCK_STREAM, and UDP for SOCK_DGRAM). To specify the default protocol, you can specify a zero value in the protocol parameter instead of the normal protocol value.

Using these guidelines, creating a socket in Unix for network communication is fairly straightforward. For instance:

int newsocket;newsocket = socket(PF_INET, SOCK_STREAM, 0);

This example creates a standard TCP socket for transferring data to a remote host. Creating the socket itself does not define where the socket will connect. That will come later.

Once the socket is created, you can reference it using the returned value; in the example just shown, it is the newsocket variable. Unix also allows the programmer to modify some of the characteristics of the socket to control the communication parameters. The next section describes how to set socket options in Unix.

Socket Options

The Unix socket interface offers a method to change the Protocol parameters that are used for communications with the socket: the setsockopt() function, which alters the default behavior of the created socket. Here is the format of the setsockopt() function:

The s parameter references the socket created with the socket() function.

The level parameter references the level of the changes. For IP sockets, there are two levels of options that can be used:

•SOL_SOCKET

•IPPROTO_IP

If you are working with TCP sockets, you can also use the IPPROTO_TCP level.

Each change level contains optname parameters, which describe the socket option to change. The optval and optlen parameters define the value and length of the option change.

Some of the socket options you’ll see most often are SO_BROADCAST, which allows the socket to send broadcast messages, and IP_ADD_MEMBERSHIP, which allows the socket to accept multicast packets.

Network Addresses

After the socket is created, it must be bound to a network address/port pair. The way that the Unix socket system uses IP addresses and TCP or UDP ports is one of the more confusing parts of socket network programming. A special C structure, sockaddr, is used to designate the address information. The sockaddr structure contains two elements:

sa_family An address family, defined as a short type

sa_data An address for a device, defined as 14 bytes

The address family (sa_family) is designed to allow the sockaddr structure to reference many types of addresses. Because of this, the 14-byte address element (sa_data) is difficult to use directly. Instead, Unix offers an IP-specific address structure, sockaddr_in, which uses the following elements. Using the sockaddr_in structure requires placing the appropriate IP address and port values in the proper data element.

sin_family An address family, defined as a short type

sin_port A port number, defined as a short type

sin_addr An address, defined as a long type (4-byte) IP address

sin_data 8 bytes of padding

To summarize use of these functions, here is some sample code to obtain an IP address/port pair for a host:

Note that the sin_addr element is also a structure that uses elements to define the network address. The s_addr element is used to represent the IP address.

Now that you know how to define IP address/port pairs, you can match the sockets to an IP address and start moving data. You must choose between two function calls depending on whether the socket is connection-oriented or connectionless. The following sections describe the difference between the types of communication and the methods they use.

Using Connection-Oriented Sockets

The world of IP connectivity revolves around two types of communication: connectionoriented and connectionless. In a connection-oriented socket (one that uses the SOCK_STREAM type) the TCP protocol is used to establish a session (connection) between two IP address endpoints. There is a fair amount of overhead involved with establishing the connection, but once it is established, data can be reliably transferred between the devices. To create a connection-oriented socket, separate sequences of functions must be used for server programs and for client programs (see Figure).

Figure: Connection-oriented socket programming functions

The Server Functions

For the server program, the created socket must be bound to a local IP address and port number that will be used for the TCP communication. The Unix bind() function is used to accomplish this:

int bind(int socket, sockaddr *addr, int length);

In bind(), the socket parameter references the return value from the socket() function. The addr parameter references a sockaddr address/port pair to define the local network connection. Because the server usually accepts connections on its own IP address, this is the IP address of the local device, along with the assigned TCP port for the application. If the IP address of the local system is unknown, the INADDR_ANY value can be used to allow the socket to bind to any local address on the system.

After the socket is bound to an address and port, the server program must be ready to accept connections from remote clients. This is a two-step process: first, the program looks for an incoming connection, next it sends and receives data.

The program must first use the listen() function to "listen" to the network for an incoming connection. Next it must use the accept() function to accept connection attempts from clients. The format of the listen() function is as follows:

int listen(int socket, int backlog);

As you’d expect, the socket parameter refers to the socket descriptor created with the socket() function. The backlog parameter refers to the number of pending connections waiting to be processed that the system can accept. For example, suppose this value is set to 2. If two separate clients attempt to connect to the port, the system will accept one of the connections for processing and hold the other connection until the first one is done. If a third connection attempt arrives, the system refuses it because the backlog value has already been met.

After the listen() function, the accept() function must be called to wait for incoming connections. The format of the accept() function is as follows:

int accept(int socket, sockaddr *from, int *fromlen);

By now, you’re familiar with the socket parameter. The from and fromlen parameters point to a sockaddr address structure and its length. The remote address information from the client is stored in this structure in case it’s needed.

Once the connection has been accepted, the server can send and receive data from the client using the send() and recv() function calls:

Here, the socket parameter again references the open socket for the connection. The message parameter references either the buffer of data to send, or an empty buffer to receive data into. The length parameter indicates the size of the buffer, and the flags parameter indicates if any special flags are necessary (such as for tagging the data as urgent in the TCP packet).

The Client Functions

In a connection-oriented socket, the client must bind to the specific host address and port for the application. For client programs, the connect() function is used instead of the listen() function:

int connect(int socket, sockaddr *addr, int addrlen);

As in server functions, the socket parameter references the created socket() function value. The addr parameter points to a created sockaddr structure containing the remote IP address and TCP port number.

Once the connect() function succeeds, the client is connected to the server and can use the standard send() and recv() functions to transmit data back and forth with the server.

Closing the Connection

When the client and server are finished sending data, two commands should be used to properly terminate the connection:

•shutdown(intsocket, inthow)

•close(intsocket)

It is possible to use the close() function alone (and often you will see programs that use only this function to close the connection). However, the kinder, more gentler way is to use shutdown()first, and then close(). The shutdown() function uses the how parameter to allow the programmer to determine how gracefully the connection will close. The options available are as follows:

0 No more packets can be received.

1 No more packets can be sent.

2 No more packets can be sent or received.

By selecting values 0 or 1, you can disable the socket from receiving or sending more data, yet allow the socket to either finish sending pending data, or finish receiving pending data. After the connection has a chance to flush out any pending data, the close() function is called to terminate the connection without any data loss.

Using Connectionless Sockets

Because SOCK DGRAM-type sockets use the UDP protocol, no connection information is required to be sent between network devices. Because of this, it is often difficult to determine which device is acting as a “server”, and which is acting as a “client”. If a device is initially waiting for data from a remote device, the socket must be bound to a local address/port pair using the bind() function. Once this is done the device can send data out from the socket, or receive incoming data from the socket. Because the client device does not create a connection to a specific server address, the connect() function need not be used for the UDP client program. Figure illustrates the function sequence used for programming a connectionless socket.

Figure: Connectionless socket programming functions

An established connection does not exist, so the normal send() and recv() functions cannot be used because they do not allow specification of the data’s destination address. Instead, sockets provide the sendto() and recvfrom() functions:

These two functions use the UDP address/port pair to specify the destination address for the dest parameter and to specify the sending host for received packets with the from parameter. After communication is finished between the two devices, you can use the shutdown() and close() functions for the sockets, as described for the TCP method.

Non-blocking I/O Methods

One drawback to the standard Unix network programming model is that the I/O functions (the functions used for sending and receiving data) block if they cannot be processed immediately. Blocking refers to stopping execution of the program and waiting for a specific statement to complete. For example, when a program gets to a recv() function, it will stop and wait until data is available on the socket to read. In effect, the recv() function blocks further execution of the program until data is present on the socket. If the remote device does not send any data, the program does not continue.

Although this principle may work fine for a single-connection client/server program where you can control the sending and receiving data patterns, it causes a problem for any type of program that must continue to process other events despite errors in sending or receiving data. There are two techniques that can be used to solve this problem: using non-blocking sockets or using socket multiplexing.

Non-blocking Sockets

A simple rudimentary solution for preventing undesirable blocking is to set a socket to not block when an I/O function is called. The non-blocking feature can be set as a special socket option on the socket using the fcntl() function. The fcntl() function is used to perform miscellaneous low-level operations on file descriptors. Setting blocking on a socket is one of those operations.

Here is the format of the fcntl() function:

int fcntl(int fd, int cmd, int arg)

The fd parameter should be an open file descriptor (or socket, in this case). The cmd parameter specifies what operation will be done on the file descriptor. For example, the command F_SETFL is used to read or set a file descriptor’s flag options. The arg parameter is used to specify the flag to set (or query).

So, to set a socket to non-blocking mode, you would use the following:

Here the O_NONBLOCK flag indicates that the socket should be set to non-blocking mode. Whenever a recv() function is performed on the newsocket socket, the program will not wait for data. If no data is immediately present, the recv() function will return a value of –1, and the Unix errno value would be set to EWOULDBLOCK.

Using non-blocking sockets, you can poll any open socket to look for incoming data or to determine if it is ready for outgoing data.

Multiplexed Socket

Another solution to the socket blocking problem uses the select() function to multiplex all the active sockets. The select() function lets you watch multiple sockets for events (such as data to be read from or written to the socket), and process only the sockets that need to be processed. Sockets without any pending events are skipped so they won’t block the program execution.

The numfd parameter specifies the highest value of the file descriptors (sockets) that the select function is monitoring, plus one. The select() function can thus know how high to iterate when testing the socket sets.

The readfds, writefds, and exceptfds parameters specify the following lists (or sets) of sockets to be monitored by select() for the specific data function:

readfds Sockets that are checked if data is available to read

writefds Sockets that are checked if ready to write

exceptfds Sockets that are checked for exceptions

The timeout parameter defines a timeval structure to set how long the select() function should wait for any of the sockets to have an event.

The tricky part of socket multiplexing is assigning sockets to the readfds, writefds, and exceptfds parameters. Indeed, there is another whole set of functions that do that, as listed in Figure.

Figure: select() Helper Functions

Function

Description

FD_ZERO(set)

Zeros out a multiplex set set

FD_SET(socket, set)

Adds socket to the multiplex set set

FD_CLR(socket, set)

Removes socket from the multiplex set set

FD_ISSET(socket, set)

Tests to see if socket is contained in the multiplex set set

The helper functions must be used to set the individual socket sets for each select() call. A select() call cancels out any previous select() calls. Thus you must add or remove any new sockets to or from the existing set before the next call to the select() function.

This example shows how the select() function can be used to monitor two separate socket connections. Once select() is called with the appropriate socket sets, you can use the FD_ISSET helper function at any time in the program to test if data is available for an individual socket. After select()finishes (either by receiving an event or from the timeout) the socketset value contains only those sockets that have had an event trigger. By using FD_ISSET, you can determine whether either socket is receiving data. If either socket does not have any data, it is not part of the set and does not block the rest of the program.

Socket Programming in Windows

When you are familiar with network programming in the Unix environment, understanding Windows network programming is easy. This section describes the relationship between the Windows network programming interface and the Unix network programming model, and how Windows socket programming has formed the foundation of the .NET Framework network classes.

Windows Socket Functions

It makes sense that the Windows network programming model is derived from the comparable Unix model. Many features of the Windows operating systems have their roots in Unix systems. Much of Windows network programming was modeled after the Unix Berkeley socket method. It was called, not surprisingly, Windows Sockets, or Winsock for short. The Winsock interface was designed to allow network programmers from the Unix environment to easily port existing network programs, or to create new network programs in the Windows environment without a large learning curve.

The Winsock APIs were implemented as a set of header and library files for developers and DLL files to be used by applications. There are two basic Winsock library versions: the 1.1 version was originally released with Windows 95 workstations and provided basic socket functionality. Later, version 2 was released as an add-on for Windows 95 machines. It added significantly more socket functions and protocols that could be deployed by network programmers. By the time Windows 98 was released, the Winsock library had matured to version 2.2, which is still a part of the current Windows operating system releases.

Note

The lone exception to this arrangement is the Windows CE platform. At this writing, Windows CE still only supports the Winsock 1.1 libraries.

The core of the Winsock environment is, of course, the socket. Just as in Unix, all Windows network programs create a socket to establish a link with the underlying network interface on the Windows system. All of the standard socket function calls employed in the Unix world were ported to the Windows system. However, there are a few differences between Unix sockets and Winsock. The following sections describe these differences.

WSAStartup()

To begin a Winsock program, you make a call to the WSAStartup() function. This function informs the operating system which Winsock version the program needs to use. The OS attempts to load the appropriate Winsock library from which the socket functions will operate.

The format of the WSAStartup() function is as follows:

int WSAStartup(WORD wVersion, LPWSDATA lpWSAData)

The first parameter defines the required version for the program. If the program requests version 2.2 of Winsock and only version 1.1 is available, the WSAStartup() function will return an error. However, if the application requests version 1.1 and version 2.2 is loaded, the function will succeed.

When the function succeeds, the lpWSAData parameter points to a structure that will contain information regarding the Winsock library after it’s loaded, such as the actual Winsock version used on the system. This information can then be used to determine the network capabilities of the system the program is running on.

WSACleanup()

A Winsock program must release the Winsock library when it is finished. The WSACleanup() function is used at the end of each Winsock program to indicate that no other Winsock functions will be used, and the Winsock library can be released. The WSACleanup() function does not use any parameters, it just signals the end of the Winsock functions in the program. If any Winsock functions are used after the WSACleanup() function, an error condition will be raised.

Winsock Functions

In between the WSAStartup() and WSACleanup() functions, the Winsock program can behave just like the Unix socket program, using socket(), bind(), connect(), listen(), and accept() calls. In fact, the Winsock interface uses the same structures for addresses (sockaddr_in) and the same values to define protocol families and types (such as the SOCK_STREAM protocol family) as Unix does. The goal of this was to make porting Unix network programs to the Windows environment as easy as possible.

In addition to the standard Unix network functions, the Winsock version 2 interface includes its own set of network functions, all preceded by WSA. These functions extend the functionality of the standard Unix network functions. For example, the WSARecv()function can be used in place of the standard Unix recv() function call. WSARecv() adds two additional parameters to the original function call, allowing for the Windows-specific functionality of creating overlapped I/O and partial datagram notifications. Figure shows how the Winsock WSA functions can be used to replace standard Unix functions.

Figure: The Winsock WSA programming functions for servers and clients

Winsock Non-blocking Socket Functions

Another similarity to the Unix network environment is that Winsock supplies ways to prevent network I/O functions from blocking the program execution. Winsock supports the standard Unix methods of setting a socket to non-blocking mode using the ioctlsocket() function (similar to the Unix fcntl() function) and the select() function to multiplex multiple sockets.

The ioctlsocket() format is as follows:

ioctlsocket(SOCKET s, long cmd, u_long FAR* argp)

The socket to be modified is s, the cmd parameter specifies the operation to make on the socket, and the argp parameter specifies the command parameter.

WSAAsyncSelect()

One of the features that differentiates Windows from standard Unix programs is the concept of events. Unlike common structured programs that have a set way of executing, Windows programs are usually event driven. Methods are executed in the program in response to events occurring while the program is running—buttons are clicked, menu items are selected, and so on. The standard technique of waiting around for data to occur on network sockets does not fit well in the Windows event model. Event-driven access to network sockets is the answer.

The WSAAsyncSelect() function expands on the standard Unix select() function by allowing Windows to do the work of querying the sockets. A WSAAsyncSelect() method is created that includes the socket to monitor, along with a Windows message value that will be passed to the window when one of the socket events occurs (such as data being available to be read, or the socket being ready to accept written data). The format of the WSAAsyncSelect() function is as follows:

The socket to monitor is defined by the s parameter, and the parent window to receive the event message is defined by hWnd. The actual event to send is defined by the wMsg parameter. The last parameter, lEvent, defines the events to monitor for the socket. You can monitor more than one event for a socket by performing a bitwise OR of the events shown in Figure.

Figure: WSAAsyncSelect() Event Types

Event

Description

FD ACCEPT

A new connection is established with the socket.

FD ADDRESS LIST CHANGE

The local address list changed for the socket’s protocol family.

FD CLOSE

An existing connection has closed.

FD CONNECT

The socket has completed a connection with a remote host.

FD GROUP QOS

The socket group’s Quality of Service value has changed.

FD OOB

The socket has received out-of-band data.

FD QOS

The socket’s Quality Of Service value has changed.

FD READ

The socket has data that is ready to be read.

FD ROUTING INTERFACE CHANGE

The socket’s routing interface has changed for a specific destination.

FD WRITE

The socket is ready for writing data.

An example of the WSAAsyncSelect() function would look like this:

WSAAsyncSelect(sock, hwnd, WM_SOCKET, FD_READ | FD_CLOSE);

In this example, if the socket has data available to be read, or if it detects that the remote host closed the connection, the WM_SOCKET message would be sent to the hwnd window in the wParam of the Window message. It would then be the responsibility of the hwnd window to detect and handle the WM_SOCKET message and perform the appropriate functions depending on which event was triggered. This is almost always handled in a Windows procedure (WindowProc) method for the window using case statements.

WSAEventSelect()

Instead of handling socket notifications using Windows messages, the WSAEventSelect() uses an event object handle. The event object handle is a self-contained method defined in the program that is called when a unique event is triggered. This technique allows you to create separate Windows methods to handle the various socket events.

For this technique to work, a unique event must first be defined using the WSACreateEvent() function. After the event is created, it must be matched to a socket using the WSAEventSelect() function:

WSASelect(SOCKET s, WSAEVENT hEvent, long lNetworkEvents)

As usual, the s parameter defines the socket to monitor, and hEvent defines the created event that will be called when the socket event occurs. Similar to the WSAAsyncSelect() function, the lNetworkEvent parameter is a bitwise combination of all the socket events to monitor. The same event definitions are used for the WSAEventSelect() function as for the WSAAsyncSelect() function. When a socket event occurs, the event method registered by the WSACreateEvent() function is executed.

Overlapped I/O

Possibly one of the greatest features of the Winsock interface is the concept of overlapped I/O. This technique allows a program to post one or more asynchronous I/O requests at a time using a special data structure. The data structure (WSAOVERLAPPED) defines multiple sockets and event objects that are matched together. The events are considered to be overlapping, in that multiple events can be called simultaneously as the sockets receive events.

To use the overlapped technique, a socket must be created with the WSASocket() function call using the overlapped enabled flag (the socket() function does not include this flag). Likewise, all data communication must be done using the WSARecv() and WSASend() functions. These Winsock functions use an overlapped I/O flag to indicate that the data will use the WSAOVERLAPPED data structure.

Although using overlapped I/O can greatly improve performance of the network program, it doesn’t solve all of the possible difficulties. One shortcoming of the overlapped I/O technique is that it can define only 64 events. For large-scale network applications that require hundreds of connections, this technique will not work.

Completion Ports

Another downside to the overlapped I/O technique is that all of the events are processed within a single thread in the program. To allow events to be split among threads, Windows introduced the completion port. A completion port allows the programmer to specify a number of threads for use within a program, and assign events to the individual threads. By combining the overlapped I/O technique with the completion port method, a programmer can handle overlapped socket events using separate program threads. This technique produces really interesting results on systems that contain more than one processor. By creating a separate thread for each processor, multiple sockets can be monitored simultaneously on each processor.