-->[OO]::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
-->]OO[:[ Network Programming ]::[OO--[ by z0mba ]--[ zomba@epidemik.org ]:::
-->[OO]:::::::::::::::::::::::::::::::[ New site up s00n ]:::
Woah, f41th 8 is looking pretty leet so far, so I thought i'd add to it with
another k-rad uber el8 article, as promised. Now I know I said that I was
gonna write an article on setting up an FTP server, but I thought this was
both more interesting and more f41th stylee, but I may write the FTP article
for a later issue of f41th, who knows. All I ask of you in return for me
giving you no-day network programming tekniq is your un-told gratitude and of
course all of your c0dez, just email them to the above address, heh.
Introduction
------------
One of the things that a lot of people ask when confronted with lots of
computers linked together is "How do computers actually communicate over a
network?". This article should explain this to you and also give you some
physical examples.
Networking and Linux are a perfect combination because linux is an OS that
was born from the internet, mainly because its makers had to (and still do)
communicate over the internet using e-mail, usenet and the WWW. Also, Linux
is based on UNIX which is where most of the fundamentals of networking
originate from anyway. Linux is an excellent platform for network
programming because it has mature and full functional networking features.
Also, because Linux provides full support for the sockets interface, most
programs developed on other version of *nix will build and run on Linux with
little or no alterations. Documentation about UNIX networking is fully
applicable to Linux as well.
In this article I have used Perl to introduce network programming which is
quite convenient considering my last article was about Perl, so if you
don't know the basic concepts of Perl, go read that first. The reason for
using perl is that you can focus on network programming concepts rather
than application development issues and programming environments. Only basic
knowledge of Perl is required to follow my exanples (wh1ch 1s why j3w sh0uld
g0 r34d my 0th3r 4rt1cl3), and they are certainly clear enough for C/C++
programmers to follow (at least I hope they are).
This article is intended to serve as an introduction to network programming,
so it doesn't cover deeper topics such as protocol layering and routing, but
if your that interested you should go buy a book or look for more tutorials
on the web. Right then, lets get on with the file...
Networking Concepts
===================
This section is to basically cover the fundamentals of networking, so pay
attention (3sp3c14lly j3w l4m3rs). You will learn what the the necessary
components of network communication are and how a program uses them to build
a connection by following a simple program that retrieves networking
information and uses it to connect to another program. You should by the end,
have a pretty good understanding of network addresses, sockets and the diffs
between TCP and UDP.
Below is a simple program that uses a Perl function that creates a connection
to the server using TCP (makeconn), this function can be found in network.pl
which is included in the f41th 8 .zip file.
1: sub makeconn {
2:
3: my ($host, $portname, $server, $port, $proto, $servaddr);
4:
5: $host = $_[0];
6: $portname = $_[1];
7:
8: #
9: # Server hostname, port and protocol
10: #
11: $server = gethostbyname($host) or
12: die "gethostbyname: cannot locate host: $!";
13: $port = getservbyname($portname, 'tcp') or
14: die "getservbyname: cannot get port: $!";
15: $proto = getprotobyname('tcp') or
16: die "getprotobyname: cannot get proto: $!";
17:
18: #
19: # Build an inet address
20: #
21: $servaddr = sockaddr_in($port, $server);
22:
23:
24: #
25: # Create the socket and connect it
26: #
27: socket(CONNFD, PF_INET, SOCK_STREAM, $proto);
28: connect(CONNFD, $servaddr) or die "connect: $!";
29:
30: return CONNFD;
31: }
This procedure can be summerised in three basic steps:
o--> Build and address.
o--> Create a socket.
o--> Establish a connection.
The network address is built by retrieving address information in lines 11
13, and then assembling it in line 21. In line 27, you create the socket
using protocol information retrieved in line 15. In line 28 you finally
establish the connection.
Building Network Addresses
--------------------------
The steps involved in building a network address and connecting to it provide
a framework for observing how network communication werks. I'll spend some
time covering each part of this process in order to better prepare you for
the tutorials later on.
If you've ever configured a PC or workstation for Internet connectivity, you
have probably seen an IP address (y3s, 3v3n j00 l4m3rs) similar to 192.9.200.
10 or 10.7.8.14. This is called 'dotted-decimal format' and like many things
in computing, is a representation of network addresses that are intended to
make things easier for humans to read. The notation that computers, routers
and other internet devices actually use to communicate is a 32-bit number,
often called a 'canonical address'. When this number is evaluated, it is
broken down into four smaller 8-bit (one byte) values, much the way the
dotted-decimal format consists of four numbers seperated by decimals.
An 'internetwork', or 'internet' for short, consists of two or more networks
that are connected. Of course this refers to any two networks connected to
each other, not *the* Internet (I kn0w th4t d1s c0nfus3s s0m3 0f j00 l4m3rs).
The internet protocol (IP) was designed with this sort of topology in mind
(ie: millions of computers). In order for an internet address to be useful,
it has to be capable of identifying not only a specific node (c0mput3r), but
also which network it resides on. Both bits of information are provided in
the 32-bit address. Which portion of the address is related to each component
is decided by the 'netmask' that is applied to the address. Depending on an
organisations needs, a network architect can decide to have more networks or
more addresses. For details on subnetting networks do some searches for
TCP/IP Network Management or something along those lines. For the sake of
network programming, its enuff to know the information stored in an internet
address and that individual workstation netmasks have to be correct in order
for a message to be successfully delivered.
Dotted-decimal format is easier to read than 32-bit values, but even so, most
ppl would rather use names than numbers because wikkid or www.hackernews.com
is a lot easier to remember than 12.145.27.2 or 192.148.252.39. For this
reason, the notion of hostnames, domain names, and the domain name system
were invented. You can get access to a database of name-to-number mappings
through a set of 'network library functions', which provide host (n0de)
information in response to names ro numbers. For example, in line 11 of the
makeconn listing above, you retrieve the address associated with a name with
one of these functions - gethostbyname().
Depending on the host configuration, gethostbyname() can retrieve the address
associated with a name from a file, /etc/hosts, from the Domain Name System
(DNS), or from the Network Information System (NIS or Yellow Pages). DNS and
NIS are network-wide-services that administrators use to implify network
configuration because adding and updating network address numbers from a
central location (and maybe a backup location) is obviously a lot easier
than updating files on every workstation individually. These systems are
also useful for internetworks because the address of a remote host can be
determined when it is needed by making a DNS request, rather than needing
to exchange configuration files in advance.
One other advantage of using names is that the address that a name is
associated with can be changed without affecting applications because the
application need only know the name, the address can be discovered at
runtime.
The following perl script illustrates the use of the gethostbyname()
function and the difference between dotted-decimal formatted addresses and
canonical address (type it up and save it as 'resolv'):
1: #!/bin/perl
2: use Socket;
3: $addr = gethostbyname($ARGV[0]);
4: $dotfmt = inet_ntoa($addr);
5: print "$ARGV[0]: numeric $addr dotted: $dotfmt\n";
Line 2 includes the Socket module included with Perl 5 distributions. This
module is required for all the sample code included in this article.
When you run this program, passing it a hostname that you want to see info
on, you will se something like this:
zomba@noday$ ./resolv www.attrition.org
www.attrition.org: numeric: [unprintable characters] dotted: 128.11.253.197
Line 3 passes the name specified on the command line to gethostbyname(),
which places the canonical address in $addr. This address is then passed to
inet_ntoa(), which returns the same address in dotted-decimal format.
(inet_ntoa is an abbreviation for internet number to ASCII). You then print
both numbers out in line 5. As you can see, the 32-bit address looks pretty
damn weird when printed.
Network Services
----------------
Being able to locate a computer is a fundamental part of network
communication, but it is not the only necessary component in an address. Why
do you want to contact a specific host? Do you want to retrieve an HTML doc
from it? Do you want to log in and check mail?. Most workstations, especially
those running Linux or any other version of UNIX, provide more thna one
service to other nodes on a network.
Back in line 13 of the makeconn listing, a function called getservbyname()
was called. This function provides the other value used to form the complete
network address. This value, referred to as the 'service port number', is the
portion of the address that specifies the service or program that you want to
communicate with.
Like host addresses, service ports can be referred to by name instead of
number. getservbyname() retrieves the number associated with the name
specified from the file /etc/services. (if NIS is available, the number can
also be retrieved from a network database). Port numbers that are listed in
this database are called 'well-known ports' because, in theory, any host can
connect to one of these services on any other because the numbers at least
ought to remain consistant. The port numbers that are used by applications
don't have to be listed in or retrieved from this database, it's just
considered a good idea to list them in /etc/services and share them in
order to prevent conflicts.
After you have retrieved the two components necessary to build a fully
qualified address, you provide them to the sockaddr_in function, which builds
a SOCKADDR_IN structure for us. SOCKADDR_IN is the programmatic
representation of a network address needed for most socket system calls.
Sockets
-------
Before you can use your addressing information you need a socket. The
socket() function in line 27 of the makeconn listing illustrates how to
create one. Some simple explanations of what sockets are and the types
available to a program first will help explain the function.
'Sockets' are an Application Running Interface (API) used for network
communication. This API was first available for BSD UNIX for the VAX
architecture in the early eighties, but is now used on almost all UNIX
versions and Microshaft, being late as always has also recently added them
to Windows. System V UNIX has a different interface called the Transport
Layer Interface (TLI), but even most system V UNIX version, such as Solaris
2.x, provide socket interfaces. Linux provides a full implementation of the
socket interface.
Socket applications treat network connections, or to be more exact, network
'endpoints', the same way most UNIX interfaces are handled - as file handles.
The reason for the endpoint qualification is simple: Not all network sessions
are connected, and referring to all network streams as connections can be
incorrect and misleading. As a matter of fact, after a network endpoint is
created and bound and/or connected, it can be written to, read from, and
destroyed using the same functions as files. Because of this interface,
socket programs tend to be portable between different version of UNIX and
frequently many other OS's.
Protocols and Socket Types
--------------------------
The socket API is designed to support multiple protocols, called 'domains' or
'families'. Most UNIX versions support at least two domains: UNIX and
Internet. (Two of the other domains are Xerox Network system and ISO protocol
suite). UNIX domain sockets use the local workstation filesystem to provide
communication between programs running on the same workstation only. Internet
domain sockets use the Internet Protocol (IP) suite to communicate over the
network. As you might guess, this file is concerned with Internet domain
sockets.
In the following call to socket(), you specify the scaler variable that you
want to have the socket descriptor stored in and three values that describe
the type of socket you want to have created - the protocol family, the socket
type, and the protocol. I've already covered which protocol family you will
use which is PF_INET, for the Internet:
socket(CONNFD, PF_INET, SOCK_STREAM, $proto);
The possible socket types are SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, SOCK_RDM and
SOCK_SEQPACKET. The last three are used for low level, advanced operations
and are beyond the scope of this article.
SOCK_STREAM sockets are connected at both ends, they are reliable, and the
messages between them are sequenced. When I say reliable I don't mean that if
it says it'll pick the kids up from skewl then it will kind of reliable, I
mean that the network guarantees delivery: An application can write a packet
with the understanding that it will arrive at the other end, unless the
connection is suddenly broken by some unforeseen event, like some twat
pulling the power cord on the host machine.. In the event that the connection
is broken, the application will receive timely notification. Sequencing
means that all messages are delivered to the other application in the exact
order that they are sent.
SOCK_DGRAM sockets support connectionless and unreliable datagrams. A
'datagram' is typically a fixed-length small message. Applications ahve no
guarantee that datagrams will be delivered, or the order they will arive in.
On the surface, it seems that no application would ever want to use
SOCK_DGRAM, but as you will see, many applications do for good reasons.
The type of socket is very closely related to the protocol that is used. In
the case of the Internet suite, SOCK_STREAM sockets always implement TCP,
and SOCK_DGRAM sockets implement UDP.
The characteristics of the TCP protocol match the characteristics of
SOCK_STREAM. TCP packets are guarunteed to be delivered except in a network
disaster, such as the workstation on the other end of the connection dropping
out, or the network itself suffering a serious, unrecoverable outage. Packets
are always delivered in the same order that they are written. Obviously,
these properties make the job of a network developer very easy because a
message and pretty much forgotten about, but as always, there is a cost. TCP
messages are much more expensive (demanding) than UDP messages in terms of
both network and computing resources. The workstations at both ends of a
session have to confirm that they have received the correct information,
which results in more work for the operating system and more network traffic.
The systems also have to track the order in which the messages were sent, and
often have to store messages until others arrive, depending on the state of
the network "terrain" between the two workstations. (New messages can arrive
while others are being retransmitted because of an error). In addition, the
fact that TCP connections are just that, connections, means that they have a
price. Every conversation has an endpoint associated with it, so a server
that has more than one client has to switch between multiple sockets, which
can be very difficult (have a look at the section on I/O Multiplexing with
TCP covered later on in this article).
UDP, like SOCK_DGRAM, is connectionless and unreliable. Applications have to
provide whatever reliability mechanisms are necessary for the job that they
are performing. For some applications, that is an advantage because all of
the mechanisms provided by TCP aren't always needed. For example, DNS, which
uses UDP, simply sends a message and wiats for a response for a predetermined
interval, because DNS is a one-to-one message-to-response protocol,
sequencing between client and server is not necessary. UDP is connectionless,
so a server can use one socket to communicate with many clients. All clients
write to the same address for the server, and the server responds
individually bu writing to specific client addresses.
UDP messages can also be broadcast to entire networks, which is a blessing to
the application that needs to communicate one message to many users, but a
curse for the workstations that don't need the message but have to read it in
order to figure out that its not actually meant for them afterall. The
ability to broadcast messages over UDP and the fact that the connectionless
aspect of UDP makes it difficult to verify the source of messages are two of
the reason why many networking people consider the protocol to be a security
risk and dislike even enabling it within their organisations.
Making a Connection
-------------------
Logically, if you are creating a connection like that of the makeconn()
function listed earlier, you need to create a SOCK_STREAM socket with the TCP
protocol information retrieved with getprotobyname() in line 15. Take a look
at lines 27 and 28 from that listing repeated below:
27: socket(CONNFD, PF_INET, SOCK_STREAM, $proto);
28: connect(CONNFD, $servaddr) or die "connect: $!";
After creating the socket in line 27, you then pass it to connect() with the
address structure created by sockaddr_in(). The connect() function actually
contacts the address specified in the structure and establishes the virtual
circuit supported by TCP.
A TCP Client Example
--------------------
The following listing puts makeconn() to work in a sample program. This
should be typed up in vi and saved as 'client1'.
#!/usr/bin/perl
use Socket;
require "./network.pl";
$NETFD = makeconn($ARGV[0], $ARGV[1]);
#
# Get the message
#
sysread $NETFD, $message, 32768 or die "error getting message : $!";
print "$message \n";
close $NETFD;
Run this program with two command-line arguments, the name of a Linux host
that is running sendmail and the email port number, smtp:
zomba@poo$ ./client1 brown smtp
220 brown.poo.com ESMTP Sendmail 8.8.5/8.8.5; Sat, 20 Jun 1999 18:24:08 -0400
This program uses makeconn() to connect to the sendmail program running on
the named host and reads the greeting that it sends to a new client when it
first connects, using the sysread() function.
sysread() is one of the functions used for exstracting network messages from
sockets. It is a wrapper for the UNIX read() system call. You cannot use the
Perl read() function because it is designed for standard I/O, which uses
buffering and other high-level features that interfere with network
communications. In a real-world application, you would prbably read messages
with sysread() in and out of a buffer of your own and keep acreful track of
what you had just read because it is possible to be interrupted in a read
call by a signal. (you would also install signal handlers). As this example
demonstrates, establishing a client connection and retrieving some data is
pretty simple.
A TCP Server Example
--------------------
Now its time to write your own server for client1 to connect to. Fisrtly you
have to place a socket in the listen state. You'll use another function that
is defined in network.pl called makelisten(), whoch is shown in the listing
below:
1: sub makelisten {
2:
3: my ($portname, $port, $proto, $servaddr);
4: $portname = $_[0];
5:
6: #
7: # port and protocol
8: #
9: $port = getservbyname($portname, 'tcp') or
10: die "getservbyname: cannot get port : $!";
11: $proto = getprotobyname('tcp') or
12: die "getprotobyname: cannot get proto : $!";
13:
14: #
15: # Bind an inet address
16: #
17: socket(LISTFD, PF_INET, SOCK_STREAM, $proto);
18: bind (LISTFD, sockaddr_in($port, INADDR_ANY)) or die "bind: $!";
19: listen (LISTFD, SOMAXCONN) or die "listen: $!";
20: return LISTFD;
21: }
The makelisten() function creates a TCP socket, binds it to a local address,
and then places it in the listen state.
Lines 9 and 11 retrieve the same information that makeconn() retrieves in
order to create a connection, with the exception of an internet address.
makelisten() then creates an internet family SOCK_STREAM socket, which by
definition is a TCP socket, but you specifiy this explicitely anyway, as you
do in makeconn().
In line 18, the socket is bound to a local address. This tells the system
that any messages sent to the specified service port and internet address
should be relayed to the specified socket. You use sockkaddr_in() to build
an address from the service port retrieved with getportbyname() and with a
special address that corresponds to all addresses on the workstation so that
connections can be made to all network interfaces and even over any dial-up
interfaces on the workstation. This function shows a little laziness in that
it passes the sockaddr_in() function to bind() instaed of calling it
seperatly and saving the results.
There are some restrictions on what service ports can be bound. For
historical reasons, only the programs executing with root access can bind
service ports numbered lower than 1024.
After the socket is bound, you can execute listen(), which notefies the
system that you're ready to accept client connections.
server1, the prgram that uses makelisten(), is just as simple as the client
and is shown below.
#!/usr/bin/perl
use Socket;
require "./network.pl";
$hello = "Hello world!";
$LISTFD = makelisten("test");
LOOP: while (1) {
unless ($paddr = accept(NEWFD, $LISTFD)) {
next LOOP;
}
syswrite(NEWFD, $hello, length($hello));
close NEWFD;
}
In the above listing, you simply place a socket in the listen state using
makelisten() and then enter a while loop that centres on the function
accept(). The purpose of accept() is exactly as it sounds: It accepts client
connections. You pass two arguments to accept(): a new variable (NEWFD) that
will contain the socket identifier for the accepted connection and the
socket ($LISTFD) that has been set up with listen().
Whenever accept() returns a connection, you write a string to the new socket
and immediately close it.
Before you can test your server, you need to add the entry for the test
service that it uses. Add the following three lines to the /etc/services
file. You will have to su to root to do this.
test 8000/tcp
test 8000/udp
test1 8001/udp
You have added three entries for your test programs, one for TCP and two
others for UDP that will be used later.
Now to test your server, you need to execute the following commands:
$ ./server1&
$./client1 iest test
Hello world!
iest is the hostname of your workstation (ie: lameasfuck). The server
writes back your greeting and exits. Because the server is executing inside
a while loop, you can run ./client1 repeatedly. Whe the test is finished,
use kill to stop the server:
$ ps ax| grep server1 | awk '{ print $1 }'
pid
$ kill pid
A UDP Example
-------------
To implement the same test in UDP, you have to set up a SOCK_DGRAM socket
for both a client and a server. This function, makeudpcli() can also be
found in network.pl and is shown below:
sub makeudpcli {
my ($proto, $servaddr);
$proto = getprotobyname('udp') or
die "getprotobyname: cannot get proto : $!";
#
# Bind a UDP port
#
socket(DGFD, PF_INET, SOCK_DGRAM, $proto);
bind (DGFD, sockaddr_in(0, INADDR_ANY)) or die "bind: $!";
return DGFD;
}
In this listing you retrieve the protocol information for UDP and then create
a SOCK_DGRAM socket. You then bind it, but you tell the system to go ahead
and bind to any address and any service port, in other words, you want the
socket names but you don't care what the name is.
The reason for this extra bind() is quite straightforward. Because USP is
connectionless, special attention has to be made to addresses when sending
and receiving datagrams. When message datagrams are read, the reader also
receives the address of the originator so that it knows where to send any
replies. If you want to receive replies to your messages, you need to
guaruntee that they come from a unique address. The call to bind() ensures
that the system allocates a unique address for you.
Now that you have created a datagram socket, you can communicate with a
server, using the program listing below, client2.
1: #!/usr/bin/perl
2:
3: use Socket;
4: require "./network.pl";
5:
6: $poke = "yo!";
7:
8: $NETFD = makeudpcli();
9:
10: #
11: # Work out server address
12: #
13: $addr = gethostbyname($ARGV[0]);
14: $port = getservbyname($ARGV[1], 'udp');
15:
16: $servaddr = sockaddr_in($port, $addr);
17:
18: #
19: # Poke the server
20: #
21: send $NETFD, $poke, 0, $servaddr;
22:
23: #
24: # Recv the reply
25: #
26: recv $NETFD, $message, 32768, 0 or die "error getting message : $!";
27: print "$message \n";
28: close $NETFD
After you create the socket, you still have to create the server address, but
instead of providing this address to the connect() function, you have to
provide it to the send() function in Line 21 so it knows where to send the
message (funn1ly enuff). You might be wondering why you send anything to the
server at all because in the TCP example, the communication is one way.
In the TCP example, the server sends a message as soon as you connect and
then closes the session. The act of connecting is in effect a message from
the client to the server. Because UDP lacks connections, you have to use a
message from the client as a trigger for the conversation.
The server creates a UDP socket in a lsightly different manner because it
needs to bind a well-known port. It uses getservbyname() to retrieve a
port number and specifies it as part of the call to bind(). Look at
makeudpserv() in network.pl for details.
The servers main loop is actually pretty close to that of the TCP server and
is shown below, server2:
#!/usr/bin/perl
#
#
use Socket;
require "./network.pl";
$hello = "Hello world!";
$LISTFD = makeudpserv("test");
while (1) {
$cliaddr = recv $LISTFD, $message, 32768, 0;
print "Recieved $message from client\n";
send $LISTFD, $hello, 0, $cliaddr;
}
Instead of waiting for a client by looping on the accept() function, the
server loops on the recv() function. There is also no new socket to close
after the reply is sent to the client.
When thse programs are run, you see the following:
$ ./server2&
$./client2 iest test
Received yo! from client
Hello world!
So you see that from a programmers standpoint, the differences between TCP
and UDP affect not only the socket functions you use and how you use them,
but also how you design your programs. Differences such as the lack of a
connection and the lack of built-in reliability mechanisms must be seriously
considered when you design an application. There is no guaruntee, for
example, that the server in this section ever receives your poke message.
For that reason, a mechanism such as a timer would would be employed in a
real-world application.
Blocking Versus Nonblocking Descriptors
---------------------------------------
So far, all the examples in this article have relied on blocking I/O.
Certain operations, such as reading, writing and connecting or accepting
connections, are set to block when they wait for completeion, which brings a
program (or thread) to a halt. After server1 sets up a listen, for example,
it enters a while loop and calls accept(). Until a client connects to a
listening socket, the program is halted. It doesn't repeatedly call accept(),
it calls it once and blocks. This is also true of client2, which blocks on
the recv() call until the server replies. If the server is unavailable, the
program will block forever. This si especially unwise for an application
that uses UDP, but how could a timer be implemented if the call to recv()
will never return?
Writing can also block on TCP connections when the receiver of the data
hasn't read enough data to allow the current write to complete.. In order to
remain reliability and proper flow control, the systems on both ends of a
connection maintain buffers, usually about 8192 bytes. If these buffers are
full in either direction, communication in that direction will stop until
some space is freed up. This is yet another concern for servers that are
writing large messages to clients that aren't running on very powerful
systems or are on remote networks with low bandwidth links. In these
situations, one client can slow things down for everyone.
Blocking I/O is acceptable for programs that don't have to maintain GUI's
and only have to maintain one communication channel. Of course, most
programs cannot afford to use blocking communications.
I/O is said to be 'nonblocking' when an operation returns an error or
status code when it cannot be completed. To demonstrate this, run client2
without running the server. It will start and not return until you halt
it by pressing Ctrl+C.
Now run nonblock:
$ ./nonblock
error getting message : Try again at ./nonblock line 30
You receive the Try again message from the recv() function.
nonblock, shown below, is a modefied version of client2, which was shown
earlier in the article.
1: #!/usr/bin/perl
2: use Socket;
3: use Fcntl;
4: require "./network.pl";
5: $poke = "yo!";
6: $NETFD = makeudpcli();
7: scntl $NETFD, &F_SETFL, 0_NONBLOCK or die "Fcntl failed : $!\n";
8: (rest of file remains the same)
A new module, Fcntl, is added to the program in line 3, which provides an
interface to the fcntl(2) system call. It is used to alter file descriptor
properties, such as blocking and how to handle certain signals. In line 7,
the last line of modefications to client2, you set the 0_NONBLOCK flag for
the UDP socket. The rest of the prog is unchanged.
When nonblocking I/O is used, the application designer has to be very
careful when handling errors returned from recv(), send() and other I/O
related functions. When no more data is available for reading or no more
data can be written, these funcions return error codez. As a result, the
application has to be prepared tohandle some errors as being routine
conditions. This is also true of the C/C++ interfaces.
I/O Multiplexing with UDP
-------------------------
Frequently, applications need to maintain more than one socket or file
descriptor. For example, many system services such as Telnet, rlogin, and
FTP are managed by one process on Linux. In order to do this, the process
(inetd) listens for requests for these services by opening a socket for each
one. Other applications such as Applix, Netscape, and Xemacs monitor file
descriptors for the keyboard, mouse, and maybe the network.
Lets set up an example that monitors the keyboard and a network connection.
The following listing should be named - udptalk.
1: #!/usr/bin/perl
2:
3: use Socket;
4: require "./network.pl";
5:
6: $NETFD = makeudpserv($ARGV[2]);
7:
8: $addr = gethostbyname($ARGV[0]);
9: $port = getservbyname($ARGV[1], 'udp');
10:
11: $servaddr = sockaddr_in($port, $addr);
12:
13: $rin = "";
14: vec($rin, fileno(STDIN), 1) = 1;
15: vec($rin, fileno($NETFD), 1) = 1;
16:
17: while (1) {
18:
19: select $ready = $rin, undef, undef, undef;
20:
21: if (vec($ready, fileno(STDIN), 1) == 1) {
22: sysread STDIN, $mesg, 256;
23: send $NETFD, $mesg, 0, $servaddr;
24: }
25: if (vec($ready, fileno($NETFD), 1) == 1) {
26: recv $NETFD, $netmsg, 256, 0;
27: print "$netmsg";
28: $netmsg = "";
29: }
30: }
31: close $NETFD;
In order to test this program, it must be run in either two windows on the
same system, or on two different systems. At one command-line session,
execute the following command, where 'iest' is the host on which the second
command will be run:
$ ./udptalk iest test test1
On the second host, run the following command, where 'iest' is the host on
which the first command was run:
$ ./udptalk iest test1 test
Each session will wait for keyboard input. Each line that is typed by one
program is printed by the other, after you press Enter.
In order to perform the two-way communication required for this, both
instances of udptalk have to bind to a well-known port. To permit this on a
single workstation, the program accepts two port names as the second and
third command line arguments. For obvious reasons, two programs cannot
register interest in the same port.
In line 6 of the above listing, udptalk uses makeudpserv() to create a UDP
socket and bind it to a well known port.. For the examples here I used 8000
for one copy and 8001 for the other.
In lines 8-11, you perform the usual procedure for building a network
address. This will be the address to which the keyboard input is written.
Lines 13-15 build bit vectors in preperation for the select() function. In
perl, a 'bit vector' is a scaler variable that is handled as an array of
bits, ie: instead of being evaluated as bytes that add up to characters or
numbers, each individual bit is evaluated as a distinct value.
In line 13, you create a variable ($rin) and tell the perl interpreter to
clear it. You then use the vec() and fileno() functions to determine the
file number for STDIN (the keyboard) and set that bit in $rin. Then you do
the same for the socket created by makeudpcli(). Therefore, if STDIN uses
file descriptor 1 (which is generally the case), the second bit in $rin is
set to 1. (bit vectors,, like other arrays, start numbering indexes at zero).
Fortunatley, the vec() function can be used to read bit vectors also, so you
can treat these data structures as opaque (which is nice :)).
select() is a key function for systems programmers. Sadly, it suffers from
an arcane interface that is intimidating in any language. System V UNIX has
a replacement, poll(), that is a little easier to use, but it is not
available on Linux or within Perl. The following is the function description
for select():
select readfds, writefds, exceptfds, timeout;
Like most of the UNIX system interface, this is virtually identical to
select() in C/C++. select() is used for discovering which file descriptors
are ready for reading, are ready for writing, or have an exceptional
condition. An exceptional condition usually corresponds with the arrival
of 'out-of-band' or urgent data. This data is most frequently associated
with TCP connections. When a message is sent out-of-band, it is tagged as
being more important than any previously sent data and is placed at the top
of the data queue. A client or server can use this to notify the process on
the other end of a connection that is existing immediately.
The first three arguments are bit vectors that correspond to the file
descriptors that you are interested in reading or writing to or that you are
monitoring for exceptional conditions. If you aren't interested in a set of
file descriptors, you can pass undef instead of vector. In the udptalk
listing, you aren't interested in writing or exceptions, so you pass undef
for the second and third arguments.
When select returns, only the bits that correspond to files with activity
are set, if any descriptors aren't ready when select returns, their
settings are lost in the vector.. For that reason, you have select() create
a new vector and copy it into $ready. This is done by passing an assignment
to select() as the first argument in line 19.
The last parameter is the time-out interval in seconds. select() waits for
activity for this period. If the period expires with no activity occuring,
select() will return with everything in the vector cleared. Because undef
is supplied for timeout in line 19, select() will blobk until a file is
ready.
Inside the while-loop entered in line 17, you call select(), passing it the
bit vector built earlier and the new one to be created. When it returns, you
check the vector using vec() with pretty much the same syntax as you used to
set the bits, however, because you are using == instead of =, vec() returns
the value of the bit instead of setting it.
If the bit for STDIN is set, you read from the keyboard and send it to the
other instance of udptalk. If the bit for the socket is set, you read from
it and print it to the terminal. This sequence illustrates a very important
advantage of the sockets interface. The program is extracting data to and
from the network using the same functions as the keyboard and screen.
This program is called 'multiplexing' and is the loop at the core of many
network-aware applications, although the actual mechanisms can be concealed
by sophisticated dispatchers or notifiers that trigger events based on
which connection is ready to be read from or written to.. Something else
missing in the udptalk listing is the minimum amount of error checking and
signal handling that cleans up connections when a quit signal is received.
I/O Multiplexing with TCP
-------------------------
In order to demonstrate TCP multiplexing, it is necessary to create different
programs for the client and server. The server, tcplisten, is shown below
and is the one that requires the most scrutiny. The client, tcptalk is very
similar to the server and so I won't print it, but I will explain how the
client works as I cover the server.
1: #!/usr/bin/perl
2:
3: use Socket;
4: require "./network.pl";
5:
6: $NETFD = makelisten($ARGV[0]);
7:
8: while (1) {
9:
10: $paddr = accept(NEWFD, $NETFD);
11:
12: ($port, $iaddr) = sokaddr_in($paddr);
13:
14: print "Accepted connection from ", inet_ntoa($iaddr),
15: " on port number ", $port, "\n";
16:
17: $rin = "";
18: vec($rin, fileno(STDIN), 1) = 1;
19: vec($rin, fileno(NEWFD), 1) = 1;
20:
21: while (1)
22:
23: select $ready = $rin, undef, undef, undef;
24:
25: if (vec($ready, fileno(STDIN), 1) == 1) {
26: sysread STDIN, $mesg, 256;
27: syswrite NEWFD, $mesg, length($mesg);
28: }
29: if (vec($ready, fileno(NEWFD), 1) == 1) {
30: $bytes = sysread NEWFD, $netmsg, 256;
31: if ($bytes == 0) { goto EXIT; }
32: print "$netmsg";
33: $netmsg = "";
34: }
35: }
36: EXIT; close NEWFD;
37: print "Client closed connection\n";
38: }
39:
40: close $NETFD;
The server creates a listening socket in line 6 and then immediately enters a
while loop. At the top of the loop is a call to accept(). By placing this in
a loop, the server can repeatedly accept client connections, like the other
TCP server. The listen socket, $NETFD, can accept more than one connection,
regardless of the state of any file descriptors cloned from it using
accept().
accept() returns the address of the connecting client. You use this address
in lines 12 and 14 to print out some information about the client. In line 12
you use sockaddr_in() to reverse engineer the fully qualified address back
into a network address and a service port. Then you use print to display it
on the terminal. Note the call to inet_ntoa() embedded in the print command.
Then you set up for a select() loop using almost the same code as in the
udptalk listing. There is, however, a key difference in the way the network
connection is handled. You are reading with sysread() again, but you are
saving the return value.
When a peer closes a TCP connection, the other program receives an EOF
indication. This is signified by marking the socket as ready for reading
and returning zero bytes when it is read.. By saving the number of bytes
returned by sysread(), you are able to detect a closed connection and record
it and then return to accept() at the top of the outer while loop.
The following is a server session, followed by a client session that is
communicating with it. The client tcptalk, is a copy of tcplisten.
$ ./tcplisten test
Accepted connection from 10.8.100.20 on port number 29337
Hello, world.
Goodbye, cruel....
Client closed connection
$ ./tcptalk iest test
Hello, world.
Goodbye, cruel....
^C
Advanced Topics
---------------
One of the biggest issues of TCP applications is queueing messages. Depending
on the nature of the data being transferred, the network bandwidth available,
and the rate at which clients can keep pace with the data being delivered,
data can be queue up. Experienced application desingners geberally specify a
queuing mechanism and the rules associated with it as part of the initial
product description.
UDP applications have to wrestle with data reliability, and some schemes rely
on message sequence numbers. All nodes involved in a transaction (or a
series of transactions) keep track of a numbering scheme. When a node
receives a message out of order, it sends a negative acknowledgement for the
message that it missed. This sort of scheme greatly reduces traffic when
everything goes well but can become very expensive when things fall out of
sequence.
Some applications can use asynchronous I/O in order to service network
traffic and other tasks in a single application. This scheme registers
interest in a signal that can be delivered whenever a file descriptor has
data ready to be read. This ,ethod is not recommedned though, because only
one signal can be delivered for all file descriptors (so select() would still
be needed) and because signals are not reliable.
Secur1ty is always a big issue, regardless of the protocols being used. UDP
is being used less and less over the Internet, essentially because it is very
easy to impersonate a host when no connections are required. Even TCP
connections, however, can be spoofed by someone who has an understanding of
the Internet Protocol and WAN technology. For that reason, applications that
require a high level of security don't rely on TCP to keep them secure and
tend to use encryption and authentication technology.
Summary
-------
Okay, this article covers a lot of ground in a short time. I can't be bothered
to write a proper summary so this is gonna be it, I hope that this has given
you enough information but if not then you can just mail me at the address at
the top of this file. Peace.
Shouts
------
[hybrid] [jasun] [force] [shadowx] [knight] [devious] [frink] [sintax]