This chapter is from the book

This chapter is from the book

The Transmission Control Protocol (TCP) is a stream-based method of network
communication that is far different from any discussed previously. This chapter
discusses TCP streams and how they operate under Java.

6.1 Overview

TCP provides an interface to network communications that is radically
different from the User Datagram Protocol (UDP) discussed in Chapter 5. The
properties of TCP make it highly attractive to network programmers, as it
simplifies network communication by removing many of the obstacles of UDP, such
as ordering of packets and packet loss. While UDP is concerned with the
transmission of packets of data, TCP focuses instead on establishing a network
connection, through which a stream of bytes may be sent and received.

In Chapter 5 we saw that packets may be sent through a network using various
paths and may arrive at different times. This benefits performance and
robustness, as the loss of a single packet doesn't necessarily disrupt the
transmission of other packets. Nonetheless, such a system creates extra work for
programmers who need to guarantee delivery of data. TCP eliminates this extra
work by guaranteeing delivery and order, providing for a reliable byte
communication stream between client and server that supports two-way
communication. It establishes a "virtual connection" between two
machines, through which streams of data may be sent (see Figure 6-1).

TCP uses a lower-level communications protocol, the Internet Protocol (IP),
to establish the connection between machines. This connection provides an
interface that allows streams of bytes to be sent and received, and
transparently converts the data into IP datagram packets. A common problem with
datagrams, as we saw in Chapter 5, is that they do not guarantee that packets
arrive at their destination. TCP takes care of this problem. It provides
guaranteed delivery of bytes of data. Of course, it's always possible that
network errors will prevent delivery, but TCP handles the implementation issues
such as resending packets, and alerts the programmer only in serious cases such
as if there is no route to a network host or if a connection is lost.

The virtual connection between two machines is represented by a socket.
Sockets, introduced in Chapter 5, allow data to be sent and received; there are
substantial differences between a UDP socket and a TCP socket, however. First,
TCP sockets are connected to a single machine, whereas UDP sockets may transmit
or receive data from multiple machines. Second, UDP sockets only send and
receive packets of data, whereas TCP allows transmission of data through byte
streams (represented as an InputStream and OutputStream). They are converted
into datagram packets for transmission over the network, without requiring the
programmer to intervene (as shown in Figure 6-2).

Figure 6-2 TCP deals with streams of data such as protocol commands, but converts streams into IP datagrams for transport over the
network.

6.1.1 Advantages of TCP over UDP

The many advantages to using TCP over UDP are briefly summarized below.

6.1.1.1 Automatic Error Control

Data transmission over TCP streams is more dependable than transmission of
packets of information via UDP. Under TCP, data packets sent through a virtual
connection include a checksum to ensure that they have not been corrupted, just
like UDP. However, delivery of data is guaranteed by the TCP data packets
lost in transit are retransmitted.

You may be wondering just how this is achievedafter all, IP and UDP do
not guarantee delivery; neither do they give any warning when datagram packets
are dropped. Whenever a collection of data is sent by TCP using datagrams, a
timer is started. Recall our UDP examples from Chapter 5, in which the
DatagramSocket.setSoTimeout method was used to start a timer for a receive()
operation. In TCP, if the recipient sends an acknowledgment, the timer is
disabled. But if an acknowledgment isn't received before the time runs out,
the packet is retransmitted. This means that any data written to a TCP socket
will reach the other side without the need for further intervention by
programmers (barring some catastrophe that causes an entire network to go down).
All of the code for error control is handled by TCP.

6.1.1.2 Reliability

Since the data sent between two machines participating in a TCP connection is
transmitted by IP datagrams, the datagram packets will frequently arrive out of
order. This would throw for a loop any program reading information from a TCP
socket, as the order of the byte stream would be disrupted and frequently
unreliable. Fortunately, issues such as ordering are handled by TCPeach
datagram packet contains a sequence number that is used to order data. Later
packets arriving before earlier packets will be held in a queue until an ordered
sequence of data is available. The data will then be passed to the application
through the interface of the socket.

6.1.1.3 Ease of Use

While storing information in datagram packets is certainly not beyond the
reach of programmers, it doesn't lead to the most efficient way of
communication between computers. There's added complexity, and it can be
argued that the task of designing and creating software within a deadline
provides complexity enough for programmers. Developers typically welcome
anything that can reduce the complexity of software development, and the TCP
does just this. TCP allows the programmer to think in a completely different
way, one that is much more streamlined. Rather than being packaged into discrete
units (datagram packets), the data is instead treated as a continuous stream,
like the I/O streams the reader is by now familiar with. TCP sockets continue
the tradition of Unix programming, in which communication is treated in the same
way as file input and output. The mechanism is the same whether the developer is
writing to a network socket, a communications pipe, a data structure, the user
console, or a file. This also applies, of course, to reading information. This
makes communicating via TCP sockets far simpler than communicating via datagram
packets.

6.1.2 Communication between Applications Using Ports

It is clear that there are significant differences between TCP and UDP, but
there is also an important similarity between these two protocols. Both share
the concept of a communications port, which distinguishes one application from
another. Many services and clients run on the same port, and it would be
impossible to sort out which one was which without distributing them by port
number. When a TCP socket establishes a connection to another machine, it
requires two very important pieces of information to connect to the remote
endthe IP address of the machine and the port number. In addition, a local
IP address and port number will be bound to it, so that the remote machine can
identify which application established the connection (as illustrated in Figure
63). After all, you wouldn't want your e-mail to be accessible by
another user running software on the same system.

Figure 6-3 Local ports identify the application establishing a connection from other programs, allowing multiple TCP applications to run on the same machine.

Ports in TCP are just like ports in UDPthey are represented by a number
in the range 165535. Ports below 1024 are restricted to use by well-known
services such as HTTP, FTP, SMTP, POP3, and telnet. Table 6-1 lists a few of the
well-known services and their associated port numbers.

6.1.3 Socket Operations

TCP sockets can perform a variety of operations. They can:

Establish a connection to a remote host

Send data to a remote host

Receive data from a remote host

Close a connection

In addition, there is a special type of socket that provides a service that
will bind to a specific port number. This type of socket is normally used only
in servers, and can perform the following operations:

Bind to a local port

Accept incoming connections from remote hosts

Unbind from a local port

Table 6-1 Protocols and Their Associated Ports

Well-Known Services

Service Port

Telnet

23

Simple Mail Transfer Protocol

25

HyperText Transfer Protocol

80

Post Office Protocol 3

110

These two sockets are grouped into different categories, and
are used by either a client or a server (since some clients may also be acting
as servers, and some servers as clients). However, it is normal practice for the
role of client and server to be separate.