Internet Software Backbone: Sockets

IT expert Prashant Khergaonkar explains the complex topic of sockets in layman's terms. This article explains the basic protocols, including IP, TCP, HTTP, and SMTP, and the basic socket functions required to write a full-fledged network utility.

Like this article? We recommend

Like this article? We recommend

From a user's point of view, the Internet that we know today is a
browser where sites can be viewed and searched, information can be entered,
songs can be played, movies can be viewed, information can be sent and received
through email, and files can be sent and received using FTP tools.

Most of the transfer of information on the Internet is done using the HTTP
protocol. Email transfer is usually done using the SMTP and POP protocols. Files
are usually downloaded or uploaded using the FTP protocol. These protocols ride
on top of the TCP/IP protocol.

But what is the software mechanism used to transfer information across the
net using these protocols? The answer is sockets.

Sockets can be defined as the software backbone of the Internet. The Internet
is what it is today because of the socket paradigm.

What Are Sockets?

The socket paradigm was first defined for the C language under UNIX for
network programming. It was later incorporated into the MS Windows environment
and is called WINSOCK. If you understand the use of sockets in an algorithmic
sense, you should have no problem applying and using them in different platforms
and environments.

Many functions in these environments deal with sockets. The basic function
names and the parameters that they take are generally the same in the different
environments, such as C under UNIX, Perl, and WINSOCK in Windows on the highest
level.

Before we look at the functions in detail, a few basic concepts must be
understood.

What Is a Port in Socket Terminology?

In very basic terms, a port is a unique number used to identify an
application on a machine. For example, let's say that there are two
programs, PA and PB (applications), running on one machine on a network. Now
imagine that a second machine on the network needs to use program PA on Machine
1, so Machine 2 sends a request to Machine 1 to use PA using the unique
identifier of PA, which can be called a port. Applications that listen to and
service requests from other machines are assigned port numbers.

What Is Network Bandwidth?

In very basic terms, network bandwidth is the amount of information that can
be transmitted over a network in a given amount of time.

What Is a protocol?

A protocol is a set of rules or standards to be followed to do a particular
task. Therefore, to transmit data between different machines over a network, a
set of rules or standards must be followed. These rules define the procedure of
establishing a connection, packaging information, transmitting information,
addressing the packets of information, performing error checking (if any),
performing error correction (if any) over the network, and so on.

There are two types of network rules or protocols:

Connection-oriented—An example is the Transmission Control
Protocol (TCP)

Connectionless—An example is the User Datagram Protocol
(UDP)

Connection-oriented protocols provide reliable packet delivery to the
destination using mechanisms such as parity and checksum calculations per
individual data packet. Although this ensures data integrity and reliability, it
requires higher bandwidth. This is not the case with connectionless protocols
because they do not ensure integrity or reliable delivery of data packets to the
destination.

Information to be sent over a network is broken into units of bytes called
packets. In the case of TCP, each packet consists of the actual data and also
checksum and parity data, which is used to ensure data integrity and
reliability. In the case of UDP, no such information is stored in the packet;
therefore, less bandwidth is required compared to TCP to send the same amount of
information over a network.

In the case of TCP, therefore, data is guaranteed to reach its destination,
which is not true in the case of UDP. This also leads to a slow transmission of
data over the network using TCP compared to UDP.

Each machine on a network is identified by an Internet Protocol (IP) address.
Each packet of information contains the IP of the destination machine.

Depending on the type of protocols used by a socket, the socket is classified
into different types: