Peer-to-Peer Programming

Introduction

What is Peer-to-peer?

This page walks through a basic introduction to developing
peer-to-peer (P2P) applications. By the end of it, you should
understand the concepts and programming constructs necessary to
implement a P2P protocol and/or application program.

The first thing we need is to understand exactly what is meant by
"P2P". This page, What Is P2P... And What Isn't,
written by Clay Shirky in 2000, gives a nice overview of the fundamental
essence of P2P, along with a little history. Briefly, P2P applications
organize and manage resources at the edges of the Internet in a
decentralized manner, with little or no interaction with centralized
servers. The resources may be storage or content (e.g. file sharing
applications), processor cycles (e.g. the SETI framework or other
programs which use your computer to perform part of some large,
distributed task when it is idle), or human presence (e.g. instant
messaging). The 'edges' of the Internet are machines such as the average
home PC which has just a single line (dial-up, DSL, cable, etc.) to
connect it to the vast Internet 'cloud.'

In a traditional client/server setup, your home PC acts as a client
and sends requests to machines-- servers-- somewhere inside the cloud of
the Internet to get things done: browsing the web or getting email. In a
P2P paradigm, on the other hand, your home PC (for example) may connect,
through the Internet, directly to tens or hundreds of other home PCs (or
other machines at the 'edge') in order to share information and data.
Because such resources at the edge of the Internet tend to be ephemeral
(they may connect and disconnect many times repeatedly in a day), P2P
protocols have to operate in an "environment of unstable connectivity
and unpredictable IP addresses" [Shirky].

P2P vs. Client/Server

So, unlike a server/client architecture where you develop
applications in two asymmetrical pieces -- the server, which provides
services and is assumed to be reliably available at a known Internet
address, and the client which connects to the server in order to request
information -- P2P applications seem a bit more tricky to develop. In a
P2P system, all machines (nodes) are running the same program (this is
somewhat of a generalization: some systems are organized so groups of
P2P nodes are running similar but different programs). Some of the
issues that need to be handled in such a situation include:

Connectivity: how to find and connect other P2P nodes that are
running in the network (unlike traditional servers, they don't have a
fixed, known IP address)

Instability: nodes may always be joining
and leaving the network (unlike servers-- web, email, etc., which we
usually depend on to "be there")

Message routing: how messages
should be routed to get from one node to another (where the two nodes
may not directly know about each other)

Searching (somewhat related
to routing): how to find desired information from the nodes connected to
the network

Security: a whole slew of issues including nodes being
able to trust other nodes, preventing malicious nodes from doing bad
things to the P2P network or the individual nodes, being able to send
and receive data anonymously, etc.

The tutorial in this document will primarily focus on the first four
issues -- the basics of getting a P2P system up and running. By the time
you reach the end of the tutorial, you will be familiar with a library
that will help you implement P2P protocols and applications. The library
provides infrastructure-related routines to help you manage issues of
socket handling, threading, and sending messages between peers. To help
you understand what is going on, the main part of this tutorial is a
walk through the development of the library itself.

A P2P Framework

Overview

Before diving into the details of actual code, you should understand
what we are trying to implement at a high-level. The figure below
illustrates how conversations happen between peers in a network. Each
application running on a node provides an interface to the user (you and
I), and is simultaneously running a "main loop" that listens for
incoming connections from other peers.

In the figure, a scenerio is diagrammed where the user on Peer 1
clicks a button, for example, a "Search" button, in the GUI interface.
The interface somehow decides to send a "Query" message to another peer,
in this case, Peer 2. The main loop of Peer 2 detects the incoming
connection request (step 2) and starts up a separate thread to handle
the actual data of the request (step 3). (A thread is a task that
a program runs simultaneously, or pseudo-simultaneously, with other
running tasks: see Thread - Wikipedia if you
are not familiar with the term. The purpose of using threads here is to
allow a peer to handle multiple incoming connections simultaneous.)

Assuming, message type "c" refers to a "Query" message, Peer 1 sends
the actual message (step 4) once it has gotten a connection to Peer 2.
In step 5, the "handle peer" task (thread) of Peer 2 receives the
message, sends an acknowledgment back to Peer 1, closes the connection,
and then calls an appropriate function/method to handle the message
based on its type.

After processing the message, the "msg c handler" function decides
that it needs to send a "Query Response" message back to Peer 1, so it
attempts to connect (step 6). Peer 1's main loop, listening for such
connections, accepts the connection and starts its separate handler
thread (step 7) to receive the actual message data from Peer 2 (step 8).
Having received the message, Peer 1 does what Peer 2 did in step 5, and
the process continues...

Components of a Peer

The implementation of the P2P protocol that each peer is running in
the diagram above is built on a simple framework that you will be
familiar with in detail by the end of this document. Here, I will give a
high-level overview of the various modules (i.e. classes, in
object-oriented terminology) that make up the framework. Note, that the
framework does not include anything to do with the user interface-- that
part of the program would be implemented seperately and would interact
in appropriate ways with the underlying framework described here.

The Peer module

The Peer module manages the overall operations of a single node in
the P2P network. It contains a main loop that listens for incoming
connections and creates separate threads to handle them. The programmer,
building a P2P protocol on top of this generic framework, would
registers handlers (i.e. methods or functions) with the Peer module for
various message types, and the main loop would dispatch incoming
requests to the appropriate handler. The Peer is initialized by
providing a port to listen for incoming connections, and optionally a
host address (i.e. IP address, which may be automatically determined)
and node identifier.

A list of known peers, which may be accessed and modified by the
programmer, is also maintained by the Peer module. The size of the list
may be limited, and peers may be accessed using their identifiers or
their sequential position in the list. Besides a list of handlers for
various method types, the node also stores a programmer-supplied
function for deciding how to route messages, and can be set up to run
stabilization operations at specific intervals.

The PeerConnection module

The PeerConnection module encapsulates a socket connected to a peer
node. The framework currently uses TCP/IP sockets for communication
between nodes. A PeerConnection object provides methods the make it easy
for the programmer to send and receive messages and acknowledgments in
the P2P algorithm. It ensures messages are encoded in the correct format
and attempts to detect various error conditions.

Messages exchanged between nodes built on this framework are
prefixed by a header composed of a 4-byte identifier for the type of the
message and a 4-byte integer holding the size of the data in the
message. The 4-byte message code can be viewed as a string, so the
programmer may come up with appropriate strings of length 4 to identify
the various types of messages exchanged in the system. When the main
loop of a peer receives a message, it dispatches it to the appropriate
handler based on the message type.

A message handler is simply a function object in Python (or an
object supporting the handler interface in Java) that receives a
reference to an open PeerConnection and the message data. Handlers can
be registered for any message type identified by a 4-byte string.
Currently only one handler per type may be used. When the Peer module
receives an incoming connection request, it sets up a PeerConnection
object, reads in the message type and remainder of the message, and
launches a separate thread to handle the data. The peer connection is
automatically closed when the message handler completes its task.