GNUtella

Authors:

Overview

Gnutella is a very simple file sharing protocol that uses the principles of peer-to-peer networking to allow users to share data. It became public domain through a process of reverse engineering of an experimental P2P client developed by Nullsoft. Although the company intended to release the specification of the protocol under the GPL at a later stage, it never came to that due to legal concerns. It is thanks to the open-source clients that appeared shortly after the protocol had been cracked, that Gnutella still appears on the P2P map today.

Its initial popularity, which was partly fuelled by the growing legal problems that Napster had at the time, was also the main reason for its early demise. As many people set about using Gnutella clients as a replacement for Napster, the poor scalability of the protocol became apparent. Although later adjustments were introduced to improve its scalability and performance, Gnutella still remains far less popular than the likes of Kaazaa, WinMX etc.
A complete revision that attempts to address all the defects of the current version was recently proposed. However, a first glance at the proposed feature set seems to indicate that the protocol will lose its remarkable simplicity.

History Of GNUtella

Gnutella was invented at Nullsoft, a subsidiary of AOL, by Justin Frankel and Tom Pepper. The program was released on the 14th March 2000 to the general public from the Nullsoft website. At the same time Napster was being investigated for the fact that its network was allowing the distribution of illegal copyrighted material. Once it was known that Gnutella was capable of doing the same as Napster and with AOL merging with Time Warner Music at the time, AOL forced Nullsoft to remove all links to Gnutella from their website. It was too late though, as for the few hours that the program was on the website, it had been downloaded by a large number of people.

Once Gnutella was out on the Internet people who had been able to download it set about reverse engineering the protocol. Within days of the official release and removal, the protocol had been reverse engineered and re-released onto the web. As different people had gone about doing the reverse engineering, many different programs using the Gnutella protocol became available on the internet e.g. Morpheus, LimeWire, GNUcleus and others. They all use the Gnutella protocol to interact but each is a separate program set to run in its own individual fashion.

When Nullsoft released the original version of Gnutella it was in the form of a beta release which the programmers at Nullsoft intended to develop and optimise. As the project was shelved, this has never happened and it has been left to the users of the network to update the protocol and ensure it works better. At the time Gnutella was released, it was running as version 0.4, which is the one still in use today. This implementation however suffers from many substantial problems which will be examined in more detail later.

One of the principles of the designers of Gnutella was to release it as an open source protocol and use the GPL scheme to allow the protocol to evolve. As the designers of Gnutella were forbidden from doing any work on the project, this never properly happened. Instead a consortium of programmers was set up to overview changes and advances in the protocol. The reason this is not true open sources is that the designers of the application were not the heads of the consortium, as had been the way with the other open source projects under the GPL. Also, the real code of the protocol was never released, so the consortium has only the backwards engineered versions of the protocol. To stick to the general idea of the GPL, a central site was set up to facilitate the discussion of Gnutella. This was set up at Gnutella.wego.com.

Gene Kan took on the role of the head of the consortium overseeing Gnutella. He was made head of the consortium due to the efforts he applied within the first few weeks to ensure that Gnutella took off as a protocol. His major input was to start a project, the seven-day plan, which quickly resolved some of the protocol problems. He also set up a central site for all things to do with Gnutella. The site that was set up was Gnutella.wego.com. As a result of his and others efforts Gnutella is now an established peer to peer networking protocol.

As head of the consortium, Kan was responsible for ensuring wide spread use/support for the protocol and also the development of the protocol. To this end, he set about publicising Gnutella and informing the world about the benefits of the protocol.

Technical Overview

The Gnutella protocol (current version 0.4) is run over TCP/IP a connection-oriented network protocol. A typical session comprises a client connecting to a server. The client then sends a Gnutella packet advertising its presence. This advertisement is propagated by the servers through the network by recursively forwarding it to other connected servers. All servers that receive the packet reply with a similar packet about themselves.

Queries are propagated in the same manner, with positive responses being routed back the same path. When a resource is found and selected for downloading, a direct point to point connection is made between the client and the host of the resource, and the file downloaded directly using HTTP. The server in this case will act as a web server capable of responding to HTTP GET requests.

Gnutella packets are of the form:

Message ID (16 bytes)

Function ID (1 byte)

TTL (1 byte)

Hops (1 byte)

Payload length (4 bytes)

where:Message ID in conjunction with a given TCP/IP connection is used to uniquely identify a transaction.Function ID is one of: Advertisement[response], Query[response] or Push-Request.TTL is the time-to-live of the packet, i.e. how many more times the packet will be forwarded.Hops counts the number of times a given packet is forwarded.Payload length is the length in bytes of the body of the packet.

Connecting

A client finds a server by trying to connect to any of a local list of known servers that are likely to be available. This list can be downloaded from the internet, or be compiled by the end user, comprising for example servers run by friends, etc. The Advertisement packets (also known as Ping or Init) comprise the number of files the client is sharing, and the size in Kilobytes of the shared data. The server replies (Pongs) comprise the same information. Thus, once connected, a client knows how much data is available on the network.

Queries

As mentioned above, Queries are propagated the same way as Advertisements. To save bandwidth, servers that cannot match the search parameters need not send a reply.

The semantics of matching search parameters are not defined in the current published protocol. The details are server dependent. For example, a search for ".mp3" could be interpreted as all files with same file extension, or any file with "mp3" in its' name, etc.

Downloading

A client wishing to make a download opens a HTTP (hyper-text transfer protocol) connection to the host and requests the resource by sending a "GET >URL<" type HTTP command, where the URL (Uniform Resource Locator) is returned by a Query request. Hence, a client sharing resources has to implement a basic HTTP (aka "Web") server.

Firewalls

A client residing behind a firewall trying to connect to a Gnutella network will have to connect to a server running on a "firewall-friendly" port. Typically this will be port 80, as this is the reserved port number for HTTP, which is generally considered secure and non-malicious.

When a machine hosting a resource cannot accept HTTP connections because it is behind a firewall, it is possible for the client to send a "Push-Request" packet to the host, instructing it to make an outbound connection to the client on a firewall-friendly port, and "upload" the requested resource, as opposed to the more usual client "download" method.

The other permutation, where both client and server reside behind firewalls renders the protocol non-functional.

Limitations of the Protocol

The current shortfalls of Gnutella are no doubt due to the fact that the current protocol was only meant to be a beta release that would be refined as necessary. It was not the finished article that the creator had envisaged. The principal shortcomings in the protocol are:

Scalability The system had been designed in a laboratory and had been set up to run with a couple of hundred users. When it became available on the internet, it quickly grew to having a user base of tens of thousands. Unfortunately at that stage the system became overloaded and was unable to handle the amount of traffic and nodes that were present in the system. Concepts that had looked good in a laboratory were showing signs of stress right from the start. Packet Life To find other users, a packet has to be sent out into the network. It became apparent early on that the packet life on some packets had not been set right and a build up of these packets started circulating around the network indefinitely. This resulted in less bandwidth being available on the network for users. Connection Speeds of Users Users on the system act as gateways to other users to find the data they need. However, not every user had the same connection speed. This resulted in problems as users on slower bandwidth machines were acting as connections to people on higher bandwidth. This resulted in connection speeds being dictated by people with the slowest connection speed, on the link to the data thereby leading to bottlenecks.

Furthermore not the entire network is visible to any one client. Using the standard time-to-live during advertisement and search, only about 4,000 peers are reachable. This arises from the fact that each client only holds connections to 4 other clients and a search/init packet is only forwarded 5 times. In practical terms this means that even though a certain resource is available on the network, it may not be visible to the seeker because it is too many nodes away. To increase the number of reachable peers in the Gnutella network we would need to increase the time-to-live for packets and the number of connections kept open. Unfortunately this gives rise to other problems: if we where to increase both the number of connections and the number of hops to eight, 1.2 gigabytes of aggregate data could be potentially crossing the network just to perform an 18 byte search query.

Another significant issue that has been identified is Gnutella's susceptibility to denial of service attacks. For example a burst of search requests can easily saturate all the available bandwidth in the attacker's neighbourhood, as there is no easy way for peers to discriminate between malicious and genuine requests. Some workarounds to the problem have been presented, but in each case there are significant compromises to be made. All in all the overall quality of service of the Gnutella network is very poor. This is due to a combination of factors, some of them of deriving directly from the characteristics of the protocol, others induced by the users of the network themselves. For example, users that are reluctant to concede any outgoing bandwidth will go to great lengths to prevent others to download files that they are 'sharing'. Similarly the ability to find a certain file will largely depend on the naming scheme of the user that makes certain files available. Conspiracy theorist will of course argue that the above tactics are being used by record companies to undermine the peer to peer revolution.

The Future of GNUtella

There is a core theme of the Gnutella architecture that is similar to so many other internet technologies: it was never designed to handle the load it currently supports. It was designed as an ad hoc technology like the Hypertext Transport Protocol and TCP/IP.

Most of the Big 5 (BMG, Bertlesmann, Time-Warner-AOL, et cetera) record companies see this protocol (and it's applied use) as merely a music pirating tool. Which, to some extent, is indeed true. The creators of the protocol were employees of Nullsoft, the makers of Winamp, one of the world's most popular MP3 players (in the Windows world, anyway). It would be quite naïve to believe that their intentions were puritan and above the station of simple music-swapping. Indeed the project was terminated because AOL owned Nullsoft.

Despite the best intentions of pirating and other nefarious activities such as current Hollywood rehashed-feel-good movies available for download and so-called warez or pirated software (as if the price of BSA endorsed software wasn't piracy in itself), the Gnutella network of users and applications will continue to evolve beyond just "stealing music." Many users are realising the vastly different and refreshing attitude to connectivity presented by the Gnutella network.

In a way Gnutella is a refreshing pastiche of past internet technologies sewn up into one technology. One has the ability retrieve the desired information from a choice of sources, even in parallel.
In theory, the user's machine can provide data of any kind at the user's discretion, thus making it a server of his favourite classical music to some über-smutty gallery of Bible quotes. Of course, in practice, the theory does not hold. Adar & Huberman discussed the problem of free-loading on the Gnutella network in their paper "Free Riding on Gnutella." Due to human nature, many people will refuse to share information simply taking it instead. This drags the architecture back to the old-school Internet ways of a few centralised servers and many clients.

Obviously, this leads to a degradation in service as most of the traffic is directed along similar lines. Causing bottlenecks in the network. Further, this leads to security problems involved in relying and using information from insecure servers that essentially act as authorities on the network. With the realisation that the large-scale decentralised model cannot continue to grow without severe problems, many of the publicly discussed changes to the protocol (e.g. Gnutella2) abandon a fully decentralised model in favour of a loose tree-like structure, creating what are known as supernodes. The supernodes loosely act as authorities. They route information and will have the ability to cache search hits. This different structure allows all of those teenie-boppers who want to download Britney Spears' latest song about how much she is not a girl, or those who just want to download miss Spears onto their hard-drive to not overload the entire network with repeated queries.

The World Wide Web in the form of HTML used meta-information to enhance searching through content for the right information. So too will new versions of the protocol as proposed by those nice people at limewire (prepend http://www. and append .com/). The meta-searching abilities being built will have the gift of learning from the severe mistakes of HTML and the rather spurious over use of the meta tag that ensued.

Community Effort

This network, as with so much of the Internet, does not belong to any select group with a surreptitious motivation. It is a community effort to connect people in a different way. Thus the its future will be documented and swayed by those who build the systems that use it.