<p>I2P is a anonymous communication network and selects subsets of peers by using peer profiling mechanisms for tunnel building. It optimizes bandwith while maintaining the anonymityof its clients. Opportunistic measurement of both per-tunnel bandwidth and the number of accepted tunnels is used. The profiling and selection system is implemented since early 2004 and the effectiveness of it have been conclusively demonstrated.</p>

19

20

<p>In this paper we document I2P's peer profiling and selection algorithms, discuss its strengths and weaknesses and describe directions for future work. We hope that it will be useful for improvements in I2P and other anonymous networks with similar features.

21

<p>

22

23

<h2>I2P Overview</h2>

24

25

<p>

26

I2P is a overlay network based on unidirectional tunnels. The project evolved from the Invisible Internet Project (IIP), which was a specialized IRC server transmitting his data through a mix network. Starting from a low-latency transport protocol for The Freenet Project it grew into an independent software. The main developers, zzz and complication, took over the maintenance of I2P after the leave of the former developer jrandom.

27

<!-- IMHO the last sentence is irrelevant and should not be in the paper. -->

28

I2P is Free Software and make use of several licenses (PD, GPL, MIT etc.). We released seven versions in 2008 and are active improving the existing codebase. The I2P software suite consists of the core router plus several software packages (webserver, BitTorrent clients etc.)</p>

29

30

31

<p>We call our routing method <em>garlic routing</em>. It is similar to the onion routing used by The Tor Project [Tor] and transmits data from peer to peer to the endpoint of the tunnel.

32

<!-- Why is it called garlic routing? IMHO we should explain it in one or two sentences. -->

33

In contrast to Tor, I2P employs a VPN-like approach. All nodes are primarily reachable within the network. Only few outproxies operate as gateway to the outside world.</p>

34

35

<p>

36

At the time of writing the I2P network consists of approximately 830 active peers (pic). This number varies over time, as 50-100 new nodes (pic) joining the network every day and others leaving. Usually on weekends the network peaks at 1000 active nodes and at weekdays the minimum is at 700 active routers. This shows a quite stable network where approximately 550 active destinations (pic) route traffic. I2P is oriented towards hidden services. All destinations inside the network are hosted upon those active peers. Only one HTTP outproxy (exit node in Tor terms) is publicly known. Furthermore an IRC gateway for I2P development discussion exists. In contrast to Tor, I2P is peer to peer traffic friendly while it distributes the encrypted traffic between all participating nodes.

37

The current bandwith used by all peers is roughly 11 MByte/sec with peaks at 16 MByte/sec (pic) and the peers builds ~40,000 tunnels (pic) producing this load. In the last 6 months the network doubled in terms of running routers, bandwidth and tunnels. So we assume a continued stable growth for the network.</p>

38

39

<h2>Tunnel Overview</h2>

40

41

<p>I2P transmits its payload through tunnels. A tunnel is a unidirectional encrypted connection between two clients. Every client has some incoming and outgoing tunnel. We distinguish between <em>exploratory</em> and <em>client tunnels</em>.</p>

42

43

<h3>Exploratory tunnels</h3>

44

45

<p>Exploratory tunnels manage network information and use low bandwith. Their main task is testing existing tunnels, queries of NetBD and building client tunnels. The paragraph <q>Peer Selection</q> describes the process of tunnel building in more detail.</p>

46

47

<h3>Client tunnels</h3>

48

49

<p>

50

Client tunnels transmit the payload for protocols like HTTP, IRC and others through I2P. These tunnels can be high bandwidth and reach up to 200 kbyte/sec.</p>

51

52

<p>

53

Each service in I2P has a destination as a representation of the location to reach the service. Examples are servers like eepsites (the I2P internal webpages), monotone server, IRC server or audio streaming servers, and the clients have a destination as a return address, e.g. IRC clients, bittorrent clients, monotone clients or else. Servers have stable destinations while clients destination are created new on every start.</p>

54

55

<p>

56

Each destination has an associated set of tunnels to transport the data. The application and/or user can control the specifications of the tunnels:</p>

57

58

<dl>

59

<dt>distance</dt>

60

<dd>0-2 peers in between</dd>

61

<dt>variance</dt>

62

<dd>a variance of 0-2 peers added/subtracted on random base</dd>

63

<dt>quantity</dt>

64

<dd>up to three tunnels can be setup and data is distributed on them</dd>

To provide anonymity of the destinations, the route from a client to server is divided into two tunnels: the <em>outgoing tunnel</em>, controlled by the client, and the <em>incoming tunnel</em>, controlled by the server. The connection between these two tunnels is direct, the endpoint (<em>outbound gateway</em>) of the outgoing tunnel connects direct to the inpoint (<em>inbound gateway</em>) of the incoming tunnel. This setup allows every participant to control its security and anonymity level for himself by setting the distance and variance for his tunnels.</p>

71

72

<p>

73

Each client selects the peers with which it builds his part of the tunnel by himself, based on the rating described further down. In this tunnel building process the strict ordering within a tunnel is obeyed, two or more I2P peers within the same subnet (/24,/16) cannot be used within thesame tunnel, which restricts the predecessor attack.</p>

74

75

<p>

76

To secure the tunnels and the route from client to server further more, each tunnel in I2P has a fixed lifetime of 10 minutes and will be discarded after this time. A new tunnel will be created before the existing one will time out. Data will be sent on one or more of the tunnels associated with the specific destination.</p>

77

78

<p>

79

Peers can reject or drop tunnel build requests sent by other peers for a number of reasons (overload, shutdown in progress, limit reached, out of sync). Even within the lifetime of a tunnel these can be discarded due to overload or unexpected shutdown of one of the peers in the tunnel. To prevent a breakage of the destination, each tunnel is tested on a periodic basis. This creates about 1 Kbyte/minute of traffic, which is very low bandwith.</p>

80

81

<p>The client tunnels on a peer are sorted into different classes:</p>

82

83

<dl>

84

<dt>application tunnels</dt>

85

<dd>the tunnel starts or ends in this peer, bound to a server or client on the peer</dd>

86

<dt>participating tunnels</dt>

87

<dd>this peer is a member in one of the in/outgoing tunnels and the participating tunnels come in from a peer and continue through to another peer.</dd>

88

</dl>

89

90

<p>Three cases are possible: endpoint or "outbound gateway" of an outgoing tunnel, entrance or "inbound gateway" of an incoming tunnel, and a middle peer of a participating tunnel.</p>

91

92

<h2>NetDB Overview</h2>

93

94

<p>The NetDB is a collection of information stored about each peer. It containes for each peer the RouterInfo and for each known service the LeaseSet, which is the hidden service descriptor. LeaseSet will not be described here, for more info into this topic look into the I2P website (ref).</p>

95

96

<p>The RouterInfo is a small collection of all vital information describing a peer. For example, the IP address, port, peer ID, I2P stable version number, net ID version, and some statistical bandwith data are included. All of the these data are not trusted by the peer except for the K&nbsp;classification of bandwith. Each peer is set by default to 48KByte input and 24KByte output bandwith with 90% share percantage (90% of maximum bandwith is allowed to be used up by participating tunnels). These peers will fit into the L&nbsp;class of bandwith.</p>

97

98

<p>Each peer is sorted into a bandwith class based on the configured shared bandwith: These classes are named K,L,M,N,and O. Limits are &lt;12 kbit/sec for K, 12+ L, 32+ M, 64+ N,128+ kbit/sec O. This classification is just for information and except the K&nbsp;class not used for any routing information (peers of class K are not used at all for routing participating tunnels). Statistics shows only a small number of K&nbsp;class routers (96 % are class L-O) which indicates most peers are capable of routing tunnels. But if we trust these (user set) classification, serious anonymity issues and DOS implications (like the "low-ressource attacks") will rise up. This leads to the following implementation of peer profiling.</p>

99

100

<h2>Peer Profiling and Tiers </h2>

101

102

<p>Peer profiling was proposed for the I2P network in December 2003 by jrandom [Jr03]. It was introduced in the 0.3 release in March 2004 [Jr04].</p>

103

104

<p>

105

Peer selection within I2P is simply the process of choosing which routers on the network to route locally-generated messages (and replies to those messages) to go through. In other words, which peers are used for tunnels. To accomplish this, the router maintains a database of each peer's performance, called the "profile". The router uses that data to estimate the bandwidth of a peer, how often a peer will to accept tunnel build requests, and whether a peer seems to be overloaded or otherwise unable to reliably perform. The profiling system includes mechanisms similar to the "opportunistic bandwidth measurement algorithm" proposed for Tor [SB08], and much more.</p>

106

107

<p>

108

The profile contains several statistics and is continuously updated. For each statistic, averages, event counts, and maximums for several periods (for example 1 minute, 10 minute, 1 hour, and 24 hour). Example statistics are: how long it takes for them to reply to a network database query, how often their tunnels fail, and how many new peers they are able to introduce us to. The profiles are also stored locally on disk, so that they are persistent across restarts and router need not start over collecting data. Also, the profile stores several timestamps, such as the last time a peer was heard from, the last time it accepted a tunnel request, when the last communication error occurred, and others.</p>

109

110

<p>

111

Much of the profile data is unused in the current software. It remains a "reserve" defense that can be easily used to enhance the router's resistance to theoretical or actual attacks, such as denial-of-service attempts, selective packet drops, network database (floodfill) disruption, and others.</p>

112

113

<p> The profile data is periodically coalesced and sorted into tiers of peers that are used for various functions, as described further below. The picture below shows a page from the I2P router console, with a portion of the profiles sorted into tiers.</p>

114

115

<p>The Fast tier contains 8 peers minimum, peers are 'promoted' if the tier is too small. This provides a good collection of peers for tunnels.

The speed calculation (as implemented here) simply goes through the profile and estimates how much data can be sent or received on a single tunnel through the peer in a minute. For this estimate it just looks at past performance.

170

Specifically, it is the average of the bandwidth of the fastest three tunnels, measured over a one-minute period, through that peer.

171

Previous algorithms used a longer time, weighing more recent data more heavily, and extrapolating it for the future. Another previous calculation used total (rather than per-tunnel) bandwidth, but it was decided that this method overrated slow, high-capacity peers (that is, slow in bandwidth but high-capacity in number of tunnels). Even earlier methods were much more complex.</p>

172

173

<p>The current speed calculation has remained unchanged since early 2006, and it is a topic for further study to verify its effectiveness on today's network. </p>

174

175

<p> Only a router's locally-generated or -received traffic is used for these measurements - transit or "participating" traffic is not used. As it is not known if the peer before or after the router in a tunnel is a true participant or the originator or destination, that data would not be valid. Also, this could be gamed to get a router to rank some peer of their choosing as quite fast. Having a remote peer influence the rankings in this way could be dangerous to anonymity.</p>

176

177

178

<p> This algorithm is similar to the 'opportunistic bandwidth measurement' proposed in tune-up for Tor (ref). However, I2P uses a modified peak per-tunnel measurement.</p>

179

180

<h3>Capacity</h3>

181

182

<p>

183

An estimate of peer tunnel capacity, defined as the <em>number of successful tunnel builds</em> through the peer in a time period, is crucial to the smooth operation of an I2P router. As tunnel building is expensive, it is important to rate peers based on their willingness to accept tunnels. Peers may reject or drop tunnel requests for any number of reasons. Generally the rejection or drop is caused by a bandwidth limit (from participating tunnels or locally generated traffic), a processing (CPU) limit which manifests itself as large request processing delay, or an absolute limit on participating tunnel count.</p>

184

185

<p> The actual capacity rating, before adjustments (see below), is as follows:

186

Let r(t) be the successful builds per hour, over a certain time period;

187

R = 4*r(10m) + 3*r(30m) + 2*r(1h) + r(1d);</p>

188

189

<p>

190

The capacity calculation simply estimates how many tunnels the peer would agree to participate in over the next hour. For this estimate it looks at how many the peer has agreed to lately, how many the peer rejected, and how many of the agreed to tunnels failed.</p>

191

192

<p> In addition, it includes a 'growth factor' (when the peer isn't failing or rejecting requests) so that we will keep trying to push their limits. Growth factor adds a little bit each time, so that new peers are tried. </p>

193

194

<h3>Other Factors</h3>

195

<p> For both metrics, "Bonuses" can be used to manually adjust preferences for individual peers.</p>

196

197

<p>The profiling system does not use "active bandwidth probing" or generation of special-purpose traffic for measurement. Nor does it used several other statistics available in the profile, including latency.</p>

198

199

<p>The two metrics above are relative to the router's demand, not absolute, as they depend on traffic. The router also maintains an "integration" metric reflecting the number of other peers that peer has told the router about. The integration metric is not used for peer selection.</p>

200

201

<h2>Capacity: Crime, Blame, and Punishment </h2>

202

<p> Raw tunnel build capacity isn't sufficient - it is essential to decrement for bad behavior, because tunnel build attempts are expensive. A tunnel build request is about 4KB, and the reply is identically-sized. Generation of the build request also consumes significant CPU and entropy to create the cryptographic keys. As an example of the factors that we must consider:</p>

203

204

<ul>

205

<li> A peer that accepts 10/10 tunnels is better than one that accepts 10/100.</li>

206

<li> A peer that explicitly rejects a request is better than one that drops it.</li>

207

<li> A peer that accepts a request but later drops data through the tunnel is the worst.</li>

208

</ul>

209

210

<p>

211

The tunnel-failure part of the calculation was disabled for a long period, but it was reinstated for release 0.6.2, as it became apparent that recognizing and avoiding unreliable and unreachable peers is critically important. Unfortunately, as the tunnel tests require the participation of several peers, it is difficult to positively identify the cause of a failure. The recent changes assign a probability of failure to each of the peers, and uses that probability in the capacity calculation. These recent changes may require additional adjustment, and are a topic for further study to verify their effectiveness on today's network. </p>

212

213

<p> Therefore, the capacity is adjusted as a form of "punishment": Capacity is decremented for build request rejections, build request timeouts,

214

and tunnel test failures. Unfortunately, a router does not know which of the tunnel peers to blame when a request or tunnel test message is dropped. This is because tunnel builds requests are handed off from peer to peer along the path, and because tunnels are unidirectional. What should the router do with this incomplete information?</p>

215

216

<ul>

217

<li> Naive solution: Don't bother blaming anybody. This doesn't work well at all!</li>

218

<li> Much better solution: Blame everybody, with a weighted value. The bad peers will be discovered over time because they will be consistently bad. The metric for good peers will still be pretty good.</li>

219

</ul>

220

221

<p> As an example, assume a tunnel build request (4 outbound hops through peers A-B-C-D) has expired. The reply was due back through the inbound exploratory tunnel (2 hops E-F). One option is to blame A, B, C, and D with weight 1/8, blame E-F with weight 1/4. Alternately: It was probably E's fault, because that's the usual failure point in I2P (the gateway of the inbound tunnel can't be reached), so blame E with weight 3/8 and E with weight 1/8. (no change to the outbound blame). (pic needed here)</p>

222

223

<h2>Tiers and Sorting </h2>

224

225

<p> To review, exploratory tunnels are generally low-bandwidth, and are used for router operation, including building and testing other tunnels. Client tunnels are used for all user client and server traffic, including accessing internal I2P network destinations or "hidden services" such as "eepsites", connection to external gateways ("inproxies" and "outproxies") and other uses.</p>

226

227

<p> Every 30 seconds, all the profiles are sorted into three tiers:</p>

228

229

<ul>

230

<li> The Not Failing tier is everybody, including the following two tiers. Typical size is 300-500 peers.</li>

231

<li>The High-Capacity tier includes peers with above-average rating for accepting tunnel build requests, and the following tier. Typical size is 10-30 peers.</li>

232

<li>The Fast tier includes peers from the High-Capacity tier whose speed rating (i.e. peak bandwidth per tunnel) is above-average for all peers in the Not Failing tier. Typical size is 8-15 peers.</li>

233

</ul>

234

235

<p>

236

Both the speed and capacity metrics are skewed, with a small number of high ratings and a "long tail". By using the average and not the median, we select a small number of peers for the top two tiers. </p>

237

238

<h2>Peer Selection </h2>

239

240

<p>Candidate peers for tunnel builds are selected as follows. Client tunnels are built from peers in the Fast tier. Exploratory tunnels are built from peers either in the Not Failing tier or the High Capacity tier. The tier selected is chosen on a per-build basis, using a weighted random function. This function uses data on the exploratory tunnel build success vs. the client tunnel build success to adjust the weighting.

241

As exploratory build success declines, the router builds more tunnels from the high capacity tier, to limit the amount of effort spent on the expensive tunnel build request operation.

242

Therefore the selection maintains the balance between tunnel build requests and the need to explore peers.</p>

243

244

<p>

245

It may seem curious to use the Not Failing tier (generally the lowest-bandwidth, lowest-capacity peers) for exploratory tunnels, since these tunnels are required to function for a router to build client tunnels. However failing exploratory tunnels are recognized quickly and so in practice this well. And the benefit, of course, is opportunistic bandwidth and capacity measurement.</p>

246

247

<p> Peers are selected with Equal weight within each tier. If sufficient peers for a tunnel are not found within a given tier, the peer selection moves on to the next-lower tier.</p>

248

249

250

<h2>How It Works, Lessons for Other Implementations </h2>

251

252

<p> The profiling and peer selection system works quite well in I2P. The core system has been in place since the beginning of I2P. The system includes almost everything proposed in [SB08] except for "tunability".

253

Most importantly, I2P does not use claimed bandwidth, which prevents low-resource attacks. It also works well enough that peer selection is not a big obstacle to I2P network performance. There are a lot of other issues that we are working on. The basic method has been in place for five years, and has been tuned only modestly in the last two years. </p>

254

255

<p> The algorithm is stable, i.e., the Fast and High Capacity tier members don't change rapidly. This is because when a router uses a peer for tunnels, it tends to increase that peer's speed and capacity metrics. This keeps it in the pool. This is a desirable quality for anonymity, as using a large pool for tunnel building tends to assist predecessor attacks (ref).</p>

256

257

<p> The tier system is fast-reacting to individual peer failure or overload, and to increased local demand for bandwidth. The speed and capacity metrics are highly weighted for recent performance. When a peer starts to drop test messages, or fails to respond to tunnel build requests, it will quickly be demoted out of the high-capacity pool. As bandwidth demand increases, the speed metric for individual peers will rapidly increase, and the fast tier will quickly become reorganized to include the newly-recognized fast peers.</p>

258

259

<p> The tier system tends to use the highest-bandwidth peers when the network is not congested. As congestion increases, the total network traffic "spreads out" to lower-capacity peers. From an overall network perspective, this is optimal as it maintains a similar level of service for all routers.</p>

260

261

<p> The profiling system does not over-optimize. The router uses its own, normally-generated traffic for peer profiling. No high-bandwidth test messages are required or used. When a router does not require high bandwidth or a high number of tunnels, the metrics for each peer are correspondingly lower. Therefore, even a low-bandwidth peer may be classified as "fast" and used for tunnels. This tends to, at least partially, spread low-bandwidth tunnels broadly across the network, and leaves the high-capacity peers for high-speed traffic. However, even very-low-bandwidth routers tend to accurately find a few fast peers and thus are well-prepared when higher bw is demanded. </p>

262

263

<p>

264

Also, there is no need for a complete global optimum. I2P routers generally know only a subset of the active peers in the network, usually 20% to 80%. Through exploratory tunnel building and other peer discovery mechanisms, routers have no difficulty finding a sufficient portion of peers, and preventing network partitioning. As the I2P network grows the percentage of peers known to each router will decline as we implement additional mechanisms to limit memory usage, TCP and UDP connections, and other resources in the router. This poses no threat to the profiling system.</p>

265

266

267

<p> The profiling system is persistent across restarts, and maintains measurements for the last 24 hours. This allows a recently-started router to quickly re-integrate to the network, whether the router was just restarted or has been down for some time.</p>

268

269

<p> The parameters, weighting, and calculations are quite powerful "knobs" to turn. It is difficult to test and debug in a distributed network, impossible to fully optimize, and can take months or years to get right. The I2P router contains a framework for local network simulation and testing; however, we have not used this framework for profiling and selection testing. As described above, the routers include bandwidth and tunnel build success statistics in the network database entry they publish to the floodfill routers. While this information is not trusted or used in the router, it is gathered by the stats.i2p website (ref). On that website, several network performance graphs are presented, and the I2P developers make heavy use of this facility to monitor the network and judge the effectiveness of software changes in each release.</p>

270

271

272

<p> Here are some other lessons for those wishing to "tune up" their own network.</p>

273

274

<p> The basic measurements are much simpler than they used to be. The speed calculation, for example, was at one time over 400 lines of code, and it is now essentially a one-liner. The punishment for bad behavior is what keeps the network running well, and also is an area for further research. How heavily a router punishes determines how fast the load spreads out across the network as the it gets busy, sand how quickly an overloaded peer gets avoided. Too The implementation contains an implicit estimate of the cost of a tunnel build request, as it rates the relative performance of a rejected requests and a dropped request. Or, alternatively, an accepted request vs. a request not made at all. Another way of looking at it is to establish a baseline of a peer that we have not made any requests of. Call that a capacity rating of zero. What percentage (or absolute number) of request rejections, dropped requests, and test failures is required to drop the capacity rating below zero? </p>

275

276

<p>If peers are punished too heavily, the network will tend to congestion collapse as most peers are driven to negative capacity ratings and tunnel load spreads out quickly throughout the network, as routers attempt to route tunnels through very-low-capacity peers.</p>

277

278

<p>If peers are punished too lightly, routers will be slow to react to overloaded peers, and maintain the same high capacity peers for too long. In other words, a router will accept poor performance from a peer even when better peers may be available.</p>

279

280

<p> We recommend that those wishing to implement a profiling and selection system start with relatively simple algorithms. It is always possible to add complexity later if necessary. In practice, you will probably have to simplify again later.

281

For example, I2P's speed calculation used to be pages and pages of code, now it is a one-liner.</p>

282

283

284

<h2>Summary</h2>

285

<p>

286

I2P's exploratory tunnels average only 100KB in 10m (~150Bps). Idle client tunnels average only 10KB in 10m from tunnel tests (~15Bps). Surprisingly, this is sufficient to find sufficiently fast, high-capacity peers. When a router requires little bandwidth, it doesn't care how accurately its peers are sorted into tiers. When the router does require more bandwidth, the tiering will be correspondingly better. Even very-low-bandwidth routers tend to accurately find fast peers and thus are well-prepared when higher bw is demanded.</p>

287

288

289

<p> To use the terms of [SB08], I2P's peer profiling and selection system is an opportunistic bandwidth measurement algorithm that is sensitive to network load and client demand. It does not use self-reported values. However it does not provide a "tunable" mechanism for users to trade off anonymity and performance, and we do not think such a mechanism is necessary in I2P. Not only is active bandwidth probing (i.e. generating large amounts of special-purpose data for testing is not practical, as [SB08] states, it is not necessary. In addition to the bandwidth measurements proposed in [SB08], I2P measures tunnel build acceptance rate, with adjustments for various bad behavior by peers. I2P's profiling and selection system has been in continuous use for approximately five years.</p>

290

291

<p>While the base system works well, several adjustments (some described above) are possible. The authors will continue to research and tune I2P's peer profiling and selection system in the coming months. We hope that the information in this paper will prove valuable to the developers of Tor and other anonymous networks.</p>

292

293

294

<h2>References </h2><b>dont edit, zzz already has copied over</b><ul>

295

<li>[I2P]This paper includes material adapted from http://www.i2p2.de/ written by jrandom and others.