... the Kademlia-based file sharing applications
which recently appeared with about 1.5 millions users enabled new possibilities.
So this research report is the first to describe the structure and processes of the
Kad protocol in detail.

This section evaluates the possibility to control a data item by placing fraud
peers as close as possible to the data item. The iteration process finds these
peers because they are the closest peers to the target. A fraud peer can manip-
ulate a data item or let it disappear once it is the host peer. If a subset of fraud
peers receives all published content of a data item ID, they would have total
control over them. Figure 7.24 shows that the furthest 5 peers are rarely found.
This leads to the fact that controlling the 6 closest peers could be sufficient to
manipulate data in the Kad-network. Some film or music institutions have a
large interest in this attack, as they have already started the pollution attacks.

This section evaluates the possibility to control a data item by placing fraud
peers as close as possible to the data item. The iteration process finds these
peers because they are the closest peers to the target. A fraud peer can manip-
ulate a data item or let it disappear once it is the host peer. If a subset of fraud
peers receives all published content of a data item ID, they would have total
control over them. Figure 7.24 shows that the furthest 5 peers are rarely found.
This leads to the fact that controlling the 6 closest peers could be sufficient to
manipulate data in the Kad-network. Some film or music institutions have a
large interest in this attack, as they have already started the pollution attacks.

Could be, but most attacks on the Kad network are passive multi-identity ones. However, most of the passive attacks used today does only cause limited damage as Kad scatters data over a very large area (upto ~16000 nodes), and due to some caos logic the data is found even if the closest nodes are blocked. But, it is possible to devise far more sinister attacks on the network which could easily render it useless and that is something one need to be prepared for.

Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone,Nine for Mortal Men doomed to die, One for the Dark Lord on his dark throneIn the Land of Mordor where the Shadows lie. One Ring to rule them all, One Ring to find them, One Ring to bring them all and in the darkness bind themIn the Land of Mordor where the Shadows lie.Dark Lord of the Forum

Building a Reliable P2P System Out of Unreliable P2P Clients: The Case of KAD. Damiano Carra, Ernst W. Biersack (2007) pdf.

Very interesting....

The Periodic Replacement publishing scheme is able
to offer a high reliability, i.e. the system is robust to
node churn. However, this high reliability comes at a
high cost: the number of publishing messages is ten
times higher than the number of search messages [2].
In this paper we proposed a model of the KAD pub-
lishing scheme based on reliability theory. Starting from
the weaknesses identified by the model, we proposed an
improved publishing scheme, Desynchronized Quantile
Based Inspection (DQBI), that is able to offer the same
reliability with a dramatic reduction in cost.

What do you think about?

PS: from some days I get low id with Kad but NOT with ed2k servers.... I think Kad attacks are not foolishnesses...

Also this is interesting and very simple to read (a very good search, Nissenice )

General Conclusion:
KAD is a great subject for study
Many issues raised
- High redundancy at all level results in large amount of traffic
- Very easy to “eclipse” part or all of KAD
- Is a decentralized solution really better than the old centralized one
- Peers can be turned into DDoSbots
- Tried it on ourselves
- Over 100 Mbit/sec incoming traffic
- Theory vs. Practice: Many theoretical DHT designs
- Very few papers that address security
- Not really any practical solutions against various attacks such as Eclipse attack

Abstract
Many different distributed hash tables (DHTs) have been designed, but only few have been successfully deployed. The implementation of a DHT needs to deal with practical aspects (e.g. related to churn, or to the delay) that are often only marginally considered in the design. In this paper, we analyze in detail the content retrieval process in KAD, the implementation of the DHT Kademlia that is part of several popular peer-to-peer clients. In particular, we present a simple model to evaluate the impact of different design parameters on the overall lookup latency. We then perform extensive measurements on the lookup performance using an instrumented client. From the analysis of the results, we propose an improved scheme that is able to significantly decrease the overall lookup latency without increasing the overhead.

Abstract:
The Kad network, an implementation of the Kademlia DHT protocol, supports the popular eDonkey peer-to-peer file sharing network and has over 1 million concurrent nodes. We describe several attacks that exploit critical design weaknesses in Kad to allow an attacker with modest resources to cause a significant fraction of all searches to fail. We measure the cost and effectiveness of these attacks against a set of 16,000 nodes connected to the operational Kad network. We also measure the cost of previously proposed, generic DHT attacks against the Kad network and find that our attacks are much more cost effective. Finally, we introduce and evaluate simple mechanisms to significantly increase the cost of these attacks.

Quote

New versions of the two most popular Kad clients have recently been released – aMule 2.2.1 on June 11, 2008 and eMule 0.49a on May 11, 2008. We show that although they have new features intended to improve security, our attacks still work with the same resource requirements.

Some Support, on Aug 1 2008, 08:42 PM, said:

Quote

eMule 0.49b
(..)
-----------------------
- Jun, 27. 2008 -
-----------------------
.: Several changes were made to Kad in order to defy routing attacks researched by University of Minnesota guys [Peng Wang, James Tyra, Eric Chan-Tin, Tyson Malchow, Denis Foo Kune, Nicholas Hopper, Yongdae Kim], in particular:
.: Kad contacts will only be able to update themself in others routing tables if they provide the proper key (supported by 0.49a+ nodes) in order to make it impossible to hijack them
.: Kad uses now a three-way-handshake (or for older version a similar check) for new contacts, making sure they do not use a spoofed IP
.: Unverified contacts are not used for routing tasks and a marked with a special icon in the GUI
(..)

I'm piled up with stuff to read so I haven't had any time myself to study these materials. I only note that "Attacking the Kad Network" was announced about a year ago, but for some reason the publishing date was delayed...

Abstract. With the increasing deployment of P2P networks, supervising the malicious behaviours of participants, which degrade the quality and performance of the overall delivered service, is a real challenge. In this paper, we propose a fully distributed and adaptive revocation mechanism based on the reputation of the peers. The originality of our approach is that the revocation is integrated in the core of the P2P protocol and does not need complex consensus and cryptographic mechanisms, hardly scalable. The reputation criteria evolve with the contribution of a peer to the network in order to highlight and help fight against selfish or malicious behaviours. The preliminary results show that the user perceived delays are not highly impacted and that our solution is resistant to reputation and revocation attacks.
Index Terms—P2P networks, revocation mechanism, reputation mechanism, remote accounts, KAD

Abstract. In this paper, we assess the protection mechanisms entered into recent clients to fight against the Sybil attack in KAD, a widely deployed Distributed Hash Table. We study three main mechanisms: a protection against flooding through packet tracking, an IP address limitation and a verification of identities. We evaluate their efficiency by designing and adapting an attack for several KAD clients with different levels of protection. Our results show that the new security rules mitigate the Sybil attacks previously launched. However, we prove that it is still possible to control a small part of the network despite the new inserted defenses with a distributed eclipse attack and limited resources.

Abstract. A Distributed Hash Table (DHT) is a structured overlay network service that provides a decentralized lookup for mapping objects to locations. In this paper, we study the lookup performance of locating nodes responsible for replicated information in Kad – one of the largest DHT networks existing currently. Throughout the measurement study, we found that Kad lookups locate only 18% of nodes storing replicated data. This failure leads to limited reliability and an inefficient use of resources during lookups. Ironically, we found that this poor performance is due to the high level of routing table similarity, despite the relatively high churn rate in the network. We propose solutions which either exploit the high routing table similarity or avoid the duplicate returns using multiple target keys.

Abstract. ID uniqueness is essential in DHT-based systems as peer lookup and resource searching rely on ID-matching. Many previous works and measurements on Kad do not take into account that IDs among peers may not be unique. We observe that a significant portion of peers, 19.5% of the peers in routing tables and 4.5% of the active peers (those who respond to Kad protocol), do not have unique IDs. These repetitions would mislead the measurements of Kad network. We further observe that there are a large number of peers that frequently change their UDP ports, and there are a few IDs that repeat for a large number of times and all peers with these IDs do not respond to Kad protocol. We analyze the effects of ID repetitions under simplified settings and find that ID repetition degrades Kad’s performance on publishing and searching, but has insignificant effect on lookup process. These measurement and analysis are useful in determining the sources of repetitions and are also useful in finding suitable parameters for publishing and searching.

Abstract. Distributed hash tables (DHTs) have been actively studied in literature and many different proposals have been made on how to organize peers in a DHT. However, very few DHTs have been implemented in real systems and deployed on a large scale. One exception is KAD, a DHT based on Kademlia, which is part of eDonkey, a peer-to-peer file sharing system with several million simultaneous users. We have been crawling a representative subset of KAD every five minutes for six months and obtained information about geographical distribution of peers, session times, daily usage, and peer lifetime. We have found that session times are Weibull distributed and we show how this information can be exploited to make the publishing mechanism much more efficient.
Peers are identified by the so-called KAD ID, which up to now was assumed to be persistent. However, we observed that a fraction of peers changes their KAD ID as frequently as once a session. This change of KAD IDs makes it difficult to characterize end-user behavior. For this reason we have been crawling the entire KAD network once a day for more than a year to track end-users with static IP addresses, which allows us to estimate end-user lifetime and the fraction of end-users changing their KAD ID.

Abstract. In this poster, we present a solution to fight against paedophile activities in KAD. Our distributed architecture can monitor and act on paedophile contents in a very efficient way by controlling keywords and files. Early results on the real network demonstrate the applicability
of our approach.

Abstract. In this paper, we propose a new P2P Honeynet architecture called HAMACK that bypasses the Sybil attack protection mechanisms introduced recently in KAD. HAMACK is composed of distributed Honeypeers in charge of monitoring and acting on specific malicious contents in KAD by controlling the indexation of keywords and files. Our architecture allows to: (1) transparently monitor all the requests sent to the targeted contents in the network, (2) eclipse malicious entries of the DHT, and (3) attract the download requests of peers searching for malicious contents towards the Honeypeers by poisoning the DHT references with fake files and sources. Early results on the KAD network demonstrate the applicability and the efficiency of our approach.

Abstract. As the first DHT implemented in real applications and involving millions of simultaneous users, all aspects of Kad must be analyzed and measured carefully. This paper focuses on measuring the routing table of Kad in eMule/aMule. We present and analyze the availability and stability of routing table by crawling actively.
We find the phenomenon of ID repetition in Kad that many peers use a same ID simultaneously, which will decrease the performance of routing and then reduce the availability of routing table. The connection availability of global routing table is relatively low, the average of which is about 64.9%. Connection availability influences the efficiency of searching and routing in Kad network directly.

Abstract. Kademlia-based DHT has been deployed in many P2P applications and it is reported that there are millions of simultaneous users in Kad network. For such a protocol that significantly involves so many peers, its robustness and security must be evaluated carefully. In this paper, we analyze the Kademlia protocol and identify several potential vulnerabilities. We classify potential attacks as three types: asymmetric attack, routing table reflection attack and index reflection attack. A limited real-world experiment was run on eMule and the results show that these attacks tie up bandwidth and TCP connection resources of victim. We analyze the results of our experiment in three aspects: the effect of DDoS attacks by misusing Kad in eMule, the comparison between asymmetric attack and routing table reflection attack, and the distribution of attacks. More large-scale DDoS attack can be performed by means of a little more effort. We introduce some methods to amplify the performance of attack and some strategies to evade detection. Finally, we further discuss several solutions for these DDoS attacks.

Abstract. We analyze in detail the content retrieval process in kad. kad implements content search (publish and retrieval) functions that use the Kademlia Distributed Hash Table for content routing. Node churn is quite common in peer-to-peer systems and results in information loss and stale routing table entries. To deal with node churn, kad issues parallel route requests and publishes multiple redundant copies of each piece of information. We identify the key design parameters in kad and present an analytical model to evaluate the impact of changes in the values of these parameters on the overall lookup latency and message overhead. Extensive measurements of the lookup performance using an instrumented client allow us to validate the model. The overall lookup latency is in most cases 5 s or larger. We elucidate the cause for such high lookup latencies and propose an improved scheme that significantly decreases the overall lookup latency without
increasing the overhead.

Abstract. The constantly growing popularity of the peer to peer systems, has risen the interest in studying out their topology and dynamics. One of the mostly used approach is to create snapshots of the network at some specific points in time. The snapshots might be carried out by running distributed crawlers on the system of interest.

We are interested in studying one of the mostly deployed p2p networks, namely KAD. Up to now, there is no open source crawler available for this network. In this report we will give an approach and some up to date results on creating a crawler for the KAD system.

Abstract. Characterizing peer-to-peer overlays is crucial for understanding their impact on service provider networks and assessing their performance. Most popular file exchange applications use distributed hash tables (DHTs) as a framework for managing information. Their fully decentralized nature makes monitoring and users tracking challenging. In this work, we analyze KAD, a widely deployed DHT system. Thanks to the unique possibility to monitor a large population of about 20,000 ADSL clients at the edge of the network, we are able to characterize the content downloaded and shared by local users. We devised a passive content monitoring toolkit to reliably track users between sessions despite dynamic IP allocation. We applied our tool over one month of data. Our main findings are: (i) Over half a TB of fresh data is downloaded every day by the users we monitor, (ii) A significant fraction of peers (20%) regulary change their ID in the KAD overlay, either on a session basis or on a sub-session basis, which can be detrimental to the proper functioning of the DHT, (iii) Those users, that we term Chameleon users, are connected longer than regular users, and they (claim to) have less data in their shared folder than regular peers and (iv) As a consequence, even a non biased observation of the users shared folder can only provide a lower bound of the content downloaded and shared by a population of ADSL users.

Abstract. Studying deployed Distributed Hash Tables (DHTs) entails monitoring DHT traffic. Commonly, DHT traffic is measured by instrumenting ordinary peers to passively record traffic. In this approach, using a small number of peers leads to a limited (and potentially biased) view of traffic. Alternatively, inserting a large number of peers may disrupt the natural traffic patterns of the DHT and lead to incorrect results. In general, accurately capturing DHT traffic is a challenging task.

In this paper, we propose the idea of minimally visible monitors to capture the traffic at a large number of peers with minimum disruption to the DHT. We implement and validate our proposed technique, called Montra, on the Kad DHT. We show that Montra accurately captures around 90% of the query traffic while monitoring roughly 32,000 peers and can accurately identify destination peers for 90% of captured destination traffic. Using Montra, we characterize the traffic in Kad and present our preliminary results.

Abstract. Since the demise of the Overnet network, the Kad network has become not only the most popular but also the only widely used peer-to-peer system based on a distributed hash table. It is likely that its user base will continue to grow in numbers over the next few years as, unlike the eDonkey network, it does not rely on central servers, which tremendously increases scalability, and it is more efficient than unstructured systems such as Gnutella. However, despite its vast popularity, this thesis shows that today’s Kad network can be attacked in several ways. The presented attacks could be used either to hamper the correct functioning of the network itself, to censor contents, or to harm other entities in the Internet not participating in the Kad network such as ordinary web servers. While there are simple heuristics to reduce the impact of some of the attacks, we believe that the presented attacks cannot be thwarted easily in any fully decentralized peer-to-peer system without some kind of a centralized certification and verification authority.

Although there are many advantages of decentralized peer-to-peer systems compared to server based networks, most existing file sharing systems still employ a centralized architecture. In order to compare these two paradigms, as a case study, we conduct measurements in the eDonkey and the Kad network—two of the most popular peer-to-peer systems in use today. We re-engineered the eDonkey protocol and integrated two modified servers into the eDonkey network in order to monitor traffic. Additionally, we implemented a Kad client exploiting a design weakness to spy on the traffic at arbitrary locations in the ID space. We study the spacial and temporal distributions of the peers’ activities and also examine the searched contents. Finally, we discuss problems related to the collection of such data sets and investigate techniques to verify the representativeness of the measured data.

Btw, I just found out that there are two reports with the title 'ID Repetition in Kad'. One of them is called technical report which i linked to in my previous post. I've edited the post and added the other one too.

Abstract. We analyze in detail the content retrieval process in kad. kad implements content search (publish and retrieval) functions that use the Kademlia Distributed Hash Table for content routing. Node churn is quite common in peer-to-peer systems and results in information loss and stale routing table entries. To deal with node churn, kad issues parallel route requests and publishes multiple redundant copies of each piece of information. We identify the key design parameters in kad and present an analytical model to evaluate the impact of changes in the values of these parameters on the overall lookup latency and message overhead. Extensive measurements of the lookup performance using an instrumented client allow us to validate the model. The overall lookup latency is in most cases 5 s or larger. We elucidate the cause for such high lookup latencies and propose an improved scheme that significantly decreases the overall lookup latency without
increasing the overhead.

This one I find particular interesting as it confirms some of the findings I've done when improving my FastKAD algorithm.

For example I can confirm that more than 80% of the contacts respond within 700ms as my latest code calculates on the fly how long to wait for a contact to respond with a 95% confidence, and I've seen times as low as 550ms. It fluctuates up and down a bit as different parts of the world are awake at different times of the day.

An other thing is the parallell-ism of the search algorithm. Normally eMule search 3 different paths to the target. But if you modify the algorithm to always search the contacts closest to the target, results can be obtained with only 8 contact lookups. I've have been able to complete one search for a popular keyword with only 3 contact lookups. However if the keyword is rare so not enought result is returned (or a store operation that need the 10 closest contacts) the number of contacts that need to be looked up become very close to the amount for the original algorithm.

What I don't see the report mentioning is that the searching algorithm can be speed up significantly if you always keep n number lookup requests in flight. Also, having the lookups to timeout individualy allows to fire new lookups much faster.

From the tests I've done I did see that the time before a result arrived could be cut down to occur within 200 milliseconds (if forcing the "store" operation when the closest contacts known have responded) for a very popular keyword to 3 seconds for a rare one.