3 OverviewCentralizedConstantly-updated directory hosted at central locations (do not scale well, updates, single points of failure)Decentralized but structuredThe overlay topology is highly controlled and files (or metadata/index) are not placed at random nodes but at specified locationsDecentralized and Unstructuredpeers connect in an ad-hoc fashionthe location of document/metadata is not controlled by the systemNo guarantee for the success of a searchNo bounds on search timeNo maintenance costAny kind of query (not just single key or range queries)

8 Search in Unstructured P2PMust find a way to stop the search: Time-to-Leave (TTL)Exponential Number of MessagesCycles (?)Note: cycles can be detected but not avoided

9 Search in Unstructured P2PBFS vs DFSBFS better response time, larger number of nodes (message overhead per node and overall)Note: search in BFS continues (if TTL is not reached), even if the object has been located on a different pathRecursive vs IterativeDuring search, whether the node issuing the query directly contacts others, or recursively.Does the result follows the same path?

11 Search in Unstructured P2PTwo general types of search in unstructured p2p:Blind: try to propagate the query to a sufficient number of nodes (example Gnutella)Informed: utilize information about document locations (example Routing Indexes)Informed search increases the cost of join for an improved search cost

12 Blind Search MethodsGnutella:Use flooding (BFS) to contact all accessible nodes within the TTL valueHuge overhead to a large number of peers +Overall network trafficHard to find unpopular itemsUp to 60% bandwidth consumption of the total Internet traffic

13 Free-riding on Gnutella [Adar00]24 hour sampling period:70% of Gnutella users share no files50% of all responses are returned by top 1% of sharing hostsA social problem not a technical oneProblems:Degradation of system performance: collapse?Increase of system vulnerability“Centralized” (“backbone”) Gnutella  copyright issues?Verified hypotheses:H1: A significant portion of Gnutella peers are free ridersH2: Free riders are distributed evenly across domainsH3: Often hosts share files nobody is interested in (are not downloaded)

16 Free Riders File sharing studies Lots of people downloadFew people serve filesIs this bad?If there’s no incentive to serve, why do people do so?What if there are strong disincentives to being a major server?

17 Simple Solution: ThresholdsMany programs allow a threshold to be setDon’t upload a file to a peer unless it shares > k filesProblems:What’s k?How to ensure the shared files are interesting?

19 Popularity of Queries [Sripanidkulchai01]Very popular documents are approximately equally popularLess popular documents follow a Zipf-like distribution (i.e., the probability of seeing a query for the ith most popular query is proportional to 1/(ialpha))Access frequency of web documents also follows Zipf-like distributions  caching might also work for Gnutella

21 Topology of Gnutella [Jovanovic01]Power-law properties verified (“find everything close by”)Backbone + outskirtsPower-Law Random Graph (PLRG):The node degrees follow a power law distribution:if one ranks all nodes from the most connected to the least connected, thenthe i’th most connected node has ω/ia neighbors,where w is a constant.

23 Why does it work? It’s a small World! [Hong01]Milgram: 42 out of 160 letters from Oregon to Boston (~ 6 hops)Watts: between order and randomnessshort-distance clustering + long-distance shortcutsIn 1967, Stanley Milgram conducted a classic experiment where he instructed randomly chosen people in Nebraska to pass letters to a selected target person in Boston, using only intermediaries who were known to one another on a first-name basis. He found that it only required a median of six steps for the letters to reach their destination, giving rise to “six degrees of separation” and the “small-world effect.”Duncan Watts and Steven Strogatz extended this work in 1998 with an influential paper in Nature that described small-world networks as an intermediate state between regular graphs and random graphs. Small-world graphs maintain the high local clustering of regular graphs (as measured by the clustering coefficient, the proportion of a nodes linked to a given node which are also linked to each other) but also have the short pathlengths of random graphs. They can be regarded as locally clustered graphs with shortcuts scattered in.Freenet networks can be shown to be small-world graphs (next slide).Regular graph:n nodes, k nearest neighbors path length ~ n/2k4096/16 = 256Rewired graph (1% of nodes):path length ~ random graphclustering ~ regular graphRandom graph:path length ~ log (n)/log(k)~ 4

24 Links in the small World [Hong01]“Scale-free” link distributionScale-free: independent of the total number of nodesCharacteristic for small-world networksThe proportion of nodes having a given number of links n is: P(n) = 1 /n kMost nodes have only a few connectionsSome have a lot of links: important for binding disparate regions togetherA key characteristic of small-world graphs is the “scale-free” link distribution, which has no term related to the size of the network, and thus applies at all scales from small to large. This distribution can be seen in Freenet. The nodes at the top left, with few connections, are the local clusters while the nodes at the bottom right, with lots of connections, provide the shortcuts that tie the network together. The outlier at far right is the group of nodes whose datastores are completely filled, with 250 entries – with larger datastores, this column shifts further to the right.

25 Freenet: Links in the small World [Hong01]P(n) ~ 1/n 1.5A key characteristic of small-world graphs is the “scale-free” link distribution, which has no term related to the size of the network, and thus applies at all scales from small to large. This distribution can be seen in Freenet. The nodes at the top left, with few connections, are the local clusters while the nodes at the bottom right, with lots of connections, provide the shortcuts that tie the network together. The outlier at far right is the group of nodes whose datastores are completely filled, with 250 entries – with larger datastores, this column shifts further to the right.

37 Trust computations in dynamic system Overloading good nodes IssuesTrust computations in dynamic systemOverloading good nodesBad nodes can provide good content sometimesBad nodes can build up reputationBad nodes can form collectives...

39 Blind Search MethodsModified-BFS:Choose only a ratio of the neighbors (some random subset)Iterative Deepening:Start BFS with a small TTL and repeat the BFS at increasing depths if the first BFS failsWorks well when there is some stop condition and a “small” flood will satisfy the queryElse even bigger loads than standard flooding(more later …)

40 Two methods to terminate each walker:Blind Search MethodsRandom Walks:The node that poses the query sends out k query messages to an equal number of randomly chosen neighborsEach step follows each own path at each step randomly choosing one neighbor to forward itEach path – a walkerTwo methods to terminate each walker:TTL-based orchecking method (the walkers periodically check with the query source if the stop condition has been met)It reduces the number of messages to k x TTL in the worst caseSome kind of local load-balancing

42 Blind Search MethodsUsing Super-nodes:Super (or ultra) peers are connected to each otherEach super-peer is also connected with a number of leaf nodesRouting among the super-peersThe super-peers then contact their leaf nodes

43 Blind Search MethodsUsing Super-nodes:Gnutella2When a super-peer (or hub) receives a query from a leaf, it forwards it to its relevant leaves and to neighboring super-peersThe hubs process the query locally and forward it to their relevant leavesNeighboring super-peers regularly exchange local repository tables to filter out traffic between them

44 Interconnection between the superpeersBlind Search MethodsUltrapeers can be installed (KaZaA) or self-promoted (Gnutella)Interconnection between the superpeers

45 Informed Search MethodsLocal IndexEach node indexes all files stored at all nodes within a certain radius r and can answer queries on behalf of themSearch process at steps of r, hop distance between two consecutive searches 2r+1Increased cost for join/leaveFlood inside each r with TTL = r, when join/leave the network

46 Informed Search MethodsIntelligent BFSquery...?Nodes store simple statistics on its neighbors:(query, NeigborID) tuples for recently answered requests from or through their neighborsso they can rank themFor each query, a node finds similar ones and selects a directionHow?

47 Informed Search MethodsIntelligent or Directed BFSquery...?Heuristics for Selecting Direction>RES: Returned most results for previous queries<TIME: Shortest satisfaction time<HOPS: Min hops for results>MSG: Forwarded the largest number of messages (all types), suggests that the neighbor is stable<QLEN: Shortest queue<LAT: Shortest latency>DEG: Highest degree

49 Informed Search MethodsAPSAgain, each node keeps a local index with one entry for each object it has requested per neighbor –this reflects the relative probability of the node to be chosen to forward the queryk independent walkers and probabilistic forwardingEach node forwards the query to one of its neighbor based on the local index (for each object, choose a neighbor using the stored probability)If a walker, succeeds the probability is increased, else is decreased –Take the reverse path to the requestor and update the probability, after a walker miss (optimistic update) or after a hit (pessimistic update)

About project

Feedback

To ensure the functioning of the site, we use cookies. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy &amp Terms.
Your consent to our cookies if you continue to use this website.