Power Law Exploration

More Web Proxy on the site http://driver.im/

An Exploration of Power-Law Networks
Part I: Introduction

Owen Densmore
Sun Microsystems Laboratories owen.densmore@sun.com

Abstract

Recent activities within the Peer to Peer (P2P) movement have prompted interest in the theoretical underpinning of the self organizing networks at the center of P2P systems. The primary characteristic of these systems is "local knowledge", application level protocols with little support within the network (servers, multicast, naming etc.). This paper reviews a core set of seminal papers which are providing the basis for understanding the dynamics of these networks. We then present an exploration of the power-law networks which arise naturally within peer systems.

Introduction

Peer applications such as the Gnutella decentralized file sharing program, rely on a self organizing network of peers (hosts) which are easily and effectively searched for services. They rely on local routing decisions rather than global, unifying servers. The following four milestone papers (along w/ many others!) have lead to a deeper understanding of the ways peer networks operate, and how they may be optimized.

The Small World Problem Milgram, S. May 1967
Collective Dynamics of 'Small-World' Networks Watts, D; Strogatz, S. 1998
Mean-field Theory for Scale-Free Random Networks Barabási, A; Albert, R; Jeong, H. 1999
Search in Power-Law Networks Adamic, L; Lukose, R; Puniyani, A; Huberman, B. Mar 2001

Small Worlds and 6 Degrees of Separation

Stanley Milgram was fascinated by the "small world" concept: you make a new acquaintance and shortly establish surprising shared connections, ending in the exclamation "My, it's a small world". Milgram wanted to understand this at greater depth.

To do so he devised a fascinating social experiment designed to see if randomly chosen "source" people could find a "target" person using only a network of friends on a first name basis. Two studies were performed, one originating in Omaha, NB, the other in Wichita, KA; both targeting a person in Cambridge, MA.

The study required the source set of people to mail a folder to the target, who's name, address and some personal data was given. The rules required the sender to send to a friend if the target was not known. The friend was chosen to be the most likely to get the folder "closer" to the target. Each person sending the folder included themselves in an enclosed roster, thus avoiding loops.

The result: the median number of intermediate friends was 5, the range being from 2-10.

Thus began the now famous 6-degrees of separation notion; that between any two people in the US (say between yourself and the President) there are a small number of links. The Kevin Bacon game, where Hollywood actors are linked to the actor Kevin Bacon by movies they co-starred in, is an example of the phenomenon.

Although Milgram gave the problem a formal academic basis, he did not shed light on why it works:

Why is there such a small number of "hops"?
How do we find them "from the inside" .. i.e. with local knowledge only?

Rewiring Clustered Graphs

The beginning of the answer to the first question came from the authors of the second paper; Watts and Strogatz. They started by looking at graphs and the metrics for graphs common in social networks.

Two key ideas they focused on were Clustering and Path Length. The first concept is a formal way to state how "cliquish" a graph is .. how tightly the links of a node in the graph are connected to each other. In social terms, how many of your friends know each other. The second idea, Path Length, is how far apart are any two nodes in the graph. Clustering is measured by the Clustering Coefficient: the number of links among a node's immediate neighbors to each other, compared to the maximum number of interconnects they might have. Numerically, if a node has K neighbors (edges), then

CC=N/(K*(K-1)/2) where N is the number of edges among neighbors of the node, and K is the node's edges. The path is found by finding choosing a node, calculating the median distance to the rest of the graph, and averaging over all nodes.

They first looked at "regular" graphs, like the one on the left below. It has high clustering but also with a long path length. Here K=4. A given node has 4 neighbors. Looking at just these 4 neighbors, excluding the initial node, we find there are distinct 3 edges shared among them (N=3). This gives CC= 3/(4*3/2) or 0.5.

Their key finding is showing that by "rewiring" a very few edges to be randomly reconnected, the clustering remains high, while the path length (therefore searchability) dramatically reduces:

This work took an important step toward answering the first question, why are there a small number of hops, by showing how the path length drops through these random links (called weak social links due to being outside the cluster of strong links).

Dynamic Growth and Preferential Attachment

Unfortunately, however, the statistics resulting from the Watts/Strogatz model does not match the observed small worlds. These networks, such as power grids, hollywood actors, web pages, and peer networks, exhibit power-law distribution of edges among the nodes while the W/S model does not.

Barabási and his team found that two techniques resulted in power-law distributions:

Dynamic Construction
Preferential Attachment

The first constructs small worlds graphs dynamically, rather than rewiring a graph in place. Secondly, the rewiring of nodes occurs not randomly, but preferentially attaching to the most connected nodes. They introduced a family of such graphs which start with a small number of disconnected nodes. They then dynamically wire into the graph new nodes of fixed order K, with each edge being wired to existing nodes preferentially according to the number of edges the existing node has.

To the left, the example starts with 3 unwired nodes. It starts adding nodes one at a time, with two edges per new node. The edges are wired to destination nodes preferentially, with probability being proportional to edges already attached to the target node.

The sequence continues until all nodes are added to the graph. The resulting distribution of edges to nodes is power-law.

Local Knowledge Searching: Power-Law Strategy

This left the last of Milgram's questions to answer: how can a small world be searched by the nodes themselves; without external, global knowledge.

The authors of the last paper focused on the power-law nature of small worlds graphs and found that a simple strategy of seeking heavily connected nodes produces surprisingly good results. They include a specific recommendation for Gnutella peer networks that can dramatically improve both the bandwidth management and search time of that system.

They simulated their search algorithm on the data gathered by a crawl of the Gnutella network by Clip2. On a network of approximately 700 nodes, assuming a file exists on only one of the nodes, 50% of the files can be found in 8 steps or less.

Exploring Power-Law Networks

The results above, along with initial findings using the agent simulation system Repast, suggested such theoretical findings represented by these papers would form the core for future P2P infrastructure work. Toward that end, the following Power-Law Exploration was made.

An Exploration of Power-Law Networks Part I: Introduction