More Web Proxy on the site http://driver.im/
|
An Exploration of Power-Law Networks
Part I: Introduction
Owen Densmore
Sun Microsystems Laboratories
owen.densmore@sun.com |
|
Abstract
Recent activities within the Peer to Peer (P2P) movement have prompted
interest in the theoretical underpinning of the self organizing networks
at the center of P2P systems. The primary characteristic of these
systems is "local knowledge", application level protocols with little support
within the network (servers, multicast, naming etc.). This paper
reviews a core set of seminal papers which are providing the basis for
understanding the dynamics of these networks. We then present an
exploration of the power-law networks which arise naturally within peer
systems.
Introduction
Peer applications such as the Gnutella decentralized file sharing program,
rely on a self organizing network of peers (hosts) which are easily and
effectively searched for services. They rely on local routing decisions
rather than global, unifying servers. The following four milestone
papers (along w/ many others!) have lead to a deeper understanding of the
ways peer networks operate, and how they may be optimized.
-
The Small World Problem Milgram,
S. May 1967
-
Collective Dynamics of 'Small-World'
Networks Watts, D; Strogatz, S. 1998
-
Mean-field Theory for Scale-Free
Random Networks Barabási, A; Albert, R; Jeong, H. 1999
-
Search in Power-Law Networks
Adamic, L; Lukose, R; Puniyani, A; Huberman, B. Mar 2001
Small Worlds and 6 Degrees of Separation
Stanley Milgram was fascinated by the "small world" concept: you make a
new acquaintance and shortly establish surprising shared connections, ending
in the exclamation "My, it's a small world". Milgram wanted to understand
this at greater depth.
To do so he devised a fascinating social experiment designed to see
if randomly chosen "source" people could find a "target" person using only
a network of friends on a first name basis. Two studies were performed,
one originating in Omaha, NB, the other in Wichita, KA; both targeting
a person in Cambridge, MA.
The study required the source set of people to mail a folder to the
target, who's name, address and some personal data was given. The
rules required the sender to send to a friend if the target was not known.
The friend was chosen to be the most likely to get the folder "closer"
to the target. Each person sending the folder included themselves
in an enclosed roster, thus avoiding loops.
The result: the median number of intermediate friends was 5, the range
being from 2-10.
Thus began the now famous 6-degrees of separation notion; that between
any two people in the US (say between yourself and the President) there
are a small number of links. The Kevin Bacon game, where Hollywood
actors are linked to the actor Kevin Bacon by movies they co-starred in,
is an example of the phenomenon.
Although Milgram gave the problem a formal academic basis, he did not
shed light on why it works:
-
Why is there such a small number of "hops"?
-
How do we find them "from the inside" .. i.e. with local knowledge only?
Rewiring Clustered Graphs
The beginning of the answer to the first question came from the authors
of the second paper; Watts and Strogatz. They started by looking
at graphs and the metrics for graphs common in social networks.
Two key ideas they focused on were Clustering and Path Length.
The first concept is a formal way to state how "cliquish" a graph is ..
how tightly the links of a node in the graph are connected to each other.
In social terms, how many of your friends know each other. The second
idea, Path Length, is how far apart are any two nodes in the graph.
Clustering is measured by the Clustering Coefficient: the number of links
among a node's immediate neighbors to each other, compared to the maximum
number of interconnects they might have. Numerically, if a node has
K neighbors (edges), then
CC=N/(K*(K-1)/2)
where N is the number of edges among neighbors of the node, and
K is the node's edges. The path is found by finding choosing a node,
calculating the median distance to the rest of the graph, and averaging
over all nodes.
They first looked at "regular" graphs, like
the one on the left below. It has high clustering but also with a
long path length. Here K=4. A given node has 4 neighbors.
Looking at just these 4 neighbors, excluding the initial node, we find
there are distinct 3 edges shared among them (N=3). This gives CC=
3/(4*3/2) or 0.5.
Their key finding is showing that by "rewiring" a
very few edges to be randomly reconnected, the clustering remains high,
while the path length (therefore searchability) dramatically reduces:
This work took an important step toward answering the first question, why
are there a small number of hops, by showing how the path length drops
through these random links (called weak social links due to being outside
the cluster of strong links).
Dynamic Growth and Preferential Attachment
Unfortunately, however, the statistics resulting from the Watts/Strogatz
model does not match the observed small worlds. These networks, such
as power grids, hollywood actors, web pages, and peer networks, exhibit
power-law distribution of edges among the nodes while the W/S model does
not.
Barabási and his team found that two techniques resulted in power-law
distributions:
-
Dynamic Construction
-
Preferential Attachment
The first constructs small worlds graphs dynamically, rather than rewiring
a graph in place. Secondly, the rewiring of nodes occurs not randomly,
but preferentially attaching to the most connected nodes. They introduced
a family of such graphs which start with a small number of disconnected
nodes. They then dynamically wire into the graph new nodes of fixed
order K, with each edge being wired to existing nodes preferentially according
to the number of edges the existing node has.
To the left, the example starts with 3 unwired nodes. It starts
adding nodes one at a time, with two edges per new node. The edges
are wired to destination nodes preferentially, with probability being proportional
to edges already attached to the target node.
The sequence continues until all nodes are added to the graph.
The resulting distribution of edges to nodes is power-law. |
|
Local Knowledge Searching: Power-Law Strategy
This left the last of Milgram's questions to answer: how can a small world
be searched by the nodes themselves; without external, global knowledge.
The authors of the last paper focused on the power-law nature of small
worlds graphs and found that a simple strategy of seeking heavily connected
nodes produces surprisingly good results. They include a specific
recommendation for Gnutella peer networks that can dramatically improve
both the bandwidth management and search time of that system.
They simulated their search algorithm on the data gathered by a crawl of
the Gnutella network by Clip2. On a network of approximately 700
nodes, assuming a file exists on only one of the nodes, 50% of the files
can be found in 8 steps or less.
Exploring Power-Law Networks
The results above, along with initial findings using the agent simulation
system Repast, suggested such theoretical findings represented by these
papers would form the core for future P2P infrastructure work. Toward
that end, the following Power-Law Exploration was
made.