Peer Lifespan Trace
The Short Story
Part of our research addresses the problem of highly transient populations in unstructured and loosely-structured peer-to-peer systems (P2P). Due in part to the autonomous nature of peers, their architectured mutual dependency, and their astonishingly large populations, the transiency of peer populations (a.k.a. churn) and its implications on P2P systems have recently attracted the attention of the research community. Measurement studies of deployed P2P systems have reported median session times varying from an hour to one minute. As a reference, given Kazaa's current population of close to 2,300,000, this translates into over 38,000 peers joining and leaving the system every minute! The implications of such degree of transiency on the overall system's performance would clearly depend on the level of peers' investment in their friends. At the very least, the amount of maintenance-related messages processed by any node is directly correlated to the degree of stability of the node's neighboring set. Beyond this, and in the context of data-sharing P2P systems, the level of replication, the effectiveness of caches, and the spread and satisfaction rate of queries will all be affected by how dynamic the peers' population ultimately is.
Through active probing of peers in a widely-deployed P2P system (collecting over half-a-million sessions), we determined that the session time of peers can be well modeled by a Pareto distribution. In our context, this means that the expected remaining session time of a peer is directly proportional to its session's current length. This observation forms the basis for a set of new protocols for peer organization and query-related strategies that, by taking into considering the expected session times of peers (their lifespans [1]), yield systems with performance characteristics more resilient to the natural instability of their environments.
People
- Yi Qiao
- Fabian E. Bustamante
Publications
- Fabián E. Bustamante and Yi Qiao.Friendships that last: Peer lifespan and its role in P2P protocols. In Proc. of the International Workshop on Web Content Caching and Distribution, September-October 2003 (Also published as Tech. Report NWU-CS-03-21).
Resources - Peer Lifespan Trace
Trace Description
To actively measure the lifespan of peers in Gnutella, we modified an open source Gnutella client (Mutella) to both keep track of every peer found and periodically check its availability. Our monitoring peer maintains a hash table, initially empty, of peers it has seen so far. Each entry in the hash table includes fields for (1) IP:port of peer, (2) node type (leaf- or ultra-peer), (3) time of birth (TOB), (4) time when found (TWF), and (5) time of death (TOD).
On each iteration the monitoring peer updates the existing entries and inserts new ones as it finds new peers. Since it only knows with certainty the TOB of previously known and reborn peers, first time found (live) peers are included in the table with only the TWF field set to the current time. A peer is considered dead when a connection attempt fails (i.e. a third try times out, using the default timeout value of 10 seconds) or an unexpected response is received. Please refer to our paper for details on the strategy used for updating peer lifespan information.
A single monitoring peer scanning the whole table will clearly be too slow, resulting in too coarse a granularity for our lifespan measurements. To avoid this we evenly distribute the peer table (based on the hash values of peers) over 20 monitoring peers running across 17 hosts. This approach allows us to achieve a granularity of 1,300 seconds (about 21 minutes), when scanning over 30k to 40k entries per client.
This trace, made available here for the benefit of the open research community, includes over 500,000 peer lifespans measured in the Gnutella network between March 1st and 8th, 2003. To protect sensitive IP information, we have encrypted the trace using Georgia Tech's Crypto-PAn, a prefix-preserving IP anonymizer [2].
[1] We employ lifespan and session time interchangeable. Another metric of transiency sometimes used, lifetime, refers instead to the the time between the node first entering the system and its final departure from it.
[2] J. Xu. J. Fan, M. Ammar. S. Moon. On the Design and Performance of Prefix-Preserving IP Traffic Trace Anonymization, ACM SIGCOMM Internet Measurement Workshop. Nov. 2001.