Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Characterizing Internet topology, routing and hierarchy
(USC Thesis Other)
Characterizing Internet topology, routing and hierarchy
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CHARACTERIZING INTERNET TOPOLOGY, ROUTING AND HIERARCHY
by
Hongsuda Tangmunarunkit
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2004
Copyright 2004 Hongsuda Tangmunarunkit
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UMI Number: 3140561
INFORMATION TO USERS
The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality illustrations and
photographs, print bleed-through, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
UMI
UMI Microform 3140561
Copyright 2004 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company
300 North Zeeb Road
P.O. Box 1346
Ann Arbor, Ml 48106-1346
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Dedication
To my parents and family
for their love and encouragement throughout the years.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Acknowledgments
I am deeply grateful to my advisor, Ramesh Govindan, for his invaluable advice, guidance,
encouragement and support throughout the years, as well as his patience and understand
ing. I am also thankful to Scott Shenker, for his invaluable suggestions and guidance;
Deborah Estrin for her advice and for giving me opportunities to work with many great
people; Walter Willinger and Sugih Jamin for their collaboration in the topology work;
and finally, Ashish Goel for his comments and feedback on graph theory and algorithms
related to this thesis.
I am thankful to my family, my brothers and sisters for their love and support through
out my years abroad.
I would like to thank Karl Czajkowski for many discussions about optimization tech
niques and computational resources. He also encouraged me and reminded me of my goals
during difficult times. W ithout him, my work would have been much harder. I would
also like to thank many of my friends for their encouragement, discussion, suggestions and
feedback: Chalermek Intanagonwiwat, Sukumol Imudom, Fabio Silva, and many others
both at u s e and ISI.
At different stages, this work was supported by the Defence Advance Research Projects
Agency through the SCAN project at ISI under grant DABT63-98-1-0007 and the Yoid/Yallcast
project at ISI under Cooperative Agreement No. F30602-00-2055.
Ill
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Finally, I would like to thank the Center for Grid Technologies at ISI for the long
term use of their computer cluster and disk space. W ithout their resources, it would have
taken me much longer to finish my simulations. I would also like to thank the University
of Oregon Route View Project {http://www.routeviews.org/) and the National Laboratory
for Applied Network Research (NLANR) {http://www.nlanr.net/) that have provided the
Internet research community with free access to invaluable network-related information.
IV
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table of Contents
D edication ii
A cknow ledgm ents iii
List o f Figures viii
A bstract xii
1 Introduction 1
2 The Im pact o f R outing P olicy on Internet P aths 6
2.1 Introduction................................................................................................................. 7
2.2 Methodology .............................................................................................................. 10
2.3 R esults........................................................................................................................... 15
2.3.1 Inflating Unicast P a th s ................................................................................. 16
2.3.2 Finding d e to u rs.............................................................................................. 21
2.3.3 Path Concentration Around Large A S s ..................................................... 22
2.3.4 M u ltic a s t........................................................................................................ 24
2.4 Sensitivity A nalysis.................................................................................................... 25
2.4.1 Sensitivity to Different S n a p s h o t.............................................................. 26
2.4.2 Sensitivity to More Realistic Policy M o d e l............................................. 28
2.4.2.1 M ethodology.................................................................................. 29
2.4.2.2 R e s u lt............................................................................................... 31
2.5 C onclusions.................................................................................................................. 32
3 Topology C haracterization 33
3.1 Introduction.................................................................................................................. 34
3.2 Related W o rk .............................................................................................................. 39
3.3 Networks ..................................................................................................................... 42
3.3.1 Measured N etworks........................................................................................ 43
3.3.2 Generators ..................................................................................................... 44
3.3.3 Canonical Networks ..................................................................................... 46
3.4 Metrics ........................................................................................................................ 46
3.4.1 Rate of spreading: Expansion ................................................................. 48
3.4.2 Existence of alternate paths: Resilience ................................................. 51
3.4.3 Tree-like behavior: Distortion ................................................................. 51
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.4.4 S u m m a ry .......................................................................................................... 52
3.5 R esults............................................................................................................................ 54
3.5.1 E xpansion.......................................................................................................... 55
3.5.2 R esilience.......................................................................................................... 56
3.5.3 D isto rtio n .......................................................................................................... 57
3.5.4 D iscussion.......................................................................................................... 58
3.5.5 Other m e tric s................................................................................................... 61
3.6 Other Power-Law Network Generators: Does Connectivity M a tte r ? ............ 62
3.6.1 Does Connectivity Matter? ......................................................................... 65
3.6.2 A nsw ers............................................................................................................. 68
3.6.3 Conclusion ....................................................................................................... 68
3.7 D iscussion.................................................................................................................... 68
Other N etw orks 71
4.1 Introduction................................................................................................................. 71
4.2 R esults........................................................................................................................... 73
4.3 Conclusion.................................................................................................................... 74
N etwork Hierarchy 76
5.1 Introduction................................................................................................................. 77
5.2 Related W o rk .............................................................................................................. 79
5.3 Hierarchy M etrics....................................................................................................... 80
5.3.1 Weighted Traversal Set (WTSET) ............................................................ 81
5.3.2 Weighted Vertex Cover (W V C )................................................................... 82
5.3.3 W TSET vs. W V C .......................................................................................... 84
5.4 Results & : D iscussion................................................................................................. 85
5.4.1 Networks .......................................................................................................... 86
5.4.2 Results ............................................................................................................. 86
5.4.2.1 W T S E T ............................................................................................ 87
5.4.22 W V C ............................................................................................... 89
5.4.2.3 S u m m a ry ......................................................................................... 91
5.4.3 V alidation.......................................................................................................... 92
5.4.4 D iscussion.......................................................................................................... 95
5.4.4.1 Weighted Traversal Set versus Weighted Vertex Cover . . . 95
5.4.4.2 Unweighted versus Weighted M etrics........................................ 96
5.5 Hierarchy C h aracteristic........................................................................................... 98
5.6 Correlation between link usage and d e g re e .............................................................100
5.7 Other power-law variant g e n e ra to rs? .......................................................................103
5.7.1 Results ................................................................................................................ 103
5.8 Conclusion........................................................................................................................105
D oes AS Size determ ine degree in AS topology? 107
6.1 Introduction.................................................................................................................... 107
6.2 An Alternative E x p la n a tio n ....................................................................................... 110
6.3 Methodology and R e s u lt s ...........................................................................................112
6.4 C onclusions.....................................................................................................................115
VI
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
7 C onclusions & Future D irections 116
7.1 Conclusions and Contributions ..................................................................................116
7.2 Future D irections...........................................................................................................118
7.2.1 Network Hierarchy A n aly sis..........................................................................118
7.2.2 Inferring backbones.......................................................................................... 119
7.2.3 The Impact of Topology on Protocol Perform ance.................................... 120
7.2.4 Topology A n a ly s is .......................................................................................... 120
7.2.5 Internet Topology M odeling..........................................................................121
7.2.6 The Origin of the AS Power-law Size D istrib u tio n s..................................122
R eference List 123
A ppendix A
Peering Relationship Identification on the AS overlay m a p .......................................... 129
A ppendix B
Results for Other M e tr ic s .....................................................................................................130
A ppendix C
Connectivity Sensitivity A n a ly sis........................................................................................134
C.0.7 re s u lts .................................................................................................................135
A ppendix D
Parameter Space E x p lo ra tio n .............................................................................................. 141
A ppendix E
Policy-induced ball g ro w in g ..................................................................................................143
A ppendix F
Degree vs. S iz e .........................................................................................................................145
VII
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Figures
2.1 Macroscopic comparison of AS overlay and the AS to p o lo g y .......................... 13
2.2 Distribution of Path length difference between node pairs in AS overlay and
AS paths in the routing table dum p......................................................................... 14
2.3 Validating the shortest AS path model for routing policy................................... 15
2.4 Cumulative distribution of inflation r a tio .............................................................. 16
2.5 Cumulative distribution of inflation difference.................................................... 17
2.6 Cumulative distribution of inflation ratio by path le n g th ................................ 18
2.7 Cumulative distribution of inflation difference by path l e n g t h ........................ 18
2.8 Comparison of inflation ratio and difference for traceroutes and for shortest
AS p a t h ......................................................................................................................... 19
2.9 Comparison of inflation ratio and difference for Lucent traceroutes and for
shortest AS p a t h ......................................................................................................... 20
2.10 Detour gain and gain ratio distributions.................................................................. 22
2.11 Cumulative dominance fractions for the top 50 ASs............................................. 23
2.12 The effect of policy on multicast tree s iz e s ........................................................... 24
2.13 Inflation difference and inflation ratio...................................................................... 26
2.14 Inflation difference and inflation ratio with respect to different shortest-path-
lengths.............................................................................................................................. 27
2.15 Inflation difference and inflation ratio by the realistic and simplified routing
policy model................................................................................................................... 31
viii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.1 Table of network topologies used. See Appendix D for a description of
parameters for the generated networks..................................................................... 54
3.2 Degree Distributions for various g rap h s.................................................................. 54
3.3 E xpansion...................................................................................................................... 55
3.4 R esilien ce...................................................................................................................... 56
3.5 D isto rtio n ...................................................................................................................... 58
3.6 Degree Distributions of PLRG-Variant Networks ............................................... 63
3.7 PLRG V a ria n ts ............................................................................................................. 64
3.8 PLRG V a ria n ts ............................................................................................................. 65
3.9 Degree Distributions of various random connectivity methhods using PLRG
degree distribution as the initial distribution........................................................ 66
3.10 Expansion, Resilience and Distortion of various random connectivity net
works using PLRG degree distribution as the initial distribution 67
4.1 Degree Distributions of other real networks ......................................................... 73
4.2 Our metrics for other real netw orks......................................................................... 75
5.1 An illustration of link value com putation............................................................... 82
5.2 Example topology ...................................................................................................... 84
5.3 Table of network topologies used. See Appendix D for a description of
parameters for the generated networks.................................................................... 86
5.4 The normalized weighted traversal set distribution......... ...................................... 87
5.5 The link value rank distribution........................................................ 90
5.6 Top 10 highest links on the AS map based on W T S E T . 93
5.7 Top 10 highest links on the RL map based on W T S E T . 93
5.8 Top 10 highest links on the AS map based on WVC............................................ 94
IX
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.9 Top 10 highest links on the RL map based on WVC............................................ 94
5.10 Correlation between the two m e tr ic s ....................................................................... 95
5.11 The normalized traversal set distribution (x-axis on log s c a le ) ...................... 96
5.12 The link value rank distribution (x-axis on log s c a le ) ...................................... 97
5.13 Fraction of valid p a t h s ............................................................................................... 99
5.14 Correlation between minimum degree and link value based on W TSET . . . 100
5.15 Correlation between minimum degree and link value based on WVC . . . . 101
5.16 The link value (WTSET) rank distribution ...........................................................103
5.17 The link value rank (WVC) d istrib u tio n .........................................................104
6.1 Complementary cumulative distributions of AS s iz e ..................................... 113
6.2 Complementary cumulative distributions of AS d e g r e e ...............................113
6.3 Correlation between size and degree .......................................................................... 114
6.4 Correlation between age and degree............................................................................115
B .l The distribution of eigenvalues of a graph plotted against their rank [26]. . . 130
B.2 The distribution of node diameters. This is a modified version of the graph
diameter metric proposed in [81].................................................................................. 130
B.3 The vertex cover of the subgraphs within balls of size n, as a function of ball
size....................................................................................................................................... 131
B.4 The number of biconnected components within a subgraph defined by a ball
of size n, as a function of ball size................................................................................131
B.5 Figures (a)-(c) depict the attack tolerance [4] of our networks. This measures
the average path-length of the largest connected component when increas
ingly larger fractions of nodes are removed, in order of decreasing degree.
Figures (d)-(f) plot the error tolerance; the average path length when nodes
are removed randomly..................................................................................................... 132
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
B.6 Clustering Coefficient of a subgraph defined by a ball of size n, as a function
of ball size.......................................................................................................................... 133
C.l Degree Distributions of various deterministic connectivity methods using
PLRG degree distribution (5969 nodes and 10001 links) as the initial distri
bution ............................................................................................................................... 135
C.2 Degree Distributions of various random connectivity methhods using PLRG
degree distribution as the initial distribution...........................................................136
C.3 Expansion, Resilience and Distortion of various deterministic connectivity
networks using PLRG degree distribution as the initial d istrib u tio n ................ 136
C.4 Expansion, Resilience and Distortion of various random connectivity net
works using PLRG degree distribution as the initial distribution........................ 137
C.5 Degree Distribution, Expansion, Resilience and Distortion of various random
connectivity networks using a different instance of PLRG degree distribution
(6280 nodes and 10698 links) as an initial d is trib u tio n ........................................ 138
C.6 Degree Distribution, Expansion, Resilience and Distortion of various random
connectivity networks using BA degree distribution (20000 nodes and 99975
links) as the initial d istrib u tio n ................................................................................. 139
C.7 Degree Distribution, Expansion, Resilience and Distortion of various random
connectivity networks using Brite degree distribution (5000 nodes and 9996
links) as the initial d istrib u tio n ................................................................................. 140
D .l Parameters explored for structural g e n e ra to rs....................................................... 142
E .l AS annotated graph with A as the center of the b a l l ......................................... 143
F .l Actors Map: Correlation of size and degree = 0 . 7 7 .............................................. 145
F.2 Actresses Map: Correlation of size and degree = 0 .6 3 ........................................ 146
F.3 Airline Map: Correlation of size and degree = 0 . 8 0 ............................................146
XI
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Abstract
Network researchers have come to rely on artificially generated networks and simulations
for their studies, because it is impractical to analyze protocols on large scale networks.
Topology often has a major impact on the performance of protocols. Therefore, it is
important to produce generated topologies that capture the fundamental characteristics
of real networks. Researchers now have approximations of the real Internet topology: the
Autonomous System (AS) connectivity map and router-level connectivity (RL) map. We
can analyze these maps to gain a deeper understanding of the Internet’s structure, as well
as use them to evaluate existing topology generators.
We use the AS and RL maps for two studies. First we explore the overall efficiency
of the Internet’s routing infrastructure and topology. In particular, we study the impact
of routing policy on Internet paths, finding that routing policy does inflate the length of
Internet paths significantly. Next, we use several topology metrics to characterize the real
maps and several generators. We find that the AS, RL, and many other networks are well
modeled by what we call degree-based generators—a family of generators that focus solely
on generating networks with power-law degree distributions. We then explain this result
by using a few hierarchy metrics to examine the nature of hierarchy in these networks. We
find that degree-based generators produce a form of hierarchy that closely resembles the
hierarchical nature of the Internet. We also find that the hierarchy in the router graph
XU
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
arises from the deliberate placement of links in the Internet while in the AS graph the
hierarchy is due to the long-tailed nature of AS node degrees.
X lll
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 1
Introduction
The Internet is rapidly growing in size and functionality. It is costly, and often impractical,
to test or study protocols on large-scale networks. Therefore, many network researchers
have come to rely on artificially generated networks and simulations for their studies. In
simulation-based studies, there are two fundamental components—the routing protocols
and the artificially generated networks. A routing protocol is used to determine how
packets should be forwarded between any two nodes in the network. A generated or
systhesized network is a representation of a network of interest. In the simulations, the
routing models and network generators are normally chosen to reflect the existing routing
protocols that are being deployed in the Internet, and to capture the fundamental properties
of the Internet topology, respectively. These models often impact the performance, and
therefore conclusions of the studies. But what is the impact of these models on simulation
studies and how well do these models represent the real deployed Internet? We attem pt to
address these questions in this thesis.
Most simulation-based studies often use a shortest path routing protocol due to its
simplicity and the lack of more realistic routing models. However, routing in the Internet
is based on hierarchical routing, or specifically on policy-based routing. Depending on its
1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
policy, each administrative domain or Autonomous System (AS) can indedepdently select or
advertise its own routes. A path that a packet takes is normally determined by both intra
domain routing protocols—for forwarding packets within one administrative domain—and
inter-domain routing protocols—for determining how packets should be forwarded across
multiple domains. Policy-based routing trades off the path performance (in terms of hop-
count) with scalability and the ability for each administrative domain to exercise its own
policy. Despite the common acceptance that policy routing inflates the shortest router-hop
path, there has not been much work to quantify the performance of policy-based routing
on Internet paths or how policy-based routing would impact results in simulation study.
We further investigate this issue in Chapter 2 of this thesis.
Although network topology should have no effect on the correctness of network proto
cols, topology sometimes has a major impact on the their performance. For this reason,
it is important to generate realistic network topologies, i.e., topologies that embody the
fundamental characteristics of real networks. Because the fundamental characteristics of
the Internet topology were unknown, many synthesized topologies were generated based
on assumptions about the Internet structure [12, 20], for example, based on the assump
tion that the Internet is a network of networks that is arranged in a hierarchical manner.
Whenever new findings or discoveries were revealed, many new models were proposed to
accommodate the new findings. For example, since Faloutsos et al. published their paper
reporting that the degree distribution of Internet topology follows a power-law in 1999,
there have been many new generators that were primarily designed to generate networks
matching the Internet’s detree distribution. Due to this process, we now have many models
for synthesizing the Internet topology, each of which emphasizes different characteristics
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
of the Internet. However, many fundamental questions related to network topologies still
remain unanswered. For example: W hat are the fundamental characteristics of a real net
work such as the Internet? How realistic are the current network models being used in the
Internet community, comparing to the Internet topology? And what is the impact of these
characteristics to protocol performance? We address some of these questions in Chapter 3,
4, 5, and 6.
The preceding questions cannot be answered without access to snapshots of Internet
topology. Thanks to advances in Internet topology discovery techniques and considerable
effort expended in collecting the Internet data, researchers now have access to two repre
sentations of the Internet topology: the Autonomous System (AS) connectivity map and
the router-level (RL) connectivity map. As a result, we can now analyze these two maps to
gain a deeper understanding about the efficiency of the Internet protocols and the structure
of the Internet. In this thesis, we use the AS and RL maps for two studies.
Our first study involves understanding the overall efficiency of the Internet’s routing
infrastructure by analyzing the impact of routing policy on its paths. Using a simplified
model of routing policy in the Internet, we obtain approximate indications of the impact
of policy routing on Internet paths. Our investigation in Chapter 2 reveals that routing
policy affects the length of Internet paths significantly. For example, we find that 80% of
the Internet paths are inflated by policy, i.e., that the router-level path-length of these
policy paths are longer than the shortest paths. For such paths, 20% of paths are inflated
by more than 50%. There are some paths that are inflated by a factor of four, and some
are inflated by 25 hops. We also find that, for 50% of the policy paths, there exist superior
detour paths {e.g., there exists an intermediate node such that the composite path through
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the intermediate is shorter than the policy path). We confirm that these observations hold
for different snapshots of Internet topology and a more sophisticated routing policy model.
In our second study, we investigate the topological structure of the Internet by analyz
ing large scale characteristics—such as network resilience and hierarchy—of the Internet
topologies and other networks. In particular, we use several topology metrics (Chapter 3) to
compare syntesized networks generated by three families of generators with many real and
well-known networks. The three families of network generators are the random-based gener
ators that generate networks based on a variant of the classical random network model, the
structural generators that primarily focus on generating networks with the well-accepted
hierarchical structure of the Internet, and the degree-based generators that focus entirely
on generating networks with the recently observed power-law degree distribution of the
Internet. Surprisingly, our investigations in Chapter 3 and 4 reveal that despite the long-
held belief that the Internet topology is hierarchical, the large scale characteristics of the
Internet maps and many other real networks are well-modeled by degree-based generators.
In Chapter 5, we investigate this seeming paradox by proposing two metrics to measure
the level of network hierarchy in these networks. According to our metrics, we discover
that both the AS and RL maps, as well as the degree-based networks have moderate levels
of hierarchy. The hierarchies of the degree-based networks and the AS map are due to
the long-tailed nature of their degree distributions. In contrast, the moderate hierarchy
in the Internet RL map, similar to the structural network, is a result of deliberate link
placements. Our conclusions hold regardless of the routing protocols that are used in the
analysis, i.e., regardless of whether we use shortest-path routing or policy routing.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Finally, in Chapter 6, we extract AS-overlay graphs from different snapshots of Internet
router-level topology to further analyze the AS size distribution, where size is defined by
the number of routers in one AS. We find that the distribution of AS sizes exhibits a power-
law distribution and there is a strong correlation between AS size and degree. Due to the
ubiquity of highly-variable size distributions {e.g., power-law size distribution) in many
real-world entities such as file size, we conjecture that the power-law degree distribution in
the AS topology simply follow froms its power-law size distribution.
This disseration is organized as follows. Chapter 2 contains a summary of our work on
the impact of routing policy on Internet paths. Chapter 3 details our topology metrics and
the topology characterization. We apply the same topology metrics to measure large scale
properties of other real networks and present the results in Chapter 4. The analysis of
network hierarchy is presented in Chapter 5. Chapter 6 explains our conjecture regarding
the origin of the AS power-law degree distribution and the rational behind it. Each chapter
contains its own topically related work. Finally, Chapter 7 concludes this thesis.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 2
The Impact of Routing Policy on Internet Paths
The impact of routing policy on Internet paths is poorly understood. In theory, policy
can inflate shortest-router-hop paths. To our knowledge, the extent of this inflation has
not been previously examined. Using a simplified model of routing policy in the Internet,
we obtain approximate indications of the impact of policy routing on Internet paths. Our
findings suggest that routing policy significantly impact the length of Internet paths. For
instance, in our model of routing policy, some 20% of Internet paths are inflated by more
than five router-level hops. In this chapter, we present our initial findings of the impact
of policy on Internet paths. In particular, our findings reveal answers to the following
questions: How does policy based routing affect Internet paths? For a source-destination
pair, does there exist a detour path and how good is the best detour path comparing to the
policy path? And, does policy routing funnel Internet paths through larger ASs? Finally,
we show at the end of the chapter that our findings also hold with respect to different
snapshots of Internet topology and a more sophisticated routing policy model.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.1 Introduction
The earliest internet routing protocols attem pted to construct lowest delay paths to destina
tions [42]. Thereafter, based on operational experience with the stability of delay-sensitive
routing [37], deployed routing protocols evolved to essentially support global shortest hop-
count routing [23].
Today’s Internet contains several administrative domains (or Autonomous Systems,
ASs). W ithin a domain, routing uses hop-count as a metric, but because intra-domain
routing protocols support hierarchies, the resulting paths are not necessarily shortest in
terms of hops. Routing between domains is determined by policy. Each autonomous
system (AS) can, based on configured policy, independently select routing information
from its neighboring ASs, and selectively propagate this information. These policies are
not expressed in terms of hop-distance to destinations. Depending on how these policies are
constructed, the resulting policy-based paths to destinations may incur more router-level
hops than shortest-router-hop path routing.
In this chapter, we ask the question: By how much does this hierarchical (inter/intra
domain) form of routing affect Internet paths? This question was motivated by recent
work [62] that observed that, for a significant fraction of Internet paths, there existed an
intermediate node such that the composite path through the intermediate exhibited better
performance (delay, throughput). In other words, routing in the Internet does not result in
delay- or throughput-optimal paths. Perhaps this anomaly can be rectified by changing the
Internet’s routing infrastructure to be delay or load sensitive. Before we do this, however,
it would be appropriate to understand how much of these observations can be explained
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
by the fact that routing hierarchy and policy can result in longer hop paths. Our work
takes the first step towards this goal. Understanding this question can also be important
for understanding the overall eflficiency of the Internet’s routing infrastructure. Finally, an
answer to this question can also inform protocol evaluation studies which typically assume
shortest-router-hop path (henceforth, shortest path) routing.
To understand how policy routing affects Internet paths, we use a simplified model of
inter-AS routing policy that we call shortest A S path (Section 2.2). Even though, in theory,
routing policy can be completely arbitrary, many—but not all—existing routing policies
are based on shortest AS paths. To infer the router-level path corresponding to this policy,
we first begin with a router-level map of the Internet. On this map, we assign routers to
ASs and obtain an A S overlay on top of our router-level map. This construction enables
us to compare the router-level policy path between any two nodes with the shortest path
on the map. Each of the steps in our construction represents a simplification of reality. As
such, then, our results are only approximate indications of the impact of policy on Internet
paths. However, at each stage, we carefully validate our construction using a collection of
actual traceroutes that represent real paths generated by policy-based routing. This gives
us some confidence that our conclusions are meaningful.
We find several surprising results (Section 2.3). On average, about 20% of Internet
paths are inflated by 50% or more. We also find that about half of the source-destination
pairs benefit from a detour. For these pairs, there exists an intermediate node—a detour—
such that the overall policy path length through this intermediate node is less than the
policy path between source and destination.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
To our knowledge, no related work has addressed the impact of routing policy on
Internet paths. Our work, however, complements many pieces of recent work aimed at
understanding the structure of the Internet, and the properties of its paths.
• Several efforts have focused on discovering router level topologies of the Internet [11,
18, 31]. Such mapping efforts are the crucial first step to help us understand the
impact of routing policy.
• Other work has empirically studied the availability characteristics of paths [53], the
loss and packet delivery performance of paths [54], and the existence of alternate paths
with lower delay or higher throughput [62]. By considering hop-distance between
network nodes, our work examines the potential inefficiencies resulting from policy
routing.
• Finally, more recent work has looked at macroscopic properties of the inter-AS topol
ogy [25, 57]. By relating the AS structure to the underlying router-level map, our
A S overlay may be able to explain some of the observed macroscopic properties of
the inter-AS topology in terms of the underlying physical structure.
To place our work in context, we point out two important caveats. First, it is well
known that hierarchical routing can result in non-optimal paths [39]. Our paper quantifies
the extent to which hierarchical routing in the Internet, together with routing policy, affects
paths. Second, the correlation, if any, between path length and end-to-end delay is poorly
understood. As such, then, our results cannot be directly extrapolated to observed delays
on Internet paths. Nevertheless, our results are interesting since path hop count is the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
yardstick by which today’s operational routing protocols are measured. W hether hop-
count is the right metric for routing protocols is an orthogonal question that we do not
address.
2.2 M ethodology
Our first step in understanding the impact of routing policy on Internet paths was to
obtain an Internet map. We used the Mercator program [31] for this purpose. Mercator
uses hop-limited probes—the same primitive used in traceroute—to infer a router-level
Internet map. It employs several heuristics to obtain as complete a representation of
Internet topology as possible. One heuristic, informed random address probing, carefully
explores the IP address space for addressable routers and hosts. Mercator also exploits
source-route capable routers wherever possible to help discover cross-links and thereby
enhances the fidelity of the resulting map. Finally, it implements a technique for resolving
interface aliases (interfaces belonging to the same router).
The map we used was collected between March 26, 2000 to April 10, 2000. Our map
has 102,639 nodes and 142,303 links. This map is a smaller than the maps reported in [31].
Despite this, we believe we have, as [31] does, captured the transit portion of the Internet
core, where policy impacts paths. In addition to the Internet map, we also collected the
61,485 traceroutes used in inferring the map. Because these traceroutes represent actual
policy paths, we were able to use them to validate our policy model (see below).
Next, we attem pted to compute an A5 overlay on top of this Internet map. To do so,
we assigned an autonomous system (AS) number to each router in our Internet map. For
10
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
this, we used a BGP routing table from a publicly accessible route server^. For a given
router interface address, we found the matching route entry in the BGP routing table. We
then assigned that router to the origin AS in the AS path associated with the route entry.
This technique works for globally routed addresses; [31] describes situations where some
ASs number their routers from private address space. These addresses represent poten
tial inaccuracies in our AS overlay computation. There were 3210 such private interface
addresses in our map; to these we assigned a designated unused AS number.
Not all IP addresses had matching entries in the BGP routing table. Furthermore, in
the routing table, route aggregation can mask the actual origin AS. For these reasons, we
also used the RADB^ to determine the origin AS. Finally, if a router had many interfaces
corresponding to different AS numbers, we picked the most frequently assigned AS for that
router. In spite of using these two sources of information, we were unable to resolve the
AS numbers for 497 non-private IP addresses. To these we assigned another designated
unused AS number.
After assigning each node an AS number, we then applied a simple collapsing algorithm
to generate the AS overlay. The collapsing algorithm recursively marks neighbors belonging
to the same AS with the same “color”. Each color represents a node in the AS-level map.
However, due to our incomplete information (both in the Internet map [31] and in AS
number assignment), there were many disjoint clusters of nodes belonging to the same AS.
In most cases, we found that such ASs normally have one large component with many small
components each with a small number of nodes. We solved this problem by identifying the
houte-views.oregon.ix.net, data used with permission from David Meyer,
^whois.radb.net
11
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
biggest cluster of each AS and re-assigning the smaller components (about 20,000 nodes
total) to the topologically nearest AS. The resulting AS overlay consists of 2,662 nodes and
4,851 links.
Clearly, there are several sources of error in both the collected map, and in our AS
overlay generation technique. The best we could have hoped for is that our techniques
result in a first-order approximation of the actual inter-AS overlay. To verify this, we
compared some macroscopic properties [25] (such as the degree distribution, degree rank
distribution and hop-pair distribution) of the resulting AS overlay to that of an AS level
topology inferred from BGP routing table collected on April 10, 2000^ (Figure 2.1). Notice
that these macroscopic properties are in qualitative agreement.
As an additional validation, we compared the collection of AS paths in a BGP routing
table dump with the shortest AS path between the two corresponding nodes in our AS
overlay. Figure 2.2 shows the distribution of the path length difference between node pairs
in our map and the routing table dump. About 93% of node pairs in our overlay are within
one AS hop of the corresponding path in the BGP routing table.
The final step in our methodology is to select a plausible model for Internet routing
policy. In general, there exist two kinds of inter-ISP relationships, a provider-customer
relationship and a peering relationship. To our knowledge, the prevalent and widespread
routing policy practice is that an ISP picks the shortest AS path based route for customer
and peer supplied routes. Furthermore, an ISP only propagates its customer supplied
routes to its peers, never routes supplied by other peers. This latter rule can result in
AS-level paths not corresponding to shortest AS path. In the absence of information about
®This AS topology has 7,306 nodes and 14,707 links.
12
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
10000
AS overlay —
ASmap(04/10/00) —
1000
I
100
100 1000 10000 10
Degree
(a) Degree Distribution
1000
AS overlay
ASmap(04/10/00)
100
Q
1000 10000 10 100 1
Degree Rank
(b) Degree Rank Distribution
10000
1
1000
I
100
I
AS overlay
ASmap(04/10/00)
10
Hop
(c) Hop-pair Distribution
Figure 2.1: Macroscopic comparison of AS overlay and the AS topology
13
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
< 0.6
Q 0,4
5 0.3
- 4 - 3 - 2 - 1 0 1 2 3 4
Difference (Hops)
Figure 2.2: Distribution of Path length difference between node pairs in AS overlay and
AS paths in the routing table dump.
the exact nature of inter-ISP relationships, we use a simplified model in which the policy
path between two ASs is determined by the shortest AS path between them.
Clearly, this is only an approximate model of routing policy. We validate this model by
comparing the length of AS path corresponding to each traceroute in our collection, with
length of shortest AS path between the corresponding source and destination. Figure 2.3
shows the pathlength comparison between the policy paths and the shortest AS paths of the
collected traceroutes. We found that the shortest AS path underestimates the traceroute
AS path by 1 AS hop or less for 70% of the traces, and less than 2 hops for 95% of the
traces. Though the difference seems small, it doesn’t represent spectacular agreement with
our model since many of the AS paths are relatively small (5 hops or so). Nevertheless,
this validation (and another described in Section 2.3.1) is good enough to encourage us to
pursue our initial understanding of the impact of routing policy on Internet paths.
14
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2 0.6
Q
i 0.4
S 0.1
-1 0 1 2 3 4 5
Difference (AS hops)
Figure 2.3: Validating the shortest AS path model for routing policy.
2.3 Results
Having a model for AS policy, we can now compute, for any given pair of nodes in the
Internet map, the ronter-level path between them as determined by routing policy. We call
such a path the policy path. Furthermore, we can also compute the router-level shortest
path between the same pair of nodes.
Being thus empowered, we analyze the impact of routing policy on Internet paths by
asking the following four different questions. These questions are all complementary ways
of looking at how routing policy skews paths in the Internet.
• By how much does policy inflate unicast paths? That is, how different is a policy
path from the corresponding shortest path?
• For nodes A and B in the Internet map, does there exist an intermediate node I such
that the sum of the policy paths from A to / and from / to B is less than the policy
path from A to E l If such an intermediate exists, A and B can circnmvent routing
and communicate using fewer hops via /.
15
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Does policy routing funnel Internet paths through larger ASs?
• How does policy routing impact multicast tree sizes?
Before we discuss our results, a couple of caveats are worth mentioning. First, unless
otherwise specified, all our results below are derived from our policy model, not from the
traceroutes. The model enables us to compute policy paths for arbitrary source destination
pairs. Second, our definition of an “Internet path” implicitly considers paths between each
pair of nodes in the map equally likely. In practice, for example, there may not exist any
end-to-end conversations between two backbone routers. To more accurately model the
distribution of Internet paths, it is necessary to have an estimate of how many “hosts” are
attached to each router, an estimate we do not have.
2.3.1 Inflating U nicast Paths
0.9
# & 7
iS 0.6
■ Ë 0.5
0.4
Û 0.3
0.2
0.1
3 3.5 4 1.5 2 2.5 4.5 5
Inflation Ratio
Figure 2.4: Cumulative distribution of inflation ratio
16
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Our first examination of the impact of routing policy considers the difference between
policy paths and their corresponding shortest paths. Specifically, we look at two met
rics. The first, inflation ratio measures the ratio of the length of a policy path to the
length of the corresponding shortest path. Figure 2.4 plots the cumulative distribution of
this metric. This figure was obtained by computing the inflation ratio for each random
source-destination pair in our Internet map, a total of 100,000 random pairs. It shows a
quantitatively surprising impact of routing policy. Some 20% of Internet paths are inflated
by more than 50%. Some policy paths are inflated by a factor of nearly 4. Finally, for only
a fifth of the paths does the policy path length equal the length of the shortest path.
.9
1
6 0.6 -
I ■
~ 0.4 -
Ü 0.3 -,
0.2 •
0 10 15 25 30 5 20
Inflation DitTerence (hops)
Figure 2.5: Cumulative distribution of inflation difference
A second metric, the inflation difference, provides an alternative view of the impact of
routing policy. This metric represents the absolute difference, in terms of the number of
router hops, between the policy path and the shortest path. Figure 2.5 plots the cumulative
distribution of inflation difference. For nearly 20% of the node pairs, the policy path is
longer than the shortest path by more than 5 hops. Furthermore, there exist some node
pairs for which the policy path is longer by 25 hops than the shortest path.
17
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
g
Shortest Path Length = 5
Shortest Path Length = 10
Shortest Path Length = 15
Shortest Path Length = 20
2.5 3 3.5
Inflation Ratio
Figure 2.6: Cumulative distribution of inflation ratio by path length
Perhaps a more reasonable way to consider the impact of routing policy might be
to evaluate inflation for node pairs whose shortest paths are of the same length. Thus,
Figure 2.6 plots the cumulative distribution of the inflation ratio for four different shortest
path lengths. Not surprisingly, in a distribution sense, longer paths are less inflated in
proportion to their lengths—in general, shorter paths have more “room” for inflation than
longer paths. Furthermore, the smaller lengths (notably length 5) have a significantly long
tail.
0.9
§
0.7
0.6
0.5
0.4
Û 0.3
0.2
Shortest Path Length = 5 —
Shortest Path Length =10 — -
Shortest Path Length = 15
Shortest Path Leiigth = 20 -
0 10 15 25 30 5 20
Inflation Difference (hops)
Figure 2.7: Cumulative distribution of inflation difference by path length
18
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Finally, Figure 2.7 depicts the inflation difference for different shortest path lengths.
This shows an interesting trend, namely that longer paths are more absolutely inflated
than shorter paths. The exception to this observation is the path length 20. We conjecture
that the explanation for this exception is that really long paths have less absolute “room”
to “grow”. This observation has interesting consequences that we explore in Section 2.3.2.
0.9
.1
I
.g
u
0.2
S h o rte st A S P a th P o lic y M o d e l
T ra c e ro u te s
3 3.5 1.5 2 2.5 4 4.5 5
In fla tio n R a tio
0.9
0.8
.1
• Ë
5 0.3 -,
0.2 ;*
S h o rte st A S P a th P o lic y M o d e l — '—
T ra c e ro u te s *
10 15 25 30 0 5 20
(a) Inflation Ratio
In fla tio n D itT eren ce (h o p s)
(b) Inflation Difference
Figure 2.8: Comparison of inflation ratio and difference for traceroutes and for shortest AS
path
A subtle point that underlies the results presented in this and the next sections is
that, for a given node pair, there may exist many different “shortest” AS paths. The
corresponding router-level policy paths for each possible shortest AS path may be widely
different. For example, if shortest AS path X traverses larger ASs than shortest AS path
Y, one might expect that the policy path corresponding to X might be longer than that
for y . B e c a u se it is c o m p u ta tio n a lly d ifficu lt to en u m era te all p o ssib le sh o rtest A S p a th s,
all results in this chapter were derived by essentially randomly selecting shortest AS paths.
19
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Does this discrepancy affect our conclusions? One way to answer this question is to
compare the inflation ratio and difference distributions for our collection of traceroutes
(Section 2.2) with the distributions presented above. Figure 2.8 does this. Our shortest AS
path computation yields more conservative inflation ratios, and similar inflation differences,
to the traceroute data. We also tried other approaches to select shortest AS paths, and
found that most of the results presented in this chapter are relatively insensitive to the way
we pick the shortest AS path. Finally, we also compared our inflation ratios and differences
to those obtained from a different mapping effort [16].^ Figure 2.9 shows that our policy
model results in fairly conservative inflation ratios and differences.
/
w S h o rte st A S P a th P o lic y M o d e l
S C A N T ra c e ro u te s
C h e s w ic k T ra c e ro u te s
2.5 3 3.5
In fla tio n R a tio
4 4.5 5
0.9
.1
[£ 0.6
«
a 0.4
tj 0.3
0.2
S h o rte st A S P a th P o lic y M o d e l —
S C A N T ra c e ro u te s —
C h e s w ic k T ra c e ro u te s •
0 10 15 20 25 30 5
(a) Inflation Ratio
In fla tio n D iffe re n c e (H o p s)
(b) Inflation Difference
Figure 2.9: Comparison of inflation ratio and difference for Lucent traceroutes and for
shortest AS path
'‘To measure the inflation metrics on this collection of traceroutes, we need to create an instance of
Internet map corresponding to the traceroute collection. For this, we applied the alias resolution technique
described in [31] to resolve IP addresses of the same node, then synthesized the Internet map.
2 0
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2 Finding detours
We have seen that longer policy paths are generally more inflated in the number of hops
than shorter ones. This gap motivates the questions we examine in this section: for what
fraction of node pairs do there exist better alternate paths (called detours) and by how
much are these detours better than the policy paths?
Before we answer these questions, we more carefully define a detour. Consider a pair
of nodes A and B. We say that there exists a detour between A and B if there exists an
intermediate node I such that:
• I lies in a different AS than A or B,
• the AS path from A to 7 and from / to B is collectively longer than the shortest AS
path between A and B, and
• the sum of the router-level policy paths between A and I and between I and B is
less than the policy path between A and B.
Intuitively, a detour represents a way to circumvent routing by relaying communication
between A and B at the application level.
To answer the above questions, we randomly generate a large number of source-destination
pairs in which source and destination are selected from different ASs. For each source-
destination pair, we find a policy path and the best detour path (if there exists one), then
measure the difference between the two paths. We define two metrics for quantifying de
tours. The detour gain is the absolute difference, in router-level hops between the policy
path and the best detour path. We define the gain to be zero if there exists no detour for
21
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
that particular policy path. The detour gain ratio is the ratio between the gain and the
length of the policy path between source and destination.
0.9
.§ 0.7
0.6
I
0.5
I 0.4
3 0 - ^
0.2
10 20 25 0 5 1 5
Gain (Hops)
(a) Detour Gain
0.9
0.7
0.6
I
.Ë
0.5
•3 0.4
6 03
0.2
0 0.4 0.5 0.6 0.1 0.2 0.3 0.7
Gain Ratio
(b) Detour Gain Ratio
Figure 2.10: Detour gain and gain ratio distributions.
Figure 2.10 shows the cumulative distribution of the two metrics. Surprisingly, for 50%
of the sampled paths, there exist superior detour paths. Furthermore, 20% of the sampled
paths have a detour that is 3 hops or more less than the corresponding policy path. An
alternative result is that 20% of the paths have a detour that is more than 20% shorter.
Our finding is in quantitative agreement with that of [62] ; their findings were a more direct
measure of path performance whilst ours is based only on path length. To what extent our
finding is an explanation for theirs is unclear.
2.3.3 P ath C oncentration Around Large ASs
The previous two sections have studied the impact of routing policy on Internet paths.
In this section, we look at a slightly different question: does routing policy force Internet
paths through larger ASs? This question is one aspect of a larger, more general, question:
22
Reproduced wiffi permission of ffie copyrigfit owner. Furtfier reproduction profiibited witfiout permission.
Is topological connectivity rich enough that the logical connectivity imposed by policy
routing significantly skews the paths Internet traffic takes?
To study this question, we define for each node pair the dominant A S to be the largest
(by size®) transit AS encountered in the path between the two nodes. For each AS, we then
define a dominance fraction; the fraction of node pairs for which that AS is the dominant
one. We are interested in the correlation between size and the dominance fraction. In
computing our dominance fraction, we do not consider node pairs which lie within a single
AS, or in adjoining ASs. These, by definition, do not have any transit ASs between them.
Shortest AS Path Policy Model
Shortest Path
5 10 15 20 25 30 35 40 45 50
AS rank (by size)
Figure 2.11: Cumulative dominance fractions for the top 50 ASs.
We measure this metric both for our policy approximation, as well as for shortest
router-level paths. Figure 2.11 shows that, regardless of whether policy routing is used, or
shortest path routing is used, the top 15 ASs by size dominate about 90% of the paths. This
number is surprising not only in its magnitude, but also in the absence of any qualitative
difference in dominance correlation between policy routing and shortest path routing.
®An alternative way to define a dominant AS is by its degree in the AS topology. We find that the
largest ASs by size are well-correlated with the largest ASs by degree. It is therefore not surprising that
when we use this alternative definition for dominant AS, our results do not change qualitatively.
23
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In Section 2.3.1, we discussed the existence of multiple shortest AS paths. Unlike other
results, it turns out that the dominance correlation is at least quantitatively affected by
our method of selecting the shortest AS path. In particular, if instead of picking larger ASs
to explore in our breadth first search for shortest AS paths, we choose smaller ASs, then,
we find the dominance correlation curve to be different. Even by pessimally picking the
shortest AS path, we find that almost 70% of the paths are dominated by one of the top 25
ASs. We also checked if our results were sensitive to the method of selecting router-level
shortest paths. They were not.
2.3.4 M ulticast
A final question we look at is the effect of policy on multicast tree sizes. For this, we assume
a simple random source and receiver placement, and compute source-rooted multicast trees
using shortest paths and policy paths. We then compare the relative sizes of the shortest-
path tree and the policy tree.
1.35
Multicast Tree Size
Unicast Path Length
1.25
I
1.15
1.05
0 20000 40000 60000 80000 100000 120000
Figure 2.12: The effect of policy on multicast tree sizes
24
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 2.12 shows the ratio between the size of a policy tree and the corresponding
shortest-path tree as a function of the number of receivers. In the same figure, we also
show the average ratio between unicast path length of a policy path versus a shortest path
from the source to all receivers. As expected, the average unicast ratio is independent of
the number of receivers in the group. This number is consistent with the average inflation
ratio shown in Figure 2.4.
However, for a small number of receivers, the policy trees are actually larger by about
30% than shortest-path trees. As the number of receivers increases, policy trees continue
to grow larger than shortest-path trees. This result is somewhat counter-intuitive since
one might expect more path sharing with policy routing between receivers in the same AS,
and hence smaller policy trees. However, the reduction of path sharing is probably offset
by the overall increase in path length due to inflation.
2.4 Sensitivity Analysis
How sensitive are our results to a) the particular snapshot of the Internet we use, and b)
the assumption that shortest AS path is a reasonable representation of Internet routing
policy? In this section, we re-examine our results from Section 2.3.1 using a different
Internet topology snapshot, and using a more refined routing policy model that does not
violate peering arrangements. We find that our prior observations regarding the path
inflation due to routing policy appear to hold both across snapshots (and therefore time)
and with respect to a more sophisicated model of routing policy.
25
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.4.1 Sensitivity to Different Snapshot
To study the degree of Internet path inflation across time, first we needed to collect another
snapshot of the Internet topology. We did this by running a slightly more optimized version
of Mercator software [31]. The map was collected between May 01, 2001 to May 06, 2001,
approximately a year after the previous map was collected. It contains 170,589 and 215,385
links. On this map, we computed the AS overlay as described in 2.2, and verified that its
large-scale properties were qualitatively consistent with that of the AS map obtained from
BGP routing table dumps.
0.9
i
0.7
0.6
.Ë
0.4
o
0.3
0.2
Intem et_M ap(050601 )
Intem et_M ap(041000)
I 2 6 7 3 4 5
k 0.6
Intem et_M ap(050601 )
Intem et_M ap(041000)
(a) Inflation Ratio
0 5 10 15 20 25 30 35 40
D ifference (R outer H ops)
(b) Inflation Difference
Figure 2.13: Inflation difference and inflation ratio.
Figure 2.13 shows the cumulative distribution of the paths with respect to the inflation
difference and ratio. We have included the cumulative distribution of inflation with respect
to our previous study for comparison. Again, the two data sets were obtained from two
snapshots of Internet topology that are approximately one year apart. We observe that the
Internet path inflation with respect to the older map, when compared to the newer map.
26
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
is more conservative according to the Inflation difference and is approximately the same
according to the inflation ratio. Moreover, we also find that the average path-length on the
newer map is longer than the average path-length on the older map. This is possibly due to
the bigger size of the newer map, i.e., the new map is about 50% larger than the older one®.
Therefore, it is not so surprising that paths on the newer map reveal a greater inflation
difference than the older map. Nevertheless, the cumulative distributions of inflation ratio
are approximately the same.
0.9
.1 0.7
9
0.6
1
0.5
0.4
u
0.3
P athlength=10 -
P athlength= I5 -
P athlength=20 -
Pathlength=25
0 .2 . é
2 3 3.5 1.5 2.5
0.9
0.8
0.7
0.6
0.5
0.4
0.3
P athlength= 10 — i —
P athlength= 15 —x
P athlength= 20 ....
PaA length= 25 - - - s -
0.2
0.1
0 1 0 15 20 25 30 5
(a) Inflation Ratio
D ifference (R outer H ops)
(b) Inflation Difference
Figure 2.14: Inflation difference and inflation ratio with respect to different shortest-path-
lengths.
Finally, in the previous study, we also observed that the longer paths, when compared
to the shorter ones, are more inflated in terms of the absolute difference but are less inflated
in proportion to their lengths. Figure 2.14 shows the inflation difference and ratio of paths
w ith re sp ec t to d ifferent sh o r te st-p a th -le n g th . W e n o tic e th e sim ilar b eh a v io r in this data
set as well. Thus, our recent findings seem to suggest that our observations regarding the
®Whether this is the result of the growth in the Internet or a better router discovery technique revealing
more information remains an open question.
27
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Internet path inflation due to policy may hold across different snapshot of the Internet and
time.
2.4.2 Sensitivity to M ore R ealistic P olicy M odel
In our prior work, we proposed a simplified policy model, namely shortest AS path, for
policy-based routing. This policy model can be inaccurate and can violate provider-
customer relationships. Therefore, in this section, we improve the accuracy of this simplified
model by computing the shortest AS path that does not violate policy. We call this model
the shortest valid AS path routing. This improved model will allow us to examine the
sensitivity of our conclusions in Section 2.3.1 to a more accurate routing policy.
Our shortest AS path policy model suffers from one important drawback. Although
the model enables us to study router-level paths between any two nodes in the network,
it doesn’t take into consideration the peering relationship among AS nodes. As a result,
some of the generated paths may not be realistic in the Internet context, i.e., they may
violate peering relationships by transiting through a stub domain in between two transit-
domains and hence are considered invalid. As an example, an AS path traversing through
MCI-USC-SPRINT is an invalid path since USC is a customer of MCI and the packets
between the national ISPs should never transit through one of their customers.
Fortunately, recent work by Gao [28] has described a more realistic technique for infer
ring AS peering relationships, e.g., provider-customer, peer-peer or sibling-sibling relation
sh ip . T h eir w ork m ak es tw o a ssu m p tio n s; 1) th a t th ere is a stro n g correlation b etw e en th e
AS degree and AS size, e.g., an AS with larger degree {i.e., AS with many connectivities
to its peers) is a bigger AS domain in size, and 2) that the AS paths are hierarchical. They
28
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
assume that one signature of a hierarchy is that paths may go up, down, or up and then
down the hierarchy. A path connecting two regional ISPs must traverse up the hierarchy
to the national ISP, then the two national ISPs exchange traffic and traffic goes down the
hierarchy to the destination ISP. They apply these two assumptions to classify the types
of AS paths or routes that can appear in BGP routing tables [48, 27] and then infer the
peering relationship based on this classification. To proceed with our study, we need to
annotate our AS overlay map with peering relationships. Once the annotated AS overlay
map is obtained, we can improve our policy model to consider only valid shortest AS paths
in our study of impact of policy routing on Internet paths.
2.4.2.1 Methodology
How do we annotate our AS overlay map with peering relationships? In another words,
how do we determine whether a link is a provider-customer, peer-peer link or sibling-sibling
link [28] on our AS overlay? Though Gao’s algorithm can be used to determine the peering
relationship of an actual AS map, an input to the algorithm is a collection of actual AS
paths. Since we do not have a collection of paths corresponding to our AS overlay map,
we cannot apply this algorithm directly. We solve this problem by first applying Gao’s
technique to derive peering relationships for the links on an actual AS map. We chose the
AS map that was collected on May 06, 2001, approximately the same time that the router
map was collected. This AS map was generated from a collection of AS paths obtained
from a BGP routing table dump [48]. The map consists of 10941 nodes and 22568 links.
Once we have the annotated AS map, we then determine a peering relationship for
links on the AS overlay map based on the peering relationship information from the actual
29
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
AS map. The peering relationship is determined as follows. For any link I connecting two
ASs in the AS overlay map, if there exists a corresponding link in the actual AS map, the
peering relationship of link I is assigned by the relationship of the corresponding link in the
actual AS map.^ If one of the ASs doesn’t exist in the actual AS map, then the existing
AS is the provider of the non-existent AS. If both ASs on the end nodes do not exist in the
actual AS map, the AS with the larger degree is the provider of the AS with the smaller
degree. Lastly, if both ASs exist in the actual AS map but there is no corresponding link
on the actual AS map, then if both ASs have large degrees {i.e., degree > 60) or both ASs
are peers with many other ASs {e.g., the number of peer-peer links that each of them have
with other ASs are more than 15 links), then both ASs have a peer-to-peer relationship.
However, if the previous condition is not true, then the AS with larger degree is a provider
of AS with smaller degree. Note that most of our heuristics used here were derived from
Gao’s [28]. The pseudo-code for our heuristics is provided at Appendix A.
After each link on the AS overlay map is assigned a peering relationship, we then
modify our routing policy model to take the direction of each path into account. A path
traverses up the hierarchy through customer-provider links, traverses down the hierarchy
through provider-customer links, and traverses across nodes in the same hierarchical level
through peer-peer links. A valid path is assumed to be hierarchical: the path should never
traverse through a customer-provider link once it traverses through a provider-customer
link. Analogously, after the path is traversing down, it will never traverse up again. There
^We found that 70% of the links (98.7% of nodes) in the AS overlay map exist a corresponding link
(node) in the actual AS map. We have tried to aggregate 6 days of BGP table dumps and generate an
annotated aggregate map. The aggregate map improves the link coverage on our AS overlay by less than
1%.
30
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
may be more than one valid path between any source and destination; our modified policy
model will always pick one of the equally shortest ones.
We should point out that the peering relationship assignment of links in both AS maps
{i.e., the actual AS map and the AS overlay) is not perfect; there exists a small fraction
of node pairs that cannot be reached from one another. However, among all possible node
pairs on both maps, these unreachable pairs account for less than 2%. Additionally, the
heuristics that we used for assigning peering relationships for links on our AS overlay is
conservative. A peer-peer link allows the traffic to flow in both directions without any
restriction while a provider-customer link enforces some direction on the path. Since our
heuristics only assign peer-peer to links that connect big or important ASs (AS with many
peers) and assign provider-customer to links in most cases, the heuristic is conservative.
2.4.2.2 R esult
0.9
I 0.7
g 0.6
I 0.6
0.4
c j
0.3
0.2
Realistic M odel —
Sim plified M odel --■>
3 6 4 5 7
Ratio
0.9
I
£
0.7
0.6
% 0.5
0.4
u
0.3
02
R ealistic M odel —
S im plified M odel -
10 20 30 40 50 60 70 0
(a) Inflation Ratio
D ifference (R outer H ops)
(b) Inflation Difference
Figure 2.15; Inflation difference and inflation ratio by the realistic and simplified routing
policy model.
31
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Is the inflation degree of Internet paths observed earlier (Section 2.3.1) sensitive to a
more accurate routing policy model? To answer this question, we plot cumulative dis
tribution of inflation difference and inflation ratio with respect to the modified routing
policy model® in Figure 2.15. For comparison, we also include the earlier plots of inflation
difference and inflation ratio of paths with respect to the simplified policy model in the
plot. Our result indicates that the degree of inflation with respect to the two models—the
simplified model and the realistic model—are qualitatively similar. Thus, for determining
the extent of inflation, a shortest AS path model appears to suffice.
2.5 Conclusions
Does policy have an impact on Internet path length? Our results in this chapter clearly
make the case that it does, even with realistic models of routing policy. In our model,
nearly 50% of paths benefit from a detour. Some small multicast trees are enlarged almost
30% by policy.
While our shortest AS path approximation may be rendered obsolete by more compli
cated routing policy, there exists a more enduring representation of our work. Shortest AS
path represents the routing that would have resulting from a pure (policy-free) hierarchical
routing in the Internet. In this sense, our paper quantifies the impact on Internet paths of
the particular instance of hierarchy that the Internet has evolved to today.
^There are about 6.7% of sampled node pairs that are not reachable. We ignore these pairs in our
inflation distribution.
32
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 3
Topology Characterization
Following the long-held belief that the Internet is hierarchical, the network topology gener
ators most widely used by the Internet research community, Transit-Stub and Tiers, create
networks with a deliberately hierarchical structure. However, in 1999 a seminal paper by
Faloutsos et al. revealed that the Internet’s degree distribution is a power-law. Because
the degree distributions produced by the Transit-Stub and Tiers generators are not power-
laws, the research community has largely dismissed them as inadequate and proposed new
network generators that attem pt to generate graphs with power-law degree distributions.
Contrary to much of the current literature on network topology generators, we start
with the assumption that it is more important for network generators to accurately model
the large-scale structure of the Internet (such as its hierarchical structure) than to faithfully
imitate its local properties (such as the degree distribution). The purpose of this chapter
is to determine, using various topology metrics, which network generators better represent
this large-scale structure. We find, much to our surprise, that network generators based
on the degree distribution more accurately capture the large-scale structure of measured
topologies.
33
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.1 Introduction
Network protocols are (or at least should be) designed to be independent of the underlying
network topology. However, while topology should have no effect on the correctness of
network protocols, topology sometimes has a major impact on the performance of network
protocols. For this reason, network researchers often use network topology generators to
generate realistic topologies for their simulations.^ These topology generators do not aspire
to produce exact replicas of the current Internet; instead, they merely attem pt to create
network topologies that embody the fundamental characteristics of real networks.
The first network topology generator to become widely used in protocol simulations was
developed by Waxman [76]. This generator is a variant of the classical Erdos-Renyi random
graph [8]; its link creation probabilities are biased by Euclidean distance between the link
endpoints. A later line of research, noting that real network topologies have a non-random
structure, emphasized the fundamental role of hierarchy. The following from [81] reflects
this observation:
...the primary structural characteristic affecting the paths between nodes in the
Internet is the distinction between stub and transit domains... In other words,
there is a hierarchy imposed on nodes...
This reasoning quickly became accepted wisdom and, for many years, the network gener
ators resulting from this line of research, Transit-Stub [12] and Tiers [20], were considered
state-of-the-art. In what follows, we will refer to these as structural generators because of
their focus on the hierarchical structure of networks.
^It should be noted that sometimes topology generators are used to tickle subtle bugs in protocols.
However, for this purpose the emphasis is not on finding realistic topologies but on finding hard cases.
34
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
These structural generators reigned supreme until the appearance of a seminal paper
by Faloutsos et al. [26] in 1999. In that paper, the authors used measurements of the
router-level and AS-level Internet graphs—the former having routers as nodes and the
latter having ASs as nodes—to investigate (among other issues) the node degree, which is
the number of connections a node has. They found that the degree distributions of these
graphs are power-laws.^
The aforementioned structural generators do not produce power-law degree distribu
tions. Many in the field seem to have concluded that this disparity, by itself, proved that
structural generators were unsuitable models for the Internet. Subsequently, there have
been an increasing number of proposals for topology generators that are designed primar
ily to match the Internet’s degree distribution and do not attem pt to model the Internet’s
hierarchical structure; for example, see [35, 43, 2, 50, 1, 10]. These degree-based topology
generators embody the implicit assumption that it is more important to match a certain
local property—the degree distribution—than to capture the large-scale hierarchical struc
ture of the Internet. The rapid adoption of these degree-based generators suggests that
this belief, while not often explicitly stated, is widely held.
This chapter starts with a very different premise. We believe that it is more im portant
for topology generators to accurately model the large-scale structure of the Internet (such
as its hierarchical structure) than to faithfully reproduce its local properties (such as the
degree distribution). In particular, we believe that the scaling performance of protocols
will be more effected by these large-scale structures than by purely local properties. Our
^ There is some disagreement about whether these are true power laws or are Weibull distributions or
perhaps something else. For our purposes we don’t care about the exact mathematical form of the distri
bution, merely that it can be closely approximated by a power-law or similar very long-tailed distributions.
35
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
belief is based on intuition and judgement, not foundation or fact. It is impossible to
determine in any rigorous way which property of networks—the local or the large-scale—is
more important for topology generators to capture.^
While we cannot prove the correctness of our belief, our work is devoted to exploring its
implications. That is, we wish to determine which topology generators—degree-based or
structural—produce better models of the large-scale structure of the Internet. Some have
argued that this question is vacuous, because networks that do not match local properties
of the Internet cannot possibly match its large-scale structure. But we claim the two
properties—local and global—are separable. Consider, for example, a tertiary tree, a two-
dimensional grid, and degree-four random network; each of these networks have exactly
the same degree distribution (all nodes having degree four) but they obviously have very
different large-scale structure. Similarly, one can define trees with any desired degree
distribution (in particular, the one matching the Internet’s degree distribution), and yet
not alter the tree-like large-scale structure.
Thus, we believe, in contrast to much of the research community, that it is still an open
question as to which network topology generators best model the Internet. This chapter is
devoted to addressing this issue. More specifically, we ask the following question: Which
generated networks most closely model the large-scale structure of the Internet?
To answer this question we must first determine what the Internet is and then decide
how to measure the degree of resemblance between it and the generated networks. As we
describe in Section 3.3 we use two representations of the Internet. The first representation
is at the Autonomous System (AS) level, where ASs are nodes and edges represent peering
®And, of course, the answer to this question depends on how the generated topologies are being used.
36
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
relationships between ASs. We use BGP routing tables to derive the AS graph. The second
representation is at the router level, where routers are nodes and an edge indicates that the
corresponding routers are separated by one IP-level hop. The router graph comes from the
SCAN project [32] which uses a series of traceroute measurements to map the Internet. The
router graph represents the Internet at a much finer level of granularity, and has roughly 17
times more nodes and links than the AS-level graph. While they are both representations
of the Internet, it isn’ t clear that, as graphs, they would have much in common. Thus, we
consider these two measured graphs as distinct entities in our analysis, and separately ask
which generated networks most resemble the AS-level graph and which most resemble the
router-level graph. We should note that the structural topology generators were originally
intended to model the router-level graphs, while the degree-based generators were not
explicitly targeted at one or the other level of granularity.
Even though our topology data is the best we could obtain, it is clear that both of these
measured graphs—the AS graph and the router graph—are far from perfect representations
of the Internet. Not only are they subject to errors and omissions, but they also only reflect
the topology and do not contain any information about the speed of the links. We do,
however, approximately model an aspect of reality that has been shown to impact path
lengths [69, 62] in Internet topologies—policy routing.
To measure the properties of the Internet graphs and the generated graphs, we use
a set of three topology metrics described in Section 3.4. These metrics are intended to
capture the large-scale structure of networks. Our methodology for picking these metrics
was simple and, admittedly, ad-hoc. We computed eight different topology metrics (either
reported in the literature or of our own definition) on the network topologies. Of these, we
37
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
find that three basic metrics maximally distinguish our topologies: the addition of more
metrics does not further distinguish between our topologies, but the removal of one or more
of these three blurs some distinctions. Thus, the conclusions we draw are supported by all
eight metrics (not all of our own design), but can be presented with only three of them.
While we are not aware of extensive prior work in the design of metrics to measure large-
scale network properties, and while we have borrowed liberally from the work that exists,
we fully recognize that our metrics may not adequately characterize network topologies
and that additional work is urgently needed in this area. Moreover, the distinctions we
draw from these metrics are rather qualitative in nature {e.g.,do these curves have roughly
the same shape?) and thus are subject to different interpretations.
These caveats notwithstanding, we use these metrics to compare the generated and
measured networks. Our results, presented in Section 3.5 and augmented by additional re
sults (see Appendix B), suggest two findings. First, we find that the AS and router graphs
have similar properties. One might expect (as did we) that, since they describe the Internet
at such different levels, the AS and router graphs would have quite different characteris
tics; our results indicate otherwise. Second, we find that the degree-based generators are
significantly better at representing the large-scale properties of the Internet, at both the
AS and router levels, than the structural generators. Since our metrics measure large-scale
structure and the degree-based generators focus only on very local properties, we expected
the structural generators would easily be superior; again, our results indicate otherwise.
This leaves us with the seeming paradox that while the Internet certainly has hierarchy, it
appears that the large-scale structure of the Internet graphs is better modeled by network
generators that completely ignore hierarchy! We resolve this paradox in Chapter 5.
38
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2 Related Work
We have already mentioned several important areas of related work: the Waxman, Transit-
Stub and Tiers topology generators, and Faloutsos et a/.’s observations of power-law degree
distributions in the Internet. We have also mentioned in passing several new degree-based
generators [35, 43, 2, 50, 1]. They all attem pt to generate networks with power-law degree
distributions, but differ in the way in which nodes are connected. We describe some of
these generators in slightly more detail in Section 3.6.
Perhaps closest in spirit to the work presented in this chapter is the pioneering explo
ration of topology properties by Zegura et al. [81]. Their study considered various properties
(biconnectivity and various kinds of network diameters) of random graphs (and variants
thereof) and structural generators. We follow their lead but extend their study using a
larger collection of metrics, adding measured networks and degree-based generators (this
chapter), and explicitly analyzing the degree of hierarchy (in Chapterfefchapter:hierarchy).
More recently, Barabasi et al. [4] have attem pted to quantify the attack and error tolerance
of random graphs and real-world “scale-free” networks. Finally, van Mieghem et al. [72]
have shown that the Internet’s hop count distribution (the distribution of path lengths in
hops) is well modeled by that of a random graph with uniformly or exponentially assigned
link weights. Some of the topology metrics used in our work are based on the metrics
introduced in these papers.
Also directly relevant is the work of Medina et al. [44]. They too compare random
graph generators (such as Waxman), and hierarchical generators (such as Transit-Stub) to
degree-based generators (such as the BRITE generator [43]). Their metrics for comparison
39
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
include the tests in [26] for power law exponents of the degree distribution, the degree rank,
the hop-plot and the eigenvalue distribution. They conclude that the degree and degree-
rank exponents are the best discriminators between topologies among the metrics they
considered. Using these metrics, they conclude that the BRITE generator was better than
the Transit-Stub and Waxman generators in modeling the Internet. However, using the
degree and degree-rank exponents as metrics means that topologies are evaluated solely on
how well their degree distribution matches the degree distribution of the Internet. It is well
known that Transit-Stub and other structural generators do not produce power-law degree
distributions, and so it is no mystery that BRITE and other degree-based generators do a
better job of matching the degree and degree-ranked exponents. However, the question we
pose in this chapter is: which class of generators most closely resemble the Internet when
looking at the large-scale properties of the Internet? We believe this question has not been
addressed by the work in [44] or elsewhere in the literature because networks with similar
degree distributions can have very different large-scale properties (Section 3.1).
Two other recent pieces of work examine local properties of network topologies. Bu
and Towsley [10] find that degree-based generators differ significantly in their clustering
coefficients [75]. Their work proposes an alternative degree-based generator that more
closely matches the clustering behavior of the measured AS graph. For completeness, we
have incorporated both the clustering metric and the proposed generator in our analyses
(Section 3.5). Vukadinovic et al. [74] evaluate the Laplacian eigenvalue spectrum of a
variety of graphs, and conclude that the multiplicity of eigenvalues of value 1 differentiates
AS graphs from grids and random trees. However, as claimed in [74], this measure of the
spectrum reflects purely local properties of the graph (the number of degree 1 nodes, the
40
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
number of nodes attached to degree 1 nodes etc.), while our work focuses on the large-scale
structure. However, their result is consistent with our findings (and with the commonly
held intuition that the AS graph is neither mesh-like nor tree-like).
Also relevant to our work is recent work on the analysis of graph measurements. Broido
and Claffy [9] find that various properties of real-world graphs, including the degree distri
bution, are well-modeled by a Weibull distribution. Using extensive measurements of the
AS graph, Chang et al. [15] show that the degree distribution of the AS graph deviates
significantly from a strict power-law fit. As we have discussed in Section 3.1, our work
merely assumes that the degree distribution is well approximated by a heavy tail and does
not depend on the exact mathematical form of the distribution.
Our work would not have been possible without developments in Internet router-level
topology discovery. Early work in this area used traceroutes from a small set of sources to
several thousand hosts to compute a router-level map [51]. Subsequent work improved the
coverage of the Internet address space by randomly selecting IP addresses [58], randomly
selecting addresses from route entries in BGP tables [11], using a precomputed set of Web
sites [18], or using heuristics to infer addressable parts of the IP space [32]. This last work
also documents several techniques for improving completeness of the inferred topologies.
Several papers have addressed the impact of topology on protocol performance. For
example, Phillips et al. [56] showed that graphs with exponentially increasing neighborhood
sizes {i.e., number of nodes within a certain radius increases exponentially with radius)
approximately obey the Chuang-Sirbu multicast scaling law [17]. In closely related work,
Almeroth and Chambers [13] considered a variety of metrics for the efficiency of multicast
trees. Wong and Katz [79] found that the amount of multicast state from randomly placed
41
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
receivers differs qualitatively with different topologies. Radoslavov et al. [59] found similar
results for other kinds of protocol performance questions.
Somewhat orthogonal to the questions considered in this chapter is recent work at
tempting to explain the origin of power-law degree distributions. Ferrer i Cancho et al. [34]
and Fabrikant et al. [24] have independently shown that, under certain conditions, power-
law degree distributions can arise as a consequence of optimizing an objective function. We
argue in Chapter 6 (also in [65]) that, for the AS graph, the high variability of the degree
distribution follows from the high variability of the distribution of AS sizes.
There has also been significant work in the non-networking literature exploring the
properties of real-world networks. We do not intend to be exhaustive in our coverage
of this work, but will mention some oft-referenced work. Watts and Strogatz [75] found
that many real-world networks, such as the actor collaboration network and a section of
the power grid, are well-modeled by the small-world phenomenon. Kleinberg et al. [38]
analyzed properties of the World-Wide Web graph and proposed a new family of random
graph models. Aiello et al. [1] proposed a random graph model for massive graphs and
showed that this model captures some aspects of the AT&T call graph. Our work has
been influenced by some of this work, but focuses primarily on communication network
topologies.
3.3 Networks
We analyze three categories of network graphs: measured networks, generated networks,
and canonical networks.
42
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.3.1 M easured Networks
We use two measured network topologies. Our first is the .< 45 topology, representing inter-
autonomous system (AS) connectivity, obtained from AS path information in backbone
BGP routing tables. Nodes in this topology represent ASs, and links represent peering re
lationships between them. The particular topology we present in this chapter was obtained
from the routing table at a router'^ that peers with more than 20 other backbone routers.
Our second measured topology is the Internet router-level {RL) topology. This is de
rived by inferring router adjacencies [32] in the Internet from traceroutes to carefully chosen
sections of the IP address space. Nodes in this topology represent routers, and links connect
routers that are one IP-level hop from each other. In passing, we note that this definition
of a link does not distinguish shared media from point-to-point links. The former usually
appear as completely connected subgraphs in the network topology.
Although these topologies are related, they reflect Internet connectivity at rather dif
ferent scales. For example, the AS topology abstracts many details of physical connectivity
between ASs and each AS represents a grouping of several (sometimes hundreds) topolog
ically contiguous routers. Thus, these two graphs could have had very different properties,
but, as we show in Section 3.5, they behave quite similarly with respect to our topology
metrics.
Both these topologies may be incomplete, to different degrees. They may not capture
all the nodes in the network and, for the nodes that do appear in the topology, they may not
include all adjacencies at each node. We hope, however, that the qualitative conclusions we
draw in this chapter will be fairly robust to minor methodological improvements in topology
^route-views.oregon-ix.net
43
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
collection. A more serions problem is that these measured networks merely represent
connectivity between nodes and links. In particular, neither the RL nor the AS graph
contains any indication of the capacity of the underlying transmission link (or shared
medium). Although techniques for estimating link capacities along a path are known
([22, 41]), they are reported to be fairly time consuming and, to our knowledge, no one
has attempted to annotate the router-level graph of the entire Internet with link capacity
information. We don’t know how our conclusions would change if such an annotated graph
were available.
These topologies are also, obviously, time varying. We have computed our topology
metrics for at least three different snapshots of both topologies, each snapshot separated
from the next by several months.^ We find that the qualitative conclusions we draw in
this chapter hold across these different snapshots. Finally, we have also been careful to
incorporate the effects of policy routing in computing our topology metrics. We use a
variant of a simple routing policy (Section 3.4) that has been shown to match actual
routing path lengths reasonably well [69, 68]. In Section 3.5, we describe the impact of
policy on our conclusions.
3.3.2 G enerators
We consider three classes of network generators in our work. The first category, random
graph generators, is represented by the Waxman [76] generator. The classical Erdos-Renyi
random graph model [8] assigns a uniform probability for creating a link between any pair
of nodes. The Waxman generator extends the classical model by randomly assigning nodes
® Aug 1999, April 2000 and May 2001 for the RL maps. March 1999, December 2000, April 2000, and
May 2001 for the AS maps.
44
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
to locations on a plane and making the link creation probability a function of the Euclidean
distance between the nodes.
The second category, the structural generators, contains the Transit-Stub [12] and
Tiers [20] generators. Transit-Stub creates a number of top-level transit domains within
which nodes are connected randomly. Attached to each transit domain are several similarly
generated stub domains. Additional stub-to-transit and stub-to-stub links are added ran
domly based upon specified parameters. Tiers uses a somewhat different procedure. First,
it creates a number of top-level networks, to each of which are attached several interme
diate tier networks. Similarly, several LANs are randomly attached to each intermediate
tier network. W ithin each tier (except the LAN), Tiers uses a minimum spanning tree to
connect all the nodes, then adds additional links in order of increasing inter-node Euclidean
distance. LAN nodes are connected using a star topology. Additional inter-tier links are
added randomly based upon specified parameters.
Both Transit-Stub and Tiers have a wide variety of parameters. Although we present
our results for one instance of these topologies. Appendix D lists the sets of parameters we
have explored. Section 3.5.4 discusses the impact of our parameter space exploration on
our conclusions.
The third category is that of degree-based generators. The simplest degree-based gen
erator, called the power-law random graph (PLRG) [1], works as follows. Given a target
number of nodes N , and an exponent /f, it first assigns degrees to N nodes drawn from a
power-law distribution with exponent /3 {i.e., the probability of a degree of k is proportional
to k~^). Let Vi denote the degree assigned to node i. Solely for the purposes of assigning
links between nodes, the PLRG generator makes Vi copies of each node i. Links are then
45
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
assigned by randomly picking two node copies and assigning a link between them, until
no more copies remain.® For most of the rest of the chapter, we focus almost exclusively
on PLRG as the sole degree-based generator. However, the results for other degree-based
generators, presented in Section 3.6, are qualitatively similar to those of PLRG.
3.3.3 Canonical Networks
Finally, our study also includes three canonical networks: the k-axy Tree, the rectangular
grid or Mesh, and an Erdos-Renyi Random graph. We include these admittedly unrealistic
networks because they help calibrate, and explain, our results on measured and generated
networks.
3.4 Metrics
The goal of topology generators is not to produce exact replicas of the current Internet,
but instead to produce graphs whose properties are similar to the Internet graph. In
this chapter we evaluate the quality of a topology generator by how well its generated
networks match the large-scale properties of the Internet (both the AS and RL topologies)
as measured by several topology metrics. The hard question, though, is: what properties
are relevant to this comparison?
There is no single answer to this question, as the relevant properties may well depend
on how the generated networks are being used. Moreover, even for a given purpose it is
®This generator is not guaranteed to give a connected graph although, for reasonable values of /3, it pro
duces one large connected component. We pick this connected component for our analyses. Furthermore,
this procedure can produce self-loops and multiple links between nodes. We ignore these superfluous links
in our graphs.
46
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
a m atter of judgement as to what network properties are the most relevant. Thus, we
recognize that the metrics we chose are in no way definitive, but merely reflect our own
intuition.
Our list of metrics, which include many that have been reported in the networking
literature and some graph-theoretic metrics that have plausible networking interpretations,
are listed below:
• Neighborhood size (or expansion) [56].
• Resilience, the size of a cut-set for a balanced bi-partition [36].
• Distortion, or the minimum communication cost spanning tree [33].
• Node diameter distribution^ [81].
• Eigenvalue distribution [26].
• Size of a vertex cover [52].
• Biconnectivity (number of biconnected components) [81].
• The average pairwise shortest path between nodes in the largest component under
random failure (when nodes are removed from the graph randomly) or under attack
(when nodes are removed in order of decreasing degree) [4].
After computing these metrics on our topologies, we found that three (expansion, re
silience and distortion) formed the smallest set of metrics that qualitatively distinguished
our set of topologies into well-defined categories. We describe these metrics in this section,
^Node diameter is synonymous with eccentricity
47
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and discuss these qualitative distinctions in Section 3.5. We present the results for all of
our other metrics in Appendix B. The fact that these three metrics also qualitatively dif
ferentiate between our canonical graphs—mesh, tree and the random graph (Section 3.4)
serves as a simple sanity check for our methodology. Intuitively, we know that these canon
ical graphs are quite different from each other in ways that would be very important to
networks, and therefore it is important that our metrics at least clearly differentiate them.*
We made one important assumption in deciding how to compute these metrics on our
topologies—that they should be designed to ignore superficial differences, like differences
in size. Our two measured topologies differ by an order of magnitude in size, and it is
more convenient to compare the two against a set of generated and canonical networks.
We describe our approach to this, a technique called ball-growing, in the next section.
T h e T h ree B a sic M etr ics
3.4.1 R ate of spreading: Expansion
One key aspect of a tree is that the number of sites you can reach by traversing h hops
grows exponentially in h. We capture this behavior with our expansion metric, denoted
by E{h). E{h) is the average fraction of nodes in the graph that fall within a ball of
radius h centered at a node in the topology. More precisely, for a given originating node
V we compute the number of nodes that can be reached in h hops (the reachable set). We
calculate the size of the reachable set for each node in the graph, average the result, and
then normalize by the total number of nodes in the graph.
®Many of the other metrics used in the literature are not as successful in differentiating these three
canonical graphs.
48
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
This definition is similar,® to the reachability function described in [56] and to the
hop-pair distribution defined in [26]. In fact, [56] has analyzed the expansion of some,
but not all, of the topologies described in Section 3.3. We repeat those analyses here for
completeness.
For our other metrics we use a technique, called ball-growing, based on these balls of
radius h. We measure some quantity in a ball of radius h and then consider how that
quantity grows as a function of h. This allows us to compare graphs of different sizes
because, for each h, we are measuring the same sized ball in both networks. The result of
each such metric is not a single value but a function of h, and the dependence on h reflects
the behavior of the quantity in question at different scales. We will use this technique in
our other two metrics; expansion is merely the measure of the size (in terms of the number
of nodes that reside in the ball), and our other two metrics will measure other properties
of the subgraph that resides within balls of radius h.
Implicitly, in computing balls of radius h, our definition includes all nodes to whom the
shortest path from the center of the ball is less than or equal to h. For the AS and RL
graphs, we extended this in a simple way to account for policy routing. In computing a
policy-induced ball of radius h, we include all nodes to whom the policy path from the center
of the ball is less than or equal to h, and only include links that lie on policy-compliant paths
to those nodes. To do so, we use a policy model that is slightly more sophisticated than
the one reported in [69], specifically, the shortest valid AS path routing in Section 2.4.2.
At the AS level, this policy model computes the shortest AS path between two nodes tfiat
®Unlike [56] E{h) is expressed as a fraction of the total number of nodes in the graph, thus making it
easier to compare graphs of different sizes in Section 3.5.
49
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
does not violate provider-customer relationships (an example of a path that would violate
these relationship is one that traverses a provider, followed by a customer and then back
to another provider). We use the results in [28] to infer provider-customer relationships.
To compute the policy path in the RL graph, we first compute the corresponding AS level
policy path, and then use shortest-paths within the sequence of ASs to determine a router-
level policy path. We discuss policy-induced ball growing in greater detail in Appendix E.
There is an important caveat about ball growing that is worth mentioning. We have
said that ball growing allows us to study a graph at different scales. However, for some
graphs, computing a metric on balls of different sizes is not equivalent to evaluating the
metric on graphs of comparable sizes. A random graph is a good example of this; a ball
of size iV of a random graph may not itself be a random graph. However, balls of radius h
from, respectively, a random network of size N and a random network of size 2N will be
similar, as long as the diameters of both networks is larger than h. This is why we adopted
the ball-growing approach.
The expansion metric allows us to easily distinguish the mesh from our other two
canonical networks. For a mesh with N nodes, E{h) oc ^ while for the fc-ary tree or a
random graph of average degree k, E{h) oc Thus, the mesh has a qualitatively lower
expansion than the tree and the random graph. In passing, we note that our definition of
expansion is different from the traditional graph-theoretic definition of expander graphs^^
which is not appropriate for the task at hand.
An N node bipartite graph from a vertex set A to a vertex set B is said to be an (a, b) expander if,
every set of n < aN nodes in A has at least m > bN neighbors in B [55].
50
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.4.2 E xistence of alternate paths: R esilience
If you cut a single link in a tree, the graph is no longer connected. In contrast, it typically
requires many cut links to disconnect a random graph. Our second metric, resilience
measures the robustness of the graph to link failures. In its definition we use a standard
graph-theoretic quantity: the minimum cut-set size for a balanced bi-partition of a graph.
We define the resilience R{n) to be the average minimum cut-set size within an n-node
ball around any node in the topology. We make R a function of n not h—the number of
nodes in the ball, not the radius of the ball itself—to factor out the fact that graphs with
high expansion will have more nodes in balls of the same radius.
Computing the minimal cut-set size for a balanced bi-partition of a graph is NP-
hard [36]. We use the well-tested heuristics described in [36] for our computations of
R{n).
A random graph with average degree k has R{n) oc kn and a mesh has R{n) oc ^/n.
The tree, of course, has R{n) = 1. Thus, the tree has qualitatively lower resilience than
the other two graphs.
3.4.3 Tree-like behavior: D istortion
While it appears somewhat unnatural and unmotivated, our final metric, distortion, comes
from the graph theory literature [33]. Consider any spanning tree T on a graph G, and
“ For a graph with n nodes, this is the minimal number of links that must be cut so that the two resulting
components have approximately f nodes.
^^For each node in the network, we grow balls with increasing radius. For the subgraph formed by nodes
within a ball, we compute the number of nodes n as well as the resilience of the subgraph. We repeat
this computation for all (for larger subgraphs, we repeated the computation for sufficiently large number
of randomly chosen nodes, in order to keep computation times reasonable) other nodes, then average the
sizes and resilience values of all subgraphs of the same radius.
51
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
compute the average distance on T between any two vertices that share an edge in G. This
number measures how T distorts edges in G, i.e., it measures how many extra hops are
required to go from one side of an edge in G to the other, if we are restricted to using
T. We define the distortion^^ of G to be the smallest such average over all possible Ts.
Intuitively, distortion measures how tree-like a graph is.
For a given graph, distortion is a single number. As we did with resilience, we define the
distortion D{n) for a topology to be the average distortion of a subgraph of n nodes within
a “ball” around a node in the topology. Computing the distortion can be NP-hard [61]. For
the results described in this chapter, we use the smallest distortion obtained by applying
our own heuristics.We also use a simple divide and conquer algorithm suggested by
Bartal [7].^^
The tree has R{n) = 1. The random graph and the mesh each have R{n) oc logn [29].
3.4.4 Sum m ary
To more fully understand the distinctions made by our three metrics, we consider two
other standard networks: a fully-connected network and a linear chain. A fully-connected
network has extremely high expansion (E{h) = 1) and resilience {R{n) oc n), and low
distortion {D{n) = 2). A chain (linear) network (with N nodes) has extremely low values
^®This definition is a special case of minimum communication cost spanning trees defined in [33].
^ '‘For each node in the network, we grow balls with increasing radius. For the subgraph formed by
nodes within a ball, we compute the number of nodes in the ball. We then use an all-pairs shortest path
computation on the ball. The node through which the highest number of pairs traverse is deemed to be the
“center” of the ball. The subgraph’s distortion value is determined by the distortion of the BFS tree rooted
at the center. We repeat this computation for all (for larger subgraphs, we repeated the computation for
sufficiently large number of randomly chosen nodes, in order to keep computation times reasonable) other
nodes, then average the sizes and distortion values of all subgraphs of the same radius.
^®This approach is known to compute distortions to within 0(log(n )) of the optimal solution. We should
note that for all the topologies except mesh our own heuristics resulted in smaller distortion values than
that obtained using this heuristic.
52
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
on all three: E{h) = R{n) oc 1, and D{n) = 1. We don’t use these for calibration
because they have trivial expansion properties (all nodes within one hop, or one node at
each hop) that does not work well with our ball-growing metric, but they are useful here.
If we divide behavior for each metric into high (H) and low (L), we can construct the
following table which lists the properties of our five representative networks:
Topology Expansion Resilience Distortion
Mesh L H H
Random H H H
Tree H L L
Complete H H L
Linear L L L
Notice that each of the five networks has its own low/high signature. Thus, this set of
metrics is successful at distinguishing between the canonical networks.
We have not been able to find a canonical network with the LHL pattern. In fact, the
complete graph is the only example we have of any network with high-resilience and low-
distortion. The complete graph shows that these two properties (resilience and distortion)
are not redundant {i.e., they refer to different aspects of network structure). However, the
artificiality of the complete graph, and the lack of simple examples of high-resilience and
low-distortion networks might lead us to suspect that networks with high-resilience and
low-distortion are unlikely to occur in practice. In fact, we find in Section 3.5 that the two
Internet graphs have these properties.
Also missing are the combinations LLH and HLH. We conjecture that high distortion
implies high resilience so these combinations are impossible.
53
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Type Topology Num ber of Nodes Avg. Degree Com m ent
M easured RL 170589 2.53 M ay 2001
AS 10941 4.13 M ay 2001
Generated PLRG 9230 4.46 2.246
Transit-Stub 1008 2.78 3 0 0 6 0.55 6 0.32 9 0.248
Tiers 5000 2.83 1 50 10 500 40 5 20 20 1 20 1
W axm an 5000 7.22 5000 0.005 0.30
Canonical M esh 900 3.87 30x30 grid
Random 5018 4.18 Link prob = 0.0008
Tree 1093 2.00 k = 3 ,D = 6
Figure 3.1: Table of network topologies used. See Appendix D for a description of param
eters for the generated networks.
3.5 Results
We now describe the results of applying our three basic metrics to specific instances of
measured, canonical, and generated networks (Figure 5.3). Some of the network generators
allow a variety of input parameters. For these, we use particular instances of generated
networks, whose parameters are described in Figure 5.3. In Section 3.5.4 we discuss the
sensitivity of our results to parameter variations.
Tree — •-
M esh —
Random .....*-
RL —
AS :
&
§
0 . 1 0 . 1
g
0 . 0 1 0 . 0 1
u
I
I
0 .0 0 1 0 .0 0 1
I
0 .0 0 0 1 0 .0 0 0 1
1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1
TS — »-
Tiers --x-
W axman *-
PLRG - B
0 . 0 1
0 .0 0 1
0 .0 0 0 1
1 0 0 1 0 1 0 0 0 1
Degree Degree
(a) Canonical (b) Real (c) Generated
Figure 3.2: Degree Distributions for various graphs
54
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
We present the degree distributions for our real, measured and generated networks in
Figure 3.2. Of the generated and canonical networks, only the PLRG qualitatively captures
the degree distribution of the measured networks.
3 .5 .1 E x p a n sio n
O.OI O.Ol 0 .0 1
.1
I 0 .0 0 1 0 . 0 0 1 0 .0 0 1
I
0 .0 0 0 1 0.0001 0 .0 0 0 1
TS — I —
Tiers x —-
W axman •
PLRG — B —
RL — H
RL(Policy) — *
AS *-
A S(Folicy) — &
le-05 le-05 le-05
Tree — h
M esh — X -
Random *■
le-06 le-06 le-06
10 15 20 25 30 35 10 15 20 25 30 35 0 10 15 20 25 30 35 0 5 0 5 5
(a) Canonical (b) Measured
Figure 3.3: Expansion
(c) Generated
Figure 3.4 plots the expansion E{h) for our measured, generated, and canonical net
works. Following our discussion in Section 3.4, Figure 3.4(a) shows that Tree and Random
expand exponentially (up until the regime where almost all nodes are reached), although
at slightly different rates. Mesh exhibits a qualitatively slower expansion. AS and RL
also expand exponentially,^^ and their behavior doesn’t qualitatively change when policy
is considered. Of the generated networks, Transit-Stub (TS), PLRG, and Waxman expand
exponentially, but Tiers shows a markedly slower expansion similar to Mesh.
In su m m ary, th e n , w e can ca teg o rize our n etw ork s in to tw o cla sses, th o se th a t ex p a n d
exponentially, and those that expand more slowly. Using our low/high terminology of
^®The finding that the expansion of the RL graph is exponential is not universally accepted [26]. However,
at least two other studies agree with our conclusions [56, 71].
55
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Section 3.4, we say that Mesh and Tiers have low expansion, and all other networks exhibit
high expansion.
We emphasize that, in drawing these distinctions, we have made qualitative (and there
fore somewhat subjective) comparisons. We ignore quantitative differences in metric values,
such as different constants or slopes. We also do not use sophisticated curve-fitting tech
niques to infer the mathematical form of E{h) for some of the measured and generated
networks. Our emphasis on qualitative comparison is consistent with our initial assump
tion (see Section 3.4) that the goal of topology generators is not to produce exact replicas
of the Internet, but to produce graphs that have similar large-scale properties. It is also
consistent with the unquantifiable incompleteness of our Internet graphs.
3.5.2 R esilience
le+06 le+06
Tree
M esh
Random
R L —
RL(Policy) — i
AS ;
AS(Policy) —(
1 0 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0
1 0 0 0 1 0 0 0
1 0 0 1 0 0
1 0 100 1000 10000 100000 1 1 0 100 1000 10000 100000 1
le+06
TS —
W axman ....
PLRG - H
1 0 0 0 0 0
1 0 0 0 0
1 0 0 0
1 0 0
100 1000 10000 100000 1 0
(a) Canonical (b) Measured
Figure 3.4: Resilience
(c) Generated
Figure 3.4 plots the resilience function R{n) for our measured, generated, and canonical
networks. Of our canonical networks, Tree has the lowest resilience (Figure 3.4(b)). The
56
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
minor variations in this function can be attributed to the heuristics we use to determine
the cut-set. The resilience of Mesh increases with ball size, but more slowly than Random.
The measured networks exhibit a high resilience that is comparable with that of Ran
dom. However, RL and AS differ from each other quantitatively. Also, when policy routing
is taken into account, the resilience of the RL and AS graphs decreases (the former by al
most a factor of two), although its qualitative behavior as a function of ball size remains
unchanged for both graphs. Of the generated networks, Waxman closely resembles Ran
dom, and Tiers closely resembles Mesh. TS has low R(n)^^, similar to Tree.^^ Finally,
PLRG has high resilience, like Random, although it does not match Random as closely as
Waxman does.
Following our low/high classification of Section 3.4, we then say that TS and Tree have
low resilience, and all the other networks have high resilience.
3 .5 .3 D is to r tio n
Figure 3.4 plots D{n) for our measured, generated and canonical networks. The distortion
of the Tree is low, whereas that for Mesh and Random are high.
By our reckoning, the measured networks (Figure 3.4(h)) have low distortion, more so
when policy routing is taken into account. Their distortion, although it increases with n,
has many parameters, one of which is the fraction of redundant tremsit-to-stub or stub-to-stub
links. We tried varying this parameter (from 1% to 60%) in an attempt to increase the resilience of TS.
When we do so, however, the distortion of TS increases to match that of the random graph.
^®Notice that there are minor irregularities in R{n) for TS. We attribute this to the observation that,
of two balls of slightly differing size, a larger ball can have a lower resilience. For example, consider this
contrived example of two completely connected networks each with n nodes joined by a single link. A
ball of radius 1 centered on any node has a resilience of n; a ball of radius 3 centered on emy node has a
resilience of 1.
57
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
7
6
l<
5
3
Tree — '—
M esh — -
Random - *
1 10 100 1000 10000 100000
Ball Size
(a) Canonical
7
R L — I-
RL(Policy) -X -
A S *•
A S(Policy) B-
6
5
4
3
2
1
100 1000 10000 100000 1 0 1
(b) Measured
Figure 3.5: Distortion
TS — I —
Tiers — *—
W axman * *
PLRG - B - ;
V A
1 10 100 1000 10000 100000
Ball Size
(c) Generated
appears qualitatively different from Mesh or Random. The same is true of most of the
generated networks, with the sole exception of Waxman.
From this discussion, we conclude that Random, Mesh and Waxman all have high
distortion. All other networks have low distortion.
3 .5 .4 D isc u ssio n
The preceding discussion reveals the following low/high classifications for our measured
and generated networks:
Topology Expansion Resilience Distortion Comment
Mesh L H H
Random H H H
Tree H L L
Complete H H L
Linear L L L
AS, RL, PLRG H H L Like complete graph!
Tiers L H L No counterpart
TS H L L Like Tree
Waxman H H H Like Random
58
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Both measured graphs have rapid expansion, high resilience, and relatively low distor
tion; that is, these networks can be seen as tree-like, except that they are resilient. Policy
routing does not change this classification. Even though there is no a priori reason to
assume that the AS and RL topologies would be qualitatively similar, our metrics suggest
that they are quite similar, at least in terms of the properties measured by our metrics.
Among the standard graphs, only the complete graph has the same low-high signature^°
as these measured graphs. Moreover, two of the generated graphs resemble a canonical
network. TS resembles the Tree, and Waxman closely models Random. Tiers does not
have a canonical counterpart; it resembles Mesh in two metrics, but has low distortion
unlike the Mesh.
When comparing our measured graphs to the generated ones, we find that three of
the generated graphs differ from the measured graphs in one particular metric: Tiers has
low expansion, TS has low resilience, and Waxman has high distortion. Only the PLRG
matches the measured graphs in all three metrics. Thus, we contend that PLRG produces
graphs that are better qualitative matches to the Internet graphs than those produced by
the other generators.
This conclusion holds for all other degree-based generators we tested. Figure 3.7 in
Section 3.6 shows our three metrics for four other proposed degree-based generators: Brite
version 1.0 [43], BA [6], BT [10] and Inet [35]. All of these can be classified, along with
^®The results presented here contain one instance of each of the AS and RL graphs. In fact, we computed
these metrics for at least two other instances, generated more than six months apart from each other (see
footnote 5 for dates). Moreover, the RL graph of August 1999 was approximately a factor of two larger
than the later graphs (the size difference is due to the difference in the duration of execution of the topology
discovery software). Despite the differences in size and time of generation, these other measured graphs
did not change our conclusions.
^°We should hasten to add, of course, that we do not mean to suggest that the AS and RL graphs
resemble the complete graph. The latter exhibits an extreme expansion behavior (all nodes are reachable
within one hop) that the AS and RL do not.
59
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the PLRG, as having high expansion and resilience, and low distortion.^^ These generators
all produce graphs with a power-law degree distribution, but differ in the way nodes are
connected together. In Appendix C we investigate other ways of connecting nodes, and find
that our conclusions are robust to variations in node connectivity, provided the connectivity
method incorporates some notion of random connectivity and the generated graph’s degree
distribution is qualitatively similar to that of the measured graphs.
These conclusions about generated networks hold for a wide variety of parameters. We
list the various parameter settings that we have explored, for each of these generators in
Appendix D. While for most parameter values the results are in agreement with what we
have presented here, it is possible to drive these generators to different operating regimes
using extreme choices for parameters. For the Waxman generator, it is possible to in
troduce extreme geographic bias, thereby dramatically reducing the likelihood of having
links between two nodes that are far apart. This also reduces the likelihood of obtaining a
connected graph. In this regime, the largest connected component of the Waxman network
has low expansion, low resilience and low distortion. It then resembles a minimum span
ning tree overlaid on points on a plane, where edge weights are proportional to Euclidean
distance. For two-level TS hierarchies with a large transit portion, TS tends toward a
random graph. Finally, with Tiers, the average degree parameter can be reduced to the
point where it starts to resemble a minimum spanning tree.
would be interesting to find metrics that distinguish power law generators. In fact, there is some
work that has already examined this question [10]. That our metrics do not do so is not a flaw of our
methodology. It merely reflects the fact that these degree-based generators seem to produce the same
large-scale structure.
60
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.5.5 O ther m etrics
In addition to our three basic metrics, we have shown results in Appendix B for six other
metrics.Some of these were of our own devising, but many were taken from the literature.
In all cases the results were consistent with the findings above. In many cases the metrics
did not distinguish between different graphs, but whenever there was a clear distinction it
was consistent with the grouping found by onr three basic metrics. In fact, the three met
rics stood out clearly because of their superior ability to distinguish between the various
networks. We conclude that, even by these additional metrics, the PLRG resembles the
AS and the RL graphs, the Waxman resembles Random, and qualitatively matches
the tree.^^ Looking at the graphs in Appendix B in more detail, the PLRG is the only
generator with a power-law distribution of the rank of positive eigenvalues, a signature of
the AS topology [26].^® The diameter distributions have a similar bell-curve shape (with
the Tree as the sole exception, as discussed in footnote 23), although with different mag
nitudes. The error tolerance [4] plots for all the graphs are qualitatively similar, but with
different magnitudes. However, the measured networks have a peaked attack tolerance [4],
a characteristic shared by PLRG and Tiers. The vertex cover metric of all graphs are quite
similar to each other, and the biconnectivity metric of all graphs has a similar behavior
with the exception of Mesh, Random, and Waxman.
addition to the metrics described in Appendix B, we also tested many others (of our own devising),
including the average path length between any two nodes in a ball of size n, and the expected max-flow
between the center of a ball of size n and any node on the surface of the ball. These metrics, too, do not
contradict our findings but do not add to them either.
The diameter distribution for the tree is one-sided, but nevertheless resembles Transit-Stub.
Modulo the observation that extreme choices of parameters can alter the properties of the generated
graphs.
^^The RL graph was too large to obtain its eigenvalue spectrum.
61
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In addition to these various metrics that are intended to measure large-scale structure,
we did compute the clustering metric used in [10] on our various graphs. Using our ball-
growing technique and looking at the overall curve’s behavior, the PLRG graph had a
behavior similar to that of the AS graph, but different from that of all other graphs including
the RL. However, when merely looking at the value of the clustering coefficient computed on
the whole graph, the PLRG (and the structural generators) exhibited significantly different
clustering coefficients compared to either the AS or the RL. We conclude that while PLRG
captures the large-scale properties of our measured graphs, it may not capture the local
properties of these graphs.
3.6 Other Power-Law Network Generators: Does Connectivity
Matter?
In the chapter, we have used a single degree-based generator, the PLRG. The PLRG
generator first generates a set of nodes with a particular degree distribution (such as a
power-law distribution), then uses a particularly simple technique for connecting nodes
(Section 3.3.2). It clones each node as many times as the degree assigned to it, then
uniformly randomly connects the clones. In addition to PLRG, there are other generators
that generate networks with power-law degree distributions.
One class of generators is exemplified by the model proposed by Barabasi and Al
bert [6]—we call this the B-A model—and the Brite generator. The B-A model is an evo
lutionary process that generates graphs with power-law degree distributions. The graph
62
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
is grown incrementally, with newly appearing nodes randomly connecting to already ex
isting nodes, but in proportion to their degrees. The Brite [43] generator incorporates the
B-A model with additonal features, such as node placement (random or heavy-tail) and
geographic bias in establishing links. We used a heavy-tailed option when generating a
network in our study. However, we did not explore the later feature. A slight variant of the
B-A model proposed by the same authors incorporates link addition and re-wiring [2]; with
a small, but uniform probability a link can be added between two nodes, or an existing
link can reattach from one endpoint to another based on preferential connectivity. Later,
this variant has been modified by Bu and Towley—we call the modified version the BT
model—to allow more flexibility in specifying how the nodes are connected.
B-A
Brite
BT
I
i
0 .1 PLRG »
0 .0 1
0
1
0 .0 0 1
0 .0 0 0 1
1 0 0 0 1 0 0 0 0 1 1 0 1 0 0
Degree
Figure 3.6; Degree Distributions of PLRG-Variant Networks
Another class of generators initially assigns node degrees from a power-law degree
distribution, similar to the PLRG. Unlike the PLRG, however, these approaches connect
nodes using different rules. For example, after conducting a feasibility test on the generated
degree distribution to see if the resulting graph would be connected, the Inet [35] generator
63
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
creates a spanning tree among nodes of degree larger than one, connects degree one nodes to
this spanning tree with proportional connectivity,^® then satisfies the degrees of remaining
nodes in decreasing degree order. Another generator [50] connects the nodes randomly,
without cloning.
le+06 7
B-A —
Brite —
BT j
Inet (
PLRG --H
B-À — I —
Brite — X -—
B T * ....
Inet - B —
PLRG " *
6 100000 0.1
0.01 10000 5
.1
.1
1 4 0.001 1000
Q
3 0.0001 100
B-A — I-
Brite —
BT *•
Inet - G -
PLRG
2 le-05
10
1 Ie-06
100 1000 10000 100000 1 100 1000 10000 100000 35 1 10
(a) Expansion (b) Resilience
Figure 3.7: PLRG Variants
(c) Distortion
In this section, we have computed our three metrics for some other power-law network
generators, specifically, B-A, Brite, BT, and Inet. All the networks except BT have average
degrees around 4 while BT (with the default setting) has an average degree of 9.25. We
include the performance of PLRG in this section for comparison. Figure 3.6 and 3.7 plot the
degree distributions and our three metrics of those PLRG variant networks, respectively.
We conclude that they are all qualitatively similar with respect to our metrics.
®The likelihood of attaching to a node is proportional to its degree
64
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.6.1 D oes C onnectivity M atter?
Though all the PLRG varaints have qualitatively similar performance, the B-A, Brite
and BT generators have a slightly different distortion curve. In examining their degree
distributions (Figure 3.6), we noticed that the largest degree in these generators is often
significantly less than that in other variants. Furthermore, these generators also have
fewer low-degree nodes. To test whether their connectivity methods are responsible for the
difference, we reconnected links in the B-A and Brite graphs using the PLRG connectivity
method. To do this, we created two new graphs by first assigning degrees to nodes in
each graph using the degree distributions of the B-A and respectively Brite graphs. Once
each node is assigned a degree, we connect them together using the PLRG connectivity
algorithm described in Section 3.3.2.^^
7
B-A — H -
M odified B-A —
Brite * '
Modified Brite - E } -
B-A — I-
M odified B-A —
Brite *•
M odified Brite • -
6 0 . 1 100000
5 0.01 10000
I 4 0.001 1000
Û
3 100 0.0001
B-A — I —
M odified B-A —-x
Brite * ....
Modified Brite — & —
2 le-05
1 le-06
100 1000 10000 100000 100 1000 10000 100000 10 35 1 10
(a) Expansion (b) Resilience
Figure 3.8: PLRG Variants
(c) Distortion
^^Self-loop and duplicate links are ignored. If the final graph is disconnected, the biggest component is
returned.
65
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In Figure 3.8, we show the result for B-A and Brite graphs reconnected using the PLRG
connectivity method (we call this the modified B-A and modified Brite graph). We find
that both networks resemble their original networks with respect to the distortion metric.
The same conclusion holds for a BT network reconnected using the PLRG connectivity
method.
H ow do different random connectivity m ethods com pare?
Given a set of nodes with a particular degree distribution, nodes can be connected in
different ways to satisfy the degree requirements. In addition to the random connectivity
approaches taken by these degree-based generators, there are other variants of these random
connectivity techniques for power-law degree distributions. Examples include: start with
the highest degree nodes and connect to other nodes either uniformly, or in proportion
to the “unsatisfied”—assigned degree minus the number of links already assigned to the
node—degree.
R1 — I —
R2 — —
R3 ^ X
R4 Q
R5
R6 - - o - -
R7 ....
I
0.1
ü _
- I
= J
0.01
u
I
C J
0.001
0.0001
10 100 1000 1
Degree
Figure 3.9: Degree Distributions of various random connectivity methhods using PLRG
degree distribution as the initial distribution
66
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
10000 7
R I — 1-
R2 --X-
R3 *
R4 - o
R5
R6 — e
R7 ♦
R I —
R2 - -
6
R4 - e-
R5 *
R 6 - - -o
R 7 •
1000
5
.1
i
4 100 0.01
o
3
R2 — *
R3 * 0.001
2
R5 *
K 6 - o
R7 ....
l 0.0001
1000 10000 1 1000 10000 I 10 100 0 5 10 15 20 25 10 100
(a) Expansion (b) Resilience (c) Distortion
Figure 3.10: Expansion, Resilience and Distortion of various random connectivity networks
using PLRG degree distribution as the initial distribution
We have conducted an extensive experiment to study the effect of various connectiv
ity methods on degree-based generators. The detail of the experimentation is included in
Appendix C. In summary, we have computed our three metrics for many random connec
tivity variants, including the methods (denoted R1 and R5 respectively) described above.
Figure 3.9 and 3.10 show the degree distributions and our three metrics of various random
connectivity networks using PLRG degree distribution as the initial distribution. We find
that they all have qualitatively similar behavior with respect to our metrics.
In addition to these random connectivity variants, there exist deterministic connectivity
variants. One such variant is as follows. Start with the highest degree node, add one link
each from this node to each lower degree node in decreasing degree order (skipping nodes
whose degree has already been satisfied), then repeat for the next highest degree node whose
degree has not been satisfied. In our connectivity study (Appendix C), we have computed
our three basic metrics for these variants of power-law degree-distribution graphs. Not
67
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
surprisingly, deterministic connectivity results in graphs that are quite different from the
PLRG (and thus different from the AS and RL graphs).
3 .6 .2 A n sw ers
Prom these experiments, we conclude that what seems to determine the qualitative behavior
of these degree-based generators is the degree distribution, not the connectivity method. In
particular, slight variations in degree distribution (such as having too few low degree nodes,
or not having high enough large degree nodes) result in significant metric differences. In
constrast, we found that the metric properties are essentially the same for all of the random
connectivity methods we explored. Even for the uniformly random connectivity method,
where nodes are not necessarily connected in proportion to their degrees, the large-scale
metrics are qualitatively similar to the PLRG.
3 .6 .3 C o n clu sio n
In summary, degree-based generators seem qualitatively similar (in the sense of Section 3.5)
to the RL and AS topologies regardless of connectivity method, so long as that method
incorporates some notion of random connectivity and the generated graph’s degree distri
bution is qualitatively similar to that of the measured graphs.
3.7 Discussion
We began this chapter by questioning the widely accepted belief that degree-based genera
tors, by the very fact that they match the degree distribution of the Internet, are superior to
structural generators. We claimed, as a m atter of faith not fact, that it is more important
68
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
that topology generators capture the large-scale structure of the Internet than to repro
duce the purely local properties such as the degree distribution. We further argued that,
despite the widespread acceptance of degree-based generators, it was still an open question
as to which family of generators—structural or degree-based—would better capture these
large-scale properties. The goal of this chapter was to answer this question.
The work presented here is only a first step in that direction. The data on which we
based our analysis—the measured network graphs—have several methodological drawbacks.
They are incomplete, in that some nodes and links are missing. Moreover, the graphs only
show connectivity, and do not reflect the link speeds nor policy routing (although we have
attem pted to approximate policy routing).
Our topology metrics also present problems. The selection of metrics is inherently
arbitrary, and our choices may not reflect the most relevant aspects of networks. However,
the results from our chosen set of three metrics appears to be consistent with those from
the larger set of metrics we studied. The analysis of all of these metrics is qualitative, and
therefore somewhat subjective. Subsequent work from other researchers will be needed to
ensure that our own private biases did not distort the results.
W ith these caveats duly noted, our results suggest that degree-based generators capture
the large-scale structure of the measured networks surprisingly well, at least according to
our metrics, and are significantly better than structural generators. These results, however,
should not be interpreted as obviating the structural generators. The focus in this chapter
has been on which family of generators best model the large-scale structue of the Internet,
which has restricted our attention to rather large graphs (the smallest generated graph had
1000 nodes). Choosing a small (less than, say, 100 node) topology on which to run network
69
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
simulations is an entirely separate question. As noted in [80], a power-law distribution is
almost meaningless if the number of nodes is small. W ith only a few nodes, it is unlikely
that the degree distribution will be able to create the implicit hierarchy necessary for
modeling networks. It may well be that the current structural generators, or ones yet to
be devised, are better choices for small-scale simulation studies.
70
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 4
Other Networks
Do our findings in Chapter 3 apply to other “ real” network structures? In this chapter,
we apply our topology metrics to several other network structures, such as the web and
the set of airline routes. We find that several of these other networks do indeed resemble
the Internet, at least in a very rough way, and are well-modeled by the same degree-based
generators. Thus, when searching for explanations for why the Internet has its current
structure, and why the degree-based generators are such good models of the Internet, we
would be well-advised to look for explanations that have fairly general applicability and
are not restricted to the particular details of the Internet.
4.1 Introduction
Some of the degree-based generators are quite general in nature; once their degree dis
tribution has been set, the nodes are connected essentially at random.^ Such generators
do not embody any Internet-specific heuristics, and thus might be reasonable models of a
larger class of networks. To investigate this issue, we ask the following question; do our
^ Some of the degree-based generators do not precompute a degree distribution but instead generate one
through a growth process {e.g., [43]) and others do not connect the nodes randomly {e.g., [35]).
71
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
conclusions in Chapter 3 {e.g., degree-based generators generate networks that resemble
the large scale properties of the Internet) apply to other “real” networks?
To answer this question, we obtained instances of the following networks:
• The power grid network of the Western United States and Canada. Nodes on this
network are generating stations and distribution stations, and links represent trans
mission lines. (4941 nodes, 2.67 average degree)
• The airline network in which nodes are airports, and a link between two nodes
represents the existence of a direct flight between them. (3965 nodes, 11.70 average
degree)
• The actors graph representing collaborations between actors in movies. (225226
nodes, 73.71 average degree)
• A six-hour segment of a call-graph from the AT&T telephone network. Nodes are
telephone numbers and links represent a connection made between the two numbers
during the six-hour period. (47118 nodes, 2.11 average degree)
• The Usenet news server network. Nodes are news servers and links represent server
feeds. This network was inferred from server-level path information in Usenet message
headers. (46065 nodes, 3.11 average degree)
• The network of connections between gnutella users. (4736 nodes, 5.50 average degree)
72
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• The network in which nodes are Web servers, and a link between two nodes denotes
the existence of a page in one that references a page in the other.^ We call this the
Webserver network. (118588 nodes, 4.32 average degree)
• Finally, the Web-document network, in which nodes are documents, and two nodes
are linked if one contains a hyperlink to the other. (710526 nodes, 4.71 average
degree)
4.2 Results
1
0.1
1
>
0.01
1
0.001
o
0.0001
1 le-05
o
o
le-06
Actor-graph — i
Web-documents —--x
Web-servers ^
Usenet a
PLRG -
1
i
u
0.1
0.01
0.001
0.0001
le-05
le-06
^ r
D
Gnutella
Airline
Call-graph
Power-grid
PLRG
— -X— -
- ■ * ....
- Q ....
- .
10 100
Degree
1000
(a) Degree Distribution (b) Degree Distribution
Figure 4.1: Degree Distributions of other real networks
^Although this graph is directed, by definition, we consider the undirected version in computing our
metrics.
73
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figures 4.1 (a) and (b) plot the degree distributions of these networks. All networks,
with the exception of the power-grid and possibly gnutella, have long-tailed degree distri
butions. The web-document and web-server graphs, as well as the call graph, appear to
exhibit power-law degree distributions. The rest have other forms. The PLRG is included
for calibration. Most of the graphs included in the study have, roughly speaking, a power-
law degree distribution.^ There are some exceptions: the power-grid which is definitely not
power-law, and the actors graph which has a power-law regime, but a sharper tail. The
actors graph is generally considered to be a truncated power-law [5]. Finally, the gnutella
network seems to have two power-law regimes.
The expansion plots (Figures 4.2 (a) and (b)) reveal that the power-grid clearly has low
(Section 3.4) expansion. Similarly, the resilience graphs (Figures 4.2 (c) and (d)) indicate
that the power-grid has low resilience. Of the graphs shown above to have long-tailed
degree distributions, the call-graph is the only one that is not qualitatively consistent with
the PLRG. Its distortion (Figures 4.2 (e) and (f)) is significantly lower than the others,
and its resilience is slightly different from that of the other networks.
4.3 Conclusion
From this analysis, we conclude that many other real-world networks may be well-modeled
by the PLRG generator. This is interesting, and suggests that, in looking for the reasons
why the Internet is well-modeled by degree-based generators, we should perhaps look for
explanations that are not necessarily specific to data communication networks.
®These graphs plot, on the y-axis, the fraction of nodes whose degree is > = x. A power-law degree
distribution corresponds to a straight line with a negative slope on such a plot.
74
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
0.01
o
I
0.001
0.0001 "
Actor-graph
Web-documents
Web-servers
Usenet
PLRG
le-05
le-06
10 15 20 25 30 35 0 5
Bail Radius
0.01
i
0.001
0.0001 |>
Gnutella
Airline
Call-graph
Power-grid
PLR.G
le-05
le-06
10 15 20 25 30 35 0 5
(a) Expansion
Bail Radius
(b) Expansion
Actor-graph
Web-documents
Web-servers
100000
Usenet &
PLRG - -- » /
10000
g
1000
s.
100
100 1000 10000 100000 10
Bail Size
Gnutella — i —
Airline — -x— -
Call-graph *
Power-grid B
PLRG
100000
10000
g
1000
100
100 1000 10000 100000 10
Bail Size
(c) Resilience (d) Resilience
I
(5
7
Actor-graph — i-
Web-documents — x
Web-servers x
Usenet o
PLRG - ■
6
5
4
3
2
100 1000 10000 100000 1 0
Bail Size
I
ia
7
Gnutella — i-
Airline -x -
Call-graph .... *
Power-grid Q
PLRG ---»
6
5
4
3
2
1
100 1000 10000 100000 1 0
Bail Size
(e) Distortion (f) Distortion
75
Figure 4.2: Our metrics for other real networks
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 5
Network Hierarchy
The Internet is believed to have a hierarchical structure. However, our work presented
in Chapter 3 has revealed that degree-based generators that do not attem pt to create
a hierarchical strncture best model the large scale properties of the Internet [67]. This
seeming paradox motivates us to investigate the network hierarchy. In this chapter, we
propose to use two metrics to measure the level of hierarchy in many networks including
two representations—the Autonomous System and router connectivity map—of the Inter
net. We then apply these two metrics on many networks and classified them into three
categories—strict, moderate and loose—based on their level of hierarchy. Surprisingly, we
found that there is a moderate level of hierarchy in degree-based networks, similar to the
level of hierarchy observed in the Internet. We have also further examined the hierarchical
characteristic of these networks. We found that the natnre of hierarchy in degree-based
networks is more similar to the Internet at the AS level than the router level. Moreover,
the hierarchy in the router graph is due to the deliberate placement of links in the Inter
net while in the AS graph the hierarchy is more related to the long-tailed nature of node
degrees.
76
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.1 Introduction
Modern network topology generators can be classified into two families—structural genera
tors that treat hierarchy as fundamental and degree-based generators that treat the degree
distribution as fundamental. Structural generators, such as Tiers [20] and Transit-stub[12],
create networks with a deliberately hierarchical structure. On the other hand, degree-based
generators, such as Brite [43] and Inet [35], focus solely on generating networks with power-
law degree distributions to match the degree distribution of the Internet [26]. Our work
in topology characterization (see Chapter 3 or [67]) has shown that degree-based genera
tors that make no attem pt to create a hierarchical structure best resemble the large scale
structure of the Internet. However, the Internet is believed to have a significant degree
of hierarchy; at the router level, network engineers routinely speak of backbones and at
the AS level ISPs are broken into different “tiers.” This seeming paradox motivates us to
examine the hierarchical structure of network topologies. Specifically, in this chapter, we
ask the following questions; How can we measure the hierarchy inside a network? Do the
degree-based generator produce networks with hierarchy and, if so, how?
Although there is a large literature on routing hierarchies, we are not aware of much
work that has attempted to measure (as opposed to create, or utilize) hierarchy in network
topologies. For example, in the area and landmark hierarchy [70], the hierarchy is created
on top of the underlying topology. A few nodes are recursively selected to represent a
group of nodes at various levels depending on the radius of the area. Two recent and
related examples [28, 64] utilize the hierarchical property of AS paths to infer business
relationships {e.g., provider-customer) in the AS topology. The latter work also classifies
77
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ASs into a five-level hierarchy based on their business relationships, not the topological
structure of the network.
The investigation of network hierarchy is made more difficult by the lack of a concrete
definition of hierarchy that we could apply, not to mention the metrics for measuring it.
Therefore, our first task is to define what a hierarchy is. A concept of hierarchy is often
associated with a tree-like property, i.e., there is a small set of nodes or links that are
more important than others. Similarly, our notion of network hierarchy revolves around
the intuition that there is a set of backbone links that carry the traffic from many more
source-destination pairs than other links; that is, the traffic is not evenly spread out among
the links but instead is funneled into more central backbones. Based on this intuition, we
propose two metrics—weighted traversal set and weighted vertex cover—as two measures
of hierarchy, and then use them to investigate the nature of hierarchy in the generated and
measured graphs.
We find that while the degree-based generators do not explicitly inject hierarchy into
the network, the power-law nature of the degree distribution results in a substantial level of
hierarchy—not as strict as the hierarchy present in structural generators, but significantly
more hierarchical than, say, random graphs. This relatively moderate form of hierarchy,
produced merely by the presence of the power-law degree distribution, more accurately
reflects the nature of hierarchy in the Internet than the strict hierarchy produced by the
structural generators. Moreover, based on the path characteristics (Section 5.5), we find
that the hierarchy in degree-based networks more resembles that of AS hierarchy tfian
the RL’s hierarchy. Similar to the Tree network, the hierarchy in AS, degree-based, and
78
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Transit-stub networks is strictly-structured while very little structure is found in Tiers and
RL map.
This chapter is organized as follows. Section 5.2 presents related work. Section 5.3
describes the two metrics for computing the link usage value, including the explanation of
their difference. Section 5.4 presents the results of applying the simple metric to various
networks. We discuss the results and point out issues that we encountered searching for
hierarchy metrics. Section 5.5 explores the characteristics of network hierarchy. Section 5.7
presents the hierarchy analysis on other degree-based generators. Finally, Section 5.8 con
cludes this chapter.
5.2 Related Work
As mentioned in passing, we are not aware of much work that has attem pted to measure
hierarchy in network topologies. Probably closer in spirit to hierarchy measurement is the
classification of nodes into different levels of hierarchy based on different criteria. Subra-
manian et al. [64] describe a technique for inferring business relationships {e.g. provider-
customer) in the AS topology and further classify ASs into a five-level hierarchy based on
the business relationships between them. Vukadinovic et al. [74] evaluate the Laplacian
eigenvalue spectrum of a variety of graphs and classify AS nodes into 5 different levels
based on the node’ s adjacency list. Finally, Govindan et al. [30] classify AS nodes into
different groups based on its outdegree.
Our work would not have been possible without developments in Internet router-level
topology discovery. Early work in this area used traceroutes from a small set of sources to
several thousand hosts to compute a router-level map [51]. Subsequent work improved the
79
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
coverage of the Internet address space by randomly selecting IP addresses [58], randomly
selecting addresses from route entries in BGP tables [11], using a precomputed set of Web
sites [18], or using heuristics to infer addressable parts of the IP space [32]. This last work
also documents several techniques for improving completeness of the inferred topologies.
5.3 Hierarchy Metrics
Our first task is to better understand what hierarchy is and how it might be measured.
Our notion of hierarchy revolves around the intuition that there is a set of important links
or backbone links that carry the traffic from many source-destination pairs. We implicitly
assume that the traffic is not evenly spread out among the links but instead is funneled
into more central backbones. We found the evidence of path concentration around large
ASs in Chapter 2.3.3. We therefore conjecture that a symptom of hierarchical structure
is that some links are used more often than others. Here we are not referring to the level
of traffic, which is a function of the sending patterns of individual hosts, but rather usage
as measured by the set of node pairs (source-destination pairs) whose traffic traverses the
link; we call this the link’s traversal set}
Implicitly, in computing the traversal set for a link I, we include all source-destination
pairs whose shortest paths include the link I. For the AS and RL graphs, we extended
this in a simple way to account for policy routing. To do so, we use the shortest valid AS
path routing model. At the AS level, this policy model computes the shortest AS path
between two nodes that does not violate provider-customer relationships (an example of a
' Recall that a “link” in a topology graph might represent various forms of shared media in the underlying
Internet.
80
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
path that would violate these relationships is one that traverses a provider, followed by a
customer and then back to another provider). We use the results in [28] to infer provider-
customer relationships. Chapter 2.4.2.1 describes this sophisticated policy routing in more
detail. To compute the policy path in the RL graph, we first compute the corresponding
AS level policy path, and then use shortest-paths within the sequence of ASs to determine
a router-level policy path.
In this section, we propose to use two metrics—weighted traversal set and weighted
vertex cover—to compute the usage value (or value) of each link in the topology. The
link’s value is used to indicate its level of importance comparing to other links in the
topology.
5.3.1 W eighted Traversal Set (W TSET)
The most natural metric for measuring the hierarchy is the number of node-pairs that use
a particular link to carry its traffic, i.e., the size of the traversal set of a particular link.
However, for any source-destination pair in the topology, there may be multiple equal-cost
paths connecting between them. Therefore, we appropriately assign the weight to each of
the node pair in the traversal set. For each node-pair (n, v) in the traversal set of link I, we
associate the weight w{u, v, I) which is the fraction of the total number of equal cost paths
between u and v that traverse link I. Thus, if there are multiple shortest paths between a
node pair, the contribution of the node pair is accordingly weighted. We then modify our
hierarchy m etric to m easu re th e lin k ’s valu e b a sed on th e w eigh t of each n o d e p air in th e
traversal set instead of its size. Specifically, for any link I, we define Z ’s value to be the
81
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Weighted traversal set
Link value of /
= w(u,v,l) = 12
(u,v)i H,
Mill weighted vertex cover
Vertex cover: C = {a,b,c}
Link value of I
= Z W(u,l) = 3
M € C
Traversal set for link /
{(a,d),(a,e),(a,f),(a,g),
(b,d),(b,e),(b,f),(b,g),
(c,d),(c,e),(c,f),(c,g)}
w(u,v,l) = # paths u*-* V through I
# paths ^ V
Z w(u,v,l)
W(u,l) = V . r » ,V > 6 H;
degreefw^ in H,
Figure 5.1: An illustration of link value computation
sum of w{u,v,l) where (u,v) are included in link Vs traversal set. See Figure 5.1 for an
example.
5 .3 .2 W eig h ted V e r te x C over (W V C )
A slightly more complicated way of computing a link’s value is based on the vertex cover^
of the link’s traversal set. The vertex cover of a traversal set is the minimum number
of nodes that need to be removed to eliminate at least one node from each pair in the
traversal set. For instance, access links have a vertex cover of 1, since eliminating the
singleton node eliminates all pairs from the set. Intuitively, the vertex cover counts the
smallest set of nodes affected by removal of the link. A link for which this number is high
^The formal defintion of a vertex cover problem is that for For a given graph G{V, E), we want to find
C Ç.V such that for all e = {u, w} € e P I C A 0 and C is the smallest set {i.e., \C\ is minimum).
82
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
is more important {i.e., more nodes depend on this link) than links for which the number
is low.
To use this metric in the presence of multiple shortest paths, we had to use a weighted
vertex cover.^ We generalize the definition of the traversal set of link I to include weights
w{u,v,l) associated with node pairs {u,v), as described in Section 5.3.1. Consider now
the bipartite graph {Hi) formed by the traversal set (see Figure 5.1). To each vertex u
in this graph, we assign a vertex weight W{u,l) which is simply the average w{u,v,l)
such that {u,v) belongs to the traversal set, i.e. W{u,l) is the average weight of the
links that are incident on node u in Hi. We define a link’s value to be the weight of the
minimum weighted vertex cover in the bipartite graph. We use well-known approximation
algorithms [46] for computing weighted vertex covers. We tested this hierarchy metric on
several small example networks, and it produced results which coincided with our intuitive
notion of the hierarchy in those graphs.
E xam p le Figure 5.1 illustrates the link value computation on a small graph. The topology
of interest is depicted by the graph G. Suppose we want to compute the usage value of
link I connecting between node c and d. The first step is to compute the traversal set of
link I. We then generate a bipartite graph Hi according to the traversal set. Each link
connecting between node pair {u, v) in the bipartite graph Hi is then associated with the
weight w{u, V, I) which is the ratio of the number of paths between u and v that traversed
link I to the total number of paths between u and v. Since there is only a single path
connecting between any two nodes, all the links weigh 1. After we are done assigning a
®For a graph G(V, E) and a positive weight function w : V — > on the vertices, we want to find
C Ç V such that for all e = {u, n} € 1?, e fl C ^ 0 and w {C y is minimum.
83
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
weight to each link in Hi, we then compute the weight W{u,l) of each node u which is
the average link weight of u. In this case, the average link weight of all the nodes in Hi
is 1. Now we are ready to compute the usage value of link I. According to the weighted
traversal set, the link value is the total weight of all the links in Hi which is 12. According
to the weighted vertex cover metric, the link value is minimum weight of the vertex cover
set which is 3 in this example.
5 .3 .3 W T S E T v s. W V C
9 nodes
/ bl b2 b8 '
' Q
83 nodes
Link wtset wvt
aO-bO 747 9
aO-cO 664 8
bO-cO 72 8
a*~bj, i= l,...,8 3 ;j= l,...,9 99 1
hi-cj, i= l,...,9 ; j = l,...,8 99 1
d jZ CJ, z—1,...,835j—1,...,8 99 1
Figure 5.2: Example topology
We now explain the subtle difference between the two metrics by using an example.
Consider the example topology shown in Figure 5.2. The dummy topology consists of three
domains with 8, 9 and 83 nodes. The table beneath the topology shows the link values
according to the two metrics.
Based on the weighted traversal set metric, the access links (such as cO-cl) are more
important than the inter-domain link bO-cO. This is because the link bO-cO is only used
84
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
between the two small domains and therefore does not have much traffic going through
it. However, the weighted vertex cover metric gives more value to the inter-domain link
bO-cO than the access links because it is used to serve a larger variety of nodes than the
access links. For example, if we remove one of the access links, say cO-cl, cl is the only
one that loses connectivity to the rest of the world. However, the disappearance of the
link bO-cO affects all the nodes in the two domains; they are forced to use longer paths to
communicate between them.
In summary, the weighted traversal set metric emphasizes the links with a large volume
of traffic, i.e., the weight of the source-destination pairs that traverse the link, independent
of who the senders or receivers are. On the other hand, the weighted vertex cover metric
examines the list of node pairs associated with the link and emphasizes links that are used
by a variety of nodes. Another question, not of primary concern in this paper, is which
set of these links are considered more realistic backbones in the Internet? To answer this
question, we would need to have network properties such as link bandwidth information
or node geographic location, none of which we currently have. Therefore, we leave this
question for future work.
5.4 Results & Discussion
We first describe the networks that we include in our analysis in Section 5.4.1. We then show
the results of applying the two metrics to specific instances of networks in Section 5.4.2.
The validation of the two metrics and discussion are provided in Section 5.4.3 and 5.4.4.
85
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5 .4 .1 N etw o rk s
We include three categories of network graphs—measured networks, generated networks,
or canonical networks—in our study. Chapter 3.3 describes these networks in more detail.
5 .4 .2 R e su lts
We now describe the results of applying our two hierarchy metrics to compute link values
of specific instances of measured, generated and canonical networks (Figure 5.3). Some of
the network generators allow a variety of input parameters. For these, we use particular
instsances of generated networks, whose parameters are described in Figure 5.3. Note that
these instances of networks are the same set of networks that we include in our analysis in
Chapter 3.
Type Topology Number of Nodes Avg. Degree Comment
Measured RL 170589 2.53 May 2001
AS 10941 4.13 May 2001
Generated Transit-Stub (TS) 1008 2.78 3 0 0 6 0.55 6 0.32 9 0.248
Tiers 5000 2.83 1 50 10 500 40 5 20 20 1 20 1
W axman 5000 7.22 5000 0.005 0.30
PLRG 9230 4.46 2.246
Canonical Tree 1093 2.00 k = 3 ,D = 6
Mesh 900 3.87 30x30 grid
Random 5018 4.18 Link prob = 0.0008
Figure 5.3: Table of network topologies used. See Appendix D for a description of param
eters for the generated networks.
If a network is hierarchical, we would expect different sets of links to have different
link values, e.g., the backbone links will have higher values than peripheral links. The
distribution of these link values is our measure of hierarchy; if all links have similar values
then there is no hierarchy because usage is spread out evenly, and if only a few links
have high link values then there is a small and well-defined backbone on which usage is
86
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
0.2 0.2 I : A S(Policy)
2 : AS
3 : R U P olicy)
0.15
2
0.05 0.05
J ^ n d o m
le-05 0.0001 0.001 0.01 1 le-05 0.0001 0.001 0.01 0.1
Normalized Link Rank
(a) Canonical
Normalized Link Rank
(b) Measured
0.2
0.15
0.05
le-05 0.0001 0.001 0.01 0.1
N ormalized Link Rank
(c) Generated
0.1
0.01 0.01 0.01
0.001 0.001 0.001
AS(Policy)
0.0001 0.0001 0.0001
z RL(Policyl
le-05 le-05 le-05
le-06 le-06 le-06
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 I 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 I
Normalized Link Rank
(d) Canonical
N ormalized Link Rank Normalized Link Rank
(e) Measured (f) Generated
Figure 5.4; The normalized weighted traversal set distribution
concentrated (where, again, usage is not measured by the level of traffic but by the nature
of the traversal set).
5.4.2.1 W TSET
Figures 5.4 (a)-(c) show the link value distributions for the canonical, generated, and
measured networks. In these plots, the x-axis plots the rank of a link according to its
value (a higher rank indicating a higher value), normalized by the number of links in the
topology. The y-axis depicts the link value normalized by the v? where n is the number of
87
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
nodes in the network; the largest traversal set size is in the order of O(n^). Figures 5.4 (d)-
(e) plot the same data but on different scale (x-axis is on normal scale and y-axis is on the
log-scale).
By examining Figure 5.4, we conclude that there exist three classes of hierarchy in
our graphs: strict, moderate, and loose. Figure 5.4 (a)-(c) emphasize the distribution of
the high valued links in each network. In terms of the magnitude of link values, the data
reveals that the highest link values in Tree, TS, and Tiers are significantly higher than
all the other topologies; they are at least five times higher. In addition, their link value
distributions fall off rapidly suggesting that within the same topology, there is a small set
of backbone links that have much higher values than the rest of the links. We say, by this
measure, that these topologies have a strict hierarchy.
By examining the range of the distribution in Figure 5.4 (d)-(e), we can further classify
the remaining networks into two other groupings. Even though the magnitude of the
highest value links in these networks are comparable, it is very obvious that the link value
distribution of Mesh, Random and Waxman graphs are very flat, e.g., their ranges of link
values only span one or two orders of magnitude. This indicates that most of the links
within the same network have similar values. Therefore, we classify them into a group
of networks that have loose or no hierarchy. This is consistent with generally accepted
wisdom about the lack of hierarchy in the mesh and the random graph.
Measured networks and PLRG, on the other hand, have a wider range of values than
that of Mesh, Random and Waxman; they span 4 to 5 orders of magnitude. Moreover,
similar to strict hierarchical networks, we also observed rapid fall-offs in these networks.
For example, the first 10% of the links in the AS graph falls off from 0.01 to 0.0001.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Therefore, we describe AS, RL^ and PLRG as having a moderate hierarchy. In fact, the
other degree-based generators that we evaluate in section 5.7 also fall into this category.
S.4.2.2 W V C
Figures 5.5 (a)-(c) show the link value distributions for the canonical, generated, and
measured networks. In these plots, the x-axis plots the rank of a link according to its
value normalized by the number of links in the topology. The y-axis depicts the link value
normalized by the number of nodes in the network. Figures 5.5 (d)-(e) plot the same data
but on different scale. By examining these figures, similar to the results based on weighted
traversal set, we conclude that there exist three classes of hierarchy in our graphs: strict,
moderate, and loose.
Consider Figure 5.5 (a)-(c) first. Again, these plots emphasize the distribution of the
highest valued links in the network. In terms of the magnitude of link values, the data
reveals that the highest fink values in Tree, TS, and Tiers are significantly higher than all
the other topologies, and their link value distributions fall off rapidly. For the Tree and
TS some links have link values above 0.3 but only about 10% have link values above 0.005.
The distribution in Tiers falls off equally sharply, even though the highest link value is only
0.25. We say, by this measure, that these topologies have a strict hierarchy.
'‘Computing the weight associated with each link in the bipartite graph of the full RL graph is compu
tationally expensive. Therefore, we compute the link values of the core topology (generated by recursively
removing degree 1 nodes) instead. In previous work, we have found that link values (computed in a slightly
different way) computed on the core map correlate well with link values obtained from a full map.
89
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
0.5
0.45
0.4
0.35 Tree
0.05
le-05 0.0001 0.001 0.01 0.1
0.5
0.45
0.4
0.35
O J
0.25
0.2
0.15 \R L (P o Iicy )
0.05
AS
1 le-05 0.0001 0.001 0.01 0.1
0.5
0.45
TS
0.4
0.35
0.3
0.25
0.2
0.15
PLRG
0.05
0.1 le-05 0.0001 0.001 0.01
Normalized Link Rank Normalized Link Rank Normalized Link Rank
(a) Canonical (b) Measured (c) Generated
Random
0.01
Tree
0.001
Z
0.0001
le-05
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.01
RL(PoHcy)
0.001
AS
0.0001
AS{Policy)
le-05
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 .0 !
TS
0.001
0.0001
PLRG
le-05
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Link Rank Normalized Link Rank Normalized Link Rank
(d) Canonical (e) Measured (f) Generated
Figure 5.5: The link value rank distribution
90
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
By examining Figure 5.5 (d)-(f), our two other groupings become evident. From these
figure, we see that RL,^ AS, and PLRG can be well described as having a moderate hier
archy. In fact, the other degree-based generators that we evaluated in Section 5.4 also fall
into this category (see Section 5.7). These graphs have the property that, like the strict
hierarchy graphs, the distribution of link values falls off quickly (less than 10% of the nodes
have link values greater than 0.005) but the highest value links are significantly lower than
those in the strict hierarchy graphs.
In contrast, the mesh, random graph and Waxman have a significantly more well spread
link value distribution. Even though the highest link values are comparable to that of
graphs in the previous category, almost 70% of the links in these graphs have link values
about 0.05 and the distribution is very fiat. We say that graphs in this category have a
loose hierarchy (at best). Again, this is consistent with generally accepted wisdom about
the lack of significant hierarchy in the mesh and the random graph.
5.4.2.3 Sum m ary
In summary, the hierarchy analysis based on the two metrics agree with each other. Based
on both metrics, we can classify networks into 3 different groups; strict, moderate and
loose. The table below depicts the qualitative groupings.
^Similar to WTSET, computing the WVC link values for the full RL graph is computationally expensive.
Therefore, we compute the link values of the core topology (generated by recursively removing degree 1
nod es) instead.
91
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Topology Strict Moderate Loose
Mesh X
Random X
Tree X
AS, RL, PLRG X
Tiers X
TS X
Waxman X
From these groupings we make three important observations.
• Accounting for policy routing in computing the link values does not qualitatively alter
our groupings. As expected, with policy routing since paths are more concentrated,
the highest link values are larger than with the shortest path routing, both for AS
and RL.
• The structural generators construct a much stricter form of hierarchy than is present
in the measured graphs. This suggests a possible explanation for why they do not
qualitatively match the measured networks by our topology metrics (Section 3.5).
• PLRG qualitatively models the hierarchy present in AS and RL graphs, even with
policy routing accounted for. This resolves our paradox to some extent. Although
not explicitly hierarchically constructed, PLRG does capture the moderate hierarchy
in our measured networks. A question remains: what aspect of PLRG graphs is
responsible for this hierarchy? We address this in Section 5.6.
5 .4 ,3 V a lid a tio n
The backbone links are expected to have higher values than peripheral links. We have
verified, for several of our topologies, that this expectation holds for both metrics. In the
tree topology, the highest valued links are located near the root of the tree. As we traverse
92
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
from the root to any leaf, the links’ values get smaller. In TS, the highest valued links are
located in the transit cloud. In Tiers they are in the WAN. In the AS graph, high value
links are those that connect well-known national backbones together. Figure 5.6 show the
top ten highest value links in the AS topology. Finally, in the RL graph they occur in, or
between, these backbone ASs, as shown in Figure 5.4.3.
Alternet UUNET
Alternet SprintLink
CW Alternet
Alternet AT&T
Rostelecom CW
Nacamar Alternet
Globix Alternet
Alternet UUNET
Abovenet Alternet
Teleglobe Alternet
Alternet UUNET
SprintLink Alternet
CW Alternet
Alternet AT&T
SprintLink CW
CW Rostelecom
CW AT&T
Ebone Alternet
SprintLink Teleglobe
Alternet Teleglobe
(a) Shortest Path Routing (b) Policy Routing
Figure 5.6: Top 10 highest links on the AS map based on W TSET
Alternet Alternet
CW AT&T
Alternet TELSTRA-AS
Alternet Alternet
SprintLink SprintLink
Qwest KDDI
SprintLink SprintLink
CW CW
Qwest Teleglobe
CW CW
Alternet Alternet
Alternet Alternet
Alternet Alternet
Alternet Alternet
Alternet Alternet
SprintLink SprintLink
Alternet Alternet
Alternet Alternet
SprintLink SprintLink
SprintLink SprintLink
(a) Shortest Path Routing (b) Policy Routing
Figure 5.7: Top 10 highest links on the RL map based on WTSET
We have done the same validation on the weighted vertex cover metric and found similar
results. The top ten highest value links of the AS (Figure 5.8) and RL (Figure 5.9) graphs
according to the weighted vertex cover are shown in Figure 5.8 and 5.9. This provides a
sanity check on our approach to measuring hierarchy.
93
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CW SprintLink
Teleglobe AT&T
Alternet SprintLink
GTE AT&T
Level 3 UUNET
Teleglobe Qwest
Alternet UUNET
CW AT&T
Abovenet Qwest
CW Alternet
CW SprintLink
Teleglobe AT&T
Alternet SprintLink
GTE AT&T
Level 3 UUNET
Teleglobe Qwest
Alternet UUNET
CW AT&T
Abovenet Qwest
CW Alternet
(a) Shortest Path Routing (b) Policy Routing
Figure 5.8; Top 10 highest links on the AS map based on WVC.
Alternet Alternet
SprintLink SprintLink
Alternet TELSTRA
SprintLink SprintLink
Alternet Alternet
Qwest Teleglobe
SprintLink Teleglobe
Alternet TELSTRA
CW CW
CW CW
Alternet Alternet
Alternet Alternet
Alternet Alternet
SprintLink SprintLink
Alternet Alternet
SprintLink SprintLink
SprintLink SprintLink
SprintLink SprintLink
CW CW
SprintLink SprintLink
(a) Shortest Path Routing
Figure 5.9: Top 10 highest links
(b) Policy Routing
on the RL map based on WVC.
94
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.4.4 D iscussion
This section discusses the results of the two metrics and the necessity of appropriately
accounting for multiple paths in the hierarchy metrics.
5.4.4.1 W eighted Traversal Set versus W eighted V ertex Cover
Despite the difference between the two metrics, we found that both of them satisfy our
validation tests. Moreover, the qualitative network classification according to both metrics
are also the same. This motivates us to look at the correlation between the two metrics.
1
0.9
0.8
0.7
0 0.6
1 0 .3
0.3
0.2
0.1
0
i
s ;S
ê
I /]
<
< §
Figure 5.10: Correlation between the two metrics
Figure 5.10 shows the correlation between the two hierarchy metrics. We found rela
tively high correlation between the two metrics on networks in the strict and loose group;
e.g., links that have high/low value based on the W TSET metric tend to have high/low
value based on the WVC metric. Networks in the moderate hierarchical group, especially
95
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
PLRG, seem to have relatively lower correlation. For these networks, the two metrics prob
ably select different sets of links as backbone links.® However, since the measure of network
hierarchy is based on the macroscopic property such as the distribution of the link values,
the difference between the two metrics does not affect our qualitative classification.
5.4.4.2 U nw eighted versus W eighted M etrics
Properly accounting for multiple paths between any source-destination pair turns out to
be essential in our study. W ithout appropriate weight, these two simple measures are
misleading. We explain the flaws of the unweighted version of the two metrics below.
0.5
0.45
0.4
0.35
0.3
H 0.25
I 0.15
0.05
le-05 0.0001 0.001 0.01 O . l
0.4
0.35
0.3
0.25
0.2
0.15
A S(Policy)
0.05
le-05 0.0001 0.001 0.01 0.1
0.5
0.45
0.35
0.3
0.25
0.2
0.15
0.05
le-05 0.0001 0.001 0.01 0.1
N ormalized Link Rank Normalized Link Rank N ormalized Link Rank
(a) Canonical (b) Measured (c) Generated
Figure 5.11: The normalized traversal set distribution (x-axis on log scale)
®We verify this by examining the cardinality of the intersection set of links that are in the top 1%,
5% and 10% according to both metrics. The cardinality of these intersection sets are about 50-60% for
measured networks and 20% for PLRG. The cardinality of other networks are relatively higher; they are
in the reinge of 70% to 100%. This validates that the backbone links according to the two metrics are
relatively different.
96
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0 . 1
0.05
0 ■ » •
le-06 le-05 0.0001 0.001 0.01 0.1 1
0.5
0.45
0.4
0.35
0 3
0.25
0 .2
0.15
0.05
le-06 le-05 0.0001 0.001 0.01 0.1
0.5
0.45
0.4
0.35
0.3
0.25
0 .2
0.15
0 . 1
0.05
le-06 le-05 0.0001 0.00! 0.01 0.1
Normalized Link Rank Normalized Link Rank Normalized Link Rank
(a) Canonical (b) Measured (c) Generated
Figure 5.12: The link value rank distribution (x-axis on log scale)
Traversal Set versus W eighted Traveral Set Figure 5.11 shows the distribution of
traversal set sizes (with no accounting for multiple paths). This metric does not discrim
inate the Mesh from other strict hierarchical networks; in fact the highest value of the
Mesh is higher than other strict hierarchical networks. The number of paths between any
two nodes in the Mesh is quite large; the further away the end-nodes are, the larger the
number of paths between them. W ithout appropriately weighting the contribution of each
node-pair on the link, we observe large number of pairs, e.g., large traversal set size, es
pecially at the center of the Mesh. Moreover, according to this metric, the two measured
networks with shortest path routing shows higher level of hierarchcy comparing to those
with policy routing; this result is counter-intuitive. Policy routing funnels many paths of
some source-destination pairs through particular parts of the networks, resulting in larger
traversal set sizes for backbone links. However, policy routing also restricts possible paths
between any two nodes, resulting in the reduction of the traversal set size of some links.
Figure 5.11 shows that, without appropriately weighting the contribution of each node pair,
97
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the traffic reduction effect is more domineint than the traffic increase due to policy routing.
This result indicates that policy routing reduces the level of hierarchy. This is inaccurate,
therefore we disregard this metric in the study.
V ertex Cover versus W eighted V ertex Cover Figure 5.12 shows the distribution of
the vertex cover size. According to this metric, each source-destination pair in the traversal
set is assigned by equal weight. Therefore, for each link, instead of computing the weight of
the vertex cover set that yields minimum weight, we compute the size of the vertex cover.
The flaw of this metric is similar to that of the (unweighted) traversal set. The metric does
not discriminate the Mesh from other strict hierarchical networks, and it indicates that the
policy routing reduces the level of hierarchy in the AS network. Thus, we also disregard
this metric in our study.
5.5 Hierarchy Characteristic
W hat is the hierarchical characteristic of these networks? To answer this question, we
investigate the path charactersistic between any node-pairs in the network. For any source-
destination pair, we examine the sequence of link values along the path. Intuitively, if a
network has a strictly-structured hierarchy, such as in a tree topology, we would expect
these sequences to be rising as we traverse from a leaf to the root of the tree, falling as
we traverse from the root of the tree to any leaf, or rising and then falling as we traverse
between any leaf or internal nodes. Based on this intuition, we classify a path into one of
the two categories.
98
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• valid paths: A path is valid if the sequence of link values are flat, rising, falling, or
rising and then falling.
• invalid paths: Any path that is not valid is classified as an invalid path. An example
of an invalid path is the path that has values falling and then rising.
3 0.6
1
c 0.4
I
£ 0.2
CO
<
i
WTSET
WVC
J
£
Figure 5.13: Fraction of valid paths
Figure 5.13 shows the fraction of paths that are valid based on the two metrics. As
expected, all paths on the Tree are valid. TS, AS and PLRG have relatively high fraction
of paths that are valid. This result indicates that the hierarchy in these networks are quite
strictly-structured and the arrangement of high value links is approximately tree-like. To
the contrary, Tiers and RL have very low fractions of valid paths. This indicates that the
hierarchy in these networks are not strictly-structured. High value links in these networks
are distributed across the network.
In summary, we found that:
99
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• the hierarchical structure of the AS and RL map are quite different even though both
of them are classified as having moderate level of hierarchy, and
• PLRG (along with other degree-based networks; Section 5.7) resembles the hierar
chical nature of the AS beter than the RL map.
Note that the fraction of valid paths in the RL map with policy routing is less than that
with the shortest path routing. This is because there might not exist any path between
some source-destination pairs in the network. This is due to the incompleteness of the
graph and the error of the heurististic in identifying the peering relationship on the AS
map [32, 69].
5.6 Correlation between link usage and degree
0 .8
0.7
0 .6
8
S & 4
03
0 .2
0 .1
0
1
Figure 5.14: Correlation between minimum degree and link value based on W TSET
To better understand the hierarchical structure of these graphs, we compute the cor
relation between a link’s value and the lower degree of the nodes at the end of the link.
100
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1
0.9
0.8
0.7
g 0.6
* X 3
3 oj
5 04
0.3
0 .2
0 .1
0
I
O <
Figure 5.15: Corrélation between minimum degree and link value based on WVC
A high correlation between these two indicates that high-value links connect high degree
nodes. Figure 5.14 and 5.15 show the correlations for the nine networks under consideration
according to WTSET and WVC metrics respectively.
The PLRG has relatively high correlation comparing to other networks. There is ab
solutely no explicit structure built into this graph. The only links that have (relatively)
high values are the ones that connect two nodes with (relatively) high degrees. In the
PLRG graph the long-tailed nature of the power-law degree distribution means that there
are numerous nodes with very high degrees. One can think of these high-degree nodes as
“hubs” and the high value links—the backbone links—are those that connect two hubs. In
this sense, the hierarchy in a PLRG arises entirely from the long-tailed nature of its degree
distribution.
The Random graph also has a relatively high correlation. In this graph as well, there is
absolutely no explicit structure built in. The only links that have (relatively) high values
are the ones that connect two nodes with (relatively) high degrees. However, the Random
101
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
graph has a very limited distribution of degrees, and so the spread of link values is similarly
limited, resulting in very limited hierarchy.
In contrast, the Tree has relatively low correlation. Unlike the PLRG, the Tree’s hierar
chy comes from the structure—from the deliberate way in which the nodes are connected—
and not from the degree distribution. The correlation that is present is because the leaves
have a lower degree than the other nodes, and the associated links have the lowest link
values in the tree.
The AS and Waxman graphs have relatively high correlation, while the Mesh, TS, Tiers,
and RL have relatively low levels of correlation. This is consistent with our reasoning above,
that the hierarchy in the structural generators (Tiers and TS) arises, like the Tree, from
the deliberate placement of links. The fact that the AS graph has higher correlation than
the RL graph, even though they have very similar levels of hierarchy, may indicate that
the hierarchy in the RL graph is due to the deliberate placement of links while in the
AS graph the hierarchy is more related to the degrees of the nodes (that is, to the peering
relationships between the highly connected ASs that form the “backbone” of the AS graph).
In summary, given the high correlation between link value and degree of the attached
nodes, we surmise that the hierarchy in degree-based generators arises from their long-tailed
degree distribution. Structural generators show no such correlation, and the hierarchy
arises from explicit construction. The RL graph shows less correlation, suggesting that its
hierarchy is deliberately constructed, even though its link value characteristics are quite
similar to the PLRG.
102
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.7 Other power-law variant generators?
In the chapter, so far we have used a single degree-based generator, the PLRG. The PLRG
generator uses a particularly simple technique for connecting nodes (Section 5.4.1). It
clones each node as many times as the degree assigned to it, then uniformly randomly
connects the clones. However, as mentioned earlier in Section 3.6, there are many more
degree-based generators. In this section, we apply our two hierarchy metrics to other
degree-based generators; specifically, B-A, Brite, BT and Inet (the same instances that we
use in Section 3.6).
>
■ a
0.014
: A S(Pollcy)
0.012
0.01
0.008
0.006
R B
0.004
0.002
0
>
0.014 BA
0.012
0.01
0.008
PLRG \
0.006
0.004
0.002
BT
0
le-05 0.0001 0.001 0.01 O.l
Normalized Link Rank
(a) Measured Networks
le-05 0.0001 0.001 0.01 0.1
Normalized Link Rank
(b) PLRG Variants
Figure 5.16: The link value (WTSET) rank distribution
5.7.1 R e su lts
Figure 5.16 and Figure 5.17 show the link value distributions for the PLRG-variant networks
and measured networks based on the weighted traversal set and weighted vertex cover
respectively. Similar to the measured networks, the distributions of the PLRG-variants
103
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
0.18
0.16
RL(Policy)
0.14
g 0.12
I 0 .1
I 0.08
RL
>(Poliby)
0.06
z
0.04 AS
0.02
le-05 0.0001 0.001 0.01 0.1 1
Normalized Link Rank
0.18
0.16
0.14
0.12
Brite
BA
0.06
Z
0.04
Inet
0.02
BT
le-05 0.0001 0.001 0.01 O.l
(a) Measured Networks
Normalized Link Rank
(b) PLRG Variants
Figure 5.17: The link value rank (WVC) distribution
networks falls off quickly. The highest value links are approximately in the same range as
those of measured networks with BT as an exception based on the weighted traversal set.
The highest value of link of the BT network is lower than other degree-based networks.
This is because the average degree of BT is twice as large as those of other degree based;
the average degree of the BT is about 9 while other networks have the average degree of
4. Therefore, there will be many more paths connecting between any two nodes and hence
reducing the weight of each source-destination pair in the traversal sets, which as a result
lower the value of the links in the network. However, similar to some other degree-based
networks, the range of link values in the BT networks is wider comparing to that of the
Random graph or Mesh. Therefore, we conclude that, as the AS and RL networks, the
PLRG-variant networks can be described as having a moderate hierarchy. Furthermore,
the fraction of valid paths of these networks are also in the same range as that of AS, which
104
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
is much more than that observed in the RL map. Therefore, we conclude that the nature
of these networks better resembles that of the AS, than the RL map.
5.8 Conclusion
We began this chapter with the goal to undertand the seeming paradox that while the
Internet is believed to have a hierarchical structure, it’s large scale properties are well
modeled by degree-based generators that completely ignore hierarchy. This paradox leads
us to investigate the network hierarchy. Although there is a large literature on routing
hierarchies, we are not aware of many efforts that attem pt to measure network hierarchy
based on its topological structure. The hierarchy analysis is made more difficult by the
lack of a concrete definition of hierarchy that we could apply, not to mention the metrics
for measuring it.
In this study, we have done the hierarchy analysis based on the intuition that there
are different levels of importance among the links in a hierarchical network. Specifically,
one symptom of hierarchy is that there is a small set of backbone links that are used to
carry traffic from many more source-destinatioin pairs than some other links. W ith this
intuition, we have proposed two metrics to measure the network hierarchy based on the
link usage value.
Based on the distributions of link usage values of various networks, we classify topologies
into three different classes—strict, moderate and loose hierarchy. We found that degree-
based generators model the moderate hierarchy that is present in the Internet topology.
Our further examination of the path characteristic reveals that the structure of hierarchy
in degree-based networks is more similar to that of the AS than the RL network.
105
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In summary, then, we find that the prevailing wisdom that degree-based generators are
better models for Internet topologies, to which we had taken exception, is indeed correct.
However, these degree-based generators are better models of the Internet (Chapter 3)
not just because they slavishly imitate the degree-distribution but because this degree
distribution (and the fairly random connection of nodes) leads to a moderate form of
hierarchy very similar to that in the Internet.
106
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 6
Does AS Size determine degree in AS topology?
Does A S Size determine degree in A S topology? As presented in Chapter 2, we can extract
an AS-overlay graph from an Internet router-level topology. This extraction allows us to
estimate the AS size distribution, where size is defined by the number of routers in the AS.
This information can be used to gain some insight about the power-law degree distribution
observed in the AS topology. We find that the distribution of AS sizes exhibits a powerlaw.
Moreover, there is a strong correlation between AS size and degree. Based on the ubiquity
of highly-variable size distributions in real-world entities such as cities by population size,
file size, web document, etc., we conjecture that the power-law degree distrubution in the
AS topology may simply follow from its power-law size distribution.
6.1 Introduction
In a recent and much celebrated paper, Faloutsos et al. [26] found that the inter Au
tonomous System (AS) topology exhibits a power-law degree distribution. This result was
quite unexpected in the networking community, and stirred significant interest in exploring
107
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the possible causes of this phenomenon.^ The work of Barabasi et al [6], and its application
to network topology generation in the work of Medina et al [44], have explored a promis
ing class of models that yield strict power-law degree distributions. These models, which
we refer to collectively as the B-A model, describe the detailed dynamics of the network
growth process, modeling the way in which connections are made between ASs. There
are two simple connectivity rules that define the evolution of AS connectivity over time;
incremental growth where a new AS connects to existing ASs, and preferential connectivity
where the likelihood of connecting to an AS is proportional to the vertex outdegree of the
target AS. These simple rules, which are similar to the classical “rich get richer” model
originally proposed by Simon [63], lead to power-law degree distributions.
While the B-A model provably yields power-law vertex degree distributions, recent
empirical evidence indicates that the model may not be consistent with the dynamics
underlying the evolution of the actual AS topology. First, there is strong evidence [9, 14]
that the degree distribution of the actual AS topology does not conform to a strict power
law. However, the distribution is certainly heavy-tailed or highly-variable in the sense that
the observed vertex degrees typically range over three or four orders of magnitude; in some
cases, the tail of the degree distribution may fit a power law. These observations were
gleaned from more complete pictures of AS-level connectivity (obtained by augmenting
BGP route tables with peering relationships from other sources) than those used by earlier
work [6, 26, 44]. Second, the B-A model’s AS connectivity evolution rules can be shown
^As an aside, note that we do not discuss the degree distribution of the router-level Internet topology:
there seems to be some debate about the characteristics of that distribution [9].
1 0 8
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
to be inconsistent with empirical AS growth measurements [78]. As such, while the B-
A model appears to produce topologies whose degree distribution characteristics exhibit
power-law behavior, it cannot be a valid explanation for the connectivity evolution in the
AS topology.
Clearly, some of these empirical observations don’t corroborate the claim that the B-
A model explains the phenomenon of highly variable vertex degrees in the Internet’ s AS
topology [6]. However, the B-A model was originally proposed as a simple illustration
of how some elementary mechanisms or rules can give rise to power law vertex degree
distributions. As such, it is likely that the B-A model can be modified to accommodate
these more recent findings [3], but we will neither discuss here such modifications nor
comment on their possibility for success. Instead, we merely note that any such resulting
model would seek, as does the original B-A model, to explain the highly variable degree
distribution of the AS topology through the detailed dynamics of how connections between
ASs are established.
The purpose of this section is to raise the question—motivated by the B-A approach—
of whether the underlying cause of the high variability phenomenon of the vertex degree
distribution lies in the detailed dynamics of network growth, or if there are alternative ex
planations. To that end, we briefly outline an alternative explanation for the AS topology
degree distribution. We do not claim to have proven that this explanation holds; our pur
pose here is merely to expand the dialog to a larger class of explanations for the variability
of the AS topology degree distribution.
109
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6.2 An Alternative Explanation
To motivate our explanation, we first note that high variability is the norm in the distri
bution of sizes of many real-world entities. Cities by population size [63, 82], companies
by size of income [49] or by size of assets [60] are all known to exhibit power-law tails.
The distribution of countries or oil reserves by size appears to exhibit a Weibullian distri
bution [40]. In the computing literature, file [73] and Web document [19] sizes have been
known to have heavy tails.
Next, we find that AS sizes are no exception to this rule; in Section 6.3, we show that
the distribution of the size of an AS (as measured by the number of routers^ in the AS)
exhibits high variability.^
Finally, we observe, as described in Section 6.3, that AS sizes are highly correlated with
degree. That is, large ASs tend to have large degrees and small ASs tend to have small
degrees.
These observations suggest one possible, and quite general, explanation for the AS de
gree distribution: rather than arising from connectivity dynamics, the highly variable degree
distribution may arise merely from its correlation with a highly variable size distribution.
Assuming each individual AS corresponds to a business entity,^ that degree follows size
captures the intuition that large businesses, by setting up a large initial capital investment
and building out a nationwide network, are able to attract more customers and peers than
^We looked at other measures of AS size, including revenue, number of employees, and market capi
talization. All these measures exhibit heavy-tails, and were correlated with AS degree. As an aside, it
appears that for ASs, the number of routers in an AS is a good surrogate for any notion of “size”.
®We emphasize that the exact form of this distribution is not of concern for the purposes of this paper.
Of relevance is the qualitative observation that the distribution is highly-variable (or heavy-tailed).
" ‘This assumption is only approximately true. In practice, some ISPs configure their networks to have
several AS numbers.
110
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
smaller businesses. If our explanation turns out to be correct, it removes the mystery of
AS topology degree distribution, but replaces it with the much older mystery of company
size distribution. However, given that highly-variable size distributions are quite common,
the reason why AS size exhibits a power law distribution may have little to do with the
fact that the AS topology represents connectivity in a data communication network.
From the correlation between degree and size, and from the high-variability in degree,
there is an alternative conclusion that we could draw, namely that the degree of an AS
determines its size. In this case, a growth model like the one described in [44, 6] might
be a plausible explanation for how such degree distributions arise. However, the ubiquity
of highly-variable size distributions suggests to us that size variability is likely the cause,
not the effect, of high-variability in AS degree. It is not clear to ns how to establish the
validity of this suggestion.
In summary, then, we raise the possibility that there exists an alternative explanation
for the highly-variable degree distribution in the AS topology—namely, that (1) AS size
determines AS degree and (2) AS sizes are highly-variable. This latter phenomenon, which
may appear as mysterious as the original highly-variable degree distribution, is just another
instance of the ubiquitous high-variability size distributions of various real-world entities.
Given that several other real-world networks exhibit highly variable degree distributions
(Chapter 4 and [66]), one might assume that degree follows size more generally than just for
the AS topology. Our initial findings in Appendix F suggest otherwise. In our experiments
with the graph of actor collaborations (and airport connectivity), where the measure of an
actor’s size is the number of movies the actor had participated in (and the airport’s size
is the total number of flights per day that originated or ended in the airport), we found
111
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
only mild correlation between size and degree (around 0.6 to 0.8), even though the size and
degree distributions were highly-variable.
6.3 M ethodology and Results
To determine the size distribution of ASs (recall that our definition of the size of an AS is
the number of routers in the AS), we computed an A S overlay from a router-level topology.
Our router-level topology was collected using Mercator. The topology discovery meth
ods employed by Mercator, and their limitations, are documented in [32]. Briefly, Mercator
randomly probes addressable parts of the IP address space and, using traceroutes, infers
adjacencies. It is also able to resolve aliases—interfaces belonging to the same router. The
map inferred by Mercator is not complete, but we believe it captures a significant part
of the transit portion of the network. In this note we present results from three different
snapshots of the router-level topology, each taken more than six months after the previous
one.® These topologies vary widely in size, an artifact of the different durations of each
run of Mercator.
From the IP addresses obtained by Mercator, we then labelled each router with the AS
it belonged to, thereby generating an AS overlay. The techniques we used for associating a
router with an AS, together with their limitations, are described in Section 2.2 (and [69]).
Briefly, the AS overlay technique uses the BGP routing tables to infer the ASs to which
routers belong. We validate this technique and shows how it appears to give AS maps that
are qualitatively consistent with those obtained from BGP routing tables in Section 2.2.
®We have only computed correlations on three router-level topology snapshots. It is, of course, en
tirely possible that other snapshots might disprove our findings. However, we believe that the remarkably
consistent results from our three snapshots argues that the likelihood of this is small.
112
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Having described our methodology, we now describe our main findings.
8
O.OI 0.01
0.001 0.001
0.0001 0.0001
1 10 1000 10000 100000 100
(a) August 1999
10 100 1000 10000 1 10 100 1000 10000 100000
AS Size AS Size
(b) April 2000 (c) May 2001
Figure 6.1: Complementary cumulative distributions of AS size
0.1
8
0.01
I
0.001
0.0001
1 10 100 1000
AS Degree
(a) August 1999
1
0.1
0.01
0.001
0.0001
100 1000 1 10
AS Degree
0.01
0.001
0.0001
1000 10 100
(b) April 2000
Figure 6.2: Complementary cumulative distributions of AS degree
AS Degree
(c) May 2001
A Ss Sizes are H ighly-Variable Figure 6.1 depicts log-log plots of the complementary
cumulative distribution of ASs by size. In this figure, note that these distributions are
quantitatively different. This can be attributed to the variation in size of our three router-
level topologies. In each case, however, the complementary cumulative distribution of ASs
by size is highly variable, spanning 3-4 orders of magnitude.
113
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Date Coefficient of Correlation
August 1999 0.941
April 2000 0.936
May 2001 0^159
10000
1000
I
Q
<
100
1000 10000 100000 1 10 100
AS Size
Figure 6.3: Correlation between size and degree
D egrees and Sizes are W ell-Correlated Figure 6.2 shows the log-log plots of the
complementary cumulative distribution of ASs by degree, where the AS degree is measured
by the number of neighbors of each AS as inferred from our AS overlay. Figure 6.3 describes
the coefficient of correlation between AS size and degree. For three topologies separated
by several months, the correlations are high. The figure also depicts, for the topology
snapshot of May 2001, a scatterplot revealing this correlation visually (the scatterplots for
other snapshots are similar).
A S A ges and D egrees are N ot W ell-C orrelated Our final result considers the cor
relation between an ASs age and its degree. As we have alluded to in Section 2.1, if there
is indeed a correlation between age and degree, this would lend credence to growth models
114
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
that imply that degree determines size. To a first approximation, the AS number deter
mines its age.® Figure 6.4 describes the coefficient of correlation between age and degree
in our three maps. We note that there exists no correlation between age and degree.
Date Coefficient of Correlation
August 1999 - 0.129
May 2000 - 0.127
April 2001 - ffi098
Figure 6.4: Correlation between age and degree
6.4 Conclusions
In summary, we question whether the highly variable vertex degree distribution of the
Internet AS topology can indeed be attributed to connectivity dynamics, as envisioned
by the B-A model. Based on the ubiquity of highly-variable size distributions, and on
our observed correlation between AS size and AS degree, we ask whether there exists an
alternative explanation—namely, that AS size determines degree and the high-variability
in degree follows naturally from the observed high-variability in AS size.
®This is only approximately true. lANA used to hand out AS numbers sequentially, but they
would also delegate blocks of AS numbers to the regional registries [?]. However, at any given time,
each delegated block was small (a few hundred AS numbers initially, and up to 1024 of late, see
h t t p : //w w w . ia n a . o r g /a ssig n m e n ts/a s -n u m b e r s)
115
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 7
Conclusions & Future Directions
In this thesis, we have used the Internet Autonomous System (AS) and Router-level (RL)
maps to investigate the overall efficiency of the Internet’s routing infrastructure and topol
ogy. In particular, we use these maps to examine the impact of routing protocols and to
investigate how realistic synthetic network models are in capturing the large-scale struc
ture of the Internet. We summarize our conclusions and possible future directions in this
section.
7.1 Conclusions and Contributions
In Chapter 2, we proposed two policy routing models—the shortest AS path model, and
the shortest valid AS path model that considers the peering relationships between ASs—to
study the impact of policy routing on Internet paths. Our analysis is based on a snapshot
of the Internet router-level map that we collected in 1999. Our work reveals that routing
policy impacts the length of the Internet paths significantly. For example, 80% of the
Internet paths are inflated, i.e., these paths are longer than the shortest router-hop paths.
In addition, many of these paths {e.g., 50%) are suboptimal, i.e., there exist at least one
116
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
shorter detour path for each of these suboptimal policy paths. We also show that our
findings are resilient to different snapshots of the Internet topology and policy models.
We then devoted the rest of the thesis to investigate the fundamental structures of the
Internet topology. We started by examining the widely accepted belief that degree-based
generators, by the very fact that they better match the degree distribution of the Internet,
are superior to structural generators that attem pt to mimic the hierarchical structure of the
Internet. The structural generators do not produce power-law degree distributions. Thus,
many in the field seem to have concluded that this disparity, by itself, proved that structural
generators were unsuitable models for the Internet. We believed that it was more important
for topology generators to accurately model the large-scale properties of the Internet (such
as its hierarchical structure) than to faithfully reproduce its local properties (such as the
degree distribution). Therefore, it was still an open question as to which network topology
generators best model the Internet.
In Chapter 3, we proposed several large-scale topology metrics to answer the question
“Which generated networks most closely model the large-scale structure of the Internet?”
According to our metrics, our findings agreed with the prevailing wisdom that degree-based
generators are better models for Internet topologies. Our findings in Chapter 4 also showed
similar results for other real-world networks such as web documents and airline networks.
In Chapter 5, we solved this seeming paradox by proposing a few hierarchy metrics to
analyze the hierarchical structure of these networks. The results led to the conclusion
that these degree-based generators are better models of the Internet not just because they
imitate the degree-distribution but because this degree distribution (and the fairly random
117
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
connection of nodes) leads to a moderate form of hierarchy very similar to that found in
the Internet.
Finally, in Chapter 6, we again used the Internet router-level maps to study the AS
size distribution where size is determined by the number of routers within an AS. This
information helped us gain some insight about the power-law degree distribution observed
in the AS topology. For example, we found that the distribution of AS sizes exhibited a
power-law property. We also found that there is a strong correlation between between the
AS size and degree, e.g., ASs that are bigger in size seem to have larger degrees. These
findings lead us to conjecture that the power-law degree distrubtion in the AS topology
may simply follow from its power-law size distribution.
7.2 Future Directions
Our work in topology characterization and hierarchy is an initial step in trying to under
stand the fundamental structure of the Internet. Obviously, an extensive amount of work
is required before we can begin to really understand the Internet structure and its impact.
For example, in this thesis, we did not try to define additional metrics that distinguished
between the various degree-based generators. This is a useful goal and should be subject
to future work. We list a subset of possible future work in this section.
7.2.1 N etw o r k H ierarch y A n a ly sis
As an initial step to understand the characteristics of the Internet structure, we have
treated both the AS and RL maps as independent maps and analyzed them individually.
However, since both AS and RL maps reflect the overall structure of the Internet, and the
118
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
AS topology is a logical overlay on the router-level topology, one would expect that there
is some hierarchical relationship between the two maps. But what is this relationship and
how can we find it? These questions remain for future investigation.
In this thesis, we have left out the link bandwidth in our analysis due to the extreme
difficulty in acquiring or collecting such information, especially in a large scale network.
Efficient techniques for collecting network bandwidth information in the large scale Internet
are by themselves future research. Nevertheless, in the future, when the bandwidth infor
mation associated with the Internet topology becomes available, we will also need a new
framework for analyzing network hierarchy that takes into account network bandwidth.
7 .2 .2 In ferrin g b a ck b o n es
We have shown the link value distributions of PLRG, AS and RL in Section 5.4.2. Unlike
Tree, TS or Tiers, there is no clear cut-off for picking out the backbone links. We conjecture
that there is a continuum of levels of hierarchy in these networks. Since our study revealed
that PLRG, AS and RL have moderate level of hierarchy, and network researchers often
mention the backbone links in the Internet, there should be backbones in these networks.
The question is then how do we identify these backbones?
Even though people have used the word backbone extensively in the networking context,
the definition of backbone is still unclear. In this thesis, our notion of backbone links are
comparative. We refer to backbone links as a small set of links that carry the traffic
from many more source-destination pairs tlian other links (See Section 5.3). However,
there might be other definitions, and therefore metrics, for analizing network hierarchy,
especially those that incorporate network bandwidth into the analysis. Coming up with
119
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
a good definition of backbone links (with and without bandwidth information) and novel
techniques for identifying them in these networks remain for future work.
7 .2 .3 T h e Im p a ct o f T o p o lo g y o n P r o to c o l P erfo rm a n ce
Several papers have observed the impact of topology on protocol performance. For example,
Radoslavov et al. [59] found that topologies affect the amount of multicast state. The earlier
work by Doar and Leslie [21], Wei and Estrin [77], and Mitzel and Shenker [45] have found
similar result that topologies affect performance of their studied metrics. However, apart
from the work by Phillips et al. [56], there has not been much work that attem pts to address
the fundamental questions of which topology properties or metrics are responsible for the
topology impact of the protocols, or how do they impact the protocol performance. These
questions, of course, remain open research problems.
7 .2 .4 T o p o lo g y A n a ly sis
In this thesis, we analyzed the large-scale structure of the Internet router-level topology
as a whole. However, the Internet composes of a collection of networks with various sizes.
Studying the structural properties of these individual networks, especially when more de
tailed maps of individual ASs become available [47], might help us gain insight into the
overall structure of the Internet. Moreover, this insight might help researchers come up
with a better model for generating a router-level graph that closely resembles the Internet.
An important step in analyzing AS networks is to acquire individual AS topologies.
One could potentially start with an Internet router-level map (collected by a tool like
Mercator [32]), then apply an AS overlay technique (for an example, see Chapter 2) to
120
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
extract individual AS maps from the router-level map. An alternate solution is to apply
a recent measurement tool proposed by Spring et al. to measure specific ISP topologies.
The quality of the AS maps is probably different depending on the measurement techique,
location of the AS in the overall Internet, etc. Apart from gaining some insight into the
Internet structure, the results of the AS topology analysis can be used to study the quality
or correlation of Internet/AS maps generated by different techniques. This research might
lead to an improvement in topology measurement, and as a result, a better quality of map.
7 .2 .5 In te r n e t T o p o lo g y M o d elin g
Chapter 3 shows that degree-based generators, such as PLRG, model the Internet better
than structural generators. However, the hierarchy in the router-level Internet is a result
of careful structural construction as opposed to the power-law degree distribution as in the
case of PLRG (refer to Chapter 5).
A network model that captures both the large-scale properties of the Internet and
hierarchical arrangement will be useful for many network simulations, and in particular
for routing-related simulations. The information about the AS overlay and individual ASs
that can be gathered from topology analysis work might lead to a new network model
that incorporates both power-law random graph model and structural model. This hybrid
model would generate both the router-level topology and the corresponding AS-overlay on
top of it. To date, we are not aware of any generator that does this. Therefore, realistic
topology modeling is subject to future work.
121
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
7 .2 .6 T h e O rigin o f th e A S P o w er-la w S ize D istr ib u tio n s
Finally, in this thesis we conjecture that it might be the power-law size distribution of ASs
that is responsible for the power-law degree distribution in the AS map. Nevertheless, the
origin of the AS power-law size distribution remains an open question.
1 2 2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reference List
1] A i e l l o , W ., C h u n g , F ., a n d L u , L. A Random Graph Model for Massive Graphs.
In Proc. of the 32nd Annual Symposium on Theory of Computing (2000).
2] A l b e r t , R., a n d B a r a b a s i , A.-L. Topology of Evolving Networks: Local Events
and Universality . Physical Review Letters 85 (2000), 5234-5237.
3] A l b e r t , R., a n d B a r a b a s i , A.-L. Statistical Mechanics of Complex Networks.
cond-mat/0106096 (June 2001).
4] A l b e r t , R., J e o n g , H., a n d B a r a b a s i , A.-L. Attack and Error Tolerance of
Complex Networks. Nature f0 6 (2000).
5] A m a r a l , L. A. N., S c a l a , A., B a r t h é l é m y , M., a n d S t a n l e y , H. E. Classes
of Small-World Networks. Proceedings of the National Academy of Sciences 97, 21
(October 2000).
6] B a r a b a s i , A.-L., a n d A l b e r t , R. Emergence of Scaling in Random Networks.
Science ggg (1999), 509-512.
7] B a r t a l , Y. Probabilistic Approximations of Metric Spaces and its Algorithmic Ap
plications. In Proc. 37th IEEE Symposium on Foundations of Computer Science (Oc
tober 1996), pp. 184-193.
8] B o l l o b â S , B . Random Graphs. Academic Press, Inc., Orlando, Florida, 1985.
9] B r o i d o , a ., a n d C l a f f y , K. C . Internet Topology: Local Properties. In Proceed-
of SPIE ITCom 2001 (Denver, CO, August 2001).
[10] Bu, T ., AND T o w s l e y , D . On Distinguishing Between Internet Power-Law Gener
ators. In Proc. of IEEE Infocom (2002).
[11] B u r c h , H., a n d C h e s w i c k , B . Mapping the Internet. IEEE Computer 32, 4 (April
1999), 97-98.
[12] C a l v e r t , K., D o a r , M., a n d Z e g u r a , E . Modelling Internet Topology. IEEE
Communications Magazine (June 1997).
[13] C h a l m e r s , R. C ., a n d A l m e r o t h , K. C . Modeling the Branching Characteristics
and Efficiency Gains in Global Multicast Trees. In Proceedings of the IEEE Infocom
2001 (to appear) (Anchorage, Alaska, USA, April 2001).
123
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[14] C h a n g , H., G o v in d a n , R., J a m in , S., S h e n k e r , S., a n d W i l l i n g e r , W . On
Inferring AS-level Connectivity from BGP Route Tables, submitted to the ACM
Internet Measurements Workshop 2001.
[15] C h a n g , H ., G o v in d a n , R ., J a m in , S ., W i l l i n g e r , W ., a n d S h e n k e r , S. On
Inferring AS-Level Connectivity from BGP Routing Tables. In Proc. of IEEE Infocom
(2002).
[16] C h e s w i c k , B ., B u r c h , H ., a n d B r a n i g a n , S. Mapping and Visualizing the Inter
net. In USENIX (San Diego, CA, June 2000).
[17] C h u a n g , J., a n d S ir b u , M . Pricing multicast communications; A cost-based ap
proach. In Proceedings of the IN E T ’ 98 (1998).
[18] C l a f f y , K. C ., a n d M c R o b b , D . Measurement and Visualization of Internet
Connectivity and Performance. http://w w w .caida.org/TooIs/Skitter/.
[19] C r o v e l l a , M. E., a n d B e s t a v r o s , A. Self-Similarity in World-Wide Web Traffic:
Evidence and Possible Causes. IEEE /AC M Transactions on Networking 5, 6 (Decem-
ber 1997), 835-846.
[20] D o a r , M . A Better Model for Generating Test Networks. In Proceeding of IEEE
Global Telecommunications Conference (GLOBECOM) (November 1996).
[21] D o a r , M ., a n d L e s l i e , I. How Bad is Naive Multicast Routing? In Proceedings of
the IEEE Infocom (1993).
[22] D o w n e y , A. B. Using pathchar to Estimate Link Characteristics. In Proceedings of
fA e ACM gJCCOMM (1999).
[23] E T . A L ., C. C. C. A Loop-Free Extended Bellman-Ford Routing Protocol W ithout
Bouncing Effect. In Proceedings of ACM SIGCOMM (1989), pp. 224-236.
[24] F a b r i k a n t , a ., K o u t s o u p i a s , E., a n d P a p a d i m i t r i o u , C . Heuristically Opti
mized Trade-offs. http://www.cs.berkeley.edu/ Christos/.
[25] F a l o u t s o s , C., F a l o u t s o s , M., a n d F a l o u t s o s , p . W hat does Internet look
like? Empirical Laws of the Internet Topology. In Proceedings of ACM SIGCOMM
1999, Boston, MA, September 1999.
[26] F a l o u t s o s , C ., F a l o u t s o s , P ., a n d F a l o u t s o s , M . On Power-Law Relationships
of the Internet Topology. In Proceedings of the ACM SIGCOMM (Sept. 1999).
[27] FOR A d v a n c e d N e t w o r k i n g R e s e a r c h , N . L. AS Connectivity Information,
http: / / moat.nlanr.net/R outing/raw data/.
[28] G a o , L. Inferring autonomous system relationships in the internet. In Proc. IEEE
Globecom (San Francisco, CA, 2000).
[29] G o e l , a ., a n d M u n a g a l a , K. Extending Greedy Multicast Routing to Delay
Sensitive Applications. Tech. rep., Stanford Univ. Tech Note STAN-CS-TN-99-89,
July 1999. Short abstract appeared in the Symposium on Discrete Algorithms, 2000.
124
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[30] G o v in d a n , R., a n d R e d d y , A. An Analysis of Internet Inter-Domain Topology
and Route Stability. In Proc. IEEE INFOCOM ’ 97 (Kobe, Japan, Apr 1997).
[31] G o v in d a n , R ., a n d T a n g m u n a r u n k i t , H. Heuristics for Internet Map Discovery.
In Proceedings of IEEE Infocom (Tel-Aviv, Israel, April 2000).
[32] G o v in d a n , R., a n d T a n g m u n a r u n k i t , H . Heuristics for Internet Map Discovery.
In Proceedings of the IEEE Infocom (Tel-Aviv, Israel, March 2000).
[33] Hu, T. C. Optimum Communication Spanning Trees. SIAM Journal of Computing
g (1974),188-195.
[34] I C a n c h o , R. F., a n d S o l e , R. V. Optimization in Complex Networks. Condensed
M atter Archives, h t t p : / / x x x . l a n l . g o v / a b s / c o n d - m a t , November 2001.
[35] J in , C ., C h e n , Q ., a n d J a m in , S. Inet: Internet Topology Generator. Tech. Rep.
CSE-TR-433-00, EECS Department, University of Michigan, 2000.
[36] K a r y p is , C ., a n d K u m a r , V. A Fast and High Quality Multilevel Scheme for
Partitioning Irregular Graphs. SIAM Journal on Scientific Computing 20, 1 (1998),
359-92.
[37] K h a n n a , a ., a n d Z in k y , J. A Revised ARPANET Routing Metric. In Proceedings
o/ACMgJGGOMM (1989).
[38] K l e i n b e r g , j., K u m a r , S. R., R a j a g o p a l a n , S., R a g h a v a n , P ., a n d T o m k in s ,
A. The Web as a Graph: Measurements, Models and Methods. In International
Conference on Combinatorics and Computing (1999).
[39] K l e i n r o c k , L., a n d K a m o u n , F. Hierarchical Routing for Large Networks: Per
formance Evaluation and Optimization. Computer Networks 1 (1977), 155-174.
[40] L a h e r r e r e , j., a n d S o r n e t t e , D. Stretched Exponential Distributions in Nature
and Economy: Fat tails with characteristic scales. European Physics Journal B, 2
(1998), 525-539.
[41] L a i, K ., a n d B a k e r , M. G. Measuring Link Bandwidths Using a Deterministic
Model of Packet Delay. In Proceedings of the ACM SIGCOMM (2000).
[42] M c Q u i l l a n , J. M ., R i c h e r , L, a n d R o s e n , E. C. The New Routing Algorithm
for the ARPANET. IEEE Transactions on Communications 7, 1 (5 1980), 1-7.
[43] M e d i n a , A., L a k h i n a , A., M a t t a , L, a n d B y e r s , J. BRITE: An Approach to
Universal Topology Generation. In Proceedings of M ASCOTS 2001 (Cincinnati, OH,
August 2001).
[44] M e d i n a , A., M a t t a , I., a n d B y e r s , J. On the Origin of Power-Laws in Internet
Topologies. ACM Computer Communications Review SO , 2 (April 2000).
[45] M i t z e l , D. J., a n d S h e n k e r , S. Asymptotic resource consumption in multicast
reservation styles. In In SIGCOMM Symposium on Communications Architectures
and Protocols (London, UK, September 1994), pp. 226-233.
125
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[46] M o t w a n i , r . Lecture Notes on Approximation Algorithms - Vol I. Department of
Computer Science, Stanford University.
[47] N e i l S p r in g , R. M., a n d W e t h e r a l l , D. Measuring isp topologies with rocketfuel.
In In Proceedings of ACM SIGCOMM 2002 (Pittsburgh, PA, August 2002).
[48] OF O r e g o n R o u t e V i e w s P r o j e c t , U. Bgp routing table dump archive,
http: / / rv-archive.uoregon.edu/oix-route-views /.
[49] OKU YAM A, K ., T a k a y a s u , M., a n d T a k a y a s u , H. Zipf’s Law in income distribu
tion of companies. Physica A 269 (1999), 125-131.
[50] P a l m e r , D., a n d S t e f f e n , G. On Power-Laws In Network Topologies. In Proceed
ings of IEEE Globecom (2000).
[51] P a n s i o t , J.-J., a n d G r a d , D. On routes and multicast trees in the Internet. ACM
SIGCOMM Computer Communication Review 28, 1 (January 1998), 41-50.
[52] P a r k , K. Impact of topology on traceback techniques. Private communication.
[53] P a x s o n , V. End-to-end Routing Behavior in the Internet. In Proceedings of the
ACM SIGCOMM Symposium on Communication Architectures and Protocols (San
Francisco, CA, September 1996).
[54] P a x s o n , V. End-to-end Internet Packet Dynamics. In Proceedings of the 1997 ACM
SIGCOMM Conference on Communication Architectures and Protocols (September
1997).
[55] P e l e g , D., a n d U p f a l , E . Constructing disjoint paths on expander graphs. In
STOC: ACM Symposium on Theory of Computing (STOC) (1987).
[56] P h i l l i p s , G., S h e n k e r , S., a n d T a n g m u n a r u n k i t , H. Scaling of Multicast Trees:
Comments on the Chuang-Sirbu Scaling Law. In Proceedings of the ACM SIGCOMM
(Sept. 1999).
[57] P h i l l i p s , G., T a n g m u n a r u n k i t , H., a n d S h e n k e r , S. Scaling of Multicast Trees:
Comments on the Chuang-Sirbu Scaling Law. In Proceedings of ACM SIGCOMM
1999, Boston, MA, September 1999.
[58] R. SiAMWALLA AND R. S h a r m a AND S. K b s h a v . Discovering internet topology,
http: / / www.cs.cornell.edu/skeshav/papers/discovery.pdf.
[59] R a d o s l a v o v , P ., T a n g m u n a r u n k i t , H., Y u , H., G o v in d a n , R ., S h e n k e r , S.,
AND E s t r i n , D . On Characterizing Network Topologies and Analyzing Their Impact
on Protocol Design. Tech. Rep. 00-731, University of Southern California, Dept, of
OS, February 2000.
[60] R a m s d e n , j. j., a n d K is s - H a y p a l, G. Company size distribution in different
countries. Physica A 277 (2000), 220-227.
126
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[61] R é n y i, a . On the Enumeration of Trees. In Combinatorial Structures and Their
Applications (June 1969), Gordon and Breach, Science Publishers, pp. 355-360.
[62] S a v a g e , S., C o l l i n s , A., H o f f m a n , E., S n e l l , J., a n d A n d e r s o n , T . The
End-to-End Effects of Internet Path Selection. In Proceedings of ACM SIGCOMM
(Boston, MA, September 1999).
[63] S im o n , H . On a Class of Skew Distribution Functions. Biometrika (1953).
[64] S u b r a m a n ia n , L., A g a r w a l , S., R e x f o r d , J., a n d K a t z , R . Characterizing the
Internet Hierarchy from Multiple Vantage Points. In Proc. of IEEE Infocom (2002).
[65] T a n g m u n a r u n k i t , H ., D o y l e , J ., G o v i n d a n , R ., J a m in , S., W i l l i n g e r , W .,
A N D S h e n k e r , S. Does AS Size Determine AS Degree? ACM Computer Communi
cation Review (October 2001).
[66] T a n g m u n a r u n k i t , H ., G o v in d a n , R ., J a m in , S., S h e n k e r , S., a n d W i l l i n g e r ,
W. On network topologies, power laws and hierarchy. Tech. Rep. 01-746, University
of Southern California, Computer Science Department, 2001.
[67] T a n g m u n a r u n k i t , H., G o v in d a n , R., J a m in , S., a n d W i l l i n g e r , S. S. W.
Network Topology Generators: Degree-Based vs. Structural. In Proc. of ACM SIG
COMM (Pittsburgh, PA, 2002).
[68] T a n g m u n a r u n k i t , H ., G o v in d a n , R., a n d S h e n k e r , S. Internet Path Inflation
Due to Policy Routing. In Proc. of SPIE ITCom (Denver, CO, 2001), pp. 188-195.
[69] T a n g m u n a r u n k i t , H ., G o v in d a n , R ., S h e n k e r , S ., a n d E s t r i n , D . The Impact
of Policy on Internet Paths. In Proc. of IEEE INFOCOM (Anchorage, AK, 2001).
[70] T s u c h i y a , P.P. Landmark Hierarchy: A New Hierarchy for Routing in Very large
networks . In Proceedings of ACM SIGCOMM (Stanford, CA, August 1988).
[71] VAN D E R H o f s t a d , R., H o o g h i e m s t r a , G ., A N D M ie g h e m , P. V. On the effi
ciency of multicast. Submitted for publication.
[72] VAN M ie g h e m , P ., H o o g h i e m s t r a , G ., a n d v a n d e r H o f s t a d , R . A scaling law
for the hop count. Tech. rep.. Delft University of Technology, 2000.
[73] V o g e l s , W. File System Usage in Windows NT 4.0. In Proceedings of the ACM
Symposium on Operating Systems Principles (Kiawah Island, SC, December 1999).
[74] VuKADiNOVic, D., H u a n g , P ., a n d E r l e b a c h , T. A Spectral Analysis of the
Internet Topology. Tech. rep., ETH Zurich, 2001.
[75] W a t t s , D . J., a n d S t r o g a t z , S. H. Collective Dynamics of Small-World Networks.
AlafuTt ggg (1998), 202-204.
[76] W a x m a n , B. M. Routing of Multipoint Connections. IEEE Journal of Selected Areas
in Communication 6, 9 (December 1988), 1617-1622.
127
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[77] W e i, L., a n d E s t r i n , D. A Comparison of Multicast Trees and Algorithms. In
Proceedings of the IEEE Infocom (Toronto, Canada, June 1994).
[78] W i l l i n g e r , W ., G o v in d a n , R., J a m in , S., P a x s o n , V., a n d S h e n k e r , S.
Scaling Phenomena in the Internet: Critically Examining Criticality. Proceed
ings of the National Academy of Sciences. 2001 (to appear); also available at
http: / / topology.eecs.umich.edu.
[79] W o n g , T ., a n d K a t z , R. An Analysis of Multicast Forwarding State Scalability.
In Proceedings of the 8th IEEE International Conference on Network Protocols (ICNP
2000) (Osaka, Japan, November 2000).
[80] Z e g u r a , E. Thoughts on Router-level Topology Modeling. The End-to-end interest
mailing list.
[81] Z e g u r a , E ,, C a l v e r t , K. L., a n d D o n a h o o , M. J. A Quantitative Comparison of
Graph-Based Models for Internet Topology. IEEE /AC M Transactions in Networking
J, 6 (1997).
[82] ZiPF, G. K. Human Behavior and the Principle of Least Effort. Addison-Wesley,
1949.
128
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A p p en d ix A
Peering Relationship Identification on the AS overlay map
/* Given a link(A ,B ), Get_Peering_Relationship retu rn s the peering
rela tio n sh ip between A eind B.
Assumption: the peering rela tio n sh ip of the actu al A S map (obtained
from the B G P tab le) is already determined,
degree(X) : retu rn s the degree of node X .
*/
Get_Peering_Relationship(link(A,B)) {
i f link(A,B) e x ists in the actual A S map
then link_type = the rela tio n sh ip of (A,B) in the actual A S map
e ls e if node A and node B are not in actual A S map
then i f degree(A) > degree(B)
then link_type = PR 0V 1D ER _C U ST0M ER
else link_type = C U ST G M E R _PR O V ID E R
e ls e if node A is not in the actual A S map
then link_type = C U ST O M E R _PR O V ID E R
e ls e if node B is not in the actual A S map
then link_type = PR G V ID E R _C U ST O M E R
else /* node A and B are in the actual A S map but there is no link
connecting A and B in the actual A S map */
if ( ((degree(A) > 60) & & (degree(B) > 6 0 )) & &
((A has more than 15 peers) & & (B has more than 15 peers)) )
then link_type = PRG V ID ER_PRG V ID ER
else i f (degree(a) > degree(b))
then link_type = PR G V ID E R _C U ST G M E R
else link_type = C U ST G M E R _PR G V ID E R
retu rn (lin k _ ty p e);
>
129
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A pp en d ix B
Results for Other Metrics
100
10
I
I
0.1
I 1 0 100
100
10
1 0 100 1000
10
0.
10 100 1000
(a) Canonical (b) Measured (c) Generated
Figure B.l: The distribution of eigenvalues of a graph plotted against their rank [26].
0,5
• S
0.1
0
0.6
N orm alised Eccentricity (hops)
(a) Canonical
0.5
0 . 1
0
0.6
Norm alized Eccentricity (hops)
0.5
0.3
02
0.1
0
Norm alized Eccentricity (hops)
(b) Measured (c) Generated
Figure B.2: The distribution of node diameters. This is a modified version of the graph
diameter metric proposed in [81].
130
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1 0 0 0 0 0
1 0 0 0
10
1 0 0 1 1 0 1 0 0 0
1 0 0 0 0 0
1 0 0 0 1 0 1 0 0 1 0 0 0 0
1 0 0 0 0 0
1 0 0 0
1 0 0
1 0 0 0 0 1 0 0 1 0 0 0
(a) Canonical (b) Measured (c) Generated
Figure B.3: The vertex cover of the subgraphs within balls of size n, as a function of ball
size.
1 0 0 0 0 0
1 0 0 0 0
1 1 ) 0 0
1 0 0
1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 0
1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 0
1 0 0 0 0
1 0 0
1 0 0 lOOO 10000 100000
(a) Canonical (b) Measured (c) Generated
Figure B.4: The number of biconnected components within a subgraph defined by a ball
of size n, as a function of ball size.
131
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
60
M esh.att
Random.att
50
r
« 3 0
1 0
0
0.05 0.1 0.15
Error Rate f
60
RL.core.att
AS.att
PLRG.att
50
40
30
2 0
1 0
0
0 .2 0.05 0 . 1 0.15 0
TS.att
Tiers.att
W axman. att •
30
2 0
1 0
0
(a) Canonical, attack (b) Measured, attack
0.05 0.1 0.15 0.2
Error Rate f
(c) Generated, attack
2 2
2 0
18 T re e .e rr
M esh.err
Random.err 16
I
« 14
1 1 2
^ 1 0
< 8
6
4
2
0.15 0.05 0 . 1 0 .2 0
RL.core.erT ---------
A S.err .............
PLRG .err - ......
.
0.1 0.15
Error Rate f
2 2
2 0
1 5
16
14
1 2
1 0
8
6
4
TS.err •
W axman.err •
0.05 0.1 0.15
Error Rate f
(d) Canonical, error (e) Measured, error (f) Generated, error
Figure B.5: Figures (a)-(c) depict the attack tolerance [4] of our networks. This measures
the average path-length of the largest connected component when increasingly larger frac
tions of nodes are removed, in order of decreasing degree. Figures (d)-(f) plot the error
tolerance; the average path length when nodes are removed randomly.
132
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Random
1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0
Ball Size
RL —
AS
/ W G L ..
0.7
0 .6
I 0.3
0 .2
0 . 1
1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0
TS —
Tiers — ^
W axman - *
0.7
0 .6
J
I
.=
0.5
0.4
" 0 .2
ICO 1000 10000 100000 1 0
(a) Canonical (b) Measured (c) Generated
Figure B.6: Clustering Coefficient of a subgraph defined by a ball of size n, as a function
of ball size.
133
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A pp en d ix C
Connectivity Sensitivity Analysis
Figure 3.6 and 3.7 in Chapter 3.6 show us that many PLRG variant networks have qualitatively
similar characteristics according to our metrics. We also observed that the B-A, Brite and BT
generators have a slightly different distortion curve. In examining their degree distributions we
noticed that the largest degree in these generators is often significantly less than that in other
variants, and they have fewer low-degree nodes {i.e., degree-1 and degree-2 nodes). As explained
in Chapter 3.6, each of these networks deploy different random connectivity methods. So to check
whether the connectivity methods are responsible for the difference, we conduct the following
experiments.
For the same initial degree distribution, we generate various networks using various connectivity
methods and measure their characteristics according to our metrics. These connectivity methods
can be classified as deterministic connectivity methods or random connectivity methods. We list
below the connectivity detail of methods belonging to both categories.
Deterministic Connectivity Method
• Dl: Start with the highest degree node, add one link each from this node to each lower degree
node in decreasing degree order (skipping nodes whose degree has already been satisfied).
Then repeat for the next highest degree node whose degree has not been satisfied.
• D2: Start with the highest degree node, add one link each from this node to each lower degree
node in increasing degree order (skipping nodes whose degree has already been satisfied).
Then repeat for the next highest degree node whose degree has not been satisfied.
• D3: Start with the highest degree node with degree j, add one link each from this node
to the next j/2 unsatisfied nodes in decreasing degree order, and j/2 unsatisfied nodes in
increasing order. Then repeat for the next highest degree node whose degree has not been
satisfied.
• D4; Start with the highest degree node with degree j, add one link each from this node to
the next 5 unsatisfied nodes in decreasing degree order, and j — 5 unsatisfied nodes in the
increasing degree order. Then repeat for the next highest degree node whose degree has not
been satisfied.
Random Connectivity Methods
• Rl: Start with the highest degree node, add one link each from this node to a randomly
selected unsatisfied node, until all the links belong to this node are satisfied. Repeat for
the next highest degree node whose degree has not been satisfied. Note that a node whose
degree has not been satisfied is an unsatisfied node.
• R2 (INET-like connectivity method): Generate a full-mesh {i.e., clique) among the first 5
highest degree nodes. Then randomly connect 10% of the unsatisfied links belonging to the
clique nodes to degree-2 nodes. Then starting from the highest degree node whose degree has
not been satisfied, connect one link each from this node to a randomly selected unsatisfied
node. Repeat for the next highest degree node whose degree has not been satisfied.
134
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• R3: Randomly select 2 distinct unsatisfied nodes and connect them together. The connec
tion is ignored if both nodes are degree-1 nodes (to minimize the number of disconnected
components). Repeat this process until all the nodes are satisfied or no more connections
can be made.
• R4 (PLRG connectivity method): Let di denote the degree assigned to node i. Solely for
the purposes of assigning links between nodes, makes di copies of each node i. Links are
then assigned by randomly picking two node copies and assigning a link between them. A
copy of each node are taken off the pool after a link between them is assigned {i.e., nodes
are selected in proportion to the “unsatisfied”—assigned degree minus the number of links
already assigned to the node—degree). .We repeat this process until no more copies remain.
Self loop links and links connecting two degree-1 nodes are ignored.
• R5: Similar to R4, R5 makes di copies of each node i. Then starting with the highest degree
node, add one link each from this node to a randomly selected node until all the links belong
to this node are satisfied. A copy of each node are taken off the pool after a link between
them is assigned. In this method, nodes are selected in proportion to the “unsatisfied” degree.
Repeat for the next highest degree node whose degree has not been satisfied.
• R6: Start with the lowest degree node, add one link each from this node to a randomly
selected node. Repeat for the next lowest degree node whose degree has not been satisfied.
• R7: R7 is similiar to R5. However, instead of starting with the highest degree node, R7
starts with the lowest degree node.
Note: Self-loop and duplicate links are ignored. Also, to minimize the number of disconnected com
ponents, a link connected two degree-1 nodes are ignored as well. If the final graph is disconnected,
the biggest component is returned.
C .0 .7 r esu lts
D l — I-
02
0 3 *■
0 4 B
PLRG -
I
g
0.1
3
0.01
CJ
I
0.001
0 .0 0 0 1
10 100 1000 10000
Degree
Figure C.l: Degree Distributions of various deterministic connectivity methods using
PLRG degree distribution (5969 nodes and 10001 links) as the initial distribution
135
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
R2 - -
R3
I
I
0.01
o
g
0.001
0 .0 0 0 1
1 10 100 1000
Degree
Figure C.2: Degree Distributions of various random connectivity methhods using PLRG
degree distribution as the initial distribution
10000 7
D l
D2
, g 0.Q □ BBQ-B BB B O B-BSf I
D2 — M-
D3 *
6
D4 - G-
PLRG * 0.1 1000 PLRG
5
.1
I 4 O.OI 100
Q
3
D l — I
D2
D3 ^
D4 - E
PLRG — I
0.001
2
0 .0 0 0 1
10 10000 20 10 100 1000 10000 100 1000 0 5 10 15 25 1
(a) Expansion (b) Resilience (c) Distortion
Figure C.3: Expansion, Resilience and Distortion of various deterministic connectivity
networks using PLRG degree distribution as the initial distribution
136
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure C.l and C.2 show the final degree distributions of various deterministic connectivity
networks and random connectivity networks, respectively. The initial degree distribution used is
a powerlaw degree distribution with 5969 nodes and 10001 links. The final degree distributions of
these networks are different. They also have different sizes since the connectivity application doesn’t
always resolve in a single connected component but sometimes a forest of connected components.
In the case of multiple components, we always select the biggest one to study.
Figure C.3 shows the expansion, resilience and distortion of various deterministic connectivity
networks. The network with D2 connectivity results in a star topology. Even though the rest of
the networks seem to have degree distributions that follow powerlaws, we found that the large scale
properties of these networks are different from the PLRG (and thus different from the AS and RL
graph). For example, except D2, all the networks have low expansion comparing to PLRG. From
this experiment, we conclude that just because a network has a powerlaw degree distribution does
not necessarily mean that the topology captures the large scale properties of the Internet topology.
10000 7
R 1 1 —
R2 — -
R3 * ....
R4 — E D - -
R5 *
R6
R7 ♦
R i —
R2 - -
6
R4 - Q..
R5
R6 - --o -
R 7 «
1000
5
.i
I 4
I
0.01 100
b
3
0.001
R4 — B-
R5 »
R6 --0
R7 ....
2
1 0.0001
1000 0 5 15 20 25 10 100 10000 10
(a) Expansion (b) Resilience (c) Distortion
Figure C.4: Expansion, Resilience and Distortion of various random connectivity networks
using PLRG degree distribution as the initial distribution
Figure C.4 shows the expansion, resilience and distortion of various random connectivity net
works. We found that, according to our metrics, they all have qualitatively similar behavior. Their
behavior is also similar to that of PLRG, AS and RL. Among this group, R3 and R6 have slightly
(quantitatively) different expansion, resilience and distortion from the rest of the group. This is due
to the difference in the resulting degree distributions (Figure ??). In closer examination, we also
found that Rl and R2 that connect links based on uniform random node selection have slightly
(quantitatively) different distortion comparing to R4, R5, R7 that connect nodes proportion to
assigned node degrees. We observed the same behavior using the degree distributions of another
instance of PLRG (Figure C.5 and other degree-based topologies such as BA (Figure C.6) and
Brite (Figure C.7).
In summary, degree-based generators seem qualitatively similar to the RL and AS topologies
regardless of connectivity method, so long as that method incorporates some notion of random
connectivity and the generated graph’s degree distribution is qualitatively similar to that of the
measured graphs.
137
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
R2 -—-X — -
R3 X
R4 a
R5 -
R6 - - G - -
R7 ♦ ....
î
g
0.01
u
I
u
0.001
0.0001
1000 1 1 0 100
Degree
(a) Degree Distribution
7 10000
R l — I-
R2 — *
R l
R3
R4 -
R5 --
R6 --
6
R4 - o
R5
K 6 - o
R7 ♦
1000 0 . 1
5
I
i
4 1 0 0 0.01
Q
3
R2 -
R3 * ....
R4 — B—• •
R5 »
R6 - e
R7 • ....
0.001
2
1 0.0001
1000 1000 1 0 0 10000 20 1 100 10000 10 0 5 10 15 25 10
(b) Expansion (c) Resilience (d) Distortion
Figure C.5: Degree Distribution, Expansion, Resilience and Distortion of various random
connectivity networks using a different instance of PLRG degree distribution (6280 nodes
and 10698 links) as an initial distribution
138
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Rl H
R2 X -
R3 *
R5
R6 — o —
R7 ....
3
0.01
0
1
u
0.001
0.0001
1000 1 1 0 1 0 0
Degree
(a) Degree Distribution
7 100000
R l — H
R2 — * R2 — *
R3 *-
R4 " B
R5
R6 — -e-
R7 ♦
6
10000
R5
R6
R7 ....
5
1000 0.01
I 4
Q
1 0 0 0.001
3
R2
% -
R6 " 0 "
R7 ♦ ....
0.0001
2
— ^ ^ i L
1000 10000 100000 1
le-05
1 0 1 0 0 1000 10000 100000 1 0 1 0 0 25 1
(b) Expansion (c) Resilience (d) Distortion
Figure C.6: Degree Distribution, Expansion, Resilience and Distortion of various random
connectivity networks using BA degree distribution (20000 nodes and 99975 links) as the
initial distribution
139
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
R2 — -X——
R3 ^ ....
R4 o
R5
R6 - - -o - -
3
0.01
0
1
0.001
0.0001
1 0 1000 1 1 0 0
Degree
(a) Degree Distribution
10000
R2 - - X -
R3 *-
R4 --G--
R5
R6 ■ - o •
R7 •
1000
I
I 1 0 0 0.01
a
R l -
R2 -
1 0 0.001
0.0001
1000 10000 1 0 2 0 25 1 0 1 0 0 0 5 15
R l — I
R2 —>
R3 ....
R4 - e
R5 - - -■
R6
R7 *
(b) Expansion (c) Resilience
10 100 1000 10000
Ball Size
(d) Distortion
Figure C.7: Degree Distribution, Expansion, Resilience and Distortion of various random
connectivity networks using Brite degree distribution (5000 nodes and 9996 links) as the
initial distribution
140
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A p p en d ix D
Parameter Space Exploration
For a given sized graph, the power-law random graph takes a single parameter: /3 (Section 3.1.2 of
paper).
The parameters of Transit-Stub (TS) are listed in the order they appear in the table: the
number of stub domains per transit-node, the number of random transit-to-stub edges, the number
of random stub-to-stub edges, the number of transit domains, the edge probability among transit
domains, the number of nodes per transit domain, the edge probability among nodes in a transit
domain, the number of nodes per stub domain, and the edge probability among nodes in a stub
domain
The parameters of Tiers are listed in the order they appear in the table: the number of WANs
(limited to 1 in the current implementation), the number of MANs per WAN, the number of LANs
per MAN, the number of nodes per WAN the number of nodes per MAN, the number of nodes per
LAN, the intranetwork redundancy for WAN nodes, the intranetwork redundancy for MAN nodes,
the intranetwork redundancy for LAN nodes, the internetwork redundancy for MAN to WAN, and
the internetwork redundancy for LAN to MAN.
The parameters of the Waxman generator include the number of nodes in the topology, an
a value, and a /? value (the latter governs the extent of geographic bias and the former the link
probability).
141
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Topology Number of Nodes Average Degree Comment
PLRG 8037 2J9 2.550144
9114 3.47 2.358213
9230 4.46 2.246677
10091 4.61 2.253182
TS 1008 2J8 3 0 0 6 0.55 6 0.32 9 0.248
1008 2.51 3 0 0 6 0.6 6 0.45 9 0.57
1008 2^1 3 5 10 6 0.55 6 0.32 9 0.248
1008 2^4 3 10 20 6 0.55 6 0.32 9 0.248
1008 2^7 3 20 40 6 0.55 6 0.32 9 0.248
1008 2.96 3 40 80 6 0.55 6 0.32 9 0.248
1008 2^« 3 50 100 6 0.55 6 0.32 9 0.248
1008 3.14 3 75 200 6 0.55 6 0.32 9 0.248
1008 3.38 3 100 400 6 0.55 6 0.32 9 0.248
1008 3.99 3 200 800 6 0.55 6 0.32 9 0.248
1008 {2.84,2.89,3.00,3.19} 3 0 {50,100,200,400} 6 0.55 6 0.32 9 0.248
1008 {2.90,3.00,3.19,3.59} 3 {50,100,200,400} 0 6 0.55 6 0.32 9 0.248
2550 2^9 1 0 0 1 0.5 50 0.05 50 0.05
2550 2jW 1 5 5 1 0.5 50 0.05 50 0.05
2550 2jW 1 10 10 1 0.5 50 0.05 50 0.05
2550 5.01 1 0 0 1 0.5 50 0.1 50 0.1
5550 3.44 3 8 12 10 0.4 15 0.25 12 0.27
10100 4.98 1 0 0 1 0.2 100 0.05 100 0.05
Tiers 1000 2.81 1 20 4 200 20 5 9 9 1 9 1
5000 2^3 1 50 10 500 40 5 20 20 1 20 1
10000 2^7 1 100 10 1000 50 4 3 3 1 3 3
10000 2.47 1 100 10 1000 50 4 6 6 1 3 3
10000 2.68 1 100 10 1000 50 4 10 10 1 10 3
10000 3.09 1 100 10 1000 50 4 20 20 1 20 3
10000 2^5 1 50 20 1000 100 4 3 3 1 3 3
10500 2.72 1 50 50 500 100 2 3 3 1 3 3
10500 2.12 1 100 0 500 100 0 6 6 1 3 3
Waxman 1000 5.06 1000 0.050 0.20
1762 2T# 5000 0.005 0.05
4476 2j& 5000 0.005 0.10
5000 7j% 5000 0.005 0.30
5000 10.82 5000 0.005 0.50
4444 2.79 5000 0.010 0.05
4967 5.03 5000 0.010 0.10
5000 14.42 5000 0.010 0.30
Figure D.l: Parameters explored for structural generators
142
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A pp en d ix E
Policy-induced ball growing
In computing balls of radius h, our definition includes all nodes (and links) to whom the shortest
path from the center of the ball is less than or equal to h (Section 3.2.1). For the AS and RL graphs,
we extend the definition to account for policy routing; we call this extension the policy-induced
ball growing. In computing a policy-induced ball of radius h, we include all nodes to whom the
policy path from the center of the ball is less than or equal to h, and only include links that lie
on policy-compliant paths to those nodes. We use a sophisticated policy model reported in [68] in
determining the policy paths. We describe this policy model here briefly, and the reader is refered
to [68] for more detail.
(h=4)
Providei^Customer
^ y .E (h = 2 )
D (h = 2 / ' '
G (h=3)
C (h = l)
A(h=
O H (h = l)
Figure E .l: AS annotated graph with A as the center of the ball
At the AS level, an AS map is first obtained from BGP routing tables. We then use the tech
nique proposed by Gao [28] to infer the relationships between ASs, e.g. whether a link (relationship)
between two ASs is a provider-customer, peer-peer or sibling-sibling link (relationship). After the
AS map is annotated with relationships, the policy path between any two nodes is the shortest
path that doesn’ t violate any provider-customer relationship. In other words, once a path traverses
down a customer AS, it will never traverse up to a provider AS. In computing a policy-induced
ball of radius /i, after a node is randomly selected as the center of a ball, the distance between
the center node and every other nodes is determined according to their shortest policy paths. The
subgraph within a ball of radius h then comprises nodes whose distance is less than or equal to
h and links that lie on their policy paths to the center node. For example, suppose node A in
figure E.l is the selected as the center of a ball. Then a ball of radius 3 includes nodes A, B, C, D,
143
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
E, G and H and links (A,B), (A,C), (A,H), (B,E), (C,D) and (E,G). A bail of radius 4 includes all
nodes and links in the ball of radius 3 plus node F and links (D,E) and (E,F).
In the RL graph, we generate an AS overlay graph on top of the RL graph and annotate the
peering relationships between ASs using the method described in [68]. Our previous paper [68]
contains the detailed methology for generating an annotated AS overlay map. To compute the
policy path between any two RL nodes, we first compute the corresponding AS level policy paths
between them, then select the shortest router hop paths within these sequences of AS paths. A
subgraph within a ball of radius h on the RL graph includes all router nodes whose distance from
the center nodes is less than or equal to h router-hops and links that lie on their policy paths to
the center node.
144
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Appendix F
Degree vs. Size
To understand the relationship between size and degree, in addition to the analysis of the correlation
between size and degree of the AS map (Chapter 6), we also include the analysis of the following
maps in our study.
• The actors map. In this map, a node represents an actor, a link exists between two actors
if both actors appeared in the same movie, and the size associated with each actor is the
number of movies that the actor participated in.
• The actresses map. In this map, a node represents an actress, a link exists between two
actresses if both actresses appeared in the same movie, and the size associated with each
actress is the number of movies that the actress participated in.
• The airline map (04/17/2000). In this map, a node represents an airport, a link between
two nodes represents the existence of a direct flight between them, and the size associated
with each airport is the amount of flights originated or ended at that airport.
I-
u
!
I
I
J 0.001
S'
g 0.0001
t
u
0.01
0.01
0.001
le-05
le-06 0.0001
100 1000 10000 100 10 1000
Size (Number of Movies)
(a) Size
Degree
(b) Degree
Figure F .l: Actors Map: Correlation of size and degree = 0.77
Figure F.l and F.3 show the size and degree distributions of the actors map and the airline
map, respectively. The distributions of both size and degree of the two maps are highly variable.
145
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
I
3
0.01 u
I
0.001
0.0001
10 100 1000
1
0.1
0.01
0.001
0.0001
le-05
le-06
10 100 1000 10000
Size (Number of Movies) Degree
(a) Size (b) Degree
Figure F.2: Actresses Map: Correlation of size and degree = 0.63
I
1
0.1
0.01
0
1
o
0.001
o .
0.0001
10 100 1000 10000 100000
1
0.1
0.01
0.001
0.0001
1 10 100 1000
Size (Number of Flights) Degree
(a) Size (b) Degree
Figure F.3: Airline Map: Correlation of size and degree = 0.80
146
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
However, the correlation between the size and degree of those maps are not as strong as that
discovered in the Internet map.
147
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Design issues in large-scale application -level routing
PDF
Adaptive energy conservation protocols for wireless ad hoc routing
PDF
Directed diffusion: An application -specific and data -centric communication paradigm for wireless sensor networks
PDF
Diagnosis and localization of interdomain routing anomalies
PDF
Algorithms for performance and trust in peer -to -peer systems
PDF
Evaluating the dynamics of agent -environment interaction
PDF
Design of wireless sensor network based structural health monitoring systems
PDF
Architecture -independent programming and software synthesis for networked sensor systems
PDF
Applying aggregate-level traffic control algorithms to improve network robustness
PDF
Adaptive routing services in ad-hoc and sensor networks
PDF
Data -driven facial animation synthesis by learning from facial motion capture data
PDF
A hybrid systems modeling framework for transport protocols
PDF
Distributed annotation framework supporting collaborative filtering of information
PDF
A syntax-based statistical translation model
PDF
A modular approach to hardware -accelerated deformable modeling and animation
PDF
Consolidated logic and layout synthesis for interconnect -centric VLSI design
PDF
Extendible tracking: Dynamic tracking range extension in vision-based augmented reality tracking systems
PDF
Application-specific external memory interfacing for FPGA-based reconfigurable architecture
PDF
Experimental evaluation of a distributed control system for chain-type self -reconfigurable robots
PDF
Composing style-based software architectures from architectural primitives
Asset Metadata
Creator
Tangmunarunkit, Hongsuda
(author)
Core Title
Characterizing Internet topology, routing and hierarchy
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Computer Science,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Govindan, Ramesh (
committee chair
), Shenker, Scott (
committee member
), Estrin, Deborah (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-528992
Unique identifier
UC11336026
Identifier
3140561.pdf (filename),usctheses-c16-528992 (legacy record id)
Legacy Identifier
3140561.pdf
Dmrecord
528992
Document Type
Dissertation
Rights
Tangmunarunkit, Hongsuda
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA