|
|
|
|
Overview
Global reachability -- when every address is reachable from every other
address -- is the most basic goal of the Internet. It was specified as a top
priority in the original design of the Internet protocols,
ahead of high performance or good quality of service, with the philosophy that
"there is only one failure, and it is complete partition."
However, this is not always the case in practice; traffic may disappear into
black holes and consistently fail to reach the destination. This is
problematic when the outages are not simply transient, as an operator
generally has little
visibility into other ASes to discern the nature of an outage and little
ability to check if the problem exists from other vantages points.
We present Hubble, a system that operates continuously to find Internet
reachability problems in which routes exist to a destination but packets are
unable to reach the destination. Hubble allows us to characterize global
Internet reachability by identifying how many prefixes are reachable from
some vantages and not others, how often these problems occur, and how long
they persist. Whereas previous work focused on reachability within the
narrower context of an AS, testbed, or set of clients, or obtained breadth by
monitoring routes only via BGP, Hubble monitors the data-path to prefixes that
cover 89% of the Internet's edge address space at a 15 minute granularity.
Key enabling techniques include a hybrid passive/active monitoring approach
and the synthesis of multiple information sources, including historical
data and spoofed probes to isolate failures.
Papers
- Studying Black Holes in the Internet with Hubble [ pdf , html ]
E. Katz-Bassett, H. V. Madhyastha, J. P. John, A. Krishnamurthy, D. Wetherall, T. Anderson.
USENIX Symposium on Networked Systems Design & Implementation (NSDI), 2008.
Talks
- Monitoring Internet Reachability Problems with Hubble
E. Katz-Bassett.
Invited talk, Gnomedex, August 2008.
- Studying Black Holes on the Internet with Hubble
E. Katz-Bassett.
Invited talk, 10th CAIDA-WIDE Workshop, August 2008.
- Hubble: Monitoring Internet Reachability in Real Time
E. Katz-Bassett.
Invited talk, Réseaux IP Européens (RIPE) 56, May 2008.
- Real-time Blackhole Analysis with Hubble
E. Katz-Bassett. North American Network Operators Group, June 2007.
Video and
PDF slides available.
Press
- Internet Full of 'Black Holes'
LiveScience,com, www.msnbc.com, www.foxnews.com, and other sites.
April 2008.
- 'Connected' interview, 95.7 KJR FM, April 2008.
- Researchers map Internet's 'black holes', ComputerWorld, April 2008.
- Researchers map Internet black holes, Ars Technica, April 2008.
- Boing Boing,
Digg, and
Slashdot, April 2008.
- Hubble maps the changing constellation of Internet 'black holes', University of Washington News, April 2008.
- Researchers Chart Internet's 'Black Holes', Wired.com, June 2007.
System Description
Hubble consists of three high-level components, each of which
employs various network measurements and techniques:
- Target Identification - Using both active and passive monitoring,
Hubble identifies prefixes likely to be experiencing
problems as targets for further investigation.
- Distributed monitors running on PlanetLab report when a
previously responsive IP stops responding to pings.
- The system monitors RouteViews BGP updates and reports prefixes
experiencing path changes at multiple RouteViews peers.
- Reachability analysis - Hubble assesses the
reachability of the identified target prefixes.
- The system launches traceroutes to destinations in the prefixes
from PlanetLab sites around the world.
- It compares these traceroutes to current BGP snapshots from
RouteViews and to
iPlane
alias information to determine at which router,
prefix, and AS each probe terminates.
- It assigns for probing in the next round any prefixes it finds to
be experincing problems.
- Problem Classification - To aid operators and others in understanding
the problem, Hubble automatically classifies problems
according to three questions:
- Which AS contains the problem?
- The system groups the failed traceroutes and determines in
which AS(s) a substantial number terminate.
- Which routers might be causing the problem?
- The system assesses whether all traceroutes that reach the
AS terminate, or only those through certain routers.
- For each suspect router, the system examines its historical
records to see if the prefix used to be reachable through the
router. If so, it checks if the next router along the path
responds to pings.
- Which destinations are affected?
- Internet routes are often asymmetric, differing in the forward
and reverse direction. A failed traceroute signals that at least
one direction is not functioning, but leaves it difficult or
impossible to infer which.
- We employ an innovative technique using spoofed probes to
isolate the direction of failure five times more frequently than
previous techniques.
|