When Peering Goes Wrong Notable Outages
- by Staff
Peering is the foundation of the interconnected internet, enabling networks to exchange traffic efficiently and cost-effectively. When implemented correctly, peering relationships optimize performance, reduce latency, and enhance the reliability of global connectivity. However, when peering goes wrong, the consequences can ripple across the internet, resulting in significant outages, degraded service quality, and financial losses. Examining notable peering failures provides valuable insights into the challenges and complexities of maintaining a stable and secure internet.
One of the most infamous peering-related outages occurred in 2008, when Pakistan Telecom unintentionally caused a global YouTube outage. The incident began with an attempt to block access to YouTube within Pakistan by announcing a more specific BGP route for YouTube’s IP space. This route was meant to redirect local traffic to a null destination. However, due to a misconfiguration, Pakistan Telecom’s route was leaked to its upstream provider, PCCW Global, which propagated it to the rest of the internet. The result was a global hijack of YouTube traffic, effectively making the platform inaccessible worldwide for several hours. This incident highlighted the dangers of improper route filtering and the cascading effects of BGP misconfigurations.
Another high-profile case occurred in 2010, involving China Telecom. For approximately 18 minutes, China Telecom advertised routes for over 37,000 IP prefixes, many of which belonged to networks outside of China. These announcements redirected significant portions of global traffic through China Telecom’s network, raising concerns about security and data interception. While the incident was later attributed to a configuration error rather than malicious intent, it underscored the vulnerabilities of the BGP system and the potential for even unintentional peering issues to disrupt global connectivity.
In 2014, a routing dispute between Verizon and a major transit provider, Level 3, led to widespread congestion and degraded performance for end users. The disagreement centered around the imbalance of traffic exchange, as Level 3 was sending significantly more traffic to Verizon’s network than it was receiving. The resulting stalemate caused congestion at interconnection points, affecting services such as Netflix that relied heavily on Level 3’s network. This incident highlighted the complexities of managing traffic imbalances in peering relationships and the potential impact on end users when disputes remain unresolved.
In 2017, an improper route announcement by a small ISP in Nigeria, MainOne Cable, caused significant traffic disruptions. MainOne advertised incorrect routes for Google’s IP space, redirecting traffic destined for Google services through its network. The incident affected services such as Google Search, YouTube, and G Suite for users worldwide. While the outage lasted only a short time, it highlighted the risks posed by smaller networks inadvertently affecting global internet traffic. This case also underscored the importance of route validation mechanisms such as RPKI to prevent unauthorized announcements.
Another notable incident occurred in 2019 when Cloudflare, one of the largest content delivery networks, experienced a significant outage due to a peering misconfiguration at a third-party transit provider, Verizon. A misconfigured BGP route leaked from a smaller network, DQE Communications, and was propagated by Verizon. This caused traffic destined for Cloudflare and other networks to be rerouted inefficiently, leading to severe congestion and service disruptions. The incident revealed weaknesses in route filtering practices and the need for stricter controls at both the origin and transit levels of peering arrangements.
Peering failures are not always the result of misconfigurations; they can also stem from deliberate actions. In 2020, an intentional BGP hijack by a malicious actor targeted cryptocurrency services. By announcing false routes for IP addresses associated with major cryptocurrency exchanges, the attacker was able to intercept traffic and potentially exploit it for financial gain. This incident highlighted the growing threat of malicious peering activities and the need for enhanced security measures, such as widespread adoption of RPKI and real-time monitoring of BGP announcements.
The lessons learned from these outages emphasize the importance of robust safeguards and best practices in peering relationships. Effective route filtering is a cornerstone of preventing unauthorized or incorrect route announcements. Network operators must ensure that their routers are configured to accept only authorized prefixes from peers and transit providers, reducing the risk of accidental or malicious route propagation. The adoption of RPKI further enhances this capability by enabling cryptographic validation of route origins.
Monitoring and visibility are equally critical. Tools that provide real-time insights into BGP activity, such as route collectors and anomaly detection systems, allow operators to quickly identify and respond to irregularities. These tools enable proactive measures to mitigate the impact of peering issues before they escalate into widespread outages.
Collaboration and communication among networks also play a crucial role in mitigating peering failures. Peering agreements should include clear policies for traffic exchange, dispute resolution, and incident response. Regular communication between peering partners ensures alignment on technical and operational practices, reducing the likelihood of misunderstandings or misconfigurations.
In conclusion, while peering is essential for the efficient operation of the internet, it is not immune to challenges. Notable outages caused by misconfigurations, disputes, or malicious activities underscore the need for vigilance, robust safeguards, and collaboration among network operators. By learning from these incidents and adopting best practices, the internet community can build a more resilient and secure peering ecosystem, ensuring reliable connectivity for users worldwide.
Peering is the foundation of the interconnected internet, enabling networks to exchange traffic efficiently and cost-effectively. When implemented correctly, peering relationships optimize performance, reduce latency, and enhance the reliability of global connectivity. However, when peering goes wrong, the consequences can ripple across the internet, resulting in significant outages, degraded service quality, and financial losses. Examining…