On November 7, 2017, a bunch of blogs wrote about a route leak created by Level3 that affected a significant amount of users. Route leaks happen all the time, and we persistently monitor them all around the world. Except for the mentioned one, which was not detected by our system. So, for Qrator.Radar team it was vital to get into details of this particular incident and understand why our detectors missed this one. We decided to look into it, but this incident analysis took us some time. However, here's the result.
At first glance there were some Level3 clients whose prefixes were suddenly advertised through Level3. By itself AS_PATH attribute seemed to be correct: client-peer-everything else, this looks legitimate. However, what took our attention, is that prefixes were announced in a limited number of directions and were gone after the incident ended. As an example, there were some new prefixes announced by Bell Canada (AS577), but they were seen only through links between Level3 and it peers: AS2914 (NTT), AS2828 (XO), AS12956 (Telefonica), AS5511 (Orange), AS6830 (Liberty Global).
Such vector of announcing looks very weird, it is hard to imagine that Level3 is announcing prefixes to peering partners, but not announcing to a single customer.
We take a look at these prefixes by using Level3 looking glass that appeared at the moment of the anomaly and then disappeared - and… surprise, they are still there!
All these prefixes are marked with “no export” community. What’s that? “No export” is a well-known community, which restricts ISP to advertise prefixes through eBGP sessions. Since it is a well-known community, its filtering is usually supported automatically by routing software.
However, somehow, one of the Level3 routers - apparently not all of them, maybe few routers went broke. As a result, prefixes marked with “no export” community regarding Bell Canada and 60 other ISPs, were announced to other Tier-1 ISPs. It is hard to say if the “no export” was stripped by Level3 when these prefixes were announced, but in reality, this does not even matter. It is based on a fact that Tier1 ISPs must delete all communities they get from their peering partners, otherwise, it could threaten their network visibility. Anyway, as a result of an import-export between Level3 and its partners, the community was deleted from the route advertisement, and it started spreading globally.
“No export” community is a mere method of traffic management and shaping which primary goal it to lower volume of traffic on dedicated interfaces. So, Bell Canada and other affected ISPs may have experienced an overwhelming traffic overload which increased latency and may have lead to denial of a service for their services and problems for their own end users. The increased volume on single router may also hit Level3 itself. According to our statistic data more than 9000 IPv4 prefixes and 2500 IPv6 prefixes were affected in a coinciding manner. The only difference is the quantity, but this could be explained by the difference whole number of v4 and v6 prefixes. All those prefixes are belonging to Level3 customers.
Was this a route leak? Yes and no.
On the one hand, the route was announced in the directions it never meant to. However, if we take a look at the RFC 7908 describing different types of route leaks - this case is not covered. Maybe we should call it a routing policy incident. Qrator.Radar wants your opinion - is a significant change in routing policy is something that we should monitor too? Should we treat it as a security incident? Feel free to use ‘Contact Us’ button and share your thoughts.