BGP Route Leak prevention and detection with the help of the RFC9234
All the credit is due to the RFC’s authors: A. Azimov (Qrator Labs & Yandex), E. Bogomazov (Qrator Labs), R. Bush (IIJ & Arrcus), K. Patel (Arrcus), K. Sriram.
What are route leaks in the context of BGP routing
According to RFC7908: “A route leak is the propagation of routing announcement(s) beyond their intended scope. That is, an announcement from an Autonomous System (AS) of a learned BGP route to another AS is in violation of the intended policies of the receiver, the sender, and/or one of the ASes along the preceding AS path. The intended scope is usually defined by a set of local redistribution/filtering policies distributed among the ASes involved. Often, these intended policies are defined in terms of the pair-wise peering business relationship between ASes (e.g., customer, transit provider, peer).”
Based on that definition, RFC9234 takes “route leak” a step further to explain how exactly BGP route leaks occur in the wild: “Route leaks are the propagation of BGP prefixes that violate assumptions of BGP topology relationships, e.g., announcing a route learned from one transit provider to another transit provider or a lateral (i.e., non-transit) peer or announcing a route learned from one lateral peer to another lateral peer or a transit provider.”
Combining both descriptions with the accumulated experience, we can state that a BGP route leak is an unintentional propagation of BGP prefixes beyond the intended scope that could result in a redirection of traffic through an unintended path that may enable eavesdropping or traffic analysis, and may or may not result in an overload or complete drop (black hole) of the traffic. Route leaks can be accidental or malicious but most often arise from accidental misconfigurations.
And by creating a route leak, the leaker becomes a link between different regions without obtaining income. But the main problem is not even the lost profit.
First, packets must traverse a longer distance when a route leak happens in a third country. And as we understand, it results in much bigger delays. Secondly, packets can be lost due to insufficient settings if they don't get to the receiving party.
How often do they occur and what are the risks
According to Qrator Labs’ BGP incidents report for Q3 2022, there were 12 103 554 BGP Route Leaks, originated by 3030 unique route leakers in Q3.
Of course, not all of them had enough propagation to be visible globally, although according to Qrator.Radar data, in Q3 2022 there were 6 global route leaks, in Q2 2022 - 5 global route leaks, and 4 happened in Q1 2022. And global route leaks occur much more frequently than BGP global hijacks, at least since 2021.
The common effects of an active BGP route leak may vary from increased network delays for the victim (originator of a prefix) to DoS for both the victim and the leaker. The exact scope and the scale of consequences are impossible to confirm without being a party of the leaked BGP session, but some clues could be gathered with the help of traffic monitoring for the affected autonomous systems and their resources.
“BGP routing incidents can be problematic for a range of reasons. In some cases, they simply disrupt the flow of legitimate Internet traffic while others can result in the misdirection of communications, posing a security risk from interception or manipulation. Routing incidents occur with some regularity and can vary greatly in operational impact.” - Doug Madory, Kentik
If you think that only small ISPs fail - it is untrue. That is the example from the beginning of August - a classical situation when traffic between two Tier-1 ISPs was rerouted, which lasted for several hours. And if the leaker ISP were a small network - the amount of traffic would be so immense that it could cause a DoS on a global scale. And we have seen this type of situation many times previously. But luckily enough, in this particular situation presented on the slide, ISP was big enough to keep up with all the traffic rerouted its way.
What are current options in preventing and remediating route leaks
Existing approaches to leak prevention rely on marking routes by operator configuration, with no check that the configuration corresponds to that of the eBGP neighbor, or enforcement of the two eBGP speakers agreeing on their peering relationship.
Right now except for the AS-SETs, there is virtually no option to deal with route leaks. This is why BGP Roles have three “dimensions” if we can call it that way: Preventing, Detecting and Check the Third Party Configuration. Currently, with the help of BGP Communities, we could only try to Prevent.
Can we measure the effects of the route leaks? If you have different data monitoring tools, you can correlate the impact on the data and ongoing BGP incidents.
On the left part of this slide, you can see how a spike in traffic volume coincides with the route leak. On the right, you can see that they will be different if you try to visualize traceback during and after the incident; it could include additional countries and other parameters. You can see the increased RTT during the incident and monitor the number of dropped packets.
What to do if you don't want to be affected by route leaks? First, establish if your data is indeed affected by a BGP incident. Then try to find the guilty party. After, with the help of Whois or a similar service, try to find the email contacts of such an actor. Then write them a complaint and wait for an answer. This method usually works, but you should also understand that it takes a lot of time to solve the problem.
What are the tweaks? The first one is to abuse the BGP route loop prevention mechanism - prependyour leaked prefix with leaker ASN between your ASNs. How does it work? First, you will pass a neighbor check, and then you will pass the origin ASN check. Lastly, a leaker won't receive this route because of the BGP route loop prevention mechanism. So, by doing this, your reannounced prefix won't be leaked anymore.
And also, there is a practical solution if you have a big peering network. First of all, try to understand what your region of interest is. Then you need to find the most significant ISP in this region and connect to them. If one of them accepts route leak, then send them the sub-prefixes directly. It doesn't solve the whole problem, but a big percentage of your traffic would be returned to you, so you won't suffer as much as you could.
How route leaks problematics change with RFC9234 adoption
In the old world of route leaks, their detection relied on communities. They were set on the ingress and were checked on the egress - relatively simple. But the problem was that this solution was always one mistake from failure. As an ISP, a route leak occurs if your customer forgets to create an ingress filter or forgets to create an egress filter. He may forget to do both, but a route leak still occurs.
RFC9234 gives a meaningful tool to prevent and detect BGP route leaks by enhancing the BGP OPEN message to establish an agreement on the peering relationship on each eBGP session in order to enforce appropriate configuration on both sides. Propagated routes are then marked according to the agreed relationship - an in-band method with the new configuration parameter - BGP Role, which is negotiated using a BGP Role Capability in the OPEN message. An eBGP speaker may require the use of this capability and confirmation of the BGP Role with a neighbor for the BGP OPEN to succeed.
There is also an optional, transitive BGP Path Attribute, called “Only to Customer” or OTC, which prevents ASes from creating leaks and detects leaks created by the ASes in the middle of an AS path.
What is a BGP Role? It is your peering relationship with your neighbor. You only have a few peering relations, which could be: provider, customer, peer, route server and route server client. That's mostly all. You can mark all your neighbors with these easily. And in code, this configuration parameter is translated into BGP capability code, and this code is negotiated during the BGP session establishment process.
During open exchange there is a check that the provider-customer pair is a correct pair. But what happens if one side configures the provider role and its counterpart peer role? If someone misclicks, the BGP session won't establish.
Now let's get back to the route leaks. As already mentioned, route leaks are straightforward. They happen when a prefix received from one provider or peer is advertised to another provider or peer. In other words, we can transform it in the following rule. Once a prefix is advertised to a customer, it should go only downstream to the customer, to an indirect customer, and so on. And to guarantee that this rule is not violated, we added a new BGP attribute called Only-To-Customer. How does it work?
When a provider sends a prefix to its customer, it sets the OTC attribute with the value of its own autonomous system. If this attribute is not set, the customer also adds it with the value of its neighbor autonomous system. Important point - it doesn't matter who sets the attribute; the value is the same.
The OTC attribute does not change during its lifetime. And on the other side, on the right side of the slide, it is double checked. The customer first checks that if OTC is set, it must not send its prefixes to other providers and peers. And the same check does the provider piece on the other side.
So it is a double set, double-checked. And if this time the customer fails to configure his filters - nothing will happen because the provider will be able to detect route leak instantly.
Here are some formal slides about how OTC works. You can check them later in the RFC document. It's not complex, but there is one crucial point. You may skip them because you don't want to mess with the OTC. OTC is set automatically upon setting the roles. You're setting the roles - OTC works in code. You may look at how it works, but you don't need to configure it. That is the slide that describes how OTC is set.
This slide describes how OTC is checked.
And now we can talk about what we do with route leaks. The document is quite precise about what to do when you detect a route leak. You need to reject the route. All other techniques are flawed, so please don't try to use local preference - if you don’t want to be abused.
Here you can see how hard it is to configure BGP roles on some open-source software. The yellow part is what you need to configure BGP roles with, and OTC will do all the work for you. I hope it's not that hard.
At the bottom, you can see what happens if the roles are configured with mistakes - when a corresponding role does not match. The BGP session won't come up.
And this is what is happening behind the scenes. An OTC attribute is emerging in the route, but you are not configuring it - it's done in the code for you. It's simple.
So, BGP roles and OTC allow you to control your neighbor's configuration. OTC is a transit attribute, a transit signal, that may go from the Tier-1 network or IX to all its direct and indirect downstreams , . It's double-checked on egress. Double set on ingress. And OTC is an attribute that, compared with the community, is highly unlikely to be stripped. And one of the most critical points - it gives a chance to detect route leaks even several hops away from the leaking AS.
Right now and in the future
“MANRS welcomes the announcement of RFC9234 and we strongly believe that implementation of BGP-role will help the community further protect the Internet. The BGP-role mechanism proposed in RFC9234 will help in preventing and detecting most inadvertent BGP misconfigurations that create route leaks. While RPKI can safeguard against route origin hijacks, we also need a mechanism to secure the path and protect against route leaks. Whether it is ASPA, AS-Cone, BGP-role or BGPSec, they all provide necessary mechanisms to safeguard Internet routing.” - MANRS
Of course, the real-world implementation of RFC9234 would be different depending on the role of “Local AS”, adopting the BGP Roles. If you’re an Internet Exchange - you can use it right now. If you’re an ISP using some form of hardware provided by the vendor/s - ask the vendor what are his plans on implementing the RFC9234. Unfortunately, there is no other way.
We consider BGP Roles sufficient to cover (prevent and detect) 80% of BGP route leaks in case major Internet Exchanges and the world’s largest operators (Tier-1 in the first place) adopt RFC9234. 20% would probably remain: broken cases, BGP optimizers that could simply delete the attribute if they want to, something else we haven’t thought about. But the majority of problems with the BGP route leaks should and probably would be solved.
What BGP Roles are not intended to deal with is the hacking activity - they are focused on preventing and detecting errors/misconfigurations.
ASPA and ROA in combination are able to cover the hacking activity related to the BGP routing, in our opinion. ASPA is complementary to BGPSec, although both have a way to go before we see a wide adoption among Internet Service Providers.
At the moment, we are aware that the patches were applied to the three major open-source implementations. What can we say? We are not at the end; maybe that's the end of the beginning. To get rid of route leaks, if we don't really love them, we need as a community to show a desire to get rid of these routing incidents. The similar passion that the community shows to eliminate BGP hijacks with ROA.
If you're using open-source tools, you can already try to set up BGP roles. There is nothing that prevents you from doing this. If you are using some vendor software - great! Send a request for BGP roles support to your vendor of choice. And if you're a developer - even greater! There is a vast space for improvement if you can contribute to other BGP implementations. You can contribute to BMP parsers, TCP dump implementation, BGP dumps, etc.
Let us hope that with the help of RFC9234 we can get at least started with eliminating BGP anomalies for the better Internet.