pimpmynetwork.org

Global Internet Exchange

Version 0.2.0
Adam Armstrong

Contents

1. Introduction
2. Overall Design
3. Multiple RIB route-server
4. Ingress Community Tagging
5. Egress Engineering Filtering
6. Ingress Route Selection

1. Introduction

The this document is intended to be an exercise in the creation of a commercial global peering platform using an MPLS network to create a global VPLS instance across which traffic can be passed between peers on metro ethernet networks at layer 2.

The Basics

There exist today numerous Internet Exchanges around the world. They're predominantly ethernet-based, such as the LINX, DE-CIX and AMSIX. Ethernet has been standardised upon because of its low cost, reliability, ease of implementation and almost unmatched speeds. Scalability is of paramount importance to most existing exchanges, as their traffic has been growing enourmously over the past 12 months and continues to do so.

Because of the ubiquity of Ethernet, we'll use it as the access media. The MPLS network may be constructed of either Ethernet or SDH technologies, depending upon the distances involved and the amount of traffic the platform is required to carry.

Most IXes have traditionally left the session configuration down to individual members, with members contacting eachother and configuring individual sessions between eachother. Many IXes have also implemented Multi-Lateral Peering (MLP), where each member peers with a route-server which automatically distributes their routes to each other member which has peered with the route-server.

As this peering platform is commercial and we want as much traffic as possible to flow across it, it therefore makes sense to take the MLP approach, and require the use of route-servers.

The Design

The obvious design for the platform is to take a globally provisioned ethernet VPLS instance as the base and configure two or more route servers to distribute routes across the network. It's a relatively standard design, very much following the form of existing metro-scale peering platforms.

Additional security could be added to the platform by introducing prefix-lists either statically maintained, or automatically generated from RIPE or another RIR's routing registry. Customers could also be restricted to only establishing peering sessions with the route-server.

This basic design allows each customer to send/recieve routes to/from each other customer.

The Flaws In The Design

Several serious flaws will become apparent with the design once a number of peers are connected to it.

A primary flaw is the lack of flexibility in route-selection due to limitations in BGP when using traditional implementations. Lets use AS286 as an example.

AS286 is present in both London and Frankfurt and they announce the same prefixes to the route-server from both locations. When the route-servers recieve the prefixes from the London and Frankfurt peers, they use the standard BGP route-selection process to chose a single best route for each prefix. This means that one of the locations will never be used as a destination for traffic, because it will always lose the route-selection process!

The issue can be further higlighted using AS2000 and AS286 an example. AS286 sends us a route in Frankfurt and London, the Frankfurt route takes precedence and is sent to all peers. AS20000 in both London and Amsterdam then starts sending traffic across europe to reach AS286 in Frankfurt, ignoring London! Similarly, routes announced from AS2000 in Amsterdam may win the route-selection process, and recieve traffic from both the London and Frankfurt AS286 peers.

As the diagram below shows, this would cause unnecessary utilisation of costly intercity and international connectivity rather than relatively cheap metro connectivity.

The problem would cause similar problems for peers present on different continents. Assume AS286 is present in Seattle and London and AS20000 is present in Frankfurt. If the Seattle route took precedence over the London route, AS20000 would send its traffic all the way to Seattle to reach AS286, rather than taking the much shorter London route!

Correcting The Design

A number of things have been assumed :

  1. Peers can and will be present at multiple locations.
  2. Peers wish to be able to control to whom and where their routes are announced.
  3. Peers wish to know geographically where and from whom routes have originated.
  4. Route distribution should be based on a route-server model.

In order to achieve this, we need to do a number of things on the route-servers:

  1. Use a seperate RIB on the route-server for each peer, to allow shortest-route selection and ensure traffic is directed to the peer's geographically closest neighbour.
  2. Tag routes with communities identifying their peer-as, region, country and city to allow customers to control which routes they accept from us and at what preference.
  3. Filter outgoing announcements based on customer-set communities to allow customers to control where their routes are announced.

4. Ingress Community Tagging

In order to provide peers with the information they need to filter routes and to provide the route-server with the information it needs to build the individual RIBs, we'll tag each route as it's announced to us with a series of communities :

45666:<Peer AS>
45666:<Region ID>
45666:<Country ID>
45666:<City ID>

For example, a peer in London would be tagged with the following communities :

45666:8575
45666:64500
45666:65000
45666:64900

The downstream peer could then use this information to raise or lower the preference of routes based on their geographic origin. For example, a peer may not want to use any routes coming from the same country as themselves, so would filter routes with the community 65000:65000 if they were in the UK.

router bgp 45666
  neighbor 10.0.0.43 route-map as8575-in in
!
route-map as8575-in permit 20
  set community 45666:8575 45666:64500 45666:65000 45666:64900 additive
!

3. Multiple RIB Route-Server

The use of multiple routing information bases (RIBs) within the route server is the most important correction to be made.

As previously explained, a traditional BGP speaker has only a single 'live' BGP database, and it can only store a single version of each prefix in that database. This causes the entire peering platform's BGP configuration to be sub-optimal. This was clearly demonstrated in the above examples and diagram.

This is solved by creating a seperate routing table for each peer connected to the route-server, so that routes can be filtered and preferenced for each peer before the routes are announced. This must be done on the route-server because BGP is unable to transmit two versions of the same prefix, any new version merely over-writes the previous one.

Multiple RIBs allow us to build complex filters within the route-server to select routes based on their suitability for the peer they'll be announced to using information from communities imposed at ingress. This means that provided we have suitably configured filters, peers will always talk to the closest exit point to them, approximating hot-potato routing.

 

5. Egress Engineering Filtering

For much the same reasons that we'd provide peers with the information to filter the routes we advertise to them, we'd provide them with a mechanism to prevent their routes being used by peers they don't want to reach across the peering platform. This could take the form of completely blocking the announcement based on peer as, region, country or city, or of prepending announcements based on the same attributes.

For example, to block announcing a route to an entire geographic region the peer would tag their routes with a community of the form 64999:<region id>. They could then selectively allow announcements to cities or countries by using a community of the form 64888:<city/contry id>.

This allows for a great deal of flexibility for the customer without the intervention of the platform's engineering resource.

router bgp 45666
  neighbor 10.0.0.43 route-map as8575-out out
!
ip community-list extended illegal-communities 45666
!
ip community-list standard as8575-deny 65000:8575
ip community-list standard as8575-permit 64999:8575
!
ip community-list standard london 45666:65000
ip community-list standard london-deny 65000:65000
ip community-list standard london-permit 64999:65000
!
ip community-list standard uk 45666:65500
ip community-list standard uk-deny 65000:65500
ip community-list standard uk-permit 64999:65500
!
ip community-list standard europe 45666:64900
ip community-list standard europe-deny 65000:64900
ip community-list standard europe-permit 64999:64900
!
ip community-list global-deny 65000:0
!
route-map as8575-out deny 10
  match community as8575-deny
!
route-map as8575-out permit 20
  match community as8575-permit
  continue 200
!
route-map as8575-out deny 30
  match community london-deny
!
route-map as8575-out permit 40
  match community london-permit
  continue 200
!
route-map as8575-out deny 50
  match community uk-deny
!
route-map as8575-out permit 60
  match community uk-permit
  continue 200
!
route-map as8575-out deny 70
  match community europe-deny
!
route-map as8575-out permit 80
  match community europe-permit
  continue 200
!
route-map as8575-out deny 90
  match community global-deny
!
route-map as8575-out permit 100
  continue 200
!
route-map as8575-out permit 200
  match community london
  set metric 50
!
route-map as8575-out permit 210
  match community uk
  set metric 100
!
route-map as8575-out permit 220
  match community europe
  set metric 150
!
route-map as8575-out permit 230
  set metric 200
!
route-map as8575-in permit 10
  match community illegal-communities
  set community no-advertise
!

An example of egress filters for a london based peer, AS8575.

The system could be extended further to allow prepending of the peering point's AS onto announcements perhaps using 65001:<id> to prepend once and 65003:<id> to prepend three times.

6. Completed configuration

The configuration below shows neighbour statements, community-lists, prefix-lists and route-maps for an imaginary peer with as 8575 located in London, UK.

router bgp 45666
  neighbor 10.0.0.43 remote-as 8575
  neighbor 10.0.0.43 description FooBar Networks (AS-FOOBAR)
  neighbor 10.0.0.43 send-community
  neighbor 10.0.0.43 soft-reconfiguration inbound
  neighbor 10.0.0.43 maximum-prefix 100
  neighbor 10.0.0.43 prefix-list AS-FOOBAR in
  neighbor 10.0.0.43 route-map as8575-in in
  neighbor 10.0.0.43 route-map as8575-out out
  neighbor 10.0.0.43 attribute-unchanged
!
ip prefix-list AS-FOOBAR seq 5 permit 10.0.0.0/22
ip prefix-list AS-FOOBAR seq 10 permit 10.0.2.0/24
ip prefix-list AS-FOOBAR seq 15 permit 10.0.0.0/24
ip prefix-list AS-FOOBAR seq 20 permit 10.32.0.0/19
!
ip community-list extended illegal-communities 45666
!
ip community-list standard as8575-deny 65000:8575
ip community-list standard as8575-permit 64999:8575
!
ip community-list standard london 45666:65000
ip community-list standard london-deny 65000:65000
ip community-list standard london-permit 64999:65000
!
ip community-list standard uk 45666:65500
ip community-list standard uk-deny 65000:65500
ip community-list standard uk-permit 64999:65500
!
ip community-list standard europe 45666:64900
ip community-list standard europe-deny 65000:64900
ip community-list standard europe-permit 64999:64900
!
ip community-list global-deny 65000:0
!
route-map as8575-out deny 10
  match community as8575-deny
!
route-map as8575-out permit 20
  match community as8575-permit
  continue 200
!
route-map as8575-out deny 30
  match community london-deny
!
route-map as8575-out permit 40
  match community london-permit
  continue 200
!
route-map as8575-out deny 50
  match community uk-deny
!
route-map as8575-out permit 60
  match community uk-permit
  continue 200
!
route-map as8575-out deny 70
  match community europe-deny
!
route-map as8575-out permit 80
  match community europe-permit
  continue 200
!
route-map as8575-out deny 90
  match community global-deny
!
route-map as8575-out permit 100
  continue 200
!
route-map as8575-out permit 200
  match community london
  set metric 50
!
route-map as8575-out permit 210
  match community uk
  set metric 100
!
route-map as8575-out permit 220
  match community europe
  set metric 150
!
route-map as8575-out permit 230
  set metric 200
!
route-map as8575-in permit 10
  match community illegal-communities
  set community no-advertise
!
route-map as8575-in permit 20
  set community 45666:8575 45666:64500 45666:65000 45666:64900 additive
!

A complete configuration example for peer AS8575

The diagram below charts the flow of prefixes through the various route-maps required to create the desired peer-specific routing tables.

We've now provided our customers both with an optimal routing table based on their geographic location and the tools to fully customise their service and enforce their own peering policies. This could help overcome some resistance from organisations with non-open peering policies

This allows for people to connect to the peering platform but not automatically accept routes from or announce their routes to any other peer. This would be achieved by using the global deny community on their outgoing announcements, and selectively adding communities identifying locations or individual peers they would like to peer with, and configuring ingress filters on their own routers denying all prefixes not bearing communities identifying the areas or peers they wish to peer with.

Additional Considerations

As this platform is a commercial operation, it'll be charged accordingly. Transporting traffic from Frankfurt to Seattle is a costly business, and sufficient profit will have to be made to justify the creation of the platform. However, many potential customers of the peering platform might baulk at the prospect of paying international prices for traffic merely traversing the UK or Frankfurt metro rings, or even for traffic going from Amsterdam to Frankfurt.

To remove this problem, the peering platform could be split up into two or three components:

  1. The primary global peering platform. Utilised by every customer of the service, this would be the flagship of the system.
  2. Metro-based peering across each city ring, available only to subscribers of the other peering platforms, allowing peers within the same city to exchange traffic at very low cost. This would be the 'carrot' to encourage potential customers onto the peering platform.
  3. Several smaller scale platforms encompassing geographic regions, priced proportionally lower than the global platform, based on reduced distance. This would allow the platform to be marketed as a product to increase quality whilst reducing prices.

I would envisage dedicated either a single dedicated port for low usage or multiple ports for high usage customers being used for all three services. This would allow a tiered based billing system to be used, where base the entire port usage has a low cost per mbps, the regional port usage attracts a premium in addition to the base usage totalling less than transit costs, and a premium for international traffic totalling equal to transit. I believe this would create a product capable of appealing to the widest range of potential customers possible.

Diagram showing the completed, multi-tiered peering platform in operation. Not all traffic has been shown for simplicity.

 


©2006 all rights reserved adam armstrong