For a few months, we have been publishing on this blog articles about different elements that can compose an infrastructure: reverse proxies, hypervisors… Today, discover routers: their role, the way they work, and the choices made by NBS System about their use on our infrastructure.
When you want to visit a website, you write its address in the search bar of your browser. The latter sends a request (in the form of a packet) which has to reach the web server of the requested website for you to be able to access it. This server then sends back the page you requested to your browser, where it will be displayed.
For this to be possible, for the requests and their results to reach the right recipient, the protocol used is TCP/IP. It means that each service on the Internet can be identified by a port/IP address couple, either fixed or not, but which enables a continuity in the browsing. In this article, only the IP address is of interest.
The Internet is thus a huge set of IP addresses, gathered in “IP address spaces” or “IP ranges”. One (or several) of these IP ranges constitutes a zone; the official name of these zones is AS, for Autonomous System. The Internet is thus actually not a single united entity, but rather a set of different networks.
However, theses different networks cause a problem: if your IP address and the one of the website you want to visit are not in the same zone, how can your request know in which zone to go, and which way to use?
That is the mission of the routers and the BGP protocol (Border Gateway Protocol), which allow to interconnect, link these zones together, and make them communicate. To understand that, we will follow the path of a request sent by an IP address to another IP address. But first of all, we will focus on the zones themselves.
ASs: whose are they?
We explained in the introduction what are the zones, or ASs, but we must dig a little deeper on the subject. These zones can be handled by any entity providing Internet services: Internet Service Providers (ISPs), hosting providers, etc. These entities can own one or several zones each.
To get an AS, an entity has to contact an organization such as the RIPE (Réseaux IP Européens, French for European IP Networks) or the ARIN (American Registery for Internet Numbers), which gives the entity an IP address space and an AS number (for instance, NBS System’s one is 51335).
The IP addresses contained in this space now belong to the entity, which can distribute them to its clients: that is how an ISP can assign an IP address to your box, and how a hosting provider can provide an IP address for the server of its client’s website.
You thus benefit from an address which identifies you during your browsing and enables you to surf the web; on the other side, the web server of the website you visit also has an IP address.
Let us now take an example: you are a client of the ISP A, which provides you with a box associated with an IP address S (source) contained in the AS AAAA that belongs to the provider. You wish to visit a website, whose IP address D (destination) in contained in the AS ZZZZ, belonging to the hosting provider Z. How can your request reach the zone ZZZZ from the zone AAAA?
Routers: the link between the zones
The first prerequisite is as follows: the zones have to be linked together, which is the role of the routers. These equipments are present in all Internet zones, and they communicate with each other to channel data. A router can be a dedicated physical equipment (such as Cisco or Juniper products), or a simple server equipped with a service that is dedicated to routing (Bird or Quagga on a Linux server for instance).
To continue with our example: the AS AAAA and the AS ZZZZ have at least one router each, which we will call respectively router A and router Z. The ISP A provides Internet access to the hosting provider Z, which means that it links the zone ZZZZ to the rest of the Internet, ie to all other zones. To do that, a connection is established between the two routers, through the BGP protocol.
The BGP protocol is the language spoken by the routers. One of its missions is to set up and maintain this connection between two AS: it is the BGP session, which links two routers. During the initialization of this session (the setup of the connection), the router Z sends the router A which IP address space is contained in its AS (ie the AS ZZZZ). The router A thus knows which IP can be accessed through this AS. In return, the router A sends the router Z everything it already knows: the IP address space of its AS AAAA, the ones of all the other ASs it is linked to, and of the ones with which they are linked, etc. In the end, it covers all of the Internet.
Once this initialization is done, updates are exchanged regularly between the routers. The information about the new AS ZZZZ, recorded by the router A, will thus also be communicated by the latter to all the routers it is connected to, exponentially spreading the information. All connected routers thus know which AS matches the IPs on the Internet, in real time.
To get back to our example: when your request, sent from your IP address S contained in the AS AAAA, reaches the router A, the latter will know that the IP address D to which the request has to be sent is in the AS ZZZZ. Your request will then be sent to the router Z, which will then send in its own network towards the IP address D. The answer to your request will follow the inverse path, and that is how you obtain the requested page on your browser.
It is now time to focus on the second function of the BGP protocol: the determination of the better path, ie is the shorter one. It enables to optimize the sending time of a request, for a faster navigation.
Let us go back to our example. You are still on this IP address S, belonging to the AS AAAA containing the router A; you still wish to visit the website on the IP address D, belonging to the AS ZZZZ containing the router Z.
We simplified the processus before, by imagining a direct connection between the hosting provider Z and the ISP A; actually, the zones of these two entities can be cut off between one or several intermediary zones. There can thus be several available paths enabling to go from the router A to the router Z, as we see in the picture on the right.
The router A will gather, for each possible path, the names of the routers by which the request will have to go through. It then obtains what one calls a ASPATH (AS path) for each possibility, like in the following example (where we limited ourselves to 3 possibilities):
|Path 1 : AS AAAA – AS BBBB – AS ZZZZ
ASPATH : B, Z
| Path 2 : AS AAAA – AS iiii – AS jjjj – AS ZZZZ
ASPATH : i, j, Z
|Path 3 : AS AAAA – AS BBBB – AS iiii –
AS jjjj – AS ZZZZ
ASPATH : B, i, j, Z
The first path only goes through one intermediary zone when the second one goes through two zones, and the third one through three zones; it is obvious that the first path is thus the shorter. In the same way, the first ASPATH shows that there is only one intermediary router, while the last one shows 3.
The shorter path, in this case the Path 1, is the one that is chosen by default by the BGP protocol of the router A which has to forward the request. It enables to optimize the sending of the latter (and thus of its answer), to serve the Internet user as soon as possible.
Of course, there actually is an even greater number of different paths a request can follow. This allows to ensure the continuity of the connections, even if a zone or a router is unavailable: the request will then be rerouted towards the next shorter path. In our example, if the router B is no longer accessible, the first path will not be chosen, in favor of the second path.
NBS System’s routers
Most of the ASs contain several routers that are usually linked to several other routers. That is notably what enables to multiply the paths between two points of the Internet, and to ensure a service continuity to the clients of the entities responsible for the ASs.
NBS System only uses two routers, each linked to a single Internet Service Provider. These routers are connected to a switch, which is itself linked to our firewall which also have a routing function. The latter decide which way to take, upstream from the routers that are directly linked to the ISPs.
It is a commited position we chose, for two simple reasons:
- The link between a router and an ISP is charged: it is thus less expansive to only maintain 2 links (one per router) rather than 4 (two per router).
- Most importantly, this configuration simplifies the administration of the routers. Indeed, our two “frontal” servers are Juniper equipment, thus proprietary software. The firewalls, however, are Linux servers on which Quagga handles the BGP part, and thus the choice of the path. They are open-source tools such as we use a lot in our company, and they are more easily administrable than proprietary softwares, since they are more flexible.
- Technical source: Anthony Le Berre