Introduction To Linux Routers by Andrew Gideon, CTO/VP
Introduction To Linux Routers by Andrew Gideon, CTO/VP
Introduction
A few years ago, I’d the good fortune to move my company’s data center. This permitted us to revisit all aspects of our approach to running a data center, from the basics of power and air conditioning to the switch fabric used to provide connectivity to all of our devices. One of the more interesting choices we made was our switch from dedicated hardware routers to routers built from commodity hardware. Almost three years later, that choice is still looking like an excellent one.
In this and subsequent articles, I’m going to describe how we constructed the routers that we use today. It turns out that there are many different aspects to a routing device, some of which are only peripherally related to the actual movement of packets from one network to another. There are decisions to be made with respect to how the local area network is partitioned, how fault-tolerance is introduced and what connectivity to the wider world of the Internet will be used. We’ll explore these issues as they pertain to the construction and deployment of routers built from commodity hardware.
These articles will take the reader through the actual construction and testing of a router. But I’m hoping to go beyond merely providing a set of recipes. The best network engineer is one that truly understands what routers do and why they do it. This prepares one for the inevitable problems, unexpected cases, and growth opportunities to come.
Why Bother?
The first issue to address is why we’d bother to do such a thing. After all, numerous companies sell perfectly good routers. We’d had good experience with Cisco routers for years. What could motivate making such a large change?
First, a caveat: If we had to move huge volumes of packets as rapidly as possible, we’d probably continue to use dedicated routers. Maybe. When I last researched this subject, the cut-over point was in volumes carried by major backbones. Above that bandwidth, the dedicated devices outperformed commodity hardware. But I’m of the opinion that economics drives the performance of commodity hardware to improve more rapidly than hardware dedicated to a smaller community. Thus, I would expect that cut-over point to increase over time. Will it increase more quickly than the bandwidth needs of the major backbones? It will be interesting to watch and learn.
But that’s not the type of volume our data center produces. We carry tens or hundreds of megabits; not gigabits.
So what was our motivation? There were two prime motivations: cost and cost.
The matter of cost becomes obvious when comparing commodity hardware capable of carrying a full Internet routing table with the equivalent dedicated device. Our strong desire for redundancy at the hardware level only exacerbates this cost difference. Simply put: one can throw a lot of well-endowed general purpose computers at the routing problem for the same price as a pair of capable routers.
The other matter of cost is only slightly less obvious. The size of the population of system administrators comfortable administering a box with a commodity operating system such as Linux is quite a bit larger than the size of the population comfortable administering a box running Cisco’s IOS. That translates into easier and less expensive hiring.
Admittedly, this is not precisely an apples-to-apples comparison. Many of those with experience administering Linux, for example, won’t be very experienced in the capabilities of iproute2. However, many do have a start in this direction, perhaps beginning with a familiarity with iptables for firewalling of individual machines. Since iptables is also the tool of choice for firewalling within a router, experience on individual servers will be applicable to router management. This doesn’t happen with routers running IOS, for example.
Even where new hires do require additional training to extend their system knowledge to the movement of packets, this still offers the advantage that the knowledge base used in managing the routers is roughly the same as the knowledge base used to manage the servers. If nothing else, this simplifies staffing choices by letting the same people work in both areas of the data center.
The benefit of this expands further when one introduces virtualization, where individual servers actually act as self-contained networks of multiple virtual machines.
Another aspect of this benefit arises when we consider again our desire for hardware redundancy. Cisco provides this capability, as one would expect. But it operates differently than the equivalent mechanism in commodity environments. Today, we use the same clustering technology for both our servers and our routers.
Yet another aspect of this benefit arises when we expand our communication needs beyond basic routing. Rather than running different hardware for a VPN, or running a VPN on one of our routers, we simply have yet another commodity device running VPN software. The same applies to firewalls. Using commodity hardware both reduces the cost of introducing such capabilities and maintaining them over time.
First Step
We’re going to start by building a very basic router, and then adding necessary features one-by-one. This will make the process as simple and clear as possible while still ending up with a working router.
To get the most out of these articles, you’ll need to establish a test environment in which you can work. To begin with, we’ll use three computers: the router plus two test devices that will be communicating through the router. For a clear nomenclature, I’ll call the router we’re building SUN and the two test devices MERCURY and VENUS.
For now, we’re going to exchange packets between MERCURY and VENUS through SUN. Later, we’ll introduce using additional routers to reach other networks such as those on the Internet.
These need not be physical computers. This will work just as well if all three devices are guests of your virtualization mechanism of choice. All that is required is that you have the ability to add additional virtual network interfaces (ie. an eth1) to the guests and connect these to network interfaces on the actual computer (the virtual equivalent of plugging a physical network interface into a switch).
At least for now, we’ll assume that all network ports – physical and virtual – are on the same “Ethernet broadcast domain”. That is, they’re all on the same unmanaged switch, or the same bridge on a computer on which the virtualized guests are running.
Needless to say, these devices are expected to be running Linux. The distribution I’ll use for this is Fedora, but I’ll try to remain as agnostic as possible.
For the remainder of this article, our goal is to get MERCURY and VENUS communicating through SUN. To achieve this, we’re going to have to touch upon a fairly wide variety of topics. We haven’t the space to cover each in any depth in this article; we’ll just get enough done to achieve the immediate goal.
In subsequent articles, we’ll discuss each of the topics in far greater depth.
Choosing Addresses
We need to assign IP addresses to each of our two test devices and our router. The two test devices, MERCURY and VENUS, must be on separate networks, otherwise they’ll expect to be able to reach one another without the aid of a router.
RFC1918 defines address spaces for private use. These are to be used locally, with no expectation that these addresses can be used over the Internet at large. To avoid the chance of conflict with working IP addresses, I’m going to use addresses from this private space for these articles. There is still the change, though, that one or more of the networks I’m using will conflict with networks you have within your network.
If that is the case, simply switch the examples I provide to a different network. All the examples will be organized to make this as easy as possible.
The two networks I’m going to use will be 192.168.10.0/24 and 192.168.100.0/24. In this article, SUN will have an IP on each network while MERCURY and VENUS will have an IP on only one network each. This will require SUN to pass packets between MERCURY and VENUS for them to communicate.
Preventing the Firewall from Blocking our Testing
A basic component of Linux is iptables, a sophisticated and powerful firewalling mechanism. This can control the packets sent, received and forwarded by any Linux machine. Forwarding is what routers do: move packets from one network to another.
There’s a fair chance that this is installed and configured on all three of your devices such that it would block our testing. It is almost certainly configured on SUN to prevent forwarding for the same reason, discussed above, that forwarding is disabled in the kernel by default. It is also fairly likely that MERCURY and VENUS have firewalls that would block our test traffic.
In subsequent articles, we’re going to cover iptables in greater depth, and discuss using its control over packet receipt and forwarding to permit exactly what traffic we want and no more. For now, we’re simply going to assure that iptables isn’t blocking anything so as to let our tests proceed.
Iptables is controlled by rules organized into rule sets stored in tables. The basic control over packets received is accomplished by the INPUT rule set in the filter table. The basic control over packets sent is accomplished by the OUTPUT rule set in the filter table. The basic control over packets forwared – or routed – is accomplished by the FORWARD rule set in the filter table.
To examine the INPUT rule set:
/sbin/iptables -t filter -nL INPUT
Similarly, one could:
/sbin/iptables -t filter -nL OUTPUT
or
/sbin/iptables -t filter -nL FORWARD
to examine the OUTPUT or FORWARD rule sets respectively.
Each rule set consists of zero or more rules. Each rule set also has a policy: this policy defines what happens to packets that are not matched by any rules.
Firewall on MERCURY and VENUS
The first step is to get the firewalls on MERCURY and VENUS out of the way of our testing.
The simplest solution is to simply remove all rules from the rule set, and to set the policies such that all packets are permitted. This is accomplished on both MERCURY and VENUS via:
/sbin/iptables -t filter -P INPUT ACCEPT
/sbin/iptables -t filter -P OUTPUT ACCEPT
/sbin/iptables -t filter -F INPUT
/sbin/iptables -t filter -F OUTPUT
The -P option sets the policies for the INPUT and OUTPUT rule sets to ACCEPT. The -F option removes – flushes – all rules in the rule set, leaving all packets subject to the policy of that rule set.
If this is too open, an alternative for now is simply to add rules that permit the devices to send and receive packets from one another. On MERCURY:
VENUS=192.168.100.22
SUN=192.168.10.20
NIC=eth1
/sbin/iptables -t filter -I INPUT 1 -i ${NIC} -s ${VENUS} -j ACCEPT
/sbin/iptables -t filter -I OUTPUT 1 -o ${NIC} -d ${VENUS} -j ACCEPT
/sbin/iptables -t filter -I INPUT 1 -i ${NIC} -s ${SUN} -j ACCEPT
/sbin/iptables -t filter -I OUTPUT 1 -o ${NIC} -d ${SUN} -j ACCEPT
This adds to the top of the INPUT ruleset a rule which will accept packets on the NIC being used for our testing and with a source IP address that belongs to VENUS. This also adds to the top of the OUTPUT ruleset a rule which will permit transmission of packets to the NIC being used for testing when those packets have a destination IP address that belongs to VENUS.
It then does the same thing for packets from and to SUN. While this isn’t strictl required to exchange packets with VENUS, it will aid testing by letting us examine connectivity between SUN and MERCURY.
Similarly, on VENUS:
MERCURY=192.168.10.21
SUN=192.168.100.20
NIC=eth1
/sbin/iptables -t filter -I INPUT 1 -i ${NIC} -s ${MERCURY} -j ACCEPT
/sbin/iptables -t filter -I OUTPUT 1 -o ${NIC} -d ${MERCURY} -j ACCEPT
/sbin/iptables -t filter -I INPUT 1 -i ${NIC} -s ${SUN} -j ACCEPT
/sbin/iptables -t filter -I OUTPUT 1 -o ${NIC} -d ${SUN} -j ACCEPT
Firewall on SUN
On MERCURY and VENUS, we permitted packets in and out using the INPUT and OUTPUT rule sets. For SUN, we need to cause it to permit packets to be forwarded. This is our next step, and is done with the FORWARD rule set.
To look at the FORWARD rule set:
/sbin/iptables -t filter -nL FORWARD
This will probably look something like:
Chain FORWARD (policy DROP)target prot opt source destination
This indicates that there are no rules in this rule set. It also informs us that the default policy for this rule set is to drop any packets that would otherwise be forwarded. To change this so that packets are forwarded:
/sbin/iptables -t filter -P FORWARD ACCEPT
If there are any rules, then we’ll want to remove them for now. This is accomplished via:
/sbin/iptables -t filter -F FORWARD
This flushes (or removes) all rules in the FORWARD rule set in the filter table. The result of an empty rule set and a policy of ACCEPT is that the firewall will not prevent the forwarding of any packets.
We’re also going to want to permit packets from both MERCURY and VENUS to reach SUN by changing the INPUT rule set. Strictly speaking, this is not required for SUN to route packets between MERCURY and VENUS. Packets that are being forwarded follow a different path through iptables than those packets being actually received, Thet are subjected only to the FORWARD rule set and not either INPUT or OUTPUT. The INPUT and OUTPUT rule sets are involved only in “local” traffic; not traffic that is being routed.
Though letting packets from VENUS or MERCURY through SUN’s INPUT and OUTPUT rule sets isn’t strictly required, letting SUN receive these packets will simplify our testing by permitting us to test direct connectivity between SUN and the two test devices.
As before, the simplest solution is to simply remove all rules from the INPUT and OUTPUT rule sets, and set the policy to permit any packets:
/sbin/iptables -t filter -P INPUT ACCEPT
/sbin/iptables -t filter -P OUTPUT ACCEPT
/sbin/iptables -t filter -F INPUT
/sbin/iptables -t filter -F OUTPUT
If this is too open, an alternative for now is to simply add rules that permit only the packets needed for our testing. This is essentially what we did for MERCURY and VENUS, but for a pair of IP addresses:
MERCURY=192.168.10.21
MERCURY_NIC=eth1
VENUS=192.168.100.22
VENUS_NIC=eth1
/sbin/iptables -t filter -I INPUT 1 -i ${VENUS_NIC} -s ${VENUS} -j ACCEPT
/sbin/iptables -t filter -I INPUT 1 -i ${MERCURY_NIC} -s ${MERCURY} -j ACCEPT
/sbin/iptables -t filter -I OUTPUT 1 -o ${VENUS_NIC} -d ${VENUS} -j ACCEPT
/sbin/iptables -t filter -I OUTPUT 1 -o ${MERCURY_NIC} -d ${MERCURY} -j ACCEPT
This adds to the top of the INPUT ruleset a pair of rules, one of which accepts packets from VENUS and the other from MERCURY. It then adds a pair of rules to the OUTPUT ruleset which permits packets t o be sent to VENUS and MERCURY.
Connecting SUN to MERCURY
To connect SUN to MERCURY, there must be a physical Ethernet connection between them (or the virtualized equivalent connecting two virtualized guests). This can be as simple as a crossover cable between a physical NIC on each device, or perhaps both devices are connected to ports on the same switch.
Once a phyical connection is in place, the logical connection must be established.
On SUN:
NIC=eth1
IP=192.168.10.20/24
/sbin/ip link set ${NIC} up
/sbin/ip addr add ${IP} broadcast + dev ${NIC}
On MERCURY:
NIC=eth1
IP=192.168.10.21/24
/sbin/ip link set ${NIC} up
/sbin/ip addr add ${IP} broadcast + dev ${NIC}
Once this is done correctly, each machine should be able to ping the other. On MERCURY:
SUN=192.168.10.20ping -n ${SUN}
On SUN:
MERCURY=192.168.10.21ping -n ${MERCURY}
Connecting SUN to VENUS
This is quite similar to the SUN’s connection to MERCURY. It can even occur on the same network interface (though this unlikely to be desirable in any real-world solution). In the recipes to follow, I’m going to assume that eth1 on SUN connects to both MERCURY and VENUS. If you can use separate interfaces, that’s even better.
We must be sure, though, even if the MERCURY and VENUS are on the same physical Ethernet, that they cannot exchange IP packets. This is accomplished by placing the two test devices on separate networks as far as IP is concerned: giving them IP addresses that are mutually unreachable.
The first step, as before, is assuring that there is physical connectivity between the router and this test device. Once a phyical connection is in place, the logical connection must be established.
On SUN:
NIC=eth1
IP=192.168.100.20/24
/sbin/ip link set ${NIC} up
/sbin/ip addr add ${IP} broadcast + dev ${NIC}
On VENUS:
NIC=eth1
IP=192.168.100.22/24
/sbin/ip link set ${NIC} up
/sbin/ip addr add ${IP} broadcast + dev ${NIC}
Once this is done correctly, each machine should be able to ping the other. On VENUS:
SUN=192.168.100.20ping -n ${SUN}
On SUN:
VENUS=192.168.100.22ping -n ${VENUS}
Forwarding Packets on your Router
At the most fundamental level, a Router is a device which receives packets and passes them on. This is called forwarding the packets. While this is an easy thing for a computer to do, Linux is typically delivered with this feature turned off. This is for security reasons – one wouldn’t want one’s computer being used as a router unexpectedly.
Enabling this feature is straightforward. To enable it immediately:
/sbin/sysctl -w net.ipv4.ip_forward=1
To enable this for subsequent boots, add to the file /etc/sysctl.conf:
net.ipv4.ip_forward = 1
No Cheating!
In this article, as I wrote previously, I am assuming that you’ve both MERCURY and VENUS on the same Ethernet. While this makes testing cheaper, because it doesn’t require an extra NIC, it does mean that SUN can cheat. And Linux is smart enough to cheat when it is possible.
In this case, if MERCURY and VENUS are on the same Ethernet, SUN will send a special kind of packet – an ICMP redirect – which tells each that the other can be reached directly. From a performance perspective, this is a good thing: If the two devices can communicate directly, they’ll communicate more quickly. But because this bypasses SUN, this defeats some of the purpose of our testing.
To prevent SUN from sending an ICMP redirect packet:
/sbin/sysctl -w net.ipv4.conf.all.send_redirects=0
To enable this for subsequent boots, add to the file /etc/sysctl.conf:
net.ipv4.conf.all.send_redirects = 0
Directing Packets
At this point, SUN is willing and able to forward packets between the two test devices MERCURY and VENUS. If you were to try to ping from one test device to the other, however, the packets will not reach the other device. If your configuration has closely followed what I’ve described, you’ll likely see the error connect: Network is unreachable if you attempt this.
This is a symptom of the Route Discovery Problem, and I’ll have a lot to say about this in subsequent articles. The issue is that MERCURY doesn’t know how to send packets to VENUS, or visa versa. Neither knows that the other can be reached by passing packets through SUN.
For now, we’re going to handle this in the simplest way possible. Each test device needs to know to where it should send packets destined for the other device. We’ll simply hardcode that knowledge for now within each test device. That is, we’ll tell MERCURY how to reach VENUS and VENUS how to reach MERCURY. Later, we’ll explore more scalable solutions to this problem.
Note that we don’t need to tell SUN anything. This is because SUN, thanks to commands we’ve run previously, has IP addresses active on both the network containing MERCURY and the network containing VENUS. When an IP address is added to an interface, a routing table is automatically updated so SUN knows how to reach any other address on the network to which that IP address belongs.
Routing tables are used by a computer/router to know to where packets must be sent – to where they should be routed – to reach their destination. Routing tables are clearly needed by routers. They are also needed by computers – even computers with but a single interface. Most of us deal with computers that have a default route: a route that indicates that packets with destinations otherwise unknown should be sent in the default direction.
Later, we’ll see how to use default routes. For now, we’re going to carefully avoid them, both to simplify our testing and to assure that we gain a complete understanding of how routing tables work. Default routes are wonderful things when used properly, but they can hide a lot of mistakes.
Shortly, we’re going to tell MERCURY how to reach VENUS. This will be accomplished by adding an entry directly to a routing table on MERCURY.
But let’s first look at how SUN already knows how to reach both MERCURY and VENUS and how these test devices know how to reach SUN. Let’s reexamine the command where we added an IP address to VENUS. Before we ran /sbin/ip addr add to add that IP address to VENUS, the routing table on VENUS looked like:
# /sbin/ip route show
192.168.7.160/27 dev eth0 proto kernel scope link src 192.168.7.176
169.254.0.0/16 dev eth0 scope link
[root@venus andrew]#
after we run /sbin/ip addr add…:
# /sbin/ip route show
192.168.7.160/27 dev eth0 proto kernel scope link src 192.168.7.176
192.168.100.0/24 dev eth1 proto kernel scope link src 192.168.100.22
169.254.0.0/16 dev eth0 scope link
The new entry in the routing table indicates that VENUS can reach any IP address on the network 192.168.100.0/24 by sending the packet out interface eth1.
Before the addresses 192.168.10.20 and 192.168.100.20 are added to SUN, the routing table on SUN looks like:
# /sbin/ip route show
192.168.7.160/27 dev eth0 proto kernel scope link src 192.168.7.174
169.254.0.0/16 dev eth0 scope link
Once both /sbin/ip addr add … commands have been executed, the table looks like:
# /sbin/ip route show
192.168.7.160/27 dev eth0 proto kernel scope link src 192.168.7.174
192.168.100.0/24 dev eth1 proto kernel scope link src 192.168.100.20
192.168.10.0/24 dev eth1 proto kernel scope link src 192.168.10.20
169.254.0.0/16 dev eth0 scope link
Two new routes have appeared, telling SUN how to reach IP addresses on the 192.168.100.0/24 and 192.168.10.0/24 networks.
So how to we tell VENUS how to reach MERCURY, and visa versa? We make entries into these routing tables. Recall that MERCURY has an IP address 192.168.10.21 and VENUS has an IP address 192.168.100.22.
On MERCURY:
VENUS=192.168.100.22
SUN=192.168.10.20
/sbin/ip route add ${VENUS} via ${SUN}
What’s occurring is pretty much what the command says: This adds to a routing table the fact that VENUS can be reached via SUN. The routing table on VENUS is now:
# /sbin/ip route show
192.168.100.22 via 192.168.10.20 dev eth1
192.168.7.160/27 dev eth0 proto kernel scope link src 192.168.7.175
192.168.10.0/24 dev eth1 proto kernel scope link src 192.168.10.21
169.254.0.0/16 dev eth0 scope link
The new entry is exactly what we expect: VENUS is reached via SUN using eth1.
We do roughly the same on VENUS:
MERCURY=192.168.10.21
SUN=192.168.100.20
/sbin/ip route add ${MERCURY} via ${SUN}
At this point, packets between MERCURY and VENUS will be directed along their proper routes.
Testing
Having reached this point, MERCURY should be able to send packets to VENUS, and visa versa, through SUN. This can be tested by, on MERCURY:
VENUS=192.168.100.22
ping -n ${VENUS}
To see the router in use:
VENUS=192.168.100.22
traceroute -In ${VENUS}
To test from VENUS:
MERCURY=192.168.10.21
ping -n ${MERCURY}
To see the router in use:
MERCURY=192.168.10.21
traceroute -In ${MERCURY}
Future Topics
In this article, we’ve lightly touched upon a number of subjects. We’ve seen how to get the firewall out of our way. We’ve seen how to cause packets to be forwarded by a computer, turning it into a router. And, we’ve seen a simple way to tell computers how other computers on different networks can be eached. Each of these is just the beginning of a much larger topic that will be explored in future articles.
In addition, we’re also going to address such important issues as redundancy and traffic control in future articles.
Ultimately, the reader of these artices will find him or herself able to populate a data center with a collection of inexpensive yet powerful routers built from commodity hardware and running Linux, likely the same operating system in use on the other servers within the data center. This will save money on both hardware and staff, and also blur the line between “network” and “system” administration.