IP-in-IP Tunneling Over Road Runner With Linux
(Advanced Users Only)
Phil Karn
17 February 1998

If you happened on this paper without first reading Configuring LINUX for San Diego Road Runner please go there first.

You should set up IP-in-IP tunneling in /etc/dhcpsetup and not any other startup script because that is the only way to guarantee that it will not execute until dhcpcd has obtained an IP address and default gateway -- and you need this information to configure the tunnel interface.

The following sections will take you through the configuration of an IP-in-IP tunnel.

Configuring the kernel

You must have IP-in-IP tunneling configured into the Linux kernel. Go into /usr/src/linux, run make menuconfig, select "Network options", set IP: tunneling, save the new configuration and build and boot the new kernel.

Setting up the remote tunnel

The Internet as a whole must route the block of IP addresses you will use on your LAN to the remote tunnel machine. You might obtain a distinct block of IP addresses from IANA for this purpose, or you might take a chunk of the tunnel machine's local network address space and use proxy arp to get it to the tunnel machine -- the details are beyond the scope of this note.

You must also set up the remote tunnel machine to encapsulate packets addressed to this block to your current RR-assigned IP address. For example, if the block of addresses assigned to your home LAN is 199.106.106.0-15 and your current RR-assigned IP address is 204.210.36.102, then if the remote tunnel machine is running Linux you'd issue the command

# route add -net 199.106.106.0 netmask 255.255.255.240 dev tunl0 gw 204.210.36.102
You must update this entry whenever your RR IP address changes, so that your incoming packets are tunneled to the correct place. (I'm currently doing this manually, but I'm working on an automatic mechanism.)

IP source address ingress filtering

Road Runner ingress-filters IP source addresses. That is, all packets you send into the RR network must have your RR-assigned IP address in the source field or they will be silently discarded. This means you must also tunnel your upstream packets, or at least those from the other machines on your LAN.

The kernel gives you virtual tunnel interfaces with names tunl0, tunl1, etc. You generally only need use tunl0 regardless of how many remote tunnel sites you talk to. You must configure the tunnel interface's local address to be your RR-assigned IP address, as this is the address that will appear in the IP source address field of tunneled packets:

# reset the tunnel driver, in case we're restarting
ifconfig tunl0 down
ifconfig tunl0 $IPADDR
The $IPADDR variable will have been set by dhcpcd to the IP address assigned by Road Runner. Note that the standard version of dhcpcd does not set this variable -- it's only in my version.

Next we want to get rid of the default route given to us by the RR DHCP server and replace it with one of our own that points to the tunnel interface. The gateway parameter becomes the destination IP address in the outer IP header, so it must be the IP address of the remote tunnel machine:

route delete default gw $ROUTER
route add default gw $TUNNEL dev tunl0
where $ROUTER is set by dhcpcd to the local RR router and $TUNNEL is been explicitly set at the top of the script to the IP address of the remote tunnel system.

But this creates a problem. As our routing table currently stands, the encapsulated packets will match the default route and be encapsulated again and again until they grow too large and be discarded by the kernel -- this is called an "encapsulation loop". To break this loop, we need a host route for the tunnel system that points to the RR default gateway:

route add -host $TUNNEL gw $ROUTER
where $TUNNEL is again set to the IP address of the remote tunnel system. The $ROUTER variable is automatically set by dhcpcd to the default Road Runner gateway.

Now your Linux system and all other systems behind it can send packets over Road Runner. To the external Internet, however, they will appear to have come from the network on which the remote tunnel sits.

Accessing Road Runner servers

Road Runner's servers (such as they are) are generally configured to block or reject requests from outside Road Runner. For example, a firewall keeps you from connecting to port 119 (NNTP - netnews) on 204.210.0.2 from a non-RR IP address. And that's exactly what we've done with our tunnel. Not only will the hosts on your LAN behind your Linux box be unable to talk to the Road Runner NNTP server, but the same is true for your Linux box itself -- even though it is directly connected to the Road Runner modem and is using the RR IP address in its upstream packets! You get an "ICMP Destination Unreachable - admin prohibited filter" packet, which telnet shows as a "No route to host" message. Why? Because the default route we just created tunnels all our outgoing packets through the remote tunnel, even those with our RR IP address in their source fields. And the tunnel then tries to route this packet back to Road Runner from outside its network -- and an overly paranoid firewall router at Road Runner blocks the attempt.

The ideal solution to this problem would be policy-based routing, where the Linux kernel could encapsulate only those upstream packets that did not already have the RR IP address in their source field. I understand that policy routing is available as an add-on for Linux, but I haven't installed it yet. So as an imperfect stopgap, I install several more host-specific routes:

route add -host 204.210.0.2 dev eth1 gw $ROUTER
route add -host 204.210.7.25 dev eth1 gw $ROUTER
route add -host 166.48.172.14 dev eth1 gw $ROUTER
The first entry is the RR NNTP server. The second entry is the University City TAS (Toshiba Authentication Server), needed by rrlogin and also by any domain queries (the TAS is also a DNS server -- and it too refuses to answer queries from non-RR IP addresses). You must use the IP address for the TAS in your area. You can discover your TAS's IP address by running the rrlogin program with the -d (debug) flag.

The last entry is the IP address of the MCI interface on RR's main Cisco router; it is also periodically used by rrlogin to verify that it is still logged in. The test involves trying to connect to TCP port 8080. If the connection is refused or accepted, this is taken as confirmation that we're still logged in. If the connection attempt times out, however, rrlogin assumes we've been quietly logged out and tries to log in again. Why this particular IP address? Because it was the closest entity within the RR system with an unused TCP port that isn't blocked by the internal RR firewalls. All other internal RR hosts, such as the proxy servers, are heavily firewalled to allow connections only to their service ports. I originally opened test connections to one of the RR web proxy servers, but opening and closing a TCP connection seemed like more overhead on both ends than issuing a connection request to an unused port that simply responds with a TCP reset. Cisco routers are also more stable than UNIX hosts running web proxies, so this helps avoid false alarms.

Note that these ad-hoc routing entries still only allow the Linux system to talk to the RR servers. If you want to use them from your other machines, you'll have to install an application relay on the Linux gateway so the connections will appear to RR to have come from the Linux system with the RR address. SSH TCP connection forwarding is one way to do this.

Talking to other local Road Runner users

The RR DHCP server assigns a "Class C" subnet mask along with an IP address, so dhcpcd automatically installs the appropriate subnet route. Unfortunately, this keeps hosts on your LAN (other than the Linux host acting as router) from sending packets to any other RR users on your same subnet. That's because these packets will be routed directly to the Ethernet without being encapsulated first, so they are dropped by the RR ingress filter.

Again, source policy routing would be the ideal solution here, but an expedient is to delete the route to the RR subnet and replace it with a host-specific route to the cable router:

route delete -net $SUBNET netmask $NETMASK
route add -host $ROUTER dev $DHCP_DEVICE
$NETMASK, $ROUTE and $DHCP_DEVICE are set by dhcpcd, while $SUBNET must be set to the class-C subnet assigned by Road Runner. Since you don't know what this will be in advance, you have to compute it from $ROUTER and $NETMASK. Here's mksubnet, a trivial program that will do just that.

Now the only host that remains directly inaccessible from your LAN is the remote tunnel system; you have to log into your Linux system and log in from there. (The local RR router is similarly inaccessible, but that doesn't matter since you don't log directly into it anyway.)

So now we're just about done. Finally we have to start the rrlogin daemon:

if [ -f /var/run/rrlogin.pid ]
then
        kill `cat /var/run/rrlogin.pid`
        rm /var/run/rrlogin.pid
fi
rrlogin -t 1000 rr_login_name rr_password

I have ordered the various routing commands above for purposes of explanation; in actuality, there are certain dependencies that must be respected. For example, you cannot route to a device until it has been configured, and you cannot specify a gateway unless it already has a route. So for completeness, here is my /etc/dhcpsetup file with a few minor edits for security.

I have spent considerable effort on making dhcpcd, the Linux kernel, rrlogin and /etc/dhcpsetup robust enough to recover automatically from a Road Runner outage that forces an IP address change. I'm not there yet; I still need to write an automatic "registration" program that changes the routing entry on the remote tunnel, and there seem to be some unresolved dependencies that cause dhcpsetup to hang. So I'm still working on them.

Performance

The tunnel machine I use at Qualcomm is a humble 486-50 running Linux. It has been surprisingly reliable, staying up for months at a time. And the link from RR to CERFnet has also been pretty solid, so reliability has not been much of an issue. CPU loading also seems to be not much of an issue, even on such a slow machine. Since the tunnel machine has only one 10Mb/s Ethernet interface, every tunneled packet has to travel over this interface twice -- thus limiting the theoretical peak speed to 5 Mb/s. But the path bottleneck is usually elsewhere, so this hasn't been a problem for me.

Because room must be left on every tunneled packet for an outer IP header, the MTU of the tunnel interface is reduced to 1480 bytes. 1500 byte packets (the usual maximum size for an Ethernet) must be fragmented at the IP level before they can be tunneled. This makes more work for the two tunnel machines, though they seem to be able to handle it. Also, the network path between my two machines is not too lossy, which helps considerably. You can help minimize fragmentation by setting up the machines on your LAN that use the tunnel to reduce their TCP MSS values to allow for the encapsulation header to be put on at the tunnel. Linux provides per-routing-entry MTUs, which makes this easy. For example, if you have a Linux system on your LAN (not the tunnel machine) you could install your default route like this:

route add default gw local_ip_address_of_gateway mss 1480
(note the use of 'mss' where they really meant 'mtu'.)

Windows 95 also lets you set the IP MTU and/or the TCP MSS by hacking the system registry; I don't think it can be done on a per-destination basis, though.

In any event, IP fragmentation is becoming less of an issue these days now that Path MTU Discovery is pretty widely implemented.

Concluding thoughts

It's clear from the complexity of the foregoing that while IP-in-IP tunneling is architecturally very clean and simple, in practice it can get rather messy. Here are the problems as I see them:

Tunneling is most useful to support servers on the LAN behind the Linux tunnel system that must accept traffic initiated from the external Internet (this is the definition of a server). It is not really necessary to support clients on those same machines, as IP masquerading could be used instead. Masquerading has the distinct advantage over tunneling of reduced overhead and increased reliability, as every packet no longer has to go to the remote tunnel and back.

Masquerading does break certain applications, specifically those that relay IP address information in the application layer, (e.g., in the FTP PORT command) or that encrypt the higher layer information that masquerading "snoops" (such as IPSEC). But most of the "important" applications (in the sense of accounting for the majority of network traffic) have either been modified to work with masquerading, or are already insensitive to it. So it seems worthwhile to build a hybrid scheme that would use masquerading whenever possible, and fall back to tunneling when necessary. I believe this can be done in Linux with, at most, minor changes to existing mechanisms. That's my next project.

Last modified: 18 Feb 1998

Back to Phil Karn's Road Runner Page

Back to Phil Karn's Home Page