Setting up a failover system with OpenWRT

OK, this is less a tutorial and more a “I did it this way, let me document it before I forget” thingie, but let's try it anyway, it might help someone out there.

Here's what I'm trying to do:

I'm assuming we want everything to go through the primary unless it's down, because for most of us on a regular budget the secondary service is going to be metered, for example T-Mobile's back-up service (which is what I use.)

Why not failover IPv6? Because it doesn't work like that alas. If your secondary service supports IPv6 properly, you might be able to set up a system so your computers failover automatically based upon... metrics? I'm not entirely sure. But IPv6 itself doesn't lend itself to failover without the cooperation of both ISPs – that is you can't have the routing for a prefix suddenly change ISP. At best you can use NAT with the ULA range which might help IPv6-only hosts use the Internet, but most of us don't have those.

I would have looked into it more, but the secondary service is T-Mobile, and T-Mobile doesn't give out prefixes. I don't know why. They're using CGNAT for everything, so they should. I suspect it's because some marketing guy looked at the fact they're providing mobile Internet and said “Why not provide at home Internet in exactly the same way” which is horrible.

Anyway if you feel the above simplification is wrong, feel free to let me know by mentioning @poundquerydotinfo@forum.virctuary.com on your favorite ActivityPub based social media site. I'm open to being educated, and if there is a good way to use the limited IPv6 service provided by T-Mo I'm all ears.

While on the subject of IPv6, I'm assuming here you're using an IP connection to your primary ISP that uses DHCP-PD to provide IPv6 services. If your ISP uses PPP/PPPoE/etc, then look elsewhere. I'm also assuming regular DHCP is used to provide a real, routable, IP.

The basic steps we're going to take with this:

1. What you need.

You will need an OpenWRT router with multiple LAN ports and a WAN port.

You will need to copy some stuff from your old router. The obvious ones are things like your current port forwarding rules. The less obvious ones are:

That last one may be rough to get from the router itself, but you can look at things connected to it and see what prefix they have. For example, on GNU/Linux, typing 'ip -6 addr' will list some addresses that look something like 2608:329:129a:9312:cc99:1924:3eff:ff00. You're looking for something that doesn't begin with fe or fd, and not 2002 either. If you find the key address starts with 2002, then you're not using DHCP-PD, you're using something called 6to4, which is a great technology, but alas not relevant here.

Anyway, once you've found an address, the prefix bit is easy. If you look at it, you'll see it's divided into chunks separated by colons. There might be eight chunks, in which case you're interested in the first four, or there may be a double colon, in which case you're interested in the part before the double colon.

Either way, make a note of what that prefix is, put a colon or double colon after it, and write “/64” after that, eg:

2608:329:129a:9312:/64

Why do we need the current IPv6 settings? Well, because some ISPs don't react kindly to being asked for a prefix delegation when they've already sent one to your old router. So we're going to set this up statically, and when it stops working, we'll change it to use DHCP-PD.

2. Checking the router works and can be used for this project.

Set up the new router, use a spare computer that's hooked directly up to it plugged into the LAN port furthest from the WAN port (the reason for this will become clear in a bit.) You may need to manually give the spare computer an IP address, 192.168.1.20 is usually fine.

Log in using 'luci' (the web interface, https://192.168.1.1 – be prepared to tell the web browser to trust the unsigned certificate for that site), Username is root, no password, immediately go into System->Administration and set a password.

Make it all look nice, and navigate to the Network –> Interfaces tab. Click on “Devices” and you should see a device called “br-lan”. This is basically the “Internal” part of the Internet and the first thing we're going to do is see if this project is even possible by removing one of the LAN ports from it.

So go to “Configure” and you should see a “Bridge Ports” drop down. Deselect the port physically nearest the WAN port on your router (I deselected LAN4 on mine and will refer to that port as LAN4 from here out.)

Hit SAVE and then SAVE AND APPLY.

Were you able to get this far? Then you're probably OK. If you couldn't remove a LAN port from br-lan, then at this point you need to try a different router. There may be other solutions, I just don't know them.

3. Configure the LAN

My LAN uses its own DHCP server and DNS server and is on the 10.0.x.x private IP range. Your needs may vary from that. I'll try to make these instructions generic but bear in mind my assumptions might not fit your local network.

Go back to the Interfaces tab, and edit LAN. Make the following changes (or check they're already set):

Save, and if you're ready to make this live Save and Apply. If you run your own DHCP server and cleared the Dynamic DHCP setting you can now unplug your computer and plug the router into your main network, otherwise you'll need to figure out how to make sure your spare computer can connect. That's if you hit Save and Apply. Which you don't need to yet.

4. Configure WAN and WAN6

Go back to the Devices tab, and hit “Configure” next to “wan”. Change the MAC address to the MAC address of your old router's WAN port (See above!), and make sure “Enable IPv6” is checked. Save (and Apply if you want.)

You'll notice the MAC address is now listed on the Devices tab in bold.

Go to the Interfaces tab, and hit Edit on WAN.

For “Use gateway metric”, set it to “10”.

I set Client ID to the MAC address of the old router's WAN port but honestly I don't know if that's necessary. (Remove the colons if you do that.) Use Default Gateway should also be set.

On the DHCP server, make sure “Ignore interface” is on.

Save (and apply if you want.)

Now for WAN6! This part of the process WILL involve taking your connection to the Internet down. Although if things work then, well, you won't need to restore the old router.

So first, let's check the WAN6 settings are reasonable.

So... first, go to your existing router and change it's IP address to something else.

Then go to Interfaces on the OpenWRT router, and edit WAN6:

The first page should be “Protocol: DHCPv6 client”, device “wan”, bring up on boot checked, request IPv6 address “try”, and “Request IPv6 prefix of length: Automatic”.

For most people, those should be fine. I'm told you can set “Request IPv6 prefix of length” to “60” for Comcast, but 64 should work and most readers of this blog will have no reason to set it to anything lower than 64.

On Advanced Settings, use default gateway should be checked, as should IPv6 source routing and Delegate IPv6 prefixes. IPv6 assignment length should be disabled. IPv6 Prefix filter should remain at “please choose”. The other settings aren't important here.

Firewall settings should put it in the WAN and WAN6 group (or WAN, WAN6, and WANB if you skipped ahead in these instructions.)

Save if need be, and now edit LAN. On the IP address section, add the IP of your old router. You don't have to change the IP of the new one, just add the new IP. This means your current web session will remain up.

Save and Apply everything at this stage. We're going live.

Unplug the plug from the old router. Plug it into the OpenWRT router.

You should:

If this is what you see, skip the next section. You may want to do some pinging etc from your computers. Be aware pinging IPv6 addresses may be an issue if your old router is still online, you may have to manually reset your computer's route. 2001:4860:4860::8888 is a good address to test with. Pinging IPv4 addresses might also not work right away as the routing information your computer caches may still point at the older router. GIve it a few minutes, reboot if you have to.

5. WAN6 IPv6-PD issues

First, to be clear, if you can't see an IPv6 address, I don't really have any help to offer. If you don't see one, but do see an IPv6-PD, then you probably don't need to worry anyway, just verify IPv6 is working for your network clients. Again, you may need to reset the computer you're testing it with to make sure it picks up the new delegation.

But as for the IPv6 PD being missing, that's the issue I ran into:

There are two good reasons why the default, automatic, configuration for WAN6 might fail;. The first is a bug in some versions of OpenWRT that means if you set a ULA (an IPv6 address allocation intended for internal networks, similar to 192.168.x.x or 10.x.x.x) then in some cases OpenWRT won't try to get a prefix delegation from the ISP. The second is uglier, and is the reason why I told you to note the router's IPv6 information.

How do we find out?

Go to the Global Network Options tab and clear the IPv6 ULA-Prefix. Save and Apply.

Now reboot the router, log back in, and go to Network –> Interfaces. Do you now see IPv6-PD information for WAN6? If so, great, the the ULA was the issue.

If not, don't put the ULA back yet, as you might have both problems. Instead you're going to change WAN6 to a static configuration.

From the Interfaces tab, edit WAN6 and in the General Settings tab change the protocol to “Static address”. It'll come up with a “Really change protocol?” question, click on the Blue button to change it.

Leave all the IPv4 settings alone, and put the settings you gathered in section 1 in for the IPv6 address, the IPv6 gateway, and the IPv6 routed prefix. The prefix should be formatted as including a trailing colon and /64, eg:

2608:329:129a:9312:/64

Now save, Save and Apply, and restart the Interface. At this point it may look like it's working. But you don't know yet, because all those settings will show as the status even if the cable's unplugged. So you'll need to test connectivity, and again, you may have to reboot the computer or something similar but more complicated to explain to confirm IPv6 connectivity is working.

If this doesn't work, I don't have any other options.

If this works, then the likelihood is that your ISP won't give out prefixes until they expire and that's the problem. To fix this you'll just have to wait until your IPv6 connectivity stops working. That'll mean the PD has expired, and you can then change your settings back to use DHCP6 to configure WAN6 and it should work at this point. This will probably take several hours, days, or possibly weeks or months.

6. mwan3

At this point, your network should be correctly set up. You should have IPv4 connectivity, you might have IPv6 connectivity or you might have given up on it but decided to plod on regardless. You need now to install a package called 'mwan3' which does the failover (or load balancing if you prefer, but we're doing failover) To install this:

Go to System –> Software, hit the Update Lists... button, and then when it's finished search for 'luci-app-mwan3' and hit install. Installing this installs the web interface and also installs the core mwan3 package.

Go back to Network, and look at the Interfaces tab. There's a good chance it's added 'WANB' and 'WANB6' there. If so, delete WANB6, and edit WANB to match the settings you'll be using for your secondary service.

In particular, * in General Settings, set the Device to the physical LAN port closest to the WAN port. * In Advanced Settings, set the “Use gateway metric” to 20. In * Firewall Settings, add WANB to the firewall zone containing WAN and WAN6.

If you don't see WANB, then create one using the “ADD NEW INTERFACE” button. * In General Settings, It will normally be DHCP client, the Device being the physical LAN port closest to the WAN port * Bring up on boot should be checked * In Advanced Settings, set the “Use gateway metric” to 20. * Again Firewall Settings should assign it to the same zone as WAN and WAN6.

The very, very, important stuff is that WANB has the Device being the physical LAN port closest to the WAN port, the gateway metric should be 20, and the Firewall settings should assign it to the same zone as WAN and WAN6.

7. Configure mwan3 for failover

The default configuration for mwan3 has a lot of choices that aren't really necessary and may interfere with what you actually want to do. Despite what you may have read elsewhere, mwan3 does not appear to use each interface's metric setting, but it's own metric configuration instead. So let's quickly run through the MultiWAN configuration at Network –> MultiWAN Manager

The first, default, Globals tab doesn't actually have anything on it you need to worry about.

The Interfaces tab should show the two WAN devices you'll be using, WAN and WANB. You can delete WAN6 if it appears, and WANB6 which you probably haven't set up.

Both interfaces should be enabled, if either aren't (WANB wasn't in my case) enable it using EDIT.

The Member tab specifies multiple versions of each Interface with a different metric and weight attached. You should, by default, see wan_m1_w3 (a configuration of the WAN device with a metric of 1 and a weight of 3) and wanb_m2_w2 (WANB, metric 2, weight 2.) Those are the only two we care about. Create them if they don't exist.

(The “metric” here is the one mwan3 actually seems to care about. The lower the number, the greater priority is given to using the interface over other interfaces. If two members have the same metric, the weight is used to determine what fraction of the traffic goes over it. For our purposes we only care about the metric. So what we're making sure of is that we have a member for wan with a metric of one, and a member for wanb that has a metric of two, so we can make sure traffic always runs over wan if it's available and only over wanb if wan isn't.)

You can ignore the other members, you don't have to delete them. Just make sure wan_m1_w3 and wanb_m2_w2 exist.

The Policy tab lists policies, these are basically a group of members to route stuff to (taking into account their metrics and weights) and a last resort action if it can't route data via the members. The only thing you need to make sure of here is that wan_m1_w3 and wanb_m2_w2 are in a policy, which they should be: there should be one called wan_wanb. It groups those together, and says the router should respond the network is unreachable if both are unable to route anything. If it doesn't exist, you can create it. If it contains IPv6 members, feel free to delete them.

Finally, how are these policies used? Well, they're implemented by the Rules, the penultimate tab. The Rules tab should show three rules, only one of which you actually want. One is an IPv6 rule (why?), one is an example HTTPS rule. You can delete both. The only one you actually want (assuming it's there) is “default_rule_v4”. Right now it's almost certainly set up for load balancing, so edit it and change the Policy (now you see how they're used!) to the wan_wanb policy. (Once everything is working, if there's actually anything you want to be specifically routed differently, you can return here and set that up.)

Save and Apply.

8. Check MultiWAN configuration

At this point, in all honesty, it probably all works already. You can verify by visiting http://checkip.amazonaws.com/. Assuming the primary WAN is working, you should get the IP of the primary WAN returned back to you.

If it's not, you can verify the settings are correct:

Go to Network –> MultiWAN Manager –> Interface

It should show metrics of 10 and 20 for WAN and WANB respectively. If it suggests the metrics are missing, go back to the Network –> Interfaces and edit the respective interface.

Go to the Rules tab:

Make sure there is only one rule, default_rule_v4, and that it's set up with the wan_wanb policy.