Building a WAN — Full mesh, dynamic tunnels and horses

Building a WAN

I have been in network technology longer than I care to mention, the bulk of that time spent as a network architect at Cisco Systems. The principles of networking, in a Cisco world, have been drilled into me.

Now that I am living in the industry, outside the traditional approaches, I get asked a lot about why I do not believe in dynamic tunnels and full mesh for enterprise architecture. Both related approaches do indeed have some advantages, and they do have their place, however times have changed, in this next-gen WAN world we are in now.

Full disclosure. I now work with Turnium Technology Group, however I do not want this to be a promotion of what they do, more my observations on the architecture options available to service providers and enterprises.

Let’s start with dynamic tunnels. Sounds like something you have to have, however they are not as useful as they first seem, and they create more complexity that can be justified. In other words the return for the effort is low.

Some history first.

One of the first dynamic tunnel technologies, on which most are based in one way or another is Cisco DMVPN. This was developed for any hub-and-spoke tunnel architecture, such as IPSec, so that tunnels did not always have to terminate on a central hub. The main issue with hub and spoke IPSec architectures was that there would be a limited number of hubs, i.e. IPSec Access Servers, creating the likelihood of hair-pinning of traffic. So, for example, if the IPSec aggregator was in New York and a voice call was established between San Francisco and San Jose, you would not want the call path to go through New York.

This was a concern for voice traffic in particular, and so DMVPN was born as a workaround. This approach solved the hairpin issue, by recognising there was traffic that needed to go spoke to spoke, and a new IPSec tunnel would then get established between the branches, for that voice traffic. However, it came at a cost. Now a DMVPN controller was required and new protocols needed to be configured to communicate the need for the tunnel dynamically. And as anyone can attest, it was quite a tricky thing to configure and manage.

Cisco DMVPM

However, the reality, in today’s world, using software-based architecture, hair-pinning of voice traffic is not an issue as the trip to a hub (Aggregator) and to another branch can be kept to a minimum by smart Aggregator placement; they are just software components.

More on this in a moment. But first let’s discuss the cousin of dynamic tunnels, i.e. full-mesh tunnels. This seems like a no-brainer. Let’s connect every site to every other site with a secure, encrypted tunnel. This looks like a great way to build a private network over public infrastructure. You can even do QoS kind of, on these tunnels. I recall Cisco’s V3PN story, Voice and Video over VPN.

However, how many tunnels are required to realise this architectures. Thankfully there’s a formula, which I had to look up. The number of links is N*(N-1)/2, where N is the number of nodes.

If there were 100 sites, then it would mean 100*99/2 = 4,950 tunnels. Wow. And that assumes only one link per site. Most sites would have at least two links. Any given site could have 200 tunnels.

OK now that you have all those tunnels, you now require policies at each site to decide which tunnel to use, policy-based routing (PBR) in other words. That’s not going to be simple to set up. Add to that link preferences for types of traffic. Then add queuing per tunnel to the mix. It becomes overwhelming very quickly. Oh ya, then when links fail, the network has to reconverge. What about degraded links? What about flapping links? Then it’s a mess.

Well the reality is that this approach does work and despite the complexity, has been one of the the only viable ways to build private networks, without using MPLS IP-VPN solutions. That’s why it exists.

However, there is another way to build networks. Have ONE tunnel per site, going to a hub site, placed close to the sites in latency terms. This tunnel can run across multiple links and is preserved regardless of which links are working. No IP address changes which means no routing reconvergence when links fail. One tunnel means one set of QoS policies. This is something that the Turnium software can achieve, I am proud to say. Not a plug for a product, more a recognition of a newer and smarter way to build large private networks over public infrastructure.

OK remember the horses? What’s that about? Well it is similar to customers asking me for full mesh support or dynamic tunnels, when they are not needed, and I think of an old-timer seeing a Tesla and asking where the hitches are for the horses. When it is explained that horses are not needed, then the old-timer gets stubborn, and says that there has to be a harness otherwise it is a poor product. I feel that way, every time I have to explain why our solution does not do dynamic or full-mesh tunnels. They, and the associated complexities, are no longer relevant, it’s that simple!

This article is a new venture for me. I would welcome your feedback and your share. I would also be open to ideas for future articles. Maybe next time, what is the difference between speed and ‘bandwidth’…