Skip to content

Cluster Mode

Phil Jaenke edited this page Jul 2, 2020 · 2 revisions

What is Cluster Mode?

One of the most important things to understand about dns_docker is to never run more than one group per physical host. Don't do it. If you do this, you are doing it wrong, end of story. Cluster Mode is designed to take advantage of multiple physical hosts to provide increased capacity and resilience. dns_docker is designed to be extremely efficient and scale through the use of many small hosts (e.g. RPi3, RPi4, small VMs, etc.) and to scale in large environments through the use of Anycast.

Every installation is by default, both Standalone and Cluster. Any host can be either or both at once.

Are there any special hardware requirements for Cluster Mode?

The requirements for cluster mode are the same as standalone mode: 512MB of RAM and a stable network. Again, do not attempt to run dns_docker on WiFi. You will have a bad time no matter how good you think your AP is. Remember the all important haiku:

It's not DNS.
There's no way it's DNS.
It was DNS.

How do I use Cluster Mode?

dnsdist is always strict-standalone; you should never balance dnsdist to dnsdist. This is wrong and will result in nothing but problems. nsd and unbound are the core 'cluster' components. In order to support cluster mode, you only need to define a pool using the cluster mode ports (which are exposed on the host) instead of the compose ports (which are isolated to the dns_docker network.)

  • dnsdist: 53, 443, 853, 8053
  • nsd: 53 (compose), 10530 (cluster)
  • unbound: 53 (compose), 10531 (cluster)

This Visio diagram will make it only slightly more confusing: A Horrible Visio Diagram Phil Made

(This was made in about 5 minutes and I'm bad at freehanding.)

The tl;dr is basically this: "Host!11one" can operate all by itself just fine without touching anything. You can also connect "Host!11one" and Another Host to a BGP Anycast, and they can each operate independently. You can ALSO connect "Host!11one" and an essentially infinite number of additional hosts using server pools and rules within dnsdist. The configurations and DNS caches on each host are always isolated to the host servicing the query though.

The 3 C's: Cache and Configuration Consistency

Caches and configurations are always isolated to each host, even when constructing a cluster. You should use configuration management to ensure consistency across multiple hosts in a cluster. Cache is also not replicated, which can impact queries in some situations. If dnsdist on "Host!11one" knows all about bobswidgets.comnetorg, then it can serve all requests for it from it's own cache (until expiry.) The Unbound that answered dnsdist will also likely have bobswidgets.comnetorg in it's cache, so if other dnsdist instances request from that unbound instance, they will be served from a cache as well.

The caveat is that none of these components know about each other's cache contents. So, if you ask Host 2 for bobswidgets.comnetorg and dnsdist does not have it cached, it will select based on the algorithm for the pool. If the responding Unbound does not have it in it's cache either, then it will result in recursion - even if "Host!11one" already has it in both caches. Similarly, if you configure one dnsdist to look to pool bobspool for bobswidgets.comnetorg and neglect to create the same pools and maps on the other dnsdist, then it will have no idea about bobspool or the rule for bobswidgets.comnetorg.

Configuring the Cluster Mode pool

newServer( { address="host1:10530", pool="cluster.auth", name="host1", tcpFastOpen=true, qps=100} )
newServer( { address="host2:10530", pool="cluster.auth", name="host2", tcpFastOpen=true, qps=100} )
newServer( { address="host1:10531", pool="cluster.recursor", name="host1", tcpFastOpen=true, qps=100} )
newServer( { address="host2:10531", pool="cluster.recursor", name="host2", tcpFastOpen=true, qps=100} )

setPoolServerPolicy(leastOutstanding, "cluster.auth")
setPoolServerPolicy(leastOutstanding, "cluster.recursor")

authClusterCache     = newPacketCache(10240, {numberOfShards=8})
getPool("cluster.auth"):setCache(authClusterCache)
recursorClusterCache = newPacketCache(10240, {numberOfShards=8})
getPool("cluster.recursor"):setCache(recursorClusterCache)

addAction(RegexRule("\.*myauthoritative\.*"), PoolAction("cluster.auth")) -- Note: regex is case-insensitive!
addAction(".", PoolAction("cluster.recursor")

NOTE While it is possible to also cluster the localroot pool by adding a cluster member to the existing pool, it is not recommended. The localroot pool is intended specifically to serve as a fast-updating fast-responding replacement for root-hints. Sending queries across the wire defeats this purpose.

IMPORTANT You must add or update ACLs within dnsdist, nsd, and unbound appropriately!

Configuring an authoritative NSD cluster of secondaries

When configuring NSD as a secondary, most AXFR and IXFR will send to port 53 by default, which will result in the traffic landing on dnsdist which will not pass it through by default. It is not recommended to use dnsdist to pass through NOTIFY, AXFR, or IXFR as this will invariably result in extremely complicated maps.

Instead, when configuring your pattern in nsd you should make use of the outgoing-interface directive within each pattern like so:

pattern:
  name: "myzone"
  multi-master-check: yes
  allow-axfr-fallback: yes
  outgoing-interface: 0.0.0.0@10530    # This is the important part; any interface, port 10530

zone: "mydomain.com"
  name: "mydomain.com"
  include-pattern: "myzone"

If you see that you are not getting NOTIFY, then you may need to create a TeeAction in dnsdist. nsd will generally then make the request using the correct ports. Anything past the NOTIFY will make your life miserable though.

-- Forward only our NOTIFY to nsd and nsd alone. ONLY ever forward NOTIFY to the local nsd, 
--   never to a remote nsd. Even in cluster mode. It is STRONGLY recommended that you not 
--   actually use 0.0.0.0/0 and instead restrict it to known upstreams.
addAction(AndRule({OpcodeRule(4), makeRule("0.0.0.0/0")}), TeeAction("172.16.53.11:10530"))

But I absolutely HAVE to make my zone transfers go through dnsdist!

Sigh. Don't say I didn't warn you. Because I warned you. You brought this on yourself and this will not work as well as you think it will. I should know, because I've done it myself.

-- OpcodeRule(4) means NOTIFY; dnsdist does not have a NOTIFY type because it recommends AGAINST doing this.
addAction(AndRule({OpcodeRule(4), makeRule("10.10.10.10/32")}), TeeAction("172.16.53.11:10530"))
-- You must tee the AXFR because the upstream may send to 53 anyways. That's where it's supposed to go.
addAction(AndRule({QTypeRule(dnsdist.AXFR), makeRule("10.10.10.10/32")}), TeeAction("172.16.53.11:10530"))
-- And again for IXFRs.
addAction(AndRule({QTypeRule(dnsdist.IXFR), makeRule("10.10.10.10/32")}), TeeAction("172.16.53.11:10530"))
-- BUT WE'RE NOT DONE YET. Because you must repeat ALL THREE RULES for every upstream authoritative you 
--   wish to be a secondary of, replacing the IP address in the makeRule section for each server.
-- ALSO, do not EVER TeeAction to a non-local nsd! ONLY to the local. Otherwise the NOTIFY will go to the 
--   wrong nsd instance and your transfer won't happen at all.

I TOLD YOU YOU WOULDN'T LIKE IT.

Scaling with Anycast

Most of this will be left as an exercise to the user, because it depends entirely on what distribution you are running Docker on. You should NEVER run routing sessions within Docker itself. This is bad and wrong and you will get your hand slapped sooner or later. Run your routing on the host, not the containers!

Note that if you are using Anycast, then you must consider in advance whether you will be running a 'compose' architecture or a 'host' architecture. A 'compose' architecture will necessitate the use of iptables and result in reduced performance; a 'host' architecture may require significant modifications to the base system, has implications for all other compose and swarm containers, and involves the metaphorical voiding of (non-existent) warranties (GO AWAY, DNSMASQ, YOU ARE NOT WANTED RIGHT NOW.)

Who's down with BGP? (Ee Cee Em Pee.)

(I make no apologies for the joke, and will judge you if you didn't get it. -Phil)

This is the preferred and generally best method of scaling dns_docker quickly and efficiently. This generally can be accomplished using either quagga (strongly recommended against as it is unmaintained,) frr, or bird. You can use OpenBGPd as well if it is available for your distribution. And because we just said "use one of four options with incompatible configuration formats," we are very much not going to tell you how to configure it in detail. Because it will be wrong. (If you want pre-configured Anycast stuff, that's DNSecure. And it's complicated.)

OSPF Joke Goes Here

I couldn't think of one. Sorry.

We don't not support OSPF, but we generally don't recommend it, as BGP ECMP tends to do a better job of load distribution. However, where you are running multiple clusters in multiple data centers, OSPF may be the better choice for your environment. Generally we recommend consulting with your network administrator either way before picking a direction. (And if you are the network administrator and you aren't sure, BGP. It's easier to find educational materials.)

The Common Basics

You will need to direct all port 53 traffic specifically to dnsdist. If you are using a compose architecture, this will require hair-pinning your network interface to the dnsdist compose IP (172.16.53.10). If you are using a host architecture, then you will need to ensure that nsd and unbound are not binding to port 53 on any address available to dnsdist.

<lots of things still to explain, we'll get to it.>

Alias vs. Binding vs. Loopback

Look. We're just going to tell it straight: use loopback interfaces! And if you have any questions, the answer is USE LOOPBACK INTERFACES. For additional information, see USE LOOPBACK INTERFACES!!

Seriously. Use loopback interfaces for your anycast addresses. Failing to do so on Linux will result in wire leaks and unpredictable problems. Consult your distribution's documentation for how to configure loopback interfaces.

When using loopback interfaces, you must configure all modules with explicit binding for non-arbitrary ports!

-- This is wildcard binding to an alternate port in dnsdist
addLocal('0.0.0.0:5353', { reusePort=true, tcpFastOpenQueueSize=16 })
-- This is an explicit binding in dnsdist that will listen on port 53 (tcp/udp)
addLocal('10.10.10.53', { reusePort=true, tcpFastOpenQueueSize=16 })
-- This is an explicit IPv6 binding in dnsdist that will listen on port 53 (tcp/udp)
addLocal('[2001:1002::d34d:b33f]', { reusePort=true, tcpFastOpenQueueSize=16 })
-- This is an explicit TLS binding in dnsdist that will listen on port 853 (tcp)
--   TLS bindings must always be explicit!
addTLSLocal('10.10.10.53', '/usr/local/etc/dnsdist/ssl/my.cert.pem', '/usr/local/etc/dnsdist/my.cert.key', { ... options ... })
-- This is an explicit DOT binding in dnsdist that will listen on port 443
--   DOT bindings must always be explicit!
addDOHLocal('10.10.10.53', '/usr/local/etc/dnsdist/ssl/my.cert.pem', '/usr/local/etc/dnsdist/my.cert.key', '', {reusePort=true, tcpFastOpenQueueSize=16, minTLSVersion="tls1.2" })
-- This example of an explicit DOT binding uses *interface binding* to a loopback 
addDOHLocal('', '/usr/local/etc/dnsdist/ssl/my.cert.pem', '/usr/local/etc/dnsdist/my.cert.key', '', {reusePort=true, tcpFastOpenQueueSize=16, minTLSVersion="tls1.2", interface="lo:1" })
-- This is a wildcard DOH binding in dnsdist that will listen on port 80
addDOHLocal('0.0.0.0', '', '', '', { ... options ... })
## unbound configuration
server:
    interface: 0.0.0.0@10531    # Non-explicit binding to all IPs on port 10531
    interface: lo:1@53          # ERROR: unbound doesn't work like that!
    interface: 10.10.10.55      # Explicitly bind to 10.10.10.54, port 53 (tcp/udp)
    ip-address: 10.10.10.55     # ... but it does work like this! This is identical to the preceding line.
    interface: 10.10.10.55@853  # REQUIRED to bind to port 853 for TLS (tcp) serving
                                # Note that unbound will ALWAYS attempt to query other servers with TLS.
## nsd configuration
server:
    ip-address: 0.0.0.0@10530   # Non-explicit binding to all IPs on port 10530
    ip-address: lo:1@53          # ERROR: nsd doesn't work like that!
    ip-address: 10.10.10.54     # Explicitly bind to 10.10.10.54, port 53 (tcp/udp)
    interface: 10.10.10.54      # ... but it does work like this! This is identical to the preceding line.
                                # 'interface' and 'ip-address' are entirely interchangeable.
    interface: 10.10.10.54@853  # REQUIRED to bind to port 853 to serve TLS (tcp) in addition to TLS settings.