Hi,
until now I had 3 secondaries running and a hidden primary. This ran perfectly well.
Now, I'd like to add some fallback functionality to deal with a potential longer downtime of my hidden primary. Thus I added two more hidden primaries such that now every host (3) has a hidden primary that can serve every secondary at all hosts. But: Only one should be active! Zone and database data will frequently be rsynced to both inactive primaries. If there would be a downtime I will have to start one of the others to continue.
According to my understanding of https://www.knot-dns.cz/docs/3.5/html/configuration.html#secondary-slave-zo… I have been in the naive understanding that a configuration like ...
remote:
- id: primaryMWN # MWN hidden primary (running)
address: 10.0.1.203@5333
- id: primaryKBN # KBN hidden primary (not running, standby)
address: 10.0.2.203@5333
- id: primaryEDN # EDN hidden primary (not running, standby)
address: 10.0.3.203@5333
template:
- id: default
master: [primaryMWN, primaryKBN, primaryEDN] # queried in that order
… would work, because of:
"Note that the master option accepts a list of remotes, which are queried for a zone refresh sequentially in the specified order. When the server receives a zone change notification from a listed remote, only that remote is used for a subsequent zone transfer."
But I do get error massages like:
edn.ellael.lan (ns3) knot[29856]: warning: [ellael.org.] refresh, remote primaryKBN not usable
edn.ellael.lan (ns3) knot[29856]: info: [ellael.org.] refresh, remote primaryEDN, address 10.0.3.203@5333, failed (connection reset)
edn.ellael.lan (ns3) knot[29856]: warning: [ellael.org.] refresh, remote primaryEDN not usable
edn.ellael.lan (ns3) knot[29856]: error: [ellael.org.] refresh, failed (no usable master), next retry at 2026-04-27T19:03:03+0200, expires in 1119353 seconds
edn.ellael.lan (ns3) knot[29856]: error: [ellael.org.] zone event 'refresh' failed (no usable master)
If I do use "master: primaryMWN" only, everything runs as expected.
I must have misunderstood something ...
Ok, I will have to modify all remaining secondary's knot.conf files if desaster strikes and another primary has to take over.
BTW: I wanted to omit a multi primary setup as mentioned in https://www.knot-dns.cz/docs/3.5/singlehtml/#multi-primary because I do have the feeling that this is some sort of overkill for hosting 5 domains, only ;-)
Are there other ways to achieve my goal? ;-)
Thanks and regards,
Michael
Hi,
Fastmail has been running Knot for a few years now. Thank you for such excellent software!
I'm new to this list, and new to the Knot codebase, but I'm an experienced C developer and have been working on Cyrus IMAPd (a mostly C codebase) for many years.
We have hundreds of thousands of domains, and currently they all have the same set of service IPs compiled into them. This has generally been fine - setting up a new server takes an hour or so to build all the domains, but we just wait until it's done then bring it into rotation.
Our current challenge -- we want to be able to transfer everything to a new IP range quickly for datacenter failover. Rebuilding every zone is too expensive for this. I looked at a few different issues and (along with Claude) figured that it wasn't much work to extend the ALIAS type to follow the pointer to another zone inside the same server and return the records from that. I have an initial pass at:
https://github.com/fastmail/knot-dns/tree/local-alias-synth
For now I've kept it as separate commits showing the evolution of the idea as I've tested it more and thought through how I want it to interact (basically any ALIAS get substituted with the contents of the name it points to, so you can mix and match them all sorts of ways).
I'm very happy to engage on testing and modifying this code to match what the upstream project wants; or revisit the approach if this doesn't match your vision. I just need something that has these properties, and this seemed a good way to get there.
Thanks,
Bron.
Hello,
In this other use-case, described in the thread "IXFR commit time scaling", there was a reply refering to
https://www.knot-dns.cz/docs/3.5/singlehtml/#signing-threads and
https://www.knot-dns.cz/docs/3.5/singlehtml/#adjust-threads
Which made me wonder...
a] you can have an external networked HSM, which sounds promising to speed up signing a lot...
b] nowadays you also have even 128 core processers, even mutiple CPU slots, which sounds as a immense boost for co-proccesing...
c] you could combine those...
Clinical data would propably be hard, but hypothetical/esitimated;
what would be wise/pointless/smart/insane to increase signing of large zones?
I'd expect that RAM speed is a major factor also.
What would be an ideal setup today?
--
With kind regards,
Met vriendelijke groet,
Mit freundlichen Grüß,
Leo Vandewoestijne
<***(a)dns.company>
<www.dns.company>
Hi Knot team,
I'm running Knot as an Auth secondary receiving IXFR from a BIND 9 primary.
To isolate bottlenecks I've stripped the config down as far as I know how.
Here's what I'm using.
zonefile-sync: -1
zonefile-load: none
journal-content: none
There is no DNSSEC or any downstream IXFR serving happening. Logs are
confirming that it is genuine IXFR and no signs of any AXFR fallback.
"semantic-checks" is off, and knotd is linked against jemalloc. I'm really
trying to make this as quick as possible by avoiding the disk.
The pattern:
IXFR processing time scales roughly proportionally with total zone size,
even when the changeset is small, for example, a few hundred RRs out of
several hundred thousand.
There is what appears to be a full zone walk on every IXFR commit in the
adjust logic, with single threaded execution due to parent befroe child
ordering requirements. Although I'd want your confirmation before reading
too much into it.
Questions:
1. With journal-content: none, does IXFR apply trigger a full in-memory
tree walk of the QP-trie, rather than an isolated incremental record-level
update? If so, is that a necessary consequence of running without a journal
to maintain state?
2. For a secondary with no NSEC/NSEC3, no wildcards or any downstream
IXFR'ing, could a "lightweight secondary" mode bypass post-apply
bookkeeping that might only be targetted to primaries and signers?
3. Could it rewalk only subtrees where adds or removes happen to their
ancestors, rather than the full zone? If NSEC is absent, is the prev
pointer chain actually used at query time, or can it be skipped entirely?
Our use case is secondary-only, with large zones and high frequency
updates. We're hoping there is something on the configuration or roadmap
side that might help, and ultimately not sure if we're just bumping up
against a realistic constraint.
Thanks for the great software btw, loving it.
Thanks!
Hi all,
I just set up catalog zones for the first time. I'm using a conf file
with my list of zones. After creating a catalog zone and adding member
zones to it, I executed 'knotc reload'. The catalog zone then appeared
in the output of 'zone-status', and member zones were listed with the
catalog zone name. However, 'zone-read <catalog.zone>' showed no PTR
records. I tried 'zone-reload <catalog.zone>', updated serials on the
member zones and such, but the catalog zone remained empty until knotd
was restarted. I saw this behavior on both 3.4.4 and 3.5.2. Is this
the intended behavior? Is there a way to generate the catalog without
restarting knotd?
Thanks in advance,
Bill
Greetings,
I have tried to use QUIC in zone transfering, I met one error in on
bigger zone,
from master's log, it displayed,
2026-01-04T17:32:19+0800 debug: [foo.] ACL, allowed, action transfer,
remote 10.0.0.147@60880 QUIC cert-key
xJKsDkUqpl6orXeTwsrDgDvgZ/PiYxOSVlOkVdn5EOU=
2026-01-04T17:32:19+0800 info: [foo.] IXFR, outgoing, remote
10.0.0.147@60880 QUIC, incomplete history, serial 2026010403, fallback
to AXFR
2026-01-04T17:32:19+0800 debug: [foo.] ACL, allowed, action transfer,
remote 10.0.0.147@60880 QUIC cert-key
xJKsDkUqpl6orXeTwsrDgDvgZ/PiYxOSVlOkVdn5EOU=
2026-01-04T17:32:19+0800 info: [foo.] AXFR, outgoing, remote
10.0.0.147@60880 QUIC, started, serial 2026010404
2026-01-04T17:32:20+0800 info: [foo.] AXFR, outgoing, remote
10.0.0.147@60880 QUIC, buffering finished, 0.87 seconds, 7390 messages,
124493148 bytes
2026-01-04T17:32:20+0800 notice: QUIC, terminated connections, outbuf
limit 1
on the slave side, I got log as,
2026-01-04T17:32:18+0800 info: [foo.] zone file loaded, serial 2026010403
2026-01-04T17:32:19+0800 info: [foo.] loaded, serial none -> 2026010403,
92000117 bytes
2026-01-04T17:32:19+0800 info: [foo.] refresh, remote 10.0.0.151@853,
remote serial 2026010404, zone is outdated
2026-01-04T17:32:19+0800 info: server started
(and, the knotd on slave will down without log.)
Thanks in advance.
My testing environment is,
the zone size is 1,000,000 x ( 2 NS + 2 A ), such as,
domain00000000 3600 NS ns1.domain00000000
3600 NS ns2.domain00000000
ns1.domain00000000 3600 A 10.0.0.1
ns2.domain00000000 3600 A 10.0.0.2
...
domain00999999 3600 NS ns1.domain00999999
3600 NS ns2.domain00999999
ns1.domain00999999 3600 A 10.0.0.1
ns2.domain00999999 3600 A 10.0.0.2
If I decrease the record number to 500,000 x ( 2 NS + 2 A ), the zone
could be transfer with QUIC successfully.
For traditional TCP and TLS, the zone transfer is processed without
error, even for more large size.
Version in master and slave are both 3.5.2, installed from copr.
OS in both side is Rocky9 x86_64.
Best Regards,
SUN Guonian
Greetings,
If I do not configure a "notify" statement in the zone section, I notice
that Knot DNS still sends NOTIFY messages to all servers in the NS records.
How can I disable NOTIFY messages on a server that is at the end of the
zone transfer link (e.g., a stealth or receiving-only secondary)?
Best Regards,
SUN Guonian
Hello Knot DNS users,
Knot DNS supports TCP Fast Open (when configured) in both the server and client roles for several years.
However, we have not observed any performance or other improvements from this technology so far. Since
removing it would simplify the code, I'm considering dropping the support for it. Is there anyone who would
miss TFO in Knot DNS?
For better XFR efficiency between Knots, https://www.knot-dns.cz/docs/latest/singlehtml/index.html#remote-pool-limit
works much better.
Thanks,
Daniel