Hi Jonathan,
thank you for reaching us and for such a deep insight into Knot DNS.
Let me start by explaining some history.
Knot DNS was designed around two main stays: 1) query answering is fast
(also by pre-adjusting the zone contents carefully) 2) updating the zone
does not affect the answering speed.
As a consequence, there was lower motivation for speeding up zone
updates processing -- historically, it was always single-threaded and
always proportional to the zone size (not update size).
Several years ago, our big supporter operating large zone asked us to
improve this, and we did what we could at the time -- many parts of the
zone update processing became incremental (proportional to update size),
especially those that took most time (like NSEC3-relevant cross-pointers
that demanded two hash computations per domain). This included
introduction of incremental DNSSEC signing and validation (including
unique NSEC(3) chain processing routines).
For the cases when things couldn't get really incremental, we also
introduced parallelized processing:
https://www.knot-dns.cz/docs/3.5/singlehtml/#signing-threads and
https://www.knot-dns.cz/docs/3.5/singlehtml/#adjust-threads (the latter
might be interesting to you!).
However, some parts of update processing remained proportional to zone
size (and your observations confirm this). Yes, the whole QP-trie is
always iterated. We hope that those procedures are fast enough in
general, so that it doesn't really hurt (few seconds per million-RR-zone?).
I can't really say if (or if not) it is possible to incrementalise this
further. For us, correct Knot DNS behavior in all cases is most
important, so we can't really say "we just don't need this and that in
the simple case, so let's skip some edge-case correctness for the sake
of speed". Just for illustration, imagine a deep and branchy zone, were
an incremental update adds a single NS, which occludes many subordinate
RRs that become non-authoritative, with many consequences...
Yes, in theory those adjustments could be conducted only on affected
subtrees and the "prev" pointer might not be really needed without NSECs
and wildcards -- but I'd be really afraid to modify the code in this
manner :( Also my personal effort in advocating DNSSEC motivates me less
to optimizations that only take place without DNSSEC...
Anyway, I'd be really interested if you perform your tests with a
profiler, in order to see what are the concrete bottlenecks in your
case. Would you be able and willing to do this for us?
I'd also like to know what are your goals. Do you really need to apply
the updates in an instant manner (and what is the target time versus
current time?), or you are just observing a choking and resource
exhaustion and would actually benefit from slowing down the update
processing pace, by e.g. artificially limiting the frequency of updates?
Anyway, I'm a bit surprised that the Bind9 is not the bottleneck in this
case :)
Thanks!
Libor
On 12. 03. 26 0:15, Jonathan Reed wrote:
Hi Knot team,
I'm running Knot as an Auth secondary receiving IXFR from a BIND 9
primary. To isolate bottlenecks I've stripped the config down as far
as I know how. Here's what I'm using.
zonefile-sync: -1
zonefile-load: none
journal-content: none
There is no DNSSEC or any downstream IXFR serving happening. Logs are
confirming that it is genuine IXFR and no signs of any AXFR fallback.
"semantic-checks" is off, and knotd is linked against jemalloc. I'm
really trying to make this as quick as possible by avoiding the disk.
The pattern:
IXFR processing time scales roughly proportionally with total zone
size, even when the changeset is small, for example, a few hundred RRs
out of several hundred thousand.
There is what appears to be a full zone walk on every IXFR commit in
the adjust logic, with single threaded execution due to parent befroe
child ordering requirements. Although I'd want your confirmation
before reading too much into it.
Questions:
1. With journal-content: none, does IXFR apply trigger a full
in-memory tree walk of the QP-trie, rather than an isolated
incremental record-level update? If so, is that a necessary
consequence of running without a journal to maintain state?
2. For a secondary with no NSEC/NSEC3, no wildcards or any downstream
IXFR'ing, could a "lightweight secondary" mode bypass post-apply
bookkeeping that might only be targetted to primaries and signers?
3. Could it rewalk only subtrees where adds or removes happen to their
ancestors, rather than the full zone? If NSEC is absent, is the prev
pointer chain actually used at query time, or can it be skipped entirely?
Our use case is secondary-only, with large zones and high frequency
updates. We're hoping there is something on the configuration or
roadmap side that might help, and ultimately not sure if we're just
bumping up against a realistic constraint.
Thanks for the great software btw, loving it.
Thanks!
--