Thanks for looking into this, definitely appreciate it!
We'll upgrade and give it a go, and also look forward to trying out 3.5.0 when it
arrives as well.
Regards
Rob
On Tue, 29 Jul 2025, at 18:27, Daniel Salzman wrote:
Hi,
The currently released 3.4.8 improves the zone-commit performance.
Further optimizations (including catalog commit) will be released in
3.5.0 in September.
Daniel
On 7/17/25 07:47, robm(a)fastmail.com wrote:
> Hi
>
> We're noticing that as our list of zones gets larger (about 480k right now),
adding a new zone or deleting an existing zone seems to continue to get slower. We are
always doing our modifications as part of a transaction, and the time appears to occur in
the commit phase.
>
> An example timing.
>
> # time /opt/knot/sbin/knotc ... conf-begin
> OK
>
> real 0m0.010s
> user 0m0.000s
> sys 0m0.010s
> # time /opt/knot/sbin/knotc ... conf-unset zone.domain
example.com
> OK
>
> real 0m0.010s
> user 0m0.000s
> sys 0m0.010s
> # time /opt/knot/sbin/knotc ... conf-commit
> OK
>
> real 0m2.330s
> user 0m0.000s
> sys 0m0.009s
> #
>
> As you can see, it took > 2 seconds to commit the transaction that removes just
the
example.com zone. Similarly, it takes > 2 seconds to commit the transaction that
adds the zone back.
>
> Given the time is real time and not sys/user, I presume knotc is waiting on knotd to
complete the work. I used perf to record a CPU profile of knotd while the commit was
running, but nothing hugely stuck out at me.
>
> 10.75% knotd libc.so.6 [.] __memcmp_avx2_movbe
◆
> 6.03% knotd knotd [.] __popcountdi2
▒
> 5.89% knotd knotd [.] ns_first_leaf
▒
> 5.25% knotd libc.so.6 [.] pthread_mutex_lock@@GLIBC_2.2.5
▒
> 3.85% knotd liblmdb.so.0.0.0 [.] 0x0000000000003706
▒
> 3.72% knotd knotd [.] ns_find_branch.part.0
▒
> 2.76% knotd knotd [.] trie_get_try
▒
> 2.63% knotd liblmdb.so.0.0.0 [.] 0x00000000000069d2
▒
> 2.34% knotd libknot.so.14.0.0 [.] knot_dname_lf
▒
> 1.92% knotd liblmdb.so.0.0.0 [.] mdb_cursor_get
▒
> 1.72% knotd knotd [.] create_zonedb
▒
> 1.68% knotd knotd [.] twigbit.isra.0
▒
> 1.68% knotd knotd [.] catalogs_generate
▒
> 1.36% knotd knotd [.] twigoff.isra.0
▒
> 1.28% knotd knotd [.] hastwig.isra.0
▒
> 1.28% knotd knotd [.] db_code
▒
> 1.27% knotd libknot.so.14.0.0 [.] find_item
▒
> 1.11% knotd libknot.so.14.0.0 [.] knot_dname_size
▒
> 1.04% knotd knotd [.] zonedb_reload
▒
> 0.99% knotd libc.so.6 [.] _int_free
▒
> 0.99% knotd liblmdb.so.0.0.0 [.] 0x0000000000003ce8
▒
> 0.96% knotd liblmdb.so.0.0.0 [.] memcmp@plt
▒
> 0.95% knotd liblmdb.so.0.0.0 [.] mdb_cursor_open
▒
> 0.88% knotd libc.so.6 [.] malloc
▒
> 0.88% knotd knotd [.] conf_db_get
▒
> 0.87% knotd knotd [.] ns_next_leaf
▒
> 0.82% knotd libknot.so.14.0.0 [.] iter_set
▒
> 0.75% knotd knotd [.] evsched_cancel
▒
> 0.73% knotd libknot.so.14.0.0 [.] find
▒
> ...
>
> Our config is pretty simple, conf-export looks like:
>
> server:
> rundir: "/local/knot_dns/run/"
> user: "nobody"
> pidfile: "/local/knot_dns/run/knot.pid"
> listen: [ ... ]
>
> log:
> - target: "syslog"
> any: "info"
>
> statistics:
> timer: "10"
> file: "/tmpfs/knot_dns_stats.yaml"
>
> database:
> storage: "/local/knot_dns/data"
>
> mod-stats:
> - id: "default"
> request-protocol: "on"
> server-operation: "on"
> request-bytes: "on"
> response-bytes: "on"
> edns-presence: "on"
> flag-presence: "on"
> response-code: "on"
> request-edns-option: "on"
> response-edns-option: "on"
> reply-nodata: "on"
> query-type: "on"
> query-size: "on"
> reply-size: "on"
>
> template:
> - id: "default"
> global-module: "mod-stats/default"
> storage: "/local/knot_dns/zones/"
>
> zone:
> - domain: "example.com."
> template: "default"
>
> ... 478,000 more domains all the same ...
>
> Current files on disk are:
>
> # ls -l /local/knot_dns/data/*
> /local/knot_dns/data/catalog:
> total 0
>
> /local/knot_dns/data/journal:
> total 0
>
> /local/knot_dns/data/keys:
> total 0
>
> /local/knot_dns/data/timers:
> total 75880
> -rw-rw---- 1 root root 77697024 Jun 24 09:26 data.mdb
> -rw-rw---- 1 root root 2432 Jul 17 01:05 lock.mdb
>
> /local/knot_dns/data/timing:
> total 0
>
>
> This machine is not slow or constrained in any way. It's 24 core, 3.6Ghz, 64Gb,
NVMe drives, etc. Load is very low (<1) with plenty of free resources.
>
> So what I'm wondering is:
> 1. Is this normal? It doesn't feel right that adding/removing a single domain
takes > 2 seconds regardless of the size of the existing zone database
> 2. Is there any way to improve this? Doing multiple adds/deletes at once within a
transaction works and we do that where we can, but there are cases where we can't do
that and I'd really like to understand why this is as slow as it is.
>
> Thanks in advance
>
> Rob
> --