Hi Rob!
We had the same problems some years ago. Our workaround was to use
catalog zones. Knot has a catalog zone from which it automatically
creates or deletes the member zones.
We add or delete member zones by sending UPDATEs to the catalog zone.
The catalog processing is quite fast. Further we queue events. So if
there are multiple add/delete within 10 seconds, we queue them up and
send a single UPDATE to the catalog zone.
This works quite well for us, our "biggest" Knots hosts 1,4mio zones
with a single catalog. I described our setup here:
  Hi
 We're noticing that as our list of zones gets larger (about 480k right
 now), adding a new zone or deleting an existing zone seems to continue
 to get slower. We are always doing our modifications as part of a
 transaction, and the time appears to occur in the commit phase.
 An example timing.
 # time /opt/knot/sbin/knotc ... conf-begin
 OK
 real   0m0.010s
 user   0m0.000s
 sys    0m0.010s
 # time /opt/knot/sbin/knotc ... conf-unset zone.domain 
example.com
 OK
 real   0m0.010s
 user   0m0.000s
 sys    0m0.010s
 # time /opt/knot/sbin/knotc ... conf-commit
 OK
 real   0m2.330s
 user   0m0.000s
 sys    0m0.009s
 #
 As you can see, it took > 2 seconds to commit the transaction that
 removes just the 
example.com zone. Similarly, it takes > 2 seconds to
 commit the transaction that adds the zone back.
 Given the time is real time and not sys/user, I presume knotc is
 waiting on knotd to complete the work. I used perf to record a CPU
 profile of knotd while the commit was running, but nothing hugely stuck
 out at me.
   10.75%  knotd    libc.so.6               [.] __memcmp_avx2_movbe
                                      ◆
    6.03%  knotd    knotd                   [.] __popcountdi2
                                      ▒
    5.89%  knotd    knotd                   [.] ns_first_leaf
                                      ▒
    5.25%  knotd    libc.so.6               [.]
 pthread_mutex_lock@@GLIBC_2.2.5
                                                               ▒
    3.85%  knotd    liblmdb.so.0.0.0        [.] 0x0000000000003706
                                      ▒
    3.72%  knotd    knotd                   [.] ns_find_branch.part.0
                                      ▒
    2.76%  knotd    knotd                   [.] trie_get_try
                                      ▒
    2.63%  knotd    liblmdb.so.0.0.0        [.] 0x00000000000069d2
                                      ▒
    2.34%  knotd    libknot.so.14.0.0       [.] knot_dname_lf
                                      ▒
    1.92%  knotd    liblmdb.so.0.0.0        [.] mdb_cursor_get
                                      ▒
    1.72%  knotd    knotd                   [.] create_zonedb
                                      ▒
    1.68%  knotd    knotd                   [.] twigbit.isra.0
                                      ▒
    1.68%  knotd    knotd                   [.] catalogs_generate
                                      ▒
    1.36%  knotd    knotd                   [.] twigoff.isra.0
                                      ▒
    1.28%  knotd    knotd                   [.] hastwig.isra.0
                                      ▒
    1.28%  knotd    knotd                   [.] db_code
                                      ▒
    1.27%  knotd    libknot.so.14.0.0       [.] find_item
                                      ▒
    1.11%  knotd    libknot.so.14.0.0       [.] knot_dname_size
                                      ▒
    1.04%  knotd    knotd                   [.] zonedb_reload
                                      ▒
    0.99%  knotd    libc.so.6               [.] _int_free
                                      ▒
    0.99%  knotd    liblmdb.so.0.0.0        [.] 0x0000000000003ce8
                                      ▒
    0.96%  knotd    liblmdb.so.0.0.0        [.] memcmp@plt
                                      ▒
    0.95%  knotd    liblmdb.so.0.0.0        [.] mdb_cursor_open
                                      ▒
    0.88%  knotd    libc.so.6               [.] malloc
                                      ▒
    0.88%  knotd    knotd                   [.] conf_db_get
                                      ▒
    0.87%  knotd    knotd                   [.] ns_next_leaf
                                      ▒
    0.82%  knotd    libknot.so.14.0.0       [.] iter_set
                                      ▒
    0.75%  knotd    knotd                   [.] evsched_cancel
                                      ▒
    0.73%  knotd    libknot.so.14.0.0       [.] find
                                      ▒
 ...
 Our config is pretty simple, conf-export looks like:
 server:
     rundir: "/local/knot_dns/run/"
     user: "nobody"
     pidfile: "/local/knot_dns/run/knot.pid"
     listen: [ ... ]
 log:
   - target: "syslog"
     any: "info"
 statistics:
     timer: "10"
     file: "/tmpfs/knot_dns_stats.yaml"
 database:
     storage: "/local/knot_dns/data"
 mod-stats:
   - id: "default"
     request-protocol: "on"
     server-operation: "on"
     request-bytes: "on"
     response-bytes: "on"
     edns-presence: "on"
     flag-presence: "on"
     response-code: "on"
     request-edns-option: "on"
     response-edns-option: "on"
     reply-nodata: "on"
     query-type: "on"
     query-size: "on"
     reply-size: "on"
 template:
   - id: "default"
     global-module: "mod-stats/default"
     storage: "/local/knot_dns/zones/"
 zone:
   - domain: "example.com."
     template: "default"
 ... 478,000 more domains all the same ...
 Current files on disk are:
 # ls -l /local/knot_dns/data/*
 /local/knot_dns/data/catalog:
 total 0
 /local/knot_dns/data/journal:
 total 0
 /local/knot_dns/data/keys:
 total 0
 /local/knot_dns/data/timers:
 total 75880
 -rw-rw---- 1 root root 77697024 Jun 24 09:26 data.mdb
 -rw-rw---- 1 root root     2432 Jul 17 01:05 lock.mdb
 /local/knot_dns/data/timing:
 total 0
 This machine is not slow or constrained in any way. It's 24 core,
 3.6Ghz, 64Gb, NVMe drives, etc. Load is very low (<1) with plenty of
 free resources.
 So what I'm wondering is:
 1. Is this normal? It doesn't feel right that adding/removing a single
 domain takes > 2 seconds regardless of the size of the existing zone
 database
 2. Is there any way to improve this? Doing multiple adds/deletes at
 once within a transaction works and we do that where we can, but there
 are cases where we can't do that and I'd really like to understand why
 this is as slow as it is.
 Thanks in advance
 Rob
 --