Hi,
Recently, we noticed a few of our Knot slaves repeatedy doing zone transfers. After
enabling zone-related logging, these messages helped narrow down the problem:
Aug 8 17:42:14 f2 knot[31343]: warning: [our.zone.] journal: unable to make free space
for insert
Aug 8 17:42:14 f2 knot[31343]: warning: [our.zone.] IXFR, incoming, 1.2.3.4@53: failed to
write changes to journal (not enough space provided)
These failures apparently caused the transfers to occur over and over. Not all the zones
being served showed up in these messages, but I'm pretty sure that the ones with a
high rate of change were more likely to do so. I do know there was plenty of disk space. A
couple of the tunables looked relevant:
max-journal-db-size: we didn't hit this limit (used ~450M of 20G limit, the default)
max-journal-usage: we might have hit this limit. The default is 100M. I increased it a
couple of times, but the problem didn't go away.
Eventually, we simply removed the journal database and restarted the server and the
repeated transfers stopped. At first I suspected that it somehow was losing track of how
much space was being allocated, but that's a flimsy theory: I don't really have
any hard evidence and these processes had run at a high load for months without trouble.
On reflection, hitting the max-journal-usage limit seems more likely. Given that:
1. Are the messages above indeed evidence of hitting the max-journal-usage limit?
2. Is there a way to see the space occupancy of each zone in the journal, so we might tune
the threshold for individual zones?
On the odd chance that there is a bug in this area: we are using a slightly older dev
variant: a branch off 2.5.0-dev that has some non-standard, minimal EDNS0 client-subnet
support we were interested in. The branch is:
https://github.com/CZ-NIC/knot/tree/ecs-patch.
Thanks,
Chuck