On Aug 23, 2017, at 11:57 AM, libor.peltan(a)nic.cz
wrote:
Hello Chuck,
you have a setup with 'zonefile-sync: -1', right?
Currently it’s set to the
default value (0, flush immediately). Maybe -1, or a large value is better for performance
reasons, though?
Yes, it seems that max-journal-db-usage was hit here. It's not clear why enlarging
this limit didn't help.
I may not have increased it enough.
Actually, with this disabled sync, journal has to keep track of all the changes. It
manages to "compress" the history by merging the changeset into one, but this
mostly works when the same records are added and deleted over again, and after a long
time, it may get stuck anyway. It's caused by the nature of the design.
Deleting the journal was actually a good move here; on the other hand, it made further
observations very difficult.
It's also possible that there was some bug in computing used space. For that case,
let me know if that happens again.
Yeah, we’ll be keeping tabs on it. The symptom
(an increase in incoming transfer traffic at the master server) is easy enough to detect.
Newer Knot versions introduced the option 'kjournalprint -d', which displays
brief information about saved changesets including their size.
You/we may try to rebase your patch on top of newer Knot code so your Knot gets
"updated”.
Yes, I'll try that. This was a patch that Vítězslav Kříž created
and it looks like a straightforward addition to the answer_edns_init() function. I can try
applying that to the current released version.
BR,
Libor
Dne 23.8.2017 v 20:45 Chuck Musser napsal(a):
Hi,
Recently, we noticed a few of our Knot slaves repeatedy doing zone transfers. After
enabling zone-related logging, these messages helped narrow down the problem:
Aug 8 17:42:14 f2 knot[31343]: warning: [our.zone.] journal: unable to make free space
for insert
Aug 8 17:42:14 f2 knot[31343]: warning: [our.zone.] IXFR, incoming, 1.2.3.4@53: failed
to write changes to journal (not enough space provided)
These failures apparently caused the transfers to occur over and over. Not all the zones
being served showed up in these messages, but I'm pretty sure that the ones with a
high rate of change were more likely to do so. I do know there was plenty of disk space. A
couple of the tunables looked relevant:
max-journal-db-size: we didn't hit this limit (used ~450M of 20G limit, the default)
max-journal-usage: we might have hit this limit. The default is 100M. I increased it a
couple of times, but the problem didn't go away.
Eventually, we simply removed the journal database and restarted the server and the
repeated transfers stopped. At first I suspected that it somehow was losing track of how
much space was being allocated, but that's a flimsy theory: I don't really have
any hard evidence and these processes had run at a high load for months without trouble.
On reflection, hitting the max-journal-usage limit seems more likely. Given that:
1. Are the messages above indeed evidence of hitting the max-journal-usage limit?
2. Is there a way to see the space occupancy of each zone in the journal, so we might
tune the threshold for individual zones?
On the odd chance that there is a bug in this area: we are using a slightly older dev
variant: a branch off 2.5.0-dev that has some non-standard, minimal EDNS0 client-subnet
support we were interested in. The branch is:
https://github.com/CZ-NIC/knot/tree/ecs-patch.
Thanks,
Chuck
_______________________________________________
knot-dns-users mailing list
knot-dns-users(a)lists.nic.cz
https://lists.nic.cz/cgi-bin/mailman/listinfo/knot-dns-users
_______________________________________________
knot-dns-users mailing list
knot-dns-users(a)lists.nic.cz
https://lists.nic.cz/cgi-bin/mailman/listinfo/knot-dns-users