Hello,
We have a setup where a Knot DNS server is a slave and a DNS
provisioning system acts as a hidden master. The DNS server process in
the hidden master is some vendor specific implementation, not NSD, Bind
or anything well-known. Now, for random zones we see the following
errors in Knot when trying to update the zone:
Jan 31 10:33:23 host knotd[14148]: info: [xxxxx.] notify, incoming,
a:b:c:d::e@51097: received, serial none
Jan 31 10:33:23 host knotd[14148]: info: [xxxxx.] refresh, outgoing,
a:b:c:d::e@8054: remote serial 2017013116, zone is outdated
Jan 31 10:33:23 host knotd[14148]: info: [xxxxx.] IXFR, incoming,
a:b:c:d::e@8054: starting
Jan 31 10:33:23 host knotd[14148]: warning: [xxxxx.] IXFR, incoming,
a:b:c:d::e@8054: failed (malformed data)
Jan 31 10:33:23 host knotd[14148]: warning: [xxxxx.] refresh, outgoing,
a:b:c:d::e@8054: fallback to AXFR
Jan 31 10:33:23 host knotd[14148]: warning: [xxxxx.] refresh, remote
'....' not usable
As we can see, Knot first receives a notify message that triggers IXFR.
For yet unknown reason, IXFR fails due to "malformed data", after which
Knot fallbacks to AXFR. However, from tcpdump capture (I can share the
pcap off-list, if needed) we can see, that Knot reuses the same TCP
socket for AXFR as it used for IXFR, but immediately after sending the
AXFR query Knot sends TCP RST to the hidden master thus closing the TCP
connection, making the remote/master server to be unusable from Knot's
point of view.
The negative thing is that after the failure Knot gives up trying to
update the zone, leaving the zone to its old SOA serial, maybe until it
expires. So far we also don't know, what causes the IXFR to fail in the
first place. From what we can see, the zone data seems to be valid so
it's unclear why Knot fails with "malformed data". However, after
manually running "knotc zone-retransfer <zone>" once, subsequent IXFRs
succeed. Unfortunately we have very limited options to configure the
hidden master, because as said, it is a vendor specific implementation.
So we have two issues here: failing IXFR and then failure in AXFR
fallback due to TCP connection reset on the Knot side. Do you have any
ideas? Oh, forgot to mention that the Knot version is 2.4.0.
Thank you in advance for all help,
Antti