Hello Antti,
first of all, thank you for the report.
As we can see, Knot first receives a notify message
that triggers IXFR.
For yet unknown reason, IXFR fails due to "malformed data", after which
Knot fallbacks to AXFR. However, from tcpdump capture (I can share the
pcap off-list, if needed) we can see, that Knot reuses the same TCP
socket for AXFR as it used for IXFR, but immediately after sending the
AXFR query Knot sends TCP RST to the hidden master thus closing the TCP
connection, making the remote/master server to be unusable from Knot's
point of view.
Reusing TCP connection for AXFR after a failed IXFR is incorrect and
it's a bug we introduced in the last release. I haven't realised this
situation when implementing TCP connection reuse. Sorry for that. We
are gonna fix it.
As for IXFR malformed data, the pcap would be really helpful. Please,
can you send the pcap to our internal list knot-dns(a)labs.nic.cz? We
will investigate.
The negative thing is that after the failure Knot
gives up trying to
update the zone, leaving the zone to its old SOA serial, maybe until it
expires. So far we also don't know, what causes the IXFR to fail in the
first place. From what we can see, the zone data seems to be valid so
it's unclear why Knot fails with "malformed data". However, after
manually running "knotc zone-retransfer <zone>" once, subsequent IXFRs
succeed. Unfortunately we have very limited options to configure the
hidden master, because as said, it is a vendor specific implementation.
Knot should give up for SOA retry time. But then will probably fail
again the same way. The retransfer is likely to work because it forces
AXFR.
Cheers,
Jan