On 04/02/16 14:40, Jan Včelak wrote:
Hi Jan,
As for 2.1.1, we did some changes in the networking
code and we want to
make sure that everything is working correctly. If you can help us with
testing we would be very happy. The tarball with sources for testing is
available on our server:
https://secure.nic.cz/files/knot-dns/knot-2.1.1-test.tar.xz
First of all thank you for all the good work that you guys are putting
into this server!
I just compiled this, and installed it on a test server, and started it.
Its configuration has 256 "listen" lines. Previously this crashed Knot,
but now it's fine. So this test passed.
Next, I edited the config file, and added 4682 slave zones to it. They
all share the "default" template, which defines one master server. Then
I called "knotc reload". Knot logged all the zones, and said it was
going to bootstrap them. But then it just sat there, doing *something*,
and it was a full 118 seconds later, when it started to check the master
for updates. Here's the log snippet showing this:
2016-02-04T15:20:06 info: [ZONE4681] zone will be bootstrapped, serial 0
2016-02-04T15:20:06 info: [ZONE4682] zone will be bootstrapped, serial 0
2016-02-04T15:20:06 info: configuration reloaded
2016-02-04T15:22:04 info: [ZONE0001] AXFR, incoming, X.X.X.X@53: starting
2016-02-04T15:22:04 info: [ZONE0002] AXFR, incoming, X.X.X.X@53: starting
Note that 118-second delay before the zone refreshes start. Note that
during this delay, Knot made hundreds of DNS queries (A and AAAA)
towards the locally-configured caching resolver (Google DNS in this
case) for its own hostname, for example:
15:48:11.474032 IP6 2001:67c:2e8:11::c100:13bf.34815 >
2001:4860:4860::8888.domain: 7665+ A?
ns1.nl-ams.testdns.ripe.net. (45)
15:48:11.474050 IP6 2001:67c:2e8:11::c100:13bf.34815 >
2001:4860:4860::8888.domain: 52542+ AAAA?
ns1.nl-ams.testdns.ripe.net. (45)
I can't see any reason for all these queries, so it looks like a bug.
Next up, when the refreshes started, Knot went and pummelled the master
server. Several zones on the master have expired, so Knot logged this:
2016-02-04T15:22:10 warning: [ZONENNNN] AXFR, incoming, X.X.X.X@53:
server responded with SERVFAIL
2016-02-04T15:22:10 warning: [ZONENNNN] AXFR, incoming, X.X.X.X@53:
failed (processing layer error)
2016-02-04T15:22:10 warning: [ZONENNNN] AXFR, incoming, remote 'hidden'
not available
2016-02-04T15:22:10 error: [ZONENNNN] AXFR, incoming, failed (no active
master)
So, the remote *is* available. It's just telling Knot that it can't
provide a zone transfer (SERVFAIL), because the zone has probably
expired on the master. So the log message looks a bit confusing. And
what does "processing layer error" mean?
Finally, I have some comments about the various parameters that "knotc"
takes. The explanation for them in knotc man page (and in the online
documentation) is rather terse. For example what does "zone-check"
actually check? It might be nice if the man page gave a bit more
information about this.
Next up, "zone-status" prints some status. However, it would be useful
if it also explains the output of "zone-status" to the operator (such as
what is "refresh in" or "journal flush" in for).
About "zone-reload": is that only for master zones, or slave zones, or
both? If it's for master zones, will they be reloaded based on the zone
file's mtime? Or will knot look at the serial number in the SOA record?
About "zone-refresh": I assume that this makes knot immediately query
the master, and if the serial numbers are the same, then no transfer is
done (like BIND's "rndc refresh"). This could be made explicit.
About "zone-retransfer": I assume that this makes knot ignore any serial
number on the master, and transfer the zone anyway, like BIND's "rndc
retransfer". Again, this could be made explicit.
About "zone-sign": the word "resign" usually means "leave your
job", so
it's probably best spelled as "re-sign" for clarity :)
Regards,
Anand Buddhdev
RIPE NCC