next week releases

List overview All Threads
Download

newer

older

Knot DNS 1.6.7 patch release

Troubles compiling Knot 2.1

Jan Včelak

4 Úno 2016 4 Úno '16

14:40

Hello list, this is just a quick note that we plan to do an ordinary bug fix release of Knot DNS 1.6.7 and Knot DNS 2.1.1 the next week. As for 2.1.1, we did some changes in the networking code and we want to make sure that everything is working correctly. If you can help us with testing we would be very happy. The tarball with sources for testing is available on our server: https://secure.nic.cz/files/knot-dns/knot-2.1.1-test.tar.xz Thank you. Cheers, Jan

Attachments:

signature.asc (application/pgp-signature — 819 bajtů)

Show replies by date

Anand Buddhdev

4 Úno 4 Úno

17:03

On 04/02/16 14:40, Jan Včelak wrote: Hi Jan,

...

As for 2.1.1, we did some changes in the networking code and we want to make sure that everything is working correctly. If you can help us with testing we would be very happy. The tarball with sources for testing is available on our server: https://secure.nic.cz/files/knot-dns/knot-2.1.1-test.tar.xz

First of all thank you for all the good work that you guys are putting into this server! I just compiled this, and installed it on a test server, and started it. Its configuration has 256 "listen" lines. Previously this crashed Knot, but now it's fine. So this test passed. Next, I edited the config file, and added 4682 slave zones to it. They all share the "default" template, which defines one master server. Then I called "knotc reload". Knot logged all the zones, and said it was going to bootstrap them. But then it just sat there, doing *something*, and it was a full 118 seconds later, when it started to check the master for updates. Here's the log snippet showing this: 2016-02-04T15:20:06 info: [ZONE4681] zone will be bootstrapped, serial 0 2016-02-04T15:20:06 info: [ZONE4682] zone will be bootstrapped, serial 0 2016-02-04T15:20:06 info: configuration reloaded 2016-02-04T15:22:04 info: [ZONE0001] AXFR, incoming, X.X.X.X@53: starting 2016-02-04T15:22:04 info: [ZONE0002] AXFR, incoming, X.X.X.X@53: starting Note that 118-second delay before the zone refreshes start. Note that during this delay, Knot made hundreds of DNS queries (A and AAAA) towards the locally-configured caching resolver (Google DNS in this case) for its own hostname, for example: 15:48:11.474032 IP6 2001:67c:2e8:11::c100:13bf.34815 > 2001:4860:4860::8888.domain: 7665+ A? ns1.nl-ams.testdns.ripe.net. (45) 15:48:11.474050 IP6 2001:67c:2e8:11::c100:13bf.34815 > 2001:4860:4860::8888.domain: 52542+ AAAA? ns1.nl-ams.testdns.ripe.net. (45) I can't see any reason for all these queries, so it looks like a bug. Next up, when the refreshes started, Knot went and pummelled the master server. Several zones on the master have expired, so Knot logged this: 2016-02-04T15:22:10 warning: [ZONENNNN] AXFR, incoming, X.X.X.X@53: server responded with SERVFAIL 2016-02-04T15:22:10 warning: [ZONENNNN] AXFR, incoming, X.X.X.X@53: failed (processing layer error) 2016-02-04T15:22:10 warning: [ZONENNNN] AXFR, incoming, remote 'hidden' not available 2016-02-04T15:22:10 error: [ZONENNNN] AXFR, incoming, failed (no active master) So, the remote *is* available. It's just telling Knot that it can't provide a zone transfer (SERVFAIL), because the zone has probably expired on the master. So the log message looks a bit confusing. And what does "processing layer error" mean? Finally, I have some comments about the various parameters that "knotc" takes. The explanation for them in knotc man page (and in the online documentation) is rather terse. For example what does "zone-check" actually check? It might be nice if the man page gave a bit more information about this. Next up, "zone-status" prints some status. However, it would be useful if it also explains the output of "zone-status" to the operator (such as what is "refresh in" or "journal flush" in for). About "zone-reload": is that only for master zones, or slave zones, or both? If it's for master zones, will they be reloaded based on the zone file's mtime? Or will knot look at the serial number in the SOA record? About "zone-refresh": I assume that this makes knot immediately query the master, and if the serial numbers are the same, then no transfer is done (like BIND's "rndc refresh"). This could be made explicit. About "zone-retransfer": I assume that this makes knot ignore any serial number on the master, and transfer the zone anyway, like BIND's "rndc retransfer". Again, this could be made explicit. About "zone-sign": the word "resign" usually means "leave your job", so it's probably best spelled as "re-sign" for clarity :) Regards, Anand Buddhdev RIPE NCC

Anand Buddhdev

21:48

On 04/02/16 17:03, Anand Buddhdev wrote: Hi Jan,

...

I have one more observation. The test server has been running for about 5 hours now. Of the 4682 zones configured, 2891 have still not been transferred in. I ran "knotc zone-refresh" and Knot appears to be trying to refresh zones. It appears to be doing some kind of batching. It does a bunch of zones, and then waits 5 seconds before doing another batch. The numer of untransferred zones is going down, albeit slowly. I think Knot 2's slave zone refresh strategy and timing need still more work, if it's to work effectively for a configuration with lots of slave zones. Regards, Anand

Jan Včelak

8 Úno 8 Úno

13:41

Hello Anand. Thank you for a complex write-up. :) On 4.2.2016 17:03, Anand Buddhdev wrote:

...

Next, I edited the config file, and added 4682 slave zones to it. They all share the "default" template, which defines one master server. Then I called "knotc reload". Knot logged all the zones, and said it was going to bootstrap them. But then it just sat there, doing *something*, and it was a full 118 seconds later, when it started to check the master for updates. Here's the log snippet showing this: 2016-02-04T15:20:06 info: [ZONE4681] zone will be bootstrapped, serial 0 2016-02-04T15:20:06 info: [ZONE4682] zone will be bootstrapped, serial 0 2016-02-04T15:20:06 info: configuration reloaded 2016-02-04T15:22:04 info: [ZONE0001] AXFR, incoming, X.X.X.X@53: starting 2016-02-04T15:22:04 info: [ZONE0002] AXFR, incoming, X.X.X.X@53: starting

Hm. We will investigate this a little more. It's quite possible that it's related to the next problem.

...

Note that 118-second delay before the zone refreshes start. Note that during this delay, Knot made hundreds of DNS queries (A and AAAA) towards the locally-configured caching resolver (Google DNS in this case) for its own hostname, for example:

Yes, this is a bug. Knot tries to get host's canonical name for the purpose of hostname.bind TXT/CH queries. This happens when any event is started, which is wrong. We will fix it.

...

Next up, when the refreshes started, Knot went and pummelled the master server. Several zones on the master have expired, so Knot logged this: 2016-02-04T15:22:10 warning: [ZONENNNN] AXFR, incoming, X.X.X.X@53: server responded with SERVFAIL 2016-02-04T15:22:10 warning: [ZONENNNN] AXFR, incoming, X.X.X.X@53: failed (processing layer error) 2016-02-04T15:22:10 warning: [ZONENNNN] AXFR, incoming, remote 'hidden' not available 2016-02-04T15:22:10 error: [ZONENNNN] AXFR, incoming, failed (no active master) So, the remote *is* available. It's just telling Knot that it can't provide a zone transfer (SERVFAIL), because the zone has probably expired on the master. So the log message looks a bit confusing. And what does "processing layer error" mean?

Any outgoing queries are handled by some kind of a state machine. And we use layers to stack the processing steps. So this error just means that there was some error during the transfer. We could improve this. But probably no earlier than in 2.2.0.

...

Finally, I have some comments about the various parameters that "knotc" takes. The explanation for them in knotc man page (and in the online documentation) is rather terse. For example what does "zone-check" actually check? It might be nice if the man page gave a bit more information about this.

We will improve the documentation before the release. And we will try to address your issues regarding any functional changes in the next feature release.

...

Next up, "zone-status" prints some status. However, it would be useful if it also explains the output of "zone-status" to the operator (such as what is "refresh in" or "journal flush" in for).

We are aware of this. The zone-status output is the top candidate for upcoming improvements.

...

About "zone-reload": is that only for master zones, or slave zones, or both? If it's for master zones, will they be reloaded based on the zone file's mtime? Or will knot look at the serial number in the SOA record?

For both. The reload checks if the zone file mtime and reloads the zone from disk if necessary. This applies both to master and slave zones. For slave zone, refresh/bootstrap is scheduled in addition to that.

...

About "zone-refresh": I assume that this makes knot immediately query the master, and if the serial numbers are the same, then no transfer is done (like BIND's "rndc refresh"). This could be made explicit.

Right.

...

About "zone-retransfer": I assume that this makes knot ignore any serial number on the master, and transfer the zone anyway, like BIND's "rndc retransfer". Again, this could be made explicit.

Exactly.

...

About "zone-sign": the word "resign" usually means "leave your job", so it's probably best spelled as "re-sign" for clarity :)

I already resigned on naming commands. ;-) This one was originally named 'sign'. But we changed it since we do automatic signing and this command just forces Knot to drop all existing signatures.

...

The thing is that we didn't changed anything on the transfer scheduling between 1.6 and 2.0. I'll investigate this. If you found some additional hints, please, let us know. Thank you again. :) Cheers, Jan

Anand Buddhdev

14:03

On 08/02/16 13:41, Jan Včelak wrote: Hello Jan,

...

Thank you for a complex write-up. :)

You're welcome!

...

Yes, this is a bug. Knot tries to get host's canonical name for the purpose of hostname.bind TXT/CH queries. This happens when any event is started, which is wrong. We will fix it.

Ah, I had guessed this was the cause. Knot shouldn't have to do any queries, because in the config, I have: server: identity: ns1.nl-ams.testdns.ripe.net nsid: ns1.nl-ams.testdns.ripe.net The values are explicit, so there should be no lookups. The lookup is only necessary if the values are set to "yes". And even then, just once should be enough (at start, or at reload).

...

Hm, okay. I had hoped you could fix this for 2.1.1, but I guess you're eager to release 2.1.1 this week.

...

We will improve the documentation before the release. And we will try to

Thanks!

...

About "zone-sign": the word "resign" usually means "leave your job", so it's probably best spelled as "re-sign" for clarity :)

I already resigned on naming commands. ;-) This one was originally named 'sign'. But we changed it since we do automatic signing and this command just forces Knot to drop all existing signatures.

Ooh, I give up :)

...

I think Knot 2's slave zone refresh strategy and timing need still more work, if it's to work effectively for a configuration with lots of slave zones.

The thing is that we didn't changed anything on the transfer scheduling between 1.6 and 2.0. I'll investigate this. If you found some additional hints, please, let us know.

I think the refresh batching may be due to the host name lookups I noted earlier. I don't have any more to add for now. Regards, Anand

3472

days inactive

3476

days old

knot-dns-users@lists.nic.cz

Manage subscription

4 comments

2 participants

tags (0)

participants (2)

Anand Buddhdev
Jan Včelak