Hello Knot developers,
I'm testing 1.3.0-rc4, and have found something that looks like a bug.
I'm running knot using the CentOS upstart supervisor, and in the upstart
script, I have:
pre-stop exec knotc -c $CONF -w stop
This means that when I run "initctl stop knot", upstart will run "knotc
-c /etc/knot/knot.conf -w stop". The "-w" is supposed to make knotc wait
until the server has stopped.
However, in reality this is not happening. When the stop command is
given, Knot logs this:
2013-07-17T22:48:23 Stopping server...
2013-07-17T22:48:23 Server finished.
2013-07-17T22:48:23 Shut down.
And knotc returns *immediately*. However, if I examine the process
table, I see the knotd process still running. It takes knotd about 10
more seconds to actually exit, at 22:48:33. This is problematic for
upstart. Since knotc has returned, but the knotd process hasn't yet
died, upstart thinks that it has not responded to the stop request, and
so upstart uses the sledgehammer (kill -9) to stop the knotd process.
My assumption is that the knotd process is still doing housekeeping
stuff, so the KILL signal is not a good idea. By the looks of it, the
"-w" flag to knotc isn't doing what it's supposed to, ie. wait for the
server to exit. Could you please investigate this and fix it?
(As an aside, I can work around this in upstart by using the option
"kill timeout 60" which will make upstart wait at least 60 seconds
before trying a KILL signal, by which time knotd should have exited. But
this is just a work-around, not a solution).
Regards,
Anand Buddhdev
RIPE NCC
Hello,
it seems that knotd suffers from the same issue as described here:
http://lists.scusting.com/index.php?t=msg&th=244420
I have Debian 7.0 with
http://deb.knot-dns.cz/debian/dists/wheezy/main/binary-i386/net/knot_1.2.0-…
and this is in /var/log/syslog after reboot:
Jun 3 22:37:43 ns knot[2091]: Binding to interface 2xxx:xxxx:xxxx:xxxx::1
port 53.
Jun 3 22:37:43 ns knot[2091]: [error] Cannot bind to socket (errno 99).
Jun 3 22:37:43 ns knot[2091]: [error] Could not bind to UDP
interface 2xxx:xxxx:xxxx:xxxx::1 port 53.
I have a static IPv6 address configured in /etc/network/interfaces.
Restarting knot later binds to this IPv6 address without any problem - it
is only the first start which fails (during OS booting). What do you think
that is the proper way of making knotd reliably listen on a static IPv6
address? I would prefer if I could avoid restarting knotd.
Leos Bitto
Hello Knot folks:
The 'rundir' obsoletes 'pidfile' config option, as the PID file will
always be placed in 'rundir'."
This is cool, unless you want to run multiple instances of KNOT on a single
machine. Can you reconsider this?
Jonathan
Hello KNOT folks,
We've found an issue 1.3 with bootstrapping. We're using FreeBSD 9.x, but we
also quickly confirmed it exists on Ubuntu 12.x to confirm it was not
isolated to FreeBSD. We're testing with about 3000 to 4000 zones, so our
environment is not even very large at this point and the bootstrapping
failures are very problematic. There are three causes that we've seen thus
far:
1. If the AXFR TCP connect is interrupted by a signal, the whole AXFR is
aborted and the bootstrap is rescheduled instead of selecting on the socket
to either get the successful connection, or until it times out/fails. This
can result in a flood of connects, with little to no progress in the
bootstrapping.
2. When connected, if a recv() is interrupted by a signal, it isn't retried.
This results in connections being dropped that don't need to be dropped.
3. If a successful connect is made, but the remote end subsequently drops it
(e.g., resets the connection), then the bootstrap fails without being
rescheduled. This was found when slaving from a non-KNOT DNS server that may
have TCP rate limiting enabled, or something of that nature. Either way, the
fact that it is not rescheduled is very undesirable.
I suspect that there are other cases of interrupted system calls not being
handled correctly.
Here is some additional info that may help find the root cause:
- The greater the latency between the master and slave, the worse the
problem is. We tested with a slave 80 ms RTT away and it was very bad.
- The more worker threads you have, the worse the problem is. So even
locally (slave 0 ms away from master) we could reproduce the issue fairly
easily.
Hopefully this can be remedied!
Cheers,
Jonathan
Hi Knot people,
I've been trying out rc3, and I've found a few issues, as documented below:
1. In the knotc man page, the "-s" option shows the default as
@rundir@/knot.sock. I guess that should have been substituted with the
compile-time setting, but wasn't.
2. knotc start doesn't do anything now. It should be removed.
3. knotc restart only stops the server, but does not start it. It should
be removed.
4. When configured as a slave, and Knot receives a zone via AXFR
containing the following record:
<owner-obscured>. 86400 IN TXT "Alliance Fran\195\167aise de Kotte"
it serves this record correctly when queried over DNS:
dig @a.b.c.d +norec +short txt <owner-obscured>.
"Alliance Fran\195\167aise de Kotte"
But when saving the zone to disk, this record gets written out as:
<owner-obscured>. 86400 IN TXT "Alliance Fran\4294967235\4294967207aise
de Kotte"
So when Knot restarts and tries to load this zone, it gives an error:
Error in zone file /var/lib/knot/XX:4875: Number is bigger than 8 bits.
5. I have configured Knot to log to the file /var/log/knot/knot.log.
Most logging goes into it. However, some error messages are still
leaking into syslog, for example, the following appears in both Knot's
log file, as well as in /var/log/messages (via syslog):
logfile:
2013-06-28T20:17:25 [error] Incoming AXFR of 'XX.' with 'a.b.c.d@53':
General error.
/var/log/messages:
Jun 28 20:17:25 admin knot[1630]: [error] Incoming AXFR of 'XX.' with
'a.b.c.d@53': General error.
I would expect Knot to send logging to syslog only while it is starting,
and it hasn't setup its logging to file yet. Once logging to a file has
started, there should be no more logs to syslog, so this looks like a
bug, albeit harmless.
That's about it for Friday evening :)
Regards,
Anand Buddhdev
RIPE NCC