Hello Jonathan,
thanks for the comprehensive bug report.  I have created a issue in our new issue tracking
system and we will have a fix ready in -rc4.
The link to the issue: 
https://gitlab.labs.nic.cz/knot/issues/65
Ondrej
On 3. 7. 2013, at 15:39, <jh(a)netriplex.com> <jh(a)netriplex.com> wrote:
  Hello KNOT folks,
 We've found an issue 1.3 with bootstrapping. We're using FreeBSD 9.x, but we
 also quickly confirmed it exists on Ubuntu 12.x to confirm it was not
 isolated to FreeBSD. We're testing with about 3000 to 4000 zones, so our
 environment is not even very large at this point and the bootstrapping
 failures are very problematic. There are three causes that we've seen thus
 far:
 1. If the AXFR TCP connect is interrupted by a signal, the whole AXFR is
 aborted and the bootstrap is rescheduled instead of selecting on the socket
 to either get the successful connection, or until it times out/fails. This
 can result in a flood of connects, with little to no progress in the
 bootstrapping.
 2. When connected, if a recv() is interrupted by a signal, it isn't retried.
 This results in connections being dropped that don't need to be dropped.
 3. If a successful connect is made, but the remote end subsequently drops it
 (e.g., resets the connection), then the bootstrap fails without being
 rescheduled. This was found when slaving from a non-KNOT DNS server that may
 have TCP rate limiting enabled, or something of that nature. Either way, the
 fact that it is not rescheduled is very undesirable.
 I suspect that there are other cases of interrupted system calls not being
 handled correctly.
 Here is some additional info that may help find the root cause:
 - The greater the latency between the master and slave, the worse the
 problem is. We tested with a slave 80 ms RTT away and it was very bad.
 - The more worker threads you have, the worse the problem is. So even
 locally (slave 0 ms away from master) we could reproduce the issue fairly
 easily.
 Hopefully this can be remedied!
 Cheers,
 Jonathan
 _______________________________________________
 knot-dns-users mailing list
 knot-dns-users(a)lists.nic.cz
 
https://lists.nic.cz/cgi-bin/mailman/listinfo/knot-dns-users 
--
 Ondřej Surý -- Chief Science Officer
 -------------------------------------------
 CZ.NIC, z.s.p.o.    --    Laboratoře CZ.NIC
 Americka 23, 120 00 Praha 2, Czech Republic
 mailto:ondrej.sury@nic.cz    
http://nic.cz/
 tel:+420.222745110       fax:+420.222745112
 -------------------------------------------