Hi again Anand,
On 9 March 2013 03:07, Anand Buddhdev <anandb(a)ripe.net> wrote:
Hello Knot developers,
I configured an instance of Knot 1.2.0rc3 with 5174 slave zones, and
started it. I wanted to see how long it would take to transfer all the
zones in. It has been running for about an hour, and it still hasn't
managed to load all the zones from its master. I'll pick just one zone
as an example to show how long it took:
# grep
apnic.net /var/log/knot/knot.log
2013-03-09T00:50:19.361489-00:00 Will attempt to bootstrap zone
apnic.net. from AXFR master in 37s.
2013-03-09T01:06:31.330875-00:00 Will attempt to bootstrap zone
apnic.net. from AXFR master in 35s.
2013-03-09T01:16:19.741644-00:00 Incoming AXFR transfer of 'apnic.net.'
with '193.0.0.198@53' key 'ripencc-20110222.': Started.
2013-03-09T01:16:35.282203-00:00 Will attempt to bootstrap zone
apnic.net. from AXFR master in 52s.
2013-03-09T01:52:16.477212-00:00 Will attempt to bootstrap zone
apnic.net. from AXFR master in 37s.
2013-03-09T01:56:02.806828-00:00 Will attempt to bootstrap zone
apnic.net. from AXFR master in 13s.
2013-03-09T01:57:00.901518-00:00 Incoming AXFR transfer of 'apnic.net.'
with '193.0.0.198@53' key 'ripencc-20110222.': Started.
2013-03-09T01:57:01.096493-00:00 Incoming AXFR transfer of 'apnic.net.'
with '193.0.0.198@53' key 'ripencc-20110222.': Finished.
From the time Knot decided it needed to bootstrap
apnic.net, to the time
it actually transferred the zone in, it took 67 minutes! All that while,
it was responsing with SERVFAIL for the zone.
Well, you probably hit the weak spot of current implementation.
We regularly test bootstrap speed of about 5k small zones and it
finishes in about 1 minute or so,
but the problem is that this is done over a 1GbE. The thing is we do
not handle congestion very efficiently when the there are a large
number of larger zones or the line is slower.
As of current implementation, bootstrap requests are scheduled with
jittered timer and some stepping,
but over non-ideal lines it may happen that the transfer rate is
slower, packets may be lost, connections may be interrupted and so on.
We are working on a new implementation with a fixed queue, that would
handle this situation efficiently (it will be self-throttling) but it
probably won't get into 1.2.0.
For what it's worth, the problem is most evident on a bootstrap, when
you have most of the zones and
reasonable refresh timers, it will get up to speed again. Sorry for that.
While the zone "apnic.net" has now been
loaded, Knot is still busy
loading many of the other zones. I can't tell how far it is. Perhaps
there could be a command for knotc, called "zonestatus" which would
print out a list of all configured zone, and their statuses.
This is a good idea, I'll add this.
I understand that transferring in lots of zones takes
time. However, if
I start an instance of BIND with the same slave zones, it can transfer
them all in within about 15-20 minutes. Knot appears to be much slower
in comparison.
Do you have any suggestions for speeding things up?
As I said earlier, please bear with the initial bootstrap and then
things will get faster I hope.
I'll let you know as soon as the new scheduler makes it to the release.
Kind regards,
Marek
Regards,
Anand
_______________________________________________
knot-dns-users mailing list
knot-dns-users(a)lists.nic.cz
https://lists.nic.cz/cgi-bin/mailman/listinfo/knot-dns-users