Hi André,
thanks much for your concern in operational aspects of Knot DNS.
The point of recent change in rrsig-refresh default is to actually align
it with other closely related options, to make it suitable in wider
range of configurations. It's obvious that 7d may be often too high or
too low.
What rrsig-refresh actually serves for, is to refresh RRSIGs soon
enough, so they they don't expire due to delays in:
1) propagation among authoritative servers, that means synchronization
of secondaries with primaries, including e.g. the lengthy process of
signing itself (in case of huge zone)
2) propagation to resolvers' caches
When I thought about this, I actually saw that (1) is exactly
propagation-delay and (2) is exactly the RRSIG's TTL. Setting
rrsig-refresh default to the sum of both values was a logical conclusion.
I'd say that the setting of propagation-delay is still in your hands, as
well as setting non-default rrsig-refresh. The only disadvantage of too
high rrsig-refresh is that zone signing takes place more often and
creates larger change-sets to be propagated to secondaries. In other
words, utilizing more of all resources (CPU, memory, disk, network).
Anyway, we already received concerns about this change. Another argument
was that in their case, the rrsig-refresh ought to also cover possible
outages of otherwise regular signing process (signer server down). It is
a philosophical question if propagation-delay should cover those as well.
This all makes me think if the one-hour default of propagation-delay is
maybe not optimal...?
Please let me know your ideas/opinions in more detail. Any real
operational experience is very very valuable for us!
Thank you,
Libor
Dne 31. 08. 22 v 10:47 André Keller napsal(a):
Good morning,
In Knot 3.2.0 the rrsig-refresh default changed, excerpt changlog:
knotd: default value for 'policy.rrsig-refresh' is propagation delay +
zone maximum TTL
I'd like to understand the rationale behind this change and whether or
not we should tune this parameter in our deployment.
We currently have monitoring in place to ensure that we always serve
valid signatures. In my understanding with the old defaults < 3.2.0 of
rrsig-refresh of 7d and rrsig-lifetime of 14d, we always ended up with
signatures that were at least valid for 7 days. As I understand, with
the new defaults, signatures might be refreshed way closer to their
expiry date. This makes me a bit uneasy, as if there are issues with
signing this gives us hardly any time to react and fix potential
issues before the current signatures expire.
I assume setting rrsig-refresh explicitly to 7d would restore the old
behavior, but I'm wondering if this is somehow bad practice and if we
are overly paranoid with our monitoring.
How do other people handle this? Are there any downsides of setting a
higher value of rrsig-refresh that we are not aware of?
Regards
André
--