Can you say
anything about when this feature will be available in a
released version of kresd?
I personally can't promise anything. In any case, you can watch the
issue:
https://gitlab.labs.nic.cz/knot/knot-resolver/issues/257
Thanks for the reference we enabled notifications on this issue
so we get automatically updated if something happens there.
Yes, the few seconds after clearing will be noticeably
slower (I
believe).
I'm wondering why just a "few seconds".
It will take much longer than a few seconds to restore the cache after
a complete flush, no? (basically the same time it used to take to fill
it to the point before the flush happened).
Records of the same name and type get updated in-place
(with
a short overlap, sometimes), so for each GB you'll typically need
millions of *different* records to fill it - and that's not as fast as
one might expect, because people mostly visit a not-too-huge set of
sites, and these sets tend to overlap mostly. You could try with your
traffic and see how fast it grows (`du` command should be accurate until
we have the GC).
We got such graphs out of the box since tmpfs usage is monitored as just
another partition by system usage monitoring. It shows the mostly linear
growth until it drops to 0 when the cache clear occurs.
The resulting graph looks like a sawtooth wave with an interval of 3-4 days.
https://en.wikipedia.org/wiki/Sawtooth_wave#/media/File:Waveforms.svg
We could decrease the interval by increasing cache.size to maybe 2 weeks
but that isn't a solution either since we would still start with an
empty cache regularly - just less often.
> How do
people work around this limitation?
With sufficient memory for cache it is very unlikely to happen, so it
was not a practical problem up until now.
I'm curious:
What amount of memory are you using and how big is your user population?
We don't know how big our user population is but I assume that public
resolvers (like us) have a much more diverse user-base than typical ISP
based resolvers that offer services for their customers in a
geographically limited setting where users are more likely to visit
overlapping destinations since most of them might share a single native
language (less cache entries required).
We are getting queries from over 50 countries (if we ignore
countries with limited usage).
https://twitter.com/applied_privacy/status/1127151981632606208
I assume that is a bad combination with a DNS resolver that does not
support deleting cache entries without deleting all of them.
> I certainly haven't heard of anyone doing
something similar. It sounds
> possible, but I don't think it's worth it.
Yes, we agree. So for the time being we switched back to unbound where
we get around 50% cache hit rate with less than 1GB of cache size,
but we are looking forward to test your future version that comes with
a garbage collector.
Testing it would tremendously help to expedite the
process because
garbage collection very dependent on the deployment, so first user is
what we exactly need at the moment!
yes, we can test if there is a debian repository for it (otherwise we
would wait for the releases reaching your stable repo) .
your support via this mailing list is exceptional, thank you!
Christoph