cache filesystem size and setting a cache size limit

List overview All Threads
Download

newer

older

public nic.cz DoH service using...

OpenBuild Service package...

Christoph

5 Kvě 2019 5 Kvě '19

14:05

Hi, how big should the cache filesystem (tmpfs) be relative to the cache.size? If cache.size is N MB is N + 50 MB a fine value? I'm asking because apparently 50 MB additional space is not enough:

...

SIGBUS received; this is most likely due to filling up the filesystem

where cache resides. https://knot-resolver.readthedocs.io/en/stable/daemon.html#envvar-cache.size writes:

...

Note that this is only a hint to the backend, which may or may not

respect it. See cache.open(). If cache.size does not set a maximum size maybe remove "Set the cache maximum size in bytes." from its documentation? :) Does that mean I have to use cache.open() as well to enforce a limit or is "max_size" also just a hint that is not respected/enforced? If so: is there a way to enforce a max_size for the cache? thanks, Christoph

Show replies by date

Vladimír Čunát

7 Kvě 7 Kvě

16:36

New subject: cache filesystem size and setting a cache size limit

Hi Christoph. On 5/5/19 2:05 PM, Christoph wrote:

...

how big should the cache filesystem (tmpfs) be relative to the cache.size? If cache.size is N MB is N + 50 MB a fine value? I'm asking because apparently 50 MB additional space is not enough: [...]

I can't see what's wrong there yet. For me LMDB used by kresd allocates `data.mdb` file of exactly the chosen size and `lock.mdb` of 8 MiB. The implementation is basically just calling mdb_env_set_mapsize() [1] [1] http://www.lmdb.tech/doc/group__mdb.html#gaa2506ec8dab3d969b0e609cd82e619e5 It's possible tmpfs has some overhead, so 50 MiB might be a too small margin for the mount size if the size is e.g. in gigabytes. In any case, tmpfs is supposed to only consume memory used by files inside, so mounting with larger size should be for free, except for giving up a kernel-level upper bound on size.

...

If cache.size does not set a maximum size maybe remove "Set the cache maximum size in bytes." from its documentation? :)

I believe it works, but perhaps you can send more details so we can reproduce it, assuming the deviation is significant. Before that you'd best check what `ls -l` reports on the files. I think there are some minor caveats, some mentioned in [1] - IIRC low limit of 10 MiB, rounding to 4 KiB, truncation sometimes impossible because it never moves written data. --Vladimir

Christoph

18:46

New subject: cache filesystem size and setting a cache size limit

Hi Vladimir, thanks for your reply. I'll do some more tests then (like decreasing the cache.size until it no longer fills up the tmpfs partition). Either way the documentation is a bit confusing to me because it says it sets the maximum but the maximum is just a hint that might get ignored. https://knot-resolver.readthedocs.io/en/stable/daemon.html#envvar-cache.size thanks, Christoph

Christoph

9 Kvě 9 Kvě

23:04

New subject: cache filesystem size and setting a cache size limit

This time, kresd produced the following logs when it reached cache.size (the tmpfs still had lots of free space). 22:00:15 kresd[11750]: [cache] MDB_BAD_TXN, probably overfull 22:00:15 kresd[11750]: [cache] clearing error, falling back 22:00:15 kresd[11750]: [cache] MDB_BAD_TXN, probably overfull 22:00:15 kresd[11750]: [cache] clearing because overfull, ret = 0 When this happened kresd lost all its cache (this is an assumption but at the time it happened the usage level of the tmpfs partition dropped to 0 before starting to increase again at the usual rate). I'm wondering if there is something special here in our setup with tmpfs? Is anyone else putting their cache on tmpfs with kresd 4.0.0? (and can reproduce this or also: can not reproduce this?) thanks, Christoph

Vladimír Čunát

10 Kvě 10 Kvě

8:27

New subject: cache filesystem size and setting a cache size limit

On 5/9/19 11:04 PM, Christoph wrote:

...

Yes, that's been "normal" behavior of kresd, at least so far. I don't think it's documented, but when the cache is full (i.e. fails a write due to being full), the only coping is cache.clear() - in that state typically lmdb isn't able to commit *any* kind of changes (even clearing), so there's the "falling back" that removes the files and starts a new cache. We have a WIP on a "garbage-collecting" daemon that tries to remove data that are estimated as less useful when the cache is getting large, but so far typical deployments can afford setting the limit so large that it only fills up very rarely.

...

I'm wondering if there is something special here in our setup with tmpfs? Is anyone else putting their cache on tmpfs with kresd 4.0.0? (and can reproduce this or also: can not reproduce this?)

I think it's actually typical to use tmpfs, at least for more "serious" deployments. We certainly do that on our Turris routers (Omnia and MOX), as the writes could kill the flash storage soon. I suppose persisting cache across system restarts isn't too useful :-) and tmpfs is enough for persisting across daemon restarts. --Vladimir

Christoph

21:27

New subject: cache filesystem size and setting a cache size limit

Vladimír Čunát:

...

Yes, that's been "normal" behavior of kresd, at least so far. I don't think it's documented, but when the cache is full (i.e. fails a write due to being full), the only coping is cache.clear()

Just to make sure I understood you correctly: If the cache reaches cache.size, kresd will flush the **entire** cache? That is a bummer since that will obviously have a severe impact on cache hit rate and the performance for clients if the entire cache gets flushed on a regular basis. I didn't expect that. How do people work around this limitation? Place kresd behind a caching resolver that does cache housekeeping? nginx -> doh-httpproxy -> knot-resolver -> unbound ? this chain is getting longer and longer ;)

...

We have a WIP on a "garbage-collecting" daemon that tries to remove data that are estimated as less useful when the cache is getting large,

Can you say anything about when this feature will be available in a released version of kresd?

...

but so far typical deployments can afford setting the limit so large that it only fills up very rarely.

Even if it happens rarely (lets say you have enough memory for 3 weeks worth of traffic), that will also result in slow response times (due to empty cache) every 3 weeks when the cache gets cleared if I understood you correctly. thanks for your insights, Christoph

Vladimír Čunát

23:39

New subject: cache filesystem size and setting a cache size limit

On 5/10/19 9:27 PM, Christoph wrote:

...

Just to make sure I understood you correctly: If the cache reaches cache.size, kresd will flush the **entire** cache?

Yes, correct. I agree it's surprising.

...

How do people work around this limitation? Place kresd behind a caching resolver that does cache housekeeping? nginx -> doh-httpproxy -> knot-resolver -> unbound ? this chain is getting longer and longer ;)

I certainly haven't heard of anyone doing something similar. It sounds possible, but I don't think it's worth it.

...

Can you say anything about when this feature will be available in a released version of kresd?

I personally can't promise anything. In any case, you can watch the issue: https://gitlab.labs.nic.cz/knot/knot-resolver/issues/257

...

but so far typical deployments can afford setting the limit so large that it only fills up very rarely.

Yes, the few seconds after clearing will be noticeably slower (I believe). Records of the same name and type get updated in-place (with a short overlap, sometimes), so for each GB you'll typically need millions of *different* records to fill it - and that's not as fast as one might expect, because people mostly visit a not-too-huge set of sites, and these sets tend to overlap mostly. You could try with your traffic and see how fast it grows (`du` command should be accurate until we have the GC). Now that I think of it, zram might be usable to extend the capacity in exchange for bearable latency hit on rarely used records, but I haven't tried to use it this way. Swapping to an SSD also, I guess; directly placing the cache on an SSD would be more likely to cause too many writes, though it's possible almost all will be sequential writes and thus not too bad. --Vladimir

Petr Špaček

11 Kvě 11 Kvě

9:25

New subject: cache filesystem size and setting a cache size limit

On 11. 05. 19 4:39, Vladimír Čunát wrote:

...

On 5/10/19 9:27 PM, Christoph wrote:

Just to make sure I understood you correctly: If the cache reaches cache.size, kresd will flush the **entire** cache?

Yes, correct. I agree it's surprising. > How do people work around this limitation?

With sufficient memory for cache it is very unlikely to happen, so it was not a practical problem up until now. I'm curious: What amount of memory are you using and how big is your user population?

...

Place kresd behind a caching resolver that does cache housekeeping? nginx -> doh-httpproxy -> knot-resolver -> unbound ? this chain is getting longer and longer ;)

I certainly haven't heard of anyone doing something similar. It sounds possible, but I don't think it's worth it.

That really does not make much sense. You could also double the amount of memory and lower the probability of cache flush significantly.

...

Can you say anything about when this feature will be available in a released version of kresd?

I personally can't promise anything. In any case, you can watch the issue: https://gitlab.labs.nic.cz/knot/knot-resolver/issues/257

We have a prototype code for the garbage collector, so we might polish it a bit and provide you with instructions how to run it. Testing it would tremendously help to expedite the process because garbage collection very dependent on the deployment, so first user is what we exactly need at the moment! Just let me know if you want to test it in your environment. Petr Špaček @ CZ.NIC

...

but so far typical deployments can afford setting the limit so large that it only fills up very rarely.

Christoph

13:22

New subject: cache filesystem size and setting a cache size limit

...

Can you say anything about when this feature will be available in a released version of kresd?

I personally can't promise anything. In any case, you can watch the issue: https://gitlab.labs.nic.cz/knot/knot-resolver/issues/257

Thanks for the reference we enabled notifications on this issue so we get automatically updated if something happens there.

...

Yes, the few seconds after clearing will be noticeably slower (I believe).

I'm wondering why just a "few seconds". It will take much longer than a few seconds to restore the cache after a complete flush, no? (basically the same time it used to take to fill it to the point before the flush happened).

...

Records of the same name and type get updated in-place (with a short overlap, sometimes), so for each GB you'll typically need millions of *different* records to fill it - and that's not as fast as one might expect, because people mostly visit a not-too-huge set of sites, and these sets tend to overlap mostly. You could try with your traffic and see how fast it grows (`du` command should be accurate until we have the GC).

We got such graphs out of the box since tmpfs usage is monitored as just another partition by system usage monitoring. It shows the mostly linear growth until it drops to 0 when the cache clear occurs. The resulting graph looks like a sawtooth wave with an interval of 3-4 days. https://en.wikipedia.org/wiki/Sawtooth_wave#/media/File:Waveforms.svg We could decrease the interval by increasing cache.size to maybe 2 weeks but that isn't a solution either since we would still start with an empty cache regularly - just less often.

...

> How do people work around this limitation?

With sufficient memory for cache it is very unlikely to happen, so it was not a practical problem up until now. I'm curious: What amount of memory are you using and how big is your user population?

We don't know how big our user population is but I assume that public resolvers (like us) have a much more diverse user-base than typical ISP based resolvers that offer services for their customers in a geographically limited setting where users are more likely to visit overlapping destinations since most of them might share a single native language (less cache entries required). We are getting queries from over 50 countries (if we ignore countries with limited usage). https://twitter.com/applied_privacy/status/1127151981632606208 I assume that is a bad combination with a DNS resolver that does not support deleting cache entries without deleting all of them.

...

> I certainly haven't heard of anyone doing something similar. It sounds > possible, but I don't think it's worth it.

Yes, we agree. So for the time being we switched back to unbound where we get around 50% cache hit rate with less than 1GB of cache size, but we are looking forward to test your future version that comes with a garbage collector.

...

Testing it would tremendously help to expedite the process because garbage collection very dependent on the deployment, so first user is what we exactly need at the moment!

yes, we can test if there is a debian repository for it (otherwise we would wait for the releases reaching your stable repo) . your support via this mailing list is exceptional, thank you! Christoph

Tomas Krizek

16 Kvě 16 Kvě

14:28

New subject: cache filesystem size and setting a cache size limit

On 11/05/2019 13.22, Christoph wrote:

...

Testing it would tremendously help to expedite the process because garbage collection very dependent on the deployment, so first user is what we exactly need at the moment!

yes, we can test if there is a debian repository for it (otherwise we would wait for the releases reaching your stable repo) .

Hi, I've prepared experimental packages with a cache GC daemon. You can currently find them in [knot-resolver-testing]. (version 4.0.0.1558007111.360311a5) If you'd be interested in testing it, it'd be a great help to assess whether there are any major issues with the approach or the daemon. I've played with it a bit and I set the defaults to values that seemed to work, more info in [PR!817]. To run the GC as a daemon that checks the cache in /var/cache/knot-resolver every minute, use: kr_cache_gc -c /var/cache/knot-resolver -d 60000 Thanks for you help! [knot-resolver-testing] - https://build.opensuse.org/package/show/home:CZ-NIC:knot-resolver-testing/k… [PR!817] - https://gitlab.labs.nic.cz/knot/knot-resolver/merge_requests/817 -- Tomas Krizek PGP: 4A8B A48C 2AED 933B D495 C509 A1FB A5F7 EF8C 4869

Christoph

21:07

New subject: cache filesystem size and setting a cache size limit

...

I've prepared experimental packages with a cache GC daemon. You can currently find them in [knot-resolver-testing]. (version 4.0.0.1558007111.360311a5) If you'd be interested in testing it, it'd be a great help to assess whether there are any major issues with the approach or the daemon. I've played with it a bit and I set the defaults to values that seemed to work, more info in [PR!817]. To run the GC as a daemon that checks the cache in /var/cache/knot-resolver every minute, use: kr_cache_gc -c /var/cache/knot-resolver -d 60000

Thanks for providing these packages. We started directing some workload towards this version and will share our experiences once we see what happens when we are about to reach cache.size. In the meantime, here two minor suggestion for improvement with regards to your repo setup instructions at [1]: - I would find it more trustworthy if the signing key would be fetched from knot-resolver.cz (instead of opensuse.org). (when installing the key we went through the steps to find the same key on your domain which took a bit of time) - since the repo supports https maybe replace http with https URLs [1] https://software.opensuse.org//download.html?project=home%3ACZ-NIC%3Aknot-r…

Petr Špaček

17 Kvě 17 Kvě

12:21

New subject: cache filesystem size and setting a cache size limit

On 16. 05. 19 21:07, Christoph wrote:

...

Thanks, we will try to improve this, possibly in a more radical way. -- Petr Špaček @ CZ.NIC

Christoph

20 Kvě 20 Kvě

15:50

New subject: kr_cache_gc systemd service file

Tomas Krizek:

...

On 11/05/2019 13.22, Christoph wrote:

Testing it would tremendously help to expedite the process because garbage collection very dependent on the deployment, so first user is what we exactly need at the moment!

yes, we can test if there is a debian repository for it (otherwise we would wait for the releases reaching your stable repo) .

Your package does not appear to include a systemd service file for this daemon. Will this do fine? -- [Unit] Description=Knot Resolver Garbage Collector daemon [Service] Type=notify WorkingDirectory=/var/cache/knot-resolver ExecStart=/usr/sbin/kr_cache_gc -c /var/cache/knot-resolver -d 60000 User=knot-resolver Restart=on-abnormal [Install] WantedBy=kresd.target --

...

[knot-resolver-testing] - https://build.opensuse.org/package/show/home:CZ-NIC:knot-resolver-testing/k… [PR!817] - https://gitlab.labs.nic.cz/knot/knot-resolver/merge_requests/817

Tomas Krizek

16:45

New subject: kr_cache_gc systemd service file

On 20/05/2019 15.50, Christoph wrote:>> To run the GC as a daemon that checks the cache in

...

/var/cache/knot-resolver every minute, use: kr_cache_gc -c /var/cache/knot-resolver -d 60000

Your package does not appear to include a systemd service file for this daemon.

I plan to add a systemd unit file before releasing the GC officially. It's not yet clear whether GC will have a separate service file or whether it'll be part of some periodic maintenance service which might do other thing in the future.

...

Will this do fine? -- [Unit] Description=Knot Resolver Garbage Collector daemon [Service] Type=notify WorkingDirectory=/var/cache/knot-resolver ExecStart=/usr/sbin/kr_cache_gc -c /var/cache/knot-resolver -d 60000 User=knot-resolver Restart=on-abnormal [Install] WantedBy=kresd.target --

Type=simple should work better, otherwise it looks fine -- Tomas Krizek PGP: 4A8B A48C 2AED 933B D495 C509 A1FB A5F7 EF8C 4869

Christoph

19:27

New subject: kr_cache_gc systemd service file

...

Type=simple should work better, otherwise it looks fine

Thanks, since the gc daemon runs the cache size stopped growing, so I assume it works as intended so far. Actual current cache size was above 90% of cache.size when starting the daemon. There are kr_cache_gc syslog messages that say it deleted records but the cache size did not actually decrease when looking at the cache filesize.

Vladimír Čunát

20:00

New subject: kr_cache_gc systemd service file

On 5/20/19 7:27 PM, Christoph wrote:

...

There are kr_cache_gc syslog messages that say it deleted records but the cache size did not actually decrease when looking at the cache filesize.

Well, that's a property of LMDB. When deletions are done, it does not punch holes in the underlying file (or not in a large extend from my observation), so `df` or `du` commands will no longer reflect real usage of cache space. --Vladimir

Christoph

20:05

New subject: kr_cache_gc systemd service file

Vladimír Čunát:

...

On 5/20/19 7:27 PM, Christoph wrote:

There are kr_cache_gc syslog messages that say it deleted records but the cache size did not actually decrease when looking at the cache filesize.

Thanks for confirming. I somewhat assumed it because the documentation says you can not reduce the LMDB size.

2235

days inactive

2250

days old

knot-resolver-users@lists.nic.cz

Manage subscription

16 comments

4 participants

tags (0)

participants (4)

Christoph
Petr Špaček
Tomas Krizek
Vladimír Čunát