On 5/9/19 11:04 PM, Christoph wrote:
This time, kresd produced the following logs when it
reached
cache.size (the tmpfs still had lots of free space).
22:00:15 kresd[11750]: [cache] MDB_BAD_TXN, probably overfull
22:00:15 kresd[11750]: [cache] clearing error, falling back
22:00:15 kresd[11750]: [cache] MDB_BAD_TXN, probably overfull
22:00:15 kresd[11750]: [cache] clearing because overfull, ret = 0
When this happened kresd lost all its cache (this is an assumption
but at the time it happened the usage level of the tmpfs partition
dropped to 0 before starting to increase again at the usual rate).
Yes, that's been "normal" behavior of kresd, at least so far. I don't
think it's documented, but when the cache is full (i.e. fails a write
due to being full), the only coping is cache.clear() - in that state
typically lmdb isn't able to commit *any* kind of changes (even
clearing), so there's the "falling back" that removes the files and
starts a new cache.
We have a WIP on a "garbage-collecting" daemon that tries to remove data
that are estimated as less useful when the cache is getting large, but
so far typical deployments can afford setting the limit so large that it
only fills up very rarely.
I'm wondering if there is something special here
in our setup
with tmpfs? Is anyone else putting their cache on tmpfs with kresd
4.0.0? (and can reproduce this or also: can not reproduce this?)
I think it's actually typical to use tmpfs, at least for more "serious"
deployments. We certainly do that on our Turris routers (Omnia and
MOX), as the writes could kill the flash storage soon. I suppose
persisting cache across system restarts isn't too useful :-) and tmpfs
is enough for persisting across daemon restarts.
--Vladimir