Thank you for your reply. 

OS:
# uname -a
Linux dns-cache-2 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux

knot-resolver:

# dpkg-query -l | grep knot
ii  knot-resolver                  5.1.0-1                             armhf        caching, DNSSEC-validating DNS resolver
ii  knot-resolver-module-http      5.1.0-1                             all          HTTP/2 module for Knot Resolver
ii  knot-resolver-release          1.7-1                               all          Knot Resolver official upstream repositories
ii  libknot10:armhf                2.9.4-1                             armhf        DNS shared library from Knot DNS
ii  libknot8:armhf                 2.7.6-2                             armhf        Authoritative domain name server (shared library)

source point to:

Hope it is not an issue to run this build on raspbian OS.

I will try to modify kres-cache-gc.service as you suggest and let you know, but this will take a few days.

Just FYI
we are running the same setup on two other servers but OS is CentOS 7 (64bit ) and same version of knot-resolver 5.1.0 and it works just fine.

pá 5. 6. 2020 v 14:28 odesílatel Tomas Krizek <tomas.krizek@nic.cz> napsal:
Hi,

thanks for the detailed report.

On 05/06/2020 13.46, Vladimír Čunát wrote:
> Increasing your tmpfs size would probably avoid the
> out-of-space, but I'm way more concerned about the GC not working well
> and those SIGSEGVs (80% is way too low to cause problems).

I think there might be some issue with the usage calculation in GC.
Perhaps you could try to lower the limit to trigger GC to something like
60% (down from the default 80%), and see if the issue persists? It can
be done by passing "-u 60" when launching kres-cache-gc.

On 05/06/2020 09.29, Petr Kyselák wrote:
> Jun  4 16:32:05 dns-cache-2 kres-cache-gc[548]: Cache analyzed in 1.41
> secs, 1038386 records, limit category is 100.

This shouldn't be related to the issue, but 1.4 secs is a long time to
analyze the cache, considering there's just 1 second delay before it is
triggered again. It's understandable given you're running it on
Raspberry Pi, but I'd consider changing the delay to something like 1
minute to prevent GC from consuming too much CPU. This can be done with
"-d 60000" argument to kres-cache-gc.


You can check how kres-cache-gc is executed from systemd by running:

$ systemctl cat kres-cache-gc.service | grep ExecStart

Then, you can override the arguments for the kres-cache-gc by creating a
systemd override for the kres-cache-gc.service unit:

$ systemctl edit kres-cache-gc.service

[Service]
ExecStart=<original_command> -u 60 -d 60000

And restart the kres-cache-gc.service. I'd be interested to know if the
issue persists with the lower GC limit.
--
Tomas Krizek
PGP: 4A8B A48C 2AED 933B D495  C509 A1FB A5F7 EF8C 4869