Hi,
first - please try to use more descriptive e-mail subjects. It helps
others to find solutions to same/similar issues in the future.
On 12/12/2019 14.29, Milan Jeskynka Kazatel wrote:> I´m still facing the
service kresd@1 crashes without any obvious reasons.
Today I did a second try to upgrade to Knot Resover to
version 4.2.2 and the
upgrade seems to be ok, service can start without any difficulties.
The latest released version is 4.3.0. Before any further debugging,
please ensure you're using the latest version. EPEL repositories lag
behind the upstream releases, but there's usually an update waiting
shortly after our upstream release. You can install it using:
yum update knot-resolver --enablerepo epel-testing
Alternately, you can use our upstream package repositories to get the
updates right as they're released:
https://www.knot-resolver.cz/download/
It runs
as expected more than 3,5 hour, but unfortunately, it starts to write in the
log the same messages as was reported in my previous post and the service
get restart by itself.
The auto-restart is a systemd feature we're using to recover from
crashes/failures. It's preferable to a dead service.
However, it'd be interesting to find out the cause of these crashes.
Could you explore the errors in journal and post the output?
journalctl -u kresd@1 -p notice --since -2w
Every restarts couse a new sevice PID in /var/cache/
knot-resolver/tty, the old one was not correctly finished
This is an unfortunate state of things in CentOS 7 right now. We have a
solution for it in an upcoming 5.0 release. Each instance will have
exactly one deterministic control socket.
and the whole
operating system goes to a visible slowdown.
I don't see how knot-resolver crash under systemd would cause any
slowdown. Do you have any evidence of that? Are there any hanging kresd
process in ps, which weren't correctly terminated? What system resources
are they using?
I don´t know how to do an
exact sevice crashdump file, but I can provide any log messages if needed.
If the crashes keep happening after upgrade to 4.3.0 and the journal
messages don't help with debugging, this is how I managed to turn on
coredump collection on CentOS 7:
1. install debugsymbols
$ debuginfo-install knot knot-resolver luajit
2. create /etc/sysctl.d/50-core.conf with the following content:
kernel.core_pattern=|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
3. modify/uncomment the following parameters in /etc/systemd/system.conf
DumpCore=yes
DefaultLimitCORE=infinity
4. reboot
Please refer to man systemd-coredump for more details.
The next time kresd crashes, there should be a PID in
$ coredump list
which can be used to display some information about the coredump:
$ coredump info $PID
Even the stack trace could helps us track the root of the issue. If you
believe you've found a security issue, please report it via a
*confidential* issue at
https://gitlab.labs.nic.cz/knot/knot-resolver/issues or to
knot-resolver(a)labs.nic.cz (non-public list).
Thanks!
--
Tomas Krizek
PGP: 4A8B A48C 2AED 933B D495 C509 A1FB A5F7 EF8C 4869