Hi Thomas,
It looks like /var/lib/knot/keys/keys/24991855db1c06b29b9e78ef8d6fddf7c5f07cd5.pem
doesn't exist.
What does keymgr say about this key?
Daniel
On 12/9/20 6:35 PM, Thomas E. wrote:
Hi Libor,
you are right, I now started strace with the "-f" option. Now I have the
full trace. Sorry, working with strace has been a long time ago...
I think this is interesting:
This key is written to the backup:
[pid 9783] openat(AT_FDCWD,
"/var/lib/knot/keys/keys/087cc573318e070befff1d9cbcf07e3b5cf5444d.pem",
O_RDONLY) = 23
[pid 9783] lseek(23, 0, SEEK_END) = 1704
[pid 9783] lseek(23, 0, SEEK_SET) = 0
[pid 9783] read(23, "-----BEGIN PRIVATE KEY-----\nMIIE"..., 1704) = 1704
[pid 9783] close(23) = 0
[pid 9783] openat(AT_FDCWD,
"/tmp/backup/keys/keys/087cc573318e070befff1d9cbcf07e3b5cf5444d.pem",
O_WRONLY|O_CREAT|O_EXCL, 0640) = 23
[pid 9783] write(23, "-----BEGIN PRIVATE KEY-----\nMIIE"..., 1704) = 1704
[pid 9783] close(23) = 0
This key has not been written:
[pid 9783] openat(AT_FDCWD,
"/var/lib/knot/keys/keys/24991855db1c06b29b9e78ef8d6fddf7c5f07cd5.pem",
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 9783] munmap(0x7f53c042a000, 1142784) = 0
[pid 9783] close(28) = 0
[pid 9783] close(27) = 0
[pid 9783] munmap(0x7f5909418000, 8192) = 0
[pid 9783] close(26) = 0
[pid 9783] close(24) = 0
[pid 9783] unlink("/tmp/backup/knot.backup.lockfile") =
Thanks a lot,
Thomas
On 09.12.20 17:49, libor.peltan wrote:
Thomas, don't you call strace in a way which
only tracks one thread of
knotd process? Knot is a multithreaded application and the backup is
done by different threads than the one handling knotc requests. It's
still visible that the straces you provided so far don't cover any key
file copying etc...
I'm not very surprised that on another machine the backup works. This is
a nasty issue and therefore we need your help with debugging it on the
exact environment it fails on.
Thanks!
Dne 09. 12. 20 v 17:17 Thomas E. napsal(a):
> Hi Libor,
>
>
> I performed the same actions on a test machine with 2 test zones. Same
> Knot version, same OS.
>
> root@signer-test:/var/lib/knot/keys/keys# strace -p 3424
>
>
> strace: Process 3424 attached
> restart_syscall(<... resuming interrupted poll ...>) = 1
> accept4(13, NULL, NULL, SOCK_NONBLOCK) = 14
> recvmsg(14, {msg_namelen=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EAGAIN
> (Resource temporarily unavailable)
> poll([{fd=14, events=POLLIN}], 1, 5000) = 1 ([{fd=14, revents=POLLIN}])
> recvmsg(14, {msg_name=NULL, msg_namelen=0,
> msg_iov=[{iov_base="\1\20\0\vzone-backup\21\0\0\32\0\v/tmp/backup"...,
> iov_len=262144}], msg_iovlen=1, msg_controllen=0, msg_flags=0},
> MSG_DONTWAIT|MSG_NOSIGNAL) = 37
> write(18, "2020-12-09T16:13:06+0000 info: c"..., 71) = 71
> stat("/var/lib/knot/catalog/data.mdb", 0x7ffd37b72bb0) = -1 ENOENT (No
> such file or directory)
> stat("/var/lib/knot/journal/data.mdb", {st_mode=S_IFREG|0660,
> st_size=94208, ...}) = 0
> stat("/var/lib/knot/keys/data.mdb", {st_mode=S_IFREG|0660,
> st_size=45056, ...}) = 0
> mkdir("/tmp/backup", 0770) = 0
> openat(AT_FDCWD, "/tmp/backup/knot.backup.lockfile",
> O_RDONLY|O_CREAT|O_EXCL, 0600) = 21
> mkdir("/tmp/backup", 0750) = -1 EEXIST (File exists)
> stat("/var/lib/knot/catalog/data.mdb", 0x7ffd37b72ac0) = -1 ENOENT (No
> such file or directory)
> futex(0x7ffd37b73400, FUTEX_WAKE_PRIVATE, 1) = 1
> futex(0x7ffd37b733b0, FUTEX_WAKE_PRIVATE, 1) = 1
> futex(0x7ffd37b73404, FUTEX_WAKE_PRIVATE, 1) = 1
> futex(0x7ffd37b733b0, FUTEX_WAKE_PRIVATE, 1) = 1
> sendmsg(14, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\3",
> iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL)
> = 1
> recvmsg(14, {msg_namelen=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EAGAIN
> (Resource temporarily unavailable)
> poll([{fd=14, events=POLLIN}], 1, 5000) = 1 ([{fd=14, revents=POLLIN}])
> recvmsg(14, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0",
> iov_len=262144}], msg_iovlen=1, msg_controllen=0, msg_flags=0},
> MSG_DONTWAIT|MSG_NOSIGNAL) = 1
>
> close(14) = 0
> poll([{fd=13, events=POLLIN}], 1, -1
>
>
> 2020-12-09T16:13:06+0000 info: control, received command 'zone-backup'
> 2020-12-09T16:13:06+0000 info: [xx.de.] zone backed up to '/tmp/backup'
> 2020-12-09T16:13:06+0000 info: [yy.de.] zone backed up to '/tmp/backup'
>
> All 4 keys have been successfully backed up.
>
>
>
> Thanks,
> Thomas
>
>
> On 09.12.20 16:05, libor.peltan wrote:
>> Some explanation: zone backup or restore is performed as a zone
"event".
>> When zone events are invoked via the knotc interface, it typically works
>> in the way that the event is planned to be performed later and "OK" is
>> returned just as an info that the command has been registered. The event
>> (zone backup in this case) happens per-zone and with some delay, when
>> some background-worker becomes available, and the outcome appears in the
>> log. On the contrary, if knotc is invoked with `-b` parameter, it waits
>> for the event to be performed synchronously, and collects the outcome of
>> the whole process (some error code handligs of this are currently
>> subject to ongoing improvements by my colleagues).
>>
>> So, from the strace you sent me, I only see that some initialization has
>> been done and that the early "OK" has been returned, but there are no
>> traits of the backup process as such :(
>>
>> Could you please collect somewhat longer strace, by watching when the
>> backup finishes in the logfile?
>>
>> Thanks!
>>
>> Libor
>>
>> Dne 09. 12. 20 v 15:56 Thomas E. napsal(a):
>>> Hi Libor,
>>>
>>> yes, the backup command was executed and confirmed with "OK":
>>>
>>> root@signer-0:/# knotc zone-backup +backupdir /tmp/backup
>>> OK
>>>
>>> I did it again and the output is the same.
>>>
>>> Thanks,
>>> Thomas
>>>
>>> On 09.12.20 15:42, libor.peltan wrote:
>>>> Hi Thomas,
>>>>
>>>> are you sure this strace has been collected while the issue was
>>>> reproduced and half of the keys copied again?
>>>>
>>>> Maybe you just have deattached strace too early(?).
>>>>
>>>> Libor
>>>>
>>>> Dne 09. 12. 20 v 15:00 Thomas E. napsal(a):
>>>>> Hi Libor,
>>>>>
>>>>> sure, here is the output:
>>>>>
>>>>>
>>>>> root@signer-0:/tmp/backup/keys/keys# strace -p 9781
>>>>> strace: Process 9781 attached
>>>>> restart_syscall(<... resuming interrupted restart_syscall ...>)
= 1
>>>>> accept4(16, NULL, NULL, SOCK_NONBLOCK) = 23
>>>>> recvmsg(23, {msg_namelen=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EAGAIN
>>>>> (Resource temporarily unavailable)
>>>>> poll([{fd=23, events=POLLIN}], 1, 5000) = 1 ([{fd=23,
>>>>> revents=POLLIN}])
>>>>> recvmsg(23, {msg_name=NULL, msg_namelen=0,
>>>>>
msg_iov=[{iov_base="\1\20\0\vzone-backup\21\0\0\32\0\v/tmp/backup"...,
>>>>>
>>>>> iov_len=262144}], msg_iovlen=1, msg_controllen=0, msg_flags=0},
>>>>> MSG_DONTWAIT|MSG_NOSIGNAL) = 37
>>>>> write(6, "2020-12-09T13:57:12+0000 info: c"..., 71) = 71
>>>>> stat("/var/lib/knot/catalog/data.mdb", 0x7ffd0fb9f090) = -1
ENOENT
>>>>> (No
>>>>> such file or directory)
>>>>> stat("/var/lib/knot/journal/data.mdb",
{st_mode=S_IFREG|0660,
>>>>> st_size=9936896, ...}) = 0
>>>>> stat("/var/lib/knot/keys/data.mdb", {st_mode=S_IFREG|0660,
>>>>> st_size=45056, ...}) = 0
>>>>> mkdir("/tmp/backup", 0770) = 0
>>>>> openat(AT_FDCWD, "/tmp/backup/knot.backup.lockfile",
>>>>> O_RDONLY|O_CREAT|O_EXCL, 0600) = 24
>>>>> mkdir("/tmp/backup", 0750) = -1 EEXIST (File
exists)
>>>>> stat("/var/lib/knot/catalog/data.mdb", 0x7ffd0fb9efa0) = -1
ENOENT
>>>>> (No
>>>>> such file or directory)
>>>>> futex(0x7ffd0fb9f8e0, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>> futex(0x7ffd0fb9f890, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>> futex(0x7ffd0fb9f8e4, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>> futex(0x7ffd0fb9f890, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>> sendmsg(23, {msg_name=NULL, msg_namelen=0,
msg_iov=[{iov_base="\3",
>>>>> iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0},
>>>>> MSG_NOSIGNAL) = 1
>>>>> recvmsg(23, {msg_namelen=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EAGAIN
>>>>> (Resource temporarily unavailable)
>>>>> poll([{fd=23, events=POLLIN}], 1, 5000) = 1 ([{fd=23,
>>>>> revents=POLLIN}])
>>>>> recvmsg(23, {msg_name=NULL, msg_namelen=0,
msg_iov=[{iov_base="\0",
>>>>> iov_len=262144}], msg_iovlen=1, msg_controllen=0, msg_flags=0},
>>>>> MSG_DONTWAIT|MSG_NOSIGNAL) = 1
>>>>> close(23) = 0
>>>>> poll([{fd=16, events=POLLIN}], 1, -1
>>>>>
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>>
>>>>> On 09.12.20 13:34, libor.peltan wrote:
>>>>>> Hi Thomas,
>>>>>>
>>>>>> would you try attaching `strace` to the running Knot and try the
>>>>>> backup
>>>>>> again, in order to see if some syscall(s) return error codes?
>>>>>>
>>>>>> Thanks much!
>>>>>>
>>>>>> Libor
>>>>>>
>>>>>> Dne 09. 12. 20 v 12:25 Thomas E. napsal(a):
>>>>>>> Hi Libor,
>>>>>>>
>>>>>>>
>>>>>>> # knotc -V
>>>>>>> knotc (Knot DNS), version 3.0.2
>>>>>>>
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>> On 09.12.20 12:16, libor.peltan wrote:
>>>>>>>> Hi Thomas,
>>>>>>>>
>>>>>>>> aren't you using Knot version 3.0.0 (thus lower than
3.0.1) ?
>>>>>>>>
>>>>>>>> Libor
>>>>>>>>
>>>>>>>> Dne 09. 12. 20 v 11:39 Thomas E. napsal(a):
>>>>>>>>> Hi Libor,
>>>>>>>>>
>>>>>>>>> I deleted the backup directory a few times, but I
receive the
>>>>>>>>> same
>>>>>>>>> error
>>>>>>>>> every time:
>>>>>>>>>
>>>>>>>>> 2020-12-09T10:31:52+0000 info: control, received
command
>>>>>>>>> 'zone-backup'
>>>>>>>>> 2020-12-09T10:31:52+0000 warning: [xxx.] zone backup
failed (not
>>>>>>>>> exists)
>>>>>>>>> 2020-12-09T10:31:52+0000 error: [xxx.] zone event
>>>>>>>>> 'backup/restore'
>>>>>>>>> failed (not exists)
>>>>>>>>> 2020-12-09T10:31:52+0000 warning: [yyy.] zone backup
failed (not
>>>>>>>>> exists)
>>>>>>>>> 2020-12-09T10:31:52+0000 error: [yyy.] zone event
>>>>>>>>> 'backup/restore'
>>>>>>>>> failed (not exists)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> One of the keys is the ZSK of zone xxx and the oterh
the KSKS of
>>>>>>>>> zone
>>>>>>>>> yyy.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> root@signer-0:/# keymgr xxx list human | grep
>>>>>>>>> 1fb3900b2e5ac72d30f927016ea4546ca561a5da
>>>>>>>>>
>>>>>>>>> 1fb3900b2e5ac72d30f927016ea4546ca561a5da ksk=no
zsk=yes
>>>>>>>>> tag=39969
>>>>>>>>> algorithm=7 size=1024 public-only=no pre-active=0
>>>>>>>>> publish=-1M3D17h49m4s ready=0 active=-1M3D17h49m4s
>>>>>>>>> retire-active=0
>>>>>>>>> retire=0 post-active=0 revoke=0 remove=0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> root@signer-0:/# keymgr yyy list human | grep
>>>>>>>>> 087cc573318e070befff1d9cbcf07e3b5cf5444d
>>>>>>>>>
>>>>>>>>> 087cc573318e070befff1d9cbcf07e3b5cf5444d ksk=yes
zsk=no
>>>>>>>>> tag=37419
>>>>>>>>> algorithm=7 size=2048 public-only=no pre-active=0
>>>>>>>>> publish=-1M3D18h11m7s ready=-1M3D18h11m7s
active=-26D12h49m4s
>>>>>>>>> retire-active=0 retire=0 post-active=0 revoke=0
remove=0
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Thomas
>>>>>>>>>
>>>>>>>>> On 09.12.20 11:15, libor.peltan wrote:
>>>>>>>>>> Hi Thomas,
>>>>>>>>>>
>>>>>>>>>> could you please try if this issue is
reproducible: if whenever
>>>>>>>>>> you
>>>>>>>>>> attempt the backup (to a fresh empty target
directory), it fails
>>>>>>>>>> with
>>>>>>>>>> "not exists"?
>>>>>>>>>>
>>>>>>>>>> Could you please check if the keys that happen to
make it to the
>>>>>>>>>> backup
>>>>>>>>>> belong to the same zone, or that it's one
from each zone? (You
>>>>>>>>>> light use
>>>>>>>>>> `keymgr list` to check which key ID belongs to
which zone.)
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Libor
>>>>>>>>>>
>>>>>>>>>> Dne 08. 12. 20 v 21:36 Thomas napsal(a):
>>>>>>>>>>> Hi Libor,
>>>>>>>>>>>
>>>>>>>>>>> sorry, I was really too unspecific.
>>>>>>>>>>>
>>>>>>>>>>> I'm hosting 2 zones. These 4 keys are on
the production
>>>>>>>>>>> machine:
>>>>>>>>>>>
>>>>>>>>>>> root@signer-0:/var/lib/knot/keys/keys# ls
-alh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -rw-r----- 1 knot knot 1,7K Nov 5 16:22
>>>>>>>>>>> 087cc573318e070befff1d9cbcf07e3b5cf5444d.pem
>>>>>>>>>>> -rw-r----- 1 knot knot 916 Nov 5 16:44
>>>>>>>>>>> 1fb3900b2e5ac72d30f927016ea4546ca561a5da.pem
>>>>>>>>>>> -rw-r----- 1 knot knot 916 Nov 5 16:22
>>>>>>>>>>> 6ebb8eb3ec2ddaf150119b4bc11b47dcec91621a.pem
>>>>>>>>>>> -rw-r----- 1 knot knot 1,7K Nov 5 16:44
>>>>>>>>>>> d7e47e2909f4d5947d8fb8684cb79ed06feb4b0a.pem
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Performing a backup with the following
command:
>>>>>>>>>>>
>>>>>>>>>>> # knotc zone-backup +backupdir /tmp/backup
>>>>>>>>>>>
>>>>>>>>>>> Backup directory after performing the backup
shows:
>>>>>>>>>>>
>>>>>>>>>>> root@signer-0:/tmp/backup/keys/keys# ls -ahl
>>>>>>>>>>>
>>>>>>>>>>> -rw-r----- 1 knot knot 1,7K Dez 8 20:21
>>>>>>>>>>> 087cc573318e070befff1d9cbcf07e3b5cf5444d.pem
>>>>>>>>>>> -rw-r----- 1 knot knot 916 Dez 8 20:21
>>>>>>>>>>> 1fb3900b2e5ac72d30f927016ea4546ca561a5da.pem
>>>>>>>>>>>
>>>>>>>>>>> 2 keys are missing.
>>>>>>>>>>>
>>>>>>>>>>> Hhmm ok, there is an error in the log:
>>>>>>>>>>>
>>>>>>>>>>> 2020-12-08T20:26:43+0000 info: control,
received command
>>>>>>>>>>> 'zone-backup'
>>>>>>>>>>> 2020-12-08T20:26:43+0000 warning: [xxx.] zone
backup failed
>>>>>>>>>>> (not
>>>>>>>>>>> exists)
>>>>>>>>>>> 2020-12-08T20:26:43+0000 error: [xxx.] zone
event
>>>>>>>>>>> 'backup/restore'
>>>>>>>>>>> failed (not exists)
>>>>>>>>>>> 2020-12-08T20:26:43+0000 warning: [yyy.] zone
backup failed
>>>>>>>>>>> (not
>>>>>>>>>>> exists)
>>>>>>>>>>> 2020-12-08T20:26:43+0000 error: [yyy.] zone
event
>>>>>>>>>>> 'backup/restore'
>>>>>>>>>>> failed (not exists)
>>>>>>>>>>>
>>>>>>>>>>> I'm using the latest knot version.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>>
>>>>>>>>>>> Thomas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Am 08.12.20 um 16:56 schrieb libor.peltan:
>>>>>>>>>>>> Hi Thomas,
>>>>>>>>>>>>
>>>>>>>>>>>> could you be more specific about
"half of private keys were
>>>>>>>>>>>> in the
>>>>>>>>>>>> backup" ? How many were, how many
weren't, and was there some
>>>>>>>>>>>> obvious
>>>>>>>>>>>> difference between them?
>>>>>>>>>>>>
>>>>>>>>>>>> Could you share the log snippets covering
the backup and the
>>>>>>>>>>>> restore
>>>>>>>>>>>> procedures?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Libor
>>>>>>>>>>>>
>>>>>>>>>>>> Dne 08. 12. 20 v 16:48 Thomas E.
napsal(a):
>>>>>>>>>>>>> Hi (again),
>>>>>>>>>>>>>
>>>>>>>>>>>>> I was trying to backup and restore a
server with the new
>>>>>>>>>>>>> knotc
>>>>>>>>>>>>> zone-backup/restore command.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I recognized that only half of the
private keys were in the
>>>>>>>>>>>>> backup,
>>>>>>>>>>>>> which leads to an error:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2020-12-08T14:44:00+0100 error:
[xxx.] DNSSEC, failed to load
>>>>>>>>>>>>> private
>>>>>>>>>>>>> keys (not exists)
>>>>>>>>>>>>> 2020-12-08T14:44:00+0100 error:
[xxx.] DNSSEC, failed to load
>>>>>>>>>>>>> keys
>>>>>>>>>>>>> (not
>>>>>>>>>>>>> exists)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Shouldn't the backup contain all
private keys?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Thomas