Le 4 avr. 2025 à 15:50, oui.mages_0w--- via knot-resolver-users <knot-resolver-users_at_lists_nic_cz_48qbhjm2vj0347_06aede6e@icloud.com> a écrit :Vladimir,--Huge progress!During the night, our VM still on v6.0.8 crashed. Thankfully, exabgp did its job and the anycast went directly to another VM.This time, the second VM reached a memleak and rebooted (and the third took over the time the second restarted).You can see the the RAM usage increasing quite fast and VM1, then VM2, and generally, it is cleared in time to prevent a crash, except around 6 am.<utf-8''Capture%20d%E2%80%99e%CC%81cran%202025%2D04%2D04%20a%CC%80%2015.25.18.png><utf-8''Capture%20d%E2%80%99e%CC%81cran%202025%2D04%2D04%20a%CC%80%2015.25.50.png>Now, the good news is that we have identified the source :)We don’t know what the queries were, because it was DoH, but one of our customer was using this: https://github.com/0xERR0R/blockyAs soon as he stopped using it, no more memleaks or sawtooth graphs.Sorry for the quick and dirty graph superposition below, but it shows the correlation:– the blue line is the the answer rate to this particular customer in pps,– in purple, it is the memory usage of the VM with knot resolver on it– before 10 am this VM was offline, so anything before is irrelevant.<utf-8''Capture%20d%E2%80%99e%CC%81cran%202025%2D04%2D04%20a%CC%80%2015.36.30.png>The customer stopped his blocky around 10:40 (I believe he might have restarted it briefly between 10:50 and 11:35).So in our case, blocky was the culprit behind the knot resolver memleaks.The other knot resolver users experiencing memleaks should look if any requests are coming from a blocky instance.My next step will be to upgrade this VM and confirm that there are no memleaks anymore even with version > 6.0.8 and libknot15.All the Best,GabrielLe 3 avr. 2025 à 11:43, Vladimír Čunát via knot-resolver-users <knot-resolver-users_at_lists_nic_cz_48qbhjm2vj0347_06aede6e@icloud.com> a écrit :--On 02/04/2025 23.19, oui.mages_0w@icloud.com wrote:
So knot-resolver 6.0.8 with libknot15 seems to also trigger the memory leak I was experiencing with knot-resolver 6.0.9+ by the unidentified traffic pattern (or whatever is causing this).Thanks, this is very interesting. I confirm that (for our Ubuntu 24.04 packages), libknot15 (i.e. knot 3.4) is used exactly since 6.0.9, so the timing checks out, too. That's just a matter of binary builds. Even the latest versions can still be built with libknot14 (3.3.x)
Have you looked into which libdnssec and libzscanner you have there? The thing is that these two didn't change soname between knot 3.3 and 3.4, so here I see larger risks than with libknot itself.