Hello,
as you already found out it is complicated ;-)
Linux kernel has its own magic algorithms to schedule work on
multi-core/multi-socket/NUMA machines and it DNS benchmarking also very
much depends on network card, its drivers etc.
If we were going to fine-tune your setup we would have to go into details:
What is your CPU architecture? Number of sockets, CPU in them etc.?
How is operating memory connected to CPUs?
Is it NUMA?
Do you have irqbalance enabled?
Have you somehow configured IRQ affinity?
What is your network card (how many IO queues it has)?
Did you configure network card queues and other driver settings explicitly?
etc.
Fine tunning always has to take into account your specific environment
and it is hard to provide general advice.
If you find specific reproducible problem please report it to our Gitlab:
https://gitlab.labs.nic.cz/knot/knot-dns/issues/
Please understand that amount of time and hardware we can allocate for
free support is limited. In case you require fine-tunning for your
specific deployment please consider byuing professional support:
https://www.knot-dns.cz/support/
Thank you for understanding.
Petr Špaček @ CZ.NIC
On 03. 04. 19 16:37, Sergey Petrov wrote:
I reverse the client and the server. So the server now
is 36-cores intel
box (72 HT-core)
Starting with small loads i see knot use lower cores except core-0.
When adding more load i see cores 0-17 AND 37-54 are used but not to
100% level. At maximum load i see all cores are about 100% used.
It seems to me as system scheduler feature. First it starts with lower
number cores, then add cores from second CPU socket and after all
HT-cores.
On Wed, 2019-04-03 at 12:53 +0300, Sergey Petrov wrote:
> On Wed, 2019-04-03 at 10:52 +0200, Petr Špaček wrote:
>> On 03. 04. 19 10:45, Sergey Petrov wrote:
>>> I perfoms benchmarks with knot-dns as a authoritative server and dnsperf
>>> as a workload client. Knot server has 32 cores. Interrupts from 10Gb
>>> network card are spreaded across all 32 cores. Knot configured with
>>> 64 udp-workers. Each knot thread assigned to one core. So there are at
>>> least two knot threads assigned to one core. Then i start dnsperf with
>>> command
>>>
>>> ./dnsperf -s 10.0.0.4 -d out -n 20 -c 103 -T 64 -t 500 -S 1 -q 1000 -D
>>>
>>> htop on knot server shows 3-4 cores completly unused. Then i restart
>>> dnsperf unused cores are changes.
>>>
>>> That is the reason for unused core?
>>
>> Well, sometimes dnsperf is too slow :-)
>>
>> I recommend to check this:
>> - Make sure dnsperf ("source machine") is not 100 % utilized.
>> - Try to increase number of sockets used by dnsperf, i.e. -c parameter.
>> I would try also values like 500 and 1000 to see if it makes any
>> difference. It might change results significantly because Linux kernel
>> is using hashes over some packet fields and low number of sockets might
>> result in uneven query distribution.
>>
>> Please let us know what are your new results.
>>
>
> The source machne is about 15% utilized.
>
> ./dnsperf -s 10.0.0.4 -d out -n 20 -c 512 -T 512 -t 500 -S 1 -q 1000 -D
>
> get us some performance penalty (260000 rps VS 310000 rps) and more even
> distribution across all cores with 100% usages of all eight cores on
> last CPU socket. While other CPU socket cores are aproximately 60%
> loaded.
>
> Using "-c 1000 -T 1000" parameters of dnsperf i see practicaly the same
> core load distribution and even more performance penalty.
>
> Using "-c 16 -T 16" parameters i see 14 0% utilized cores, 16 100%
> utilized cores and 2 50% utilized cores with about 300000 rps
>
> The question is that prevents knot thread on 0% used core to serve a
> packet arrived with IRQ bounded to another core? May be you have some
> developer guide can answer this question?
>