Skip to content

[Runtime] notify only one sleeping worker on new work

Florian Fischer requested to merge notify_one_sleeping_worker into master

This prevents a trampling herd effect on the AnywhereQueue and the workerSleep mutex.

On workloads where not all workers are busy it greatly reduces used CPU time because not all workers wake up just to sleep again.

For hight intensity workloads when al workers are busy handling there own work this change should not have any impact because then sleeping workers are rare.

This claim is backed by the experiments I did on faui49big02 (40 cores / 80 hw threads). I measured the time and resources used by our tests/Echoserver handling incremental amount of connections (one connection per client process, each connection issued 100000 echos)

           con    100k echos time[ns]    user-time[s]    sys-time[s]     cpu-used
notify_all   1            49008685297          217.86         626.26        1650%
notify_one   1            31304750273            9.40           8.33          53%

notify_all  10            76487793595          665.45        1295.19        2484%
notify_one  10            35674140605          188.77          68.26         656%

...

notify_all  40           102469333659         4255.30         363.86        4399%
notify_one  40           105289161995         4167.43         322.69        4169%

notify_all  80            76883202092         3418.44         409.64        4762%
notify_one  80            68856748614         2878.56         397.66        4548%

Although I would not absolutely trust the numbers because there are from only one run and quit a bit of randomness is inherent to emper because of the work stealing scheduling. Nonetheless they show three interesting points:

  1. CPU usage for low intensity workloads is drastically reduced.
  2. Impact of notify_one get smaller the more intense the workload gets.
  3. Somehow emper performs significantly worse for 40 than for 80 connections

Command used to generate results:

for i in 1 10 20 30 40 80; do /usr/bin/time -v build-release/tests/EchoServer 2>> notify_all.txt & sleep 2; tests/echo_client.py -p ${i} -c 1 -i 100000 >> notify_all.txt && echo "quit" | nc localhost 12345; done

Full results can be found here:
notify_all: https://termbin.com/6zba
notify_one: https://termbin.com/3bsi

Edited by Florian Fischer

Merge request reports