This prevents a trampling herd effect on the AnywhereQueue and the workerSleep mutex.
On workloads where not all workers are busy it greatly reduces used CPU time because not all workers wake up just to sleep again.
For hight intensity workloads when al workers are busy handling there own work this change should not have any impact because then sleeping workers are rare.
This claim is backed by the experiments I did on faui49big02 (40 cores / 80 hw threads). I measured the time and resources used by our tests/Echoserver handling incremental amount of connections (one connection per client process, each connection issued 100000 echos)
con 100k echos time[ns] user-time[s] sys-time[s] cpu-used
notify_all 1 49008685297 217.86 626.26 1650%
notify_one 1 31304750273 9.40 8.33 53%
notify_all 10 76487793595 665.45 1295.19 2484%
notify_one 10 35674140605 188.77 68.26 656%
...
notify_all 40 102469333659 4255.30 363.86 4399%
notify_one 40 105289161995 4167.43 322.69 4169%
notify_all 80 76883202092 3418.44 409.64 4762%
notify_one 80 68856748614 2878.56 397.66 4548%
Although I would not absolutely trust the numbers because there are from only one run and quit a bit of randomness is inherent to emper because of the work stealing scheduling. Nonetheless they show three interesting points:
Command used to generate results:
for i in 1 10 20 30 40 80; do /usr/bin/time -v build-release/tests/EchoServer 2>> notify_all.txt & sleep 2; tests/echo_client.py -p ${i} -c 1 -i 100000 >> notify_all.txt && echo "quit" | nc localhost 12345; done
Full results can be found here:
notify_all: https://termbin.com/6zba
notify_one: https://termbin.com/3bsi