implement a pipe based sleep strategy using the IO subsystem
- Wakeup either on external newWork notifications or on local IO completions -> Sleep strategy is sound without the IO completer
- Do as less as possible in a system saturated with work
- Pass a hint where to find new work to suspended workers
Data: Global: hint pipe sleepers count Per worker: dispatch hint buffer in flight flag Sleep: if we have no sleep request in flight Atomic increment sleep count Remember that we are sleeping Prepare read cqe from the hint pipe to dispatch hint buffer Prevent the completer from reaping completions on this worker's IoContext Wait until IO completions occurred NotifyEmper(n): if observed sleepers <= 0 return // Determine how many we are responsible to wake do toWakeup = min(observed sleepers, n) while (!CAS(sleepers, toWakeup)) write toWakeup hints to the hint pipe NotifyAnywhere(n): // Ensure all n notifications take effect while (!CAS(sleepers, observed sleepers - n)) if observed sleeping <= -n return toWakeup = min(observed sleeping, n) write toWakeup hints to the hint pipe onNewWorkCompletion: reset in flight flag allow completer to reap completions on this IoContext
- We must decrement the sleepers count on the notifier side to prevent multiple notifiers to observe all the same amount of sleepers, trying to wake up the same sleepers by writing to the pipe and jamming it up with unconsumed hints and thus blocking in the notify write resulting in a deadlock.
- The CAS loops on the notifier side are needed because decrementing and incrementing the excess is racy: Two notifier can observe the sum of both their excess decrement and increment to much resulting in a broken counter.
- Add the dispatch hint code in
AbstractWorkStealingScheduler::nextFiber. This allows workers to check the dispatch hint after there where no local work to execute. This is a trade-off where we trade slower wakeup - a just awoken worker will check for local work - against a faster dispatch hot path when we have work to do in our local WSQ.
- The completer tread must not reap completions on the IoContexts of sleeping workers because this introduces a race for cqes and a possible lost wakeup if the completer consumes the completions before the worker is actually waiting for them.
- When notifying sleeping workers from anywhere we must ensure that all notifications take effect. This is needed for example when terminating the runtime to prevent sleep attempt from worker thread which are about to sleep but have not incremented the sleeper count yet. We achieve this by always decrementing the sleeper count by the notification count.
Thanks to Florian Schmaus email@example.com for spotting bugs and suggesting improvements.