Commit e6cc92f1 authored 4 years ago by Florian Fischer

[IO] fix the possible lost wakeup for the IoContext::cq_lock race

Our current naive try lock protecting a worker's IoContext's cq is racy.
This fact alone is no problem a try lock is by design racy in the sense
that two threads race who can take the lock.

The actual problem is:

While a worker is holding the lock additional completions could arrive
which the worker does not observe because it could be already finished
iterating the CQ.

In the case that the worker still holds the lock preventing the globalCompleter
from reaping the additional completions there exists a lost wakeup problem
possibly leading to a completely sleeping runtime with runnable completions
in a worker's IoContext.

To prevent this lost wakeup the cq_lock now counts the unsuccessful
lock attempts from the globalCompleter.

If a worker observes that the globalCompleter tried to reapCompletions
more than once we know that a lost wakeup could have occurred and we try to
reap again.
Observing one attempt is normal since we know the globalCompleter and the
worker owning the IoContext race for the cq_lock required to reap completions.

Additionally:

* Reduce the critical section in which the cq_lock is held by copying all
  seen cqes and completing the Futures after the lock was released.

* Don't immediately schedule blocked Fibers or Callbacks rather collect them
  an return them as batch. Maybe the caller knows better what to to with a
  batch of runnable Fibers

parent 19bd1fbf

No related branches found

No related tags found

1 merge request!117Fix reap completion race

Hide whitespace changes

Inline Side-by-side

Showing with 208 additions and 104 deletions

Please register or to comment