Skip to content
Snippets Groups Projects
Forked from Lehrstuhl für Informatik 4 (Systemsoftware) / manycore / emper
Source project has a limited visibility.
  • Florian Fischer's avatar
    4ec30fd4
    implement a pipe based sleep strategy using the IO subsystem · 4ec30fd4
    Florian Fischer authored
    Design goals
    ============
    
    * Wakeup either on external newWork notifications or on local IO completions
      -> Sleep strategy is sound without the IO completer
    * Do as less as possible in a system saturated with work
    * Pass a hint where to find new work to suspended workers
    
    Algorithm
    =========
    
    Data:
    	Global:
    		hint pipe
    		sleepers count
    	Per worker:
    		dispatch hint buffer
    		in flight flag
    
    Sleep:
    	if we have no sleep request in flight
    		Atomic increment sleep count
    		Remember that we are sleeping
    		Prepare read cqe from the hint pipe to dispatch hint buffer
    	Prevent the completer from reaping completions on this worker's IoContext
    	Wait until IO completions occurred
    
    NotifyEmper(n):
    	if observed sleepers <= 0
    		return
    
    	// Determine how many we are responsible to wake
    	do
    		toWakeup = min(observed sleepers, n)
    	while (!CAS(sleepers, toWakeup))
    
    	write toWakeup hints to the hint pipe
    
    NotifyAnywhere(n):
    	// Ensure all n notifications take effect
    	while (!CAS(sleepers, observed sleepers - n))
    		if observed sleeping <= -n
    			return
    
    	toWakeup = min(observed sleeping, n)
    	write toWakeup hints to the hint pipe
    
    onNewWorkCompletion:
    	reset in flight flag
    	allow completer to reap completions on this IoContext
    
    Notes
    =====
    
    * We must decrement the sleepers count on the notifier side to
      prevent multiple notifiers to observe all the same amount of sleepers,
      trying to wake up the same sleepers by writing to the pipe and jamming it up
      with unconsumed hints and thus blocking in the notify write resulting
      in a deadlock.
    * The CAS loops on the notifier side are needed because decrementing
      and incrementing the excess is racy: Two notifier can observe the
      sum of both their excess decrement and increment to much resulting in a
      broken counter.
    * Add the dispatch hint code in AbstractWorkStealingScheduler::nextFiber.
      This allows workers to check the dispatch hint after there
      where no local work to execute.
      This is a trade-off where we trade slower wakeup - a just awoken worker
      will check for local work - against a faster dispatch hot path when
      we have work to do in our local WSQ.
    * The completer tread must not reap completions on the IoContexts of
      sleeping workers because this introduces a race for cqes and a possible
      lost wakeup if the completer consumes the completions before the worker
      is actually waiting for them.
    * When notifying sleeping workers from anywhere we must ensure that all
      notifications take effect. This is needed for example when terminating
      the runtime to prevent sleep attempt from worker thread which are
      about to sleep but have not incremented the sleeper count yet.
      We achieve this by always decrementing the sleeper count by the notification
      count.
    
    Thanks to Florian Schmaus <flow@cs.fau.de> for spotting bugs and suggesting
    improvements.
    4ec30fd4
    History
    implement a pipe based sleep strategy using the IO subsystem
    Florian Fischer authored
    Design goals
    ============
    
    * Wakeup either on external newWork notifications or on local IO completions
      -> Sleep strategy is sound without the IO completer
    * Do as less as possible in a system saturated with work
    * Pass a hint where to find new work to suspended workers
    
    Algorithm
    =========
    
    Data:
    	Global:
    		hint pipe
    		sleepers count
    	Per worker:
    		dispatch hint buffer
    		in flight flag
    
    Sleep:
    	if we have no sleep request in flight
    		Atomic increment sleep count
    		Remember that we are sleeping
    		Prepare read cqe from the hint pipe to dispatch hint buffer
    	Prevent the completer from reaping completions on this worker's IoContext
    	Wait until IO completions occurred
    
    NotifyEmper(n):
    	if observed sleepers <= 0
    		return
    
    	// Determine how many we are responsible to wake
    	do
    		toWakeup = min(observed sleepers, n)
    	while (!CAS(sleepers, toWakeup))
    
    	write toWakeup hints to the hint pipe
    
    NotifyAnywhere(n):
    	// Ensure all n notifications take effect
    	while (!CAS(sleepers, observed sleepers - n))
    		if observed sleeping <= -n
    			return
    
    	toWakeup = min(observed sleeping, n)
    	write toWakeup hints to the hint pipe
    
    onNewWorkCompletion:
    	reset in flight flag
    	allow completer to reap completions on this IoContext
    
    Notes
    =====
    
    * We must decrement the sleepers count on the notifier side to
      prevent multiple notifiers to observe all the same amount of sleepers,
      trying to wake up the same sleepers by writing to the pipe and jamming it up
      with unconsumed hints and thus blocking in the notify write resulting
      in a deadlock.
    * The CAS loops on the notifier side are needed because decrementing
      and incrementing the excess is racy: Two notifier can observe the
      sum of both their excess decrement and increment to much resulting in a
      broken counter.
    * Add the dispatch hint code in AbstractWorkStealingScheduler::nextFiber.
      This allows workers to check the dispatch hint after there
      where no local work to execute.
      This is a trade-off where we trade slower wakeup - a just awoken worker
      will check for local work - against a faster dispatch hot path when
      we have work to do in our local WSQ.
    * The completer tread must not reap completions on the IoContexts of
      sleeping workers because this introduces a race for cqes and a possible
      lost wakeup if the completer consumes the completions before the worker
      is actually waiting for them.
    * When notifying sleeping workers from anywhere we must ensure that all
      notifications take effect. This is needed for example when terminating
      the runtime to prevent sleep attempt from worker thread which are
      about to sleep but have not incremented the sleeper count yet.
      We achieve this by always decrementing the sleeper count by the notification
      count.
    
    Thanks to Florian Schmaus <flow@cs.fau.de> for spotting bugs and suggesting
    improvements.