Commits · 7505df47e5d8891df4c8f5c25d98af0d4b40e010 · Lehrstuhl für Informatik 4 (Systemsoftware) / manycore / emper

Apr 24, 2022
- make sleep semaphore threshold configurable for mazstab · 0da9832c
  Florian Fischer authored 2 years ago
  
  0da9832c
Apr 10, 2022

Florian Fischer authored 2 years ago

When EMPER is build with -Dio_synchronous each Future will be
completed synchronously when calling Future::wait().

94a00a78

Mar 24, 2022
- [meson] Add build_only_emper_dep option · c119e77f
  Florian Schmaus authored 2 years ago
  
  c119e77f
Feb 28, 2022

Add stats_blocked_context(_count) · 2d43c259
Florian Schmaus authored 3 years ago
```
This further split up the stats machinery into smaller parts.
```
2d43c259

Add worker sleep stats, rework stats machinery · 46302a0f

Florian Schmaus authored 3 years ago

The idea of the new stats machinery is that 'stats' becomes an option
that enables the basic stats gathering infrastructure in EMPER. At
some point, it should become a non-user option, i.e., it should be
remove from meson_options.txt. Then comes a layer of fine-grained
stats control switches, which default to 'auto'. Third, a new option
called 'stats_all' is added, which, if enabled, activates all
fine-graind stats knobs that are set to 'auto'.

46302a0f

Feb 27, 2022
- Fix stats_stack_usage option definition · d9966312
  Florian Schmaus authored 3 years ago
  
  d9966312
Feb 24, 2022
- Add WsClv3Queue and WsClv4Queue · 85b0451c
  Florian Schmaus authored 3 years ago
  
  85b0451c
Feb 18, 2022

io: make max unbounded io worker configurable · 81c45366

Florian Fischer authored 3 years ago

Since Linux 5.15 io_uring can limit the number of iow threads created
using IORING_REGISTER_IOWQ_MAX_WORKERS.

Bump liburing wrap to version 2.1 to use
io_uring_register_iowq_max_workers.

Expose this via the meson variable io_unbounded_iow_max and the
environment variable EMPER_IO_UNBOUNDED_IOW_MAX.

See for an detailed explanation:
https://blog.cloudflare.com/missing-manuals-io_uring-worker-pool

81c45366

Make the implementations of the work-stealing queue(s) selectable · c7f953bf
Florian Schmaus authored 3 years ago
```
This required to break an include cycle between Fibril and
LockedQueue.
```
c7f953bf

Feb 16, 2022
- [MemoryManager] Add threshold to abort memory-stealing, set to 10% · 99092a76
  Florian Schmaus authored 3 years ago
  
  99092a76
Feb 15, 2022
- Add stack-usage stats · 47d253d1
  Florian Schmaus authored 3 years ago
  
  47d253d1
Feb 11, 2022
- [Context] Add option for guard page at the bottom of stack · 7fb99a46
  Florian Schmaus authored 3 years ago
  
  7fb99a46
- [meson] set default context_alignment to cache_line_size · c9eab3b7
  Florian Fischer authored 3 years ago
  
  c9eab3b7
Feb 10, 2022
- [Context] Add option for context alignment · 6d3b525a
  Florian Schmaus authored 3 years ago
  
  6d3b525a
- [Context] Make context size and assumed page size configurable · 64916077
  Florian Schmaus authored 3 years ago
  
  Also keep the context size at 64 KiB (there was a comment that errorneously indicated that the context size is 4 MiB).
  64916077
Feb 09, 2022
- Use constexpr for memory manager in context manager · 44f4599a
  Florian Schmaus authored 3 years ago
  
  44f4599a
Feb 07, 2022

Add support for continuation stealing · cf3ac3ed

Florian Schmaus authored 3 years ago


Thanks to Nicolas Pfeiffer for writing the initial prototypical
implementation of continuation stealing and the cactus stack
mechanism, on which this is based.

Co-authored-by: Nicolas Pfeiffer <pfeiffer@cs.fau.de>

cf3ac3ed

Jan 23, 2022

[meson] add option to ignore wakeup hints · 7f6fb152

Florian Fischer authored 3 years ago

I think wakeup hints should never be ignored but having the option
seams usefull to observe their benefits/cost.

7f6fb152

Jan 21, 2022

disable throttle wakeup strategy until it works with scheduleOn · 8cb084c9
Florian Fischer authored 3 years ago

8cb084c9

add semaphore using futex_waitv(2) supporting notify_specific · 96a846a1

Florian Fischer authored 3 years ago

The SpuriousFutex2Semaphore is able to notify a specific worker
by using two futexes two wait on.

One working like a normal semaphore used for global non specific
notifications via notify() and notify_many().

And a second one per worker which is based on a SleeperState.
To notify a specific worker we change its SleeperState to Notified
and call FUTEX_WAKE if needed.

96a846a1

Jan 14, 2022
- [meson] Add use_bundled_deps option · 76f4eafd
  Florian Schmaus authored 3 years ago
  
  76f4eafd
Dec 14, 2021

[IO] support one sq poller thread per numa node · fdff953f
Florian Fischer authored 3 years ago

fdff953f

[IO] overhaul SQPOLL support · 50c965e4

Florian Fischer authored 3 years ago

Two meson options control the io_uring sqpoll feature:
* io_uring_sqpoll - enable sq polling
* io_uring_shared_poller - share the polling thread between all io_urings

Since 5.12 the IORING_SETUP_ATTACH_WQ only causes sharing of
poller threads not the work queues.
See: https://github.com/axboe/liburing/issues/349

When using SQPOLL the userspace has no good way to
know how many sqes the kernel has consumed therefore we
wait for available sqes using io_uring_sqring_wait if there
was no usable sqe.

Remove the GlobalIoContext::registerLock and register all worker
io_uring eventfd reads at the beginning of the completer function.
Also register all the worker io_uring eventfds since they never change
and it hopefully reduces overhead in the global io_uring.

50c965e4

Dec 10, 2021

[meson] introduce dependencies to io configuration options · 5c7e3e9b
Florian Fischer authored 3 years ago

5c7e3e9b

Introduce waitfree workstealing · 1c538024

Florian Fischer authored 3 years ago

Waitfree work stealing is configured with the meson option
'waitfree_work_stealing'.

The retry logic is intentionally left in the Queues and not lifted to
the scheduler to reuse the load of an unsuccessful CAS.

Consider the following pseudo code examples:

steal() -> bool:                       steal() -> res
  load                                   load
loop:                                    if empty return EMPTY
  if empty return EMPTY                  cas
  cas                                    return cas ? STOLEN : LOST_RACE
  if not WAITFREE and not cas:
    goto loop                          outer():
  return cas ? STOLEN : LOST_RACE      loop:
                                         res = steal()
outer():                                 if not WAITFREE and res == LOST_RACE:
  steal()                                  goto loop

In the right example the value loaded by a possible unsuccessful CAS
can not be reused. And a loop of unsuccessful CAS' will result in
double loads.

The retries are configurable through a template variable maxRetries.
* maxRetries < 0: indefinitely retries
* maxRetries >= 0: maxRetries

1c538024

Dec 06, 2021

[meson] set check_anywhere_queue_while_stealing automatic · 7da8e687

Florian Fischer authored 3 years ago

We introduced the check_anywhere_queue_while_steal configuration
as an optimization to get the IO completions reaped by the completer
faster into the normal WSQ.
But now the emper has configurations where we don't use a completer
thus making this optimization useless or rather harmful.

By default automatically decide the value of
check_anywhere_queue_while_stealing based on the value of
io_completer_behavior.

7da8e687

Nov 10, 2021

make the victim count in work-stealing configurable · cd06496d

Florian Fischer authored 3 years ago

Add two new mutual exclusive meson_options:
* work_stealing_victim_count: Which sets an absolute number of victims
* work_stealing_victim_denominator: Set victim count to #workers/denominator

cd06496d

Oct 13, 2021

[meson] introduce lockless memory order and rename lockless option · 67b0c77a

Florian Fischer authored 3 years ago

The lockless algorithm can now be configured by setting -Dio_lockless_cq=true
and the used memory ordering by setting -Dio_lockless_memory_order={weak,strong}.

io_lockless_memory_order=weak:
    read with acquire
    write with release

io_lockless_memory_order=strong:
    read with seq_cst
    write with seq_cst

67b0c77a

Oct 11, 2021

[IoContext] implement lockless CQ reaping · d9d350d9
Florian Fischer authored 3 years ago
```
TODO: think about stats and possible ring buffer pointers overflow and ABA.
```
d9d350d9

implement IO stealing · 0abc29ad

Florian Fischer authored 3 years ago

IO stealing is analog to work-stealing and means that worker thread
without work will try to steal IO completions (CQEs) from other worker's
IoContexts. The work stealing algorithm is modified to check a victims
CQ after findig their work queue empty.

This approach in combination with future additions (global notifications
on IO completions, and lock free CQE consumption) are a realistic candidate
to replace the completer thread without loosing its benefits.

To allow IO stealing the CQ must be synchronized which is already the
case with the IoContext::cq_lock.
Currently stealing workers always try to pop a single CQE (this could
be configurable).
Steal attempts are recorded in the IoContext's Stats object and
successfully stolen IO continuations in the AbstractWorkStealingWorkerStats.

I moved the code transforming CQEs into continuation Fibers from
reapCompletions into a seperate function to make the rather complicated
function more readable and thus easier to understand.

Remove the default CallerEnvironment template arguments to make
the code more explicit and prevent easy errors (not propagating
the caller environment or forgetting the function takes a caller environment).

io::Stats now need to use atomics because multiple thread may increment
them in parallel from EMPER and the OWNER.
And since using std::atomic<T*> in std::map is not easily possible we
use the compiler __atomic_* builtins.

Add, adjust and fix some comments.

0abc29ad

Sep 27, 2021

[log] improve timestamp scalability and increase LogBuffer size · 442ead84

Florian Fischer authored 3 years ago

std::localtime takes a global lock and is therefore not scalable and
inapplicable for analyzing timing sensible bugs.
Introduce a new option to add UTC timestamps. This allows on my system
to double the CPU load while using mmapped logging.

Also increase the LogBuffer size from 1MB to 1GB because I had some
crashes where a renewed buffer was still used.

442ead84

Sep 22, 2021
- [IoContext] replace fancy CountingTryLock with simple CQ emptiness check · 9f545ba0
  Florian Fischer authored 3 years ago
  
  9f545ba0
Sep 20, 2021

[WakeupStrategy] introduce a new class to model our wakeup strategies · 37143de2

Florian Fischer authored 3 years ago

Add new 'throttle' wakeup strategy inspired by the algorithm used
by zap, go and tokio. This tries to prevent a possible thundering herd
problem and reduce contention on the scheduler by only waking a single
worker at a time. It further ensures that the next worker is only notified
if the previous successfully found work.

37143de2

Aug 19, 2021

[GlobalIoContext] Add CompleterSchedParam option · c0cf0f8d

Florian Schmaus authored 3 years ago

This adds an option to make the scheduling parameters of the completer
thread configurable via a meson option.

c0cf0f8d

Aug 18, 2021

[IO] add "try nonblocking syscall" optimization for send and recv · 0539a922
Florian Fischer authored 3 years ago

0539a922

[IO] Implement configurable "simple architecture" · 06b5bf0f

Florian Fischer authored 3 years ago

Introduce a new meson option io_single_uring which causes EMPER
to only use the GlobalIoContexts for all IO.

To submit SQEs to the io_uring SQ SubmitActor is used.

Futures can be in a new state where they are submitted to the SubmitActor
but not to the io_uring yet.
In this state isSubmitted && !isPrepared th Future must not be destroyed
to ensure this we yield when forgetting a Future until it is prepared
and thus it is safe to destroy it.

This commit contains no optimizations (no batching, no try non blocking
syscall first, ...)

Refacter GlobalIoContext.cpp:

* rename globalCompleter to completer
* make the completer loop non-static

06b5bf0f

Aug 02, 2021
- Add check_anywhere_queue_while_stealing meson option · 94c099e2
  Florian Schmaus authored 3 years ago
  
  94c099e2
Jul 14, 2021

implement a pipe based sleep strategy using the IO subsystem · 4ec30fd4

Florian Fischer authored 3 years ago

Design goals
============

* Wakeup either on external newWork notifications or on local IO completions
  -> Sleep strategy is sound without the IO completer
* Do as less as possible in a system saturated with work
* Pass a hint where to find new work to suspended workers

Algorithm
=========

Data:
	Global:
		hint pipe
		sleepers count
	Per worker:
		dispatch hint buffer
		in flight flag

Sleep:
	if we have no sleep request in flight
		Atomic increment sleep count
		Remember that we are sleeping
		Prepare read cqe from the hint pipe to dispatch hint buffer
	Prevent the completer from reaping completions on this worker's IoContext
	Wait until IO completions occurred

NotifyEmper(n):
	if observed sleepers <= 0
		return

	// Determine how many we are responsible to wake
	do
		toWakeup = min(observed sleepers, n)
	while (!CAS(sleepers, toWakeup))

	write toWakeup hints to the hint pipe

NotifyAnywhere(n):
	// Ensure all n notifications take effect
	while (!CAS(sleepers, observed sleepers - n))
		if observed sleeping <= -n
			return

	toWakeup = min(observed sleeping, n)
	write toWakeup hints to the hint pipe

onNewWorkCompletion:
	reset in flight flag
	allow completer to reap completions on this IoContext

Notes
=====

* We must decrement the sleepers count on the notifier side to
  prevent multiple notifiers to observe all the same amount of sleepers,
  trying to wake up the same sleepers by writing to the pipe and jamming it up
  with unconsumed hints and thus blocking in the notify write resulting
  in a deadlock.
* The CAS loops on the notifier side are needed because decrementing
  and incrementing the excess is racy: Two notifier can observe the
  sum of both their excess decrement and increment to much resulting in a
  broken counter.
* Add the dispatch hint code in AbstractWorkStealingScheduler::nextFiber.
  This allows workers to check the dispatch hint after there
  where no local work to execute.
  This is a trade-off where we trade slower wakeup - a just awoken worker
  will check for local work - against a faster dispatch hot path when
  we have work to do in our local WSQ.
* The completer tread must not reap completions on the IoContexts of
  sleeping workers because this introduces a race for cqes and a possible
  lost wakeup if the completer consumes the completions before the worker
  is actually waiting for them.
* When notifying sleeping workers from anywhere we must ensure that all
  notifications take effect. This is needed for example when terminating
  the runtime to prevent sleep attempt from worker thread which are
  about to sleep but have not incremented the sleeper count yet.
  We achieve this by always decrementing the sleeper count by the notification
  count.

Thanks to Florian Schmaus <flow@cs.fau.de> for spotting bugs and suggesting
improvements.

4ec30fd4

May 05, 2021
- [Blockable] Set affinity on block · 2680c470
  Florian Schmaus authored 3 years ago
  
  2680c470
Mar 23, 2021

[IO] make the behavior of the completer thread configurable · 5ea44519

Florian Fischer authored 4 years ago

Available behaviors:
  * none - the completer thread is not started

  * schedule (default) - the completer thread will reap and schedule available
                         completions from worker IoContexts

  * wakeup - the completer thread will wakeup all workers if it observes completions
             in a worker IoContext. The Fiber produced by the completion will
             be scheduled when the worker in which's IoContext the cqe lies
             reaps its completions.

5ea44519