- Feb 07, 2022
Florian Schmaus authored
Thanks to Nicolas Pfeiffer for writing the initial prototypical implementation of continuation stealing and the cactus stack mechanism, on which this is based. Co-authored-by:
Nicolas Pfeiffer <pfeiffer@cs.fau.de>
- Jan 23, 2022
Florian Fischer authored
I think wakeup hints should never be ignored but having the option seams usefull to observe their benefits/cost.
- Jan 14, 2022
Florian Schmaus authored
- Jan 11, 2022
Florian Fischer authored
I setup a new development environment and emper did not compile because emper::io::Stats use the circular_buffer provided by boost. Boost was not installed and our build-system failed to detect it. This change adds the header-only boost dependency to emper. https://mesonbuild.com/Dependencies.html#boost The header-only dependency is enough to build emper default configuration. When linking against boost is required we use the 'modules' karg.
- Dec 14, 2021
Florian Fischer authored
Florian Fischer authored
Two meson options control the io_uring sqpoll feature: * io_uring_sqpoll - enable sq polling * io_uring_shared_poller - share the polling thread between all io_urings Since 5.12 the IORING_SETUP_ATTACH_WQ only causes sharing of poller threads not the work queues. See: https://github.com/axboe/liburing/issues/349 When using SQPOLL the userspace has no good way to know how many sqes the kernel has consumed therefore we wait for available sqes using io_uring_sqring_wait if there was no usable sqe. Remove the GlobalIoContext::registerLock and register all worker io_uring eventfd reads at the beginning of the completer function. Also register all the worker io_uring eventfds since they never change and it hopefully reduces overhead in the global io_uring.
- Dec 10, 2021
Florian Fischer authored
Florian Fischer authored
Waitfree work stealing is configured with the meson option 'waitfree_work_stealing'. The retry logic is intentionally left in the Queues and not lifted to the scheduler to reuse the load of an unsuccessful CAS. Consider the following pseudo code examples: steal() -> bool: steal() -> res load load loop: if empty return EMPTY if empty return EMPTY cas cas return cas ? STOLEN : LOST_RACE if not WAITFREE and not cas: goto loop outer(): return cas ? STOLEN : LOST_RACE loop: res = steal() outer(): if not WAITFREE and res == LOST_RACE: steal() goto loop In the right example the value loaded by a possible unsuccessful CAS can not be reused. And a loop of unsuccessful CAS' will result in double loads. The retries are configurable through a template variable maxRetries. * maxRetries < 0: indefinitely retries * maxRetries >= 0: maxRetries
- Dec 06, 2021
Florian Fischer authored
We introduced the check_anywhere_queue_while_steal configuration as an optimization to get the IO completions reaped by the completer faster into the normal WSQ. But now the emper has configurations where we don't use a completer thus making this optimization useless or rather harmful. By default automatically decide the value of check_anywhere_queue_while_stealing based on the value of io_completer_behavior.
- Nov 10, 2021
Florian Fischer authored
Add two new mutual exclusive meson_options: * work_stealing_victim_count: Which sets an absolute number of victims * work_stealing_victim_denominator: Set victim count to #workers/denominator
- Oct 13, 2021
Florian Fischer authored
The lockless algorithm can now be configured by setting -Dio_lockless_cq=true and the used memory ordering by setting -Dio_lockless_memory_order={weak,strong}. io_lockless_memory_order=weak: read with acquire write with release io_lockless_memory_order=strong: read with seq_cst write with seq_cst
- Oct 11, 2021
Florian Fischer authored
TODO: think about stats and possible ring buffer pointers overflow and ABA.
Florian Fischer authored
IO stealing is analog to work-stealing and means that worker thread without work will try to steal IO completions (CQEs) from other worker's IoContexts. The work stealing algorithm is modified to check a victims CQ after findig their work queue empty. This approach in combination with future additions (global notifications on IO completions, and lock free CQE consumption) are a realistic candidate to replace the completer thread without loosing its benefits. To allow IO stealing the CQ must be synchronized which is already the case with the IoContext::cq_lock. Currently stealing workers always try to pop a single CQE (this could be configurable). Steal attempts are recorded in the IoContext's Stats object and successfully stolen IO continuations in the AbstractWorkStealingWorkerStats. I moved the code transforming CQEs into continuation Fibers from reapCompletions into a seperate function to make the rather complicated function more readable and thus easier to understand. Remove the default CallerEnvironment template arguments to make the code more explicit and prevent easy errors (not propagating the caller environment or forgetting the function takes a caller environment). io::Stats now need to use atomics because multiple thread may increment them in parallel from EMPER and the OWNER. And since using std::atomic<T*> in std::map is not easily possible we use the compiler __atomic_* builtins. Add, adjust and fix some comments.
- Sep 22, 2021
Florian Fischer authored
- Aug 19, 2021
Florian Schmaus authored
Florian Fischer authored
Florian Schmaus authored
This adds an option to make the scheduling parameters of the completer thread configurable via a meson option.
- Aug 18, 2021
Florian Fischer authored
Florian Fischer authored
Introduce a new meson option io_single_uring which causes EMPER to only use the GlobalIoContexts for all IO. To submit SQEs to the io_uring SQ SubmitActor is used. Futures can be in a new state where they are submitted to the SubmitActor but not to the io_uring yet. In this state isSubmitted && !isPrepared th Future must not be destroyed to ensure this we yield when forgetting a Future until it is prepared and thus it is safe to destroy it. This commit contains no optimizations (no batching, no try non blocking syscall first, ...) Refacter GlobalIoContext.cpp: * rename globalCompleter to completer * make the completer loop non-static
- Aug 02, 2021
Florian Schmaus authored
- Jul 14, 2021
Florian Fischer authored
Design goals ============ * Wakeup either on external newWork notifications or on local IO completions -> Sleep strategy is sound without the IO completer * Do as less as possible in a system saturated with work * Pass a hint where to find new work to suspended workers Algorithm ========= Data: Global: hint pipe sleepers count Per worker: dispatch hint buffer in flight flag Sleep: if we have no sleep request in flight Atomic increment sleep count Remember that we are sleeping Prepare read cqe from the hint pipe to dispatch hint buffer Prevent the completer from reaping completions on this worker's IoContext Wait until IO completions occurred NotifyEmper(n): if observed sleepers <= 0 return // Determine how many we are responsible to wake do toWakeup = min(observed sleepers, n) while (!CAS(sleepers, toWakeup)) write toWakeup hints to the hint pipe NotifyAnywhere(n): // Ensure all n notifications take effect while (!CAS(sleepers, observed sleepers - n)) if observed sleeping <= -n return toWakeup = min(observed sleeping, n) write toWakeup hints to the hint pipe onNewWorkCompletion: reset in flight flag allow completer to reap completions on this IoContext Notes ===== * We must decrement the sleepers count on the notifier side to prevent multiple notifiers to observe all the same amount of sleepers, trying to wake up the same sleepers by writing to the pipe and jamming it up with unconsumed hints and thus blocking in the notify write resulting in a deadlock. * The CAS loops on the notifier side are needed because decrementing and incrementing the excess is racy: Two notifier can observe the sum of both their excess decrement and increment to much resulting in a broken counter. * Add the dispatch hint code in AbstractWorkStealingScheduler::nextFiber. This allows workers to check the dispatch hint after there where no local work to execute. This is a trade-off where we trade slower wakeup - a just awoken worker will check for local work - against a faster dispatch hot path when we have work to do in our local WSQ. * The completer tread must not reap completions on the IoContexts of sleeping workers because this introduces a race for cqes and a possible lost wakeup if the completer consumes the completions before the worker is actually waiting for them. * When notifying sleeping workers from anywhere we must ensure that all notifications take effect. This is needed for example when terminating the runtime to prevent sleep attempt from worker thread which are about to sleep but have not incremented the sleeper count yet. We achieve this by always decrementing the sleeper count by the notification count. Thanks to Florian Schmaus <flow@cs.fau.de> for spotting bugs and suggesting improvements.
- May 05, 2021
Florian Fischer authored
- Mar 23, 2021
Florian Fischer authored
Available behaviors: * none - the completer thread is not started * schedule (default) - the completer thread will reap and schedule available completions from worker IoContexts * wakeup - the completer thread will wakeup all workers if it observes completions in a worker IoContext. The Fiber produced by the completion will be scheduled when the worker in which's IoContext the cqe lies reaps its completions.
- Mar 12, 2021
Florian Fischer authored
- Mar 09, 2021
Florian Fischer authored
This change introduces a new synchronization primitive "PseudoCountingTryLock" which takes an actual lock as template and provides a CountingTryLock interface. By using a PseudoCountingTryLock we don't have to change any synchronization code in IoContext::reapCompletion. Since all PseudoCountingTryLock code is defined in a header the compiler should see our constant return values and hopefully optimize away any check depending on those constant return values. Options: * spin_lock - naive CAS spin lock * mutex - std::mutex * counting_try_lock (default) - our own lightweight special purpose synchronization primitive
Florian Schmaus authored
The run_target() function requires an absolute path in meson >= 0.57.
- Mar 08, 2021
Florian Fischer authored
Since 8f38dbed the globalCompleter does always reap and schedule in batches through IoContest::reapAndSchedule<CallerEnvironment::ANYWHERE> -> Runtime::scheduleFromAnywhere(Input it begin, InputIt end) -> AnywhereQueue::insert(Input it begin, InputIt end)
- Mar 01, 2021
Florian Schmaus authored
Florian Schmaus authored
- Feb 26, 2021
Florian Fischer authored
Available implementations configurations through the meson option 'locked_unbounded_queue_implementation' are: mutex - our current LockedUnboundedQueue implementation using std::mutex rwlock - An implementation with pthread_rwlock. The implementations tries to upgrade its rdlock and drops and acquires a wrlock on failure shared_mutex - An implementation using std::shared_mutex. dequeue() acquires a shared lock at first, drops it and acquires a unique lock boost_shared_mutex - An implementation using boost::shared_mutex. dequeue() acquires an upgradable lock and upgrade it to a unique lock if necessary
Florian Fischer authored
This change introduces new scheduleFromAnywhere methods which take a range of Fibers to schedule. Blockable gets a new method returning the fiber used to start the unblocked context, which is used by Future/PartialCompletableFuture to provide a way of completion and returning the continuation Fiber to the caller so they may schedule the continuation how they want. If the meson option io_batch_anywhere_completions is set the global completer will collect all callback and continuation fibers before scheduling them at once when it is done reaping the completions. The idea is that taking the AnywhereQueue write lock and calling onNewWork must only be done once. TODO: investigate if onNewWork should be extended by an amountOfWork argument which determines how many worker can be awoken and have work to do. This should be trivially since our WorkerWakeupSemaphore implementations already support notify_many(), which may be implemented in terms of notify_all though.
- Feb 23, 2021
Florian Fischer authored
LockedSemaphore is the already existening Semaphore using a mutex and a condition variable. PosixMutex is a thin wrapper around a POSIX semaphore. SpuriousFutexSemaphore is a atomic/futex based implementation prune to spurious wakeups which is fine for the worker wakeup usecase.
- Feb 22, 2021
Florian Fischer authored
- Feb 10, 2021
Florian Fischer authored
- Jan 26, 2021
Florian Fischer authored
Empers IO design is based on a proactor pattern where each worker can issue IO requests through its exclusive IoContext object which wraps an io_uring instance. IO completions are reaped at 4 places: 1. After a submit to collect inline completions 2. Before dispatching a new Fiber 3. When no new IO can be submitted because the completion queue is full 4. And by a global completer thread which gets notified about completions on worker IoContexts through registered eventfds All IO requests are modeled as Future objects which can be either instantiated and submitted manually, retrieved by POSIX-like non-blocking or implicitly used by posix-like blocking functions. User facing API is exported in the following headers: * emper/io.hpp (POSIX-like) * emper.h (POSIX-like) * emper/io/Future.hpp Catching short write/reads/sends and resubmitting the request without unblocking the Fiber is supported. Using AlarmFuture objects Fibers have a emper-native way to sleep for a given time. IO request timeouts with TimeoutWrapper class. Request Cancellation is supported with Future::cancel() or the CancelWrapper() Future class. A proactor design demands that buffers are committed to the kernel as long as the request is active. To guaranty memory safety Futures get canceled in their Destructor which will only return after the committed memory is free to use. Linking Futures to chains is supported using the Future::SetDependency() method. Future are submitted when their last Future gets submitted. A linked Request will start if the previous has finished. Error or partial completions will cancel the not started tail of a chain. TODO: Handle possible situations where the CQ of the global completer is full and no more sqe can be submitted to the SQ.
Florian Fischer authored
This feature must be activated using the blocked_context_set meson option.
- Jan 22, 2021
Florian Schmaus authored
This introduces AbstractWorkStealingScheduler which holds the common work-stealing scheduling strategy.
- Jan 13, 2021
Florian Schmaus authored
This also changes emper_log so that a std::ostringstream is used to assemble the log message.
- Jan 11, 2021
Florian Schmaus authored
Initiailze the WORKER_WAKEUP_STRATEGY via the contents of the EMPER_WORKER_WAKEUP_STRATEGY macro. This makes it easier to add additional strategies later on.
- Jan 05, 2021
Florian Fischer authored