emper issueshttps://gitlab.cs.fau.de/i4/manycore/emper/-/issues2020-08-22T17:09:03Zhttps://gitlab.cs.fau.de/i4/manycore/emper/-/issues/1Emper is not usable as a meson subproject2020-08-22T17:09:03ZFlorian FischerEmper is not usable as a meson subprojectI tried to use emper as a dependency through a meson warp.
```
[wrap-git]
url = git@gitlab.cs.fau.de:i4/emper.git
revision = head
```
But meson fails to build emper as a subproject with this message:
```
subprojects/emper/meson.bu...I tried to use emper as a dependency through a meson warp.
```
[wrap-git]
url = git@gitlab.cs.fau.de:i4/emper.git
revision = head
```
But meson fails to build emper as a subproject with this message:
```
subprojects/emper/meson.build:12:0: ERROR: Function 'add_global_arguments' cannot be used in subprojects because there is no way to make that reliable.
Please only call this if is_subproject() returns false. Alternatively, define a variable that
contains your language-specific arguments and add it to the appropriate *_args kwarg in each target.
```https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/2Merge async_* network support into emper2021-01-26T16:50:00ZFlorian FischerMerge async_* network support into emperFor [beehive](https://gitlab.cs.fau.de:aj46ezos/beehive) I would like to use the thread blocking anomaly avoidance mechanism introduced by Burak.
I don't care about detection or mitigation nor for disk I/O. All I need are the epoll base...For [beehive](https://gitlab.cs.fau.de:aj46ezos/beehive) I would like to use the thread blocking anomaly avoidance mechanism introduced by Burak.
I don't care about detection or mitigation nor for disk I/O. All I need are the epoll based async_* network functions.
Unfortunatly Burak's branch still uses cmake and I don't understand the individual commits enough to cherry-pick only my desired functionality.
What is the best way to get async network suppport into master? Should I try to rebase Burak's whole branch? Or try to separate and prepare only the bits I actually care about?https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/3Cleanly shutdown the actor2020-11-20T16:47:07ZFlorian SchmausCleanly shutdown the actorIt should be possible to cleanly shutdown the Actor without leaking any resources. This likely means that we have to free an potentially blocked Context within the Actor if we shut the Actor down.
The question is also if an Actor, that ...It should be possible to cleanly shutdown the Actor without leaking any resources. This likely means that we have to free an potentially blocked Context within the Actor if we shut the Actor down.
The question is also if an Actor, that is in state "PerformingShutdown", should process remaining queue items (and we only close the input side of the queue), or if close right away (and potentially return the remaining queue item to the closing entity).https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/4UnboundedBlockingMpscQueue not working properly2020-12-17T18:46:54ZFlorian FischerUnboundedBlockingMpscQueue not working properlyUnder high IO load or with the AlarmActorTest from the [unblock_on_main_thread](https://gitlab.cs.fau.de/i4/manycore/emper/-/tree/unblock_on_main_thread) branch the assert in UnboundedBlockingMpscQueue.hpp:97 triggers.Under high IO load or with the AlarmActorTest from the [unblock_on_main_thread](https://gitlab.cs.fau.de/i4/manycore/emper/-/tree/unblock_on_main_thread) branch the assert in UnboundedBlockingMpscQueue.hpp:97 triggers.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/5Anywhere queue could by implemented with reader/writer locks2022-01-19T09:45:54ZFlorian SchmausAnywhere queue could by implemented with reader/writer locksThe "anywhere queue" currently uses a mutex for enqueue and dequeue operations, which are fully locked using a `std::mutex`. It would be possible to use a reader/writer lock ([`std::shared_mutex`](https://en.cppreference.com/w/cpp/thread...The "anywhere queue" currently uses a mutex for enqueue and dequeue operations, which are fully locked using a `std::mutex`. It would be possible to use a reader/writer lock ([`std::shared_mutex`](https://en.cppreference.com/w/cpp/thread/shared_mutex) in CPP): This would allow the dequeue operation to only take the reader lock first, and bail out if the queue is empty. We could even consider using boost's [`UpgradeLockable`](https://www.boost.org/doc/libs/1_74_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_concepts.upgrade_lockable), which is a upgradable reader/writer lock.Florian FischerFlorian Fischerhttps://gitlab.cs.fau.de/i4/manycore/emper/-/issues/6Investigate and fix SimpleActorTest when using LAWS scheduling2021-01-14T13:00:24ZFlorian FischerInvestigate and fix SimpleActorTest when using LAWS schedulingSee: [failed laws job](https://gitlab.cs.fau.de/i4/manycore/emper/-/jobs/274754) triggered by !60.
On my development machine the SimpleActorTest failes 3 out of 100 runs on 997386e73bdc455b9fa4751b2cf4a9c3a8c7ac2f.See: [failed laws job](https://gitlab.cs.fau.de/i4/manycore/emper/-/jobs/274754) triggered by !60.
On my development machine the SimpleActorTest failes 3 out of 100 runs on 997386e73bdc455b9fa4751b2cf4a9c3a8c7ac2f.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/7Add (configurable) guard page to the end of the stacks2021-01-11T17:35:00ZFlorian SchmausAdd (configurable) guard page to the end of the stackshttps://gitlab.cs.fau.de/i4/manycore/emper/-/issues/8Add fix-includes Makefile target2021-01-25T17:15:50ZFlorian SchmausAdd fix-includes Makefile targetBasically
```
ninja -C build iwyu > iwyu.out
(cd build && fix_includes.py < ../iwyu.out)
```
- should place `iwyu.out` in a temp dir and delete afterwards
- create tools/fix_includes shell script?
- run clang-format afterwardsBasically
```
ninja -C build iwyu > iwyu.out
(cd build && fix_includes.py < ../iwyu.out)
```
- should place `iwyu.out` in a temp dir and delete afterwards
- create tools/fix_includes shell script?
- run clang-format afterwardshttps://gitlab.cs.fau.de/i4/manycore/emper/-/issues/9Allow to build with "-stdlib libc++"2021-01-26T17:19:57ZFlorian SchmausAllow to build with "-stdlib libc++"This is potentially a way forward enabling building on I4's lab infrastructure. We do not have a GCC/libstdc++ there that has all the headers we currently need (`<compare>` ATM, but we have a libc++-11-dev installed (thanks to apt.llvm.o...This is potentially a way forward enabling building on I4's lab infrastructure. We do not have a GCC/libstdc++ there that has all the headers we currently need (`<compare>` ATM, but we have a libc++-11-dev installed (thanks to apt.llvm.org). Maybe this way we can build EMPER in the I4 lab (without docker) again.
See https://libcxx.llvm.org/docs/UsingLibcxx.htmlhttps://gitlab.cs.fau.de/i4/manycore/emper/-/issues/10Consider removing feature_flags from tests/meson.build2021-01-29T21:16:00ZFlorian SchmausConsider removing feature_flags from tests/meson.buildIdeally a test is always attempted, and if the feature is not available, then the test exits with 77 so that it is skipped.Ideally a test is always attempted, and if the feature is not available, then the test exits with 77 so that it is skipped.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/11LinkedFuture and SimpleDiskAndNetwork Tests fail on buildtype=release2021-09-13T19:14:18ZFlorian FischerLinkedFuture and SimpleDiskAndNetwork Tests fail on buildtype=releaseBoth fail only in release builds in the Future destructor.
```
--- command ---
16:42:45 MALLOC_PERTURB_='1' EMPER_STACKTRACE_ON_ABORTS='true' /home/fischerling/code/emper/build-release/tests/LinkFutureTest
--- stderr ---
1742.698396...Both fail only in release builds in the Future destructor.
```
--- command ---
16:42:45 MALLOC_PERTURB_='1' EMPER_STACKTRACE_ON_ABORTS='true' /home/fischerling/code/emper/build-release/tests/LinkFutureTest
--- stderr ---
1742.698396097 EMPER_STACKTRACE_ON_ABORTS set to 'true': Enabling stacktrace on abort
Error: close Future 0x7fcba0080de0 created but never awaited
-------
20/22 EMPER:io / SimpleDiskAndNetworkTest FAIL 0.32s (killed by signal 11 SIGSEGV)
--- command ---
16:42:48 MALLOC_PERTURB_='1' EMPER_STACKTRACE_ON_ABORTS='true' /home/fischerling/code/emper/build-release/tests/SimpleDiskAndNetworkTest
--- stderr ---
1742.858450978 EMPER_STACKTRACE_ON_ABORTS set to 'true': Enabling stacktrace on abort
Error: accept Future 0x7f7d4c010ec0 created but never awaited
-------
```https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/12Improve worker wakeup2021-02-25T15:41:36ZFlorian FischerImprove worker wakeupThe fact that we are worse than golang or tokio when the system is not saturated with work in our echo server benchmark bothers me.
That's why I experimented with different ways of suspending and reactivating worker threads.
1. [Blockin...The fact that we are worse than golang or tokio when the system is not saturated with work in our echo server benchmark bothers me.
That's why I experimented with different ways of suspending and reactivating worker threads.
1. [Blocking IO](https://gitlab.cs.fau.de/aj46ezos/emper/-/tree/wakeup_worker_per_eventfd): Read and write on an eventfd opened with EFD_SEMAPHORE
2. [Signals](https://gitlab.cs.fau.de/aj46ezos/emper/-/tree/wakeup_worker_per_signal): sigwait and pthread_kill
I am not convinced that my implementations are sound. And not produce lost wakeups.
If we decide to change the synchronization primitive they should be thoroughly checked.
I did 5 runs of the echo test but haven't done any real statistics.
Here are the results: [wakeup.pdf](/uploads/c1c680c841ee30f69b3b913facdd2086/wakeup.pdf)
connections 500 1000 5000 10000 25000
signal 220.5737 (+8%) 257.5492 (+14%) 325.7909 (+12%) 313.9826 (+4%) 288.9554 (+8%)
mutex + condvar 202.7931 (0%) 225.6365 (0%) 289.4035 (0%) 300.4161 (0%) 267.1373 (0%)
eventfd 162.5336 (-20%) 209.8668 (-7%) 319.8507 (+7%) 311.5511 (+3%) 286.3498 (+7%)
## My interpretation
* BlockingIO with eventfd is the worst of the three variants.
* Using signal seams interesting.
* 5000 connections and eventfd has a really huge variance (not visible hear) and is not really trustworthy
* I can't really explain why both new variants perform better in a saturated system. I would expect that they do less in `onNewWork` which is executed in each `dispatchLoop` iteration. This would also explain why they behave very similarly.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/13Linked Futures can result in a BinaryPrivateSemaphore beeing signaled twice2022-01-18T14:28:48ZFlorian FischerLinked Futures can result in a BinaryPrivateSemaphore beeing signaled twiceSince !120 I observe that the echo server benchmark does not terminate.
I suspect the EchoClient because the server does work with the rust echo client and accepts new connections.
After reverting afafc7073f3fa2493fd21a346b2034b0cbfc...Since !120 I observe that the echo server benchmark does not terminate.
I suspect the EchoClient because the server does work with the rust echo client and accepts new connections.
After reverting afafc7073f3fa2493fd21a346b2034b0cbfcdf80 the benchmark works again therefore I am quite sure that the issue must be somewhere in !120.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/14SIGSEGV during exit (Safely exit the program from within the Runtime)2021-03-24T14:05:39ZFlorian FischerSIGSEGV during exit (Safely exit the program from within the Runtime)I can reproduce this [crash](https://gitlab.cs.fau.de/i4/manycore/emper/-/jobs/337220).
It happens when a thread calls `exit` and the cleanup code while another thread is using the runtime under destruction.
Thread 5 receives SIGSEGV in...I can reproduce this [crash](https://gitlab.cs.fau.de/i4/manycore/emper/-/jobs/337220).
It happens when a thread calls `exit` and the cleanup code while another thread is using the runtime under destruction.
Thread 5 receives SIGSEGV in `Scheduler::schedule` because the Runtime object and its Scheduler member are garbage values.
(gdb) i threads
Id Target Id Frame
1 Thread 0x7ffff79b5280 (LWP 444113) "TellActorFromAn" 0x00007ffff7ce79ba in __futex_abstimed_wait_common64 () from /usr/lib/libpthread.so.0
2 Thread 0x7ffff79af640 (LWP 444148) "TellActorFromAn" 0x00007ffff7bfca9d in syscall ()
from /usr/lib/libc.so.6
3 Thread 0x7ffff71ae640 (LWP 444149) "TellActorFromAn" 0x00007ffff7ce79ba in __futex_abstimed_wait_common64 () from /usr/lib/libpthread.so.0
4 Thread 0x7ffff69ad640 (LWP 444150) "TellActorFromAn" 0x00007ffff7ce79ba in __futex_abstimed_wait_common64 () from /usr/lib/libpthread.so.0
* 5 Thread 0x7ffff61ac640 (LWP 444153) "TellActorFromAn" 0x0000555555560640 in Scheduler::schedule (
this=0xfd284c0940fe485, fiber=...) at ../emper/Scheduler.hpp:60
6 Thread 0x7ffff59ab640 (LWP 444156) "TellActorFromAn" 0x00007ffff7fdc272 in _dl_fini ()
from /lib64/ld-linux-x86-64.so.2
runtime and scheduler objects seen by Thread 5
Scheduler object in `Scheduler::schedule`
(gdb) p *this
Cannot access memory at address 0xfd284c0940fe485
Runtime object in `Runtime::schedule`
(gdb) up
#1 0x0000555555560786 in Runtime::schedule (this=0x7ffff7fdc0e7 <_dl_fini+119>, fiber=...)
at ../emper/Runtime.hpp:168
168 scheduler.schedule(fiber);
(gdb) p *this
$1 = {<Logger<(LogSubsystem)6>> = {<No data fields>}, static currentRuntimeMutex =
{<std::__mutex_base> = {_M_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
__kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = '\000' <repeats 39 times>, __align = 0}}, <No data fields>},
static currentRuntime = 0x7fffffffe040, workerCount = 19339,
newWorkerHooks = std::vector of length 132845363851615715, capacity -267351304115112441 = {
<error reading variable>
(gdb) p this
$2 = (Runtime * const) 0x7ffff7fdc0e7 <_dl_fini+119>
Thread 6 is destructing the process resulting in an invalid Runtime object
(gdb) thread 6
[Switching to thread 6 (Thread 0x7ffff59ab640 (LWP 444156))]
#0 0x00007ffff7fdc272 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
(gdb) bt
#0 0x00007ffff7fdc272 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#1 0x00007ffff7b42697 in __run_exit_handlers () from /usr/lib/libc.so.6
#2 0x00007ffff7b4283e in exit () from /usr/lib/libc.so.6
#3 0x00007ffff7f26edb in invokeTest () at ../tests/test-runner/test-runner.cpp:14
Backtrace stopped: previous frame inner to this frame (corrupt stack?)Florian FischerFlorian Fischerhttps://gitlab.cs.fau.de/i4/manycore/emper/-/issues/15Consider using "Fast Random Integer Generation in an Interval" Lemir (2019), ...2021-03-24T08:47:27ZFlorian SchmausConsider using "Fast Random Integer Generation in an Interval" Lemir (2019), e.g. for random victim selection when stealing workRight now, EMPER's random victim selection is even biased, due https://gitlab.cs.fau.de/i4/manycore/emper/-/blob/33cad423220b3fc3c4a0f0202a61d45d104b0a19/emper/strategies/AbstractWorkStealingScheduler.cpp#L71
- https://arxiv.org/abs/180...Right now, EMPER's random victim selection is even biased, due https://gitlab.cs.fau.de/i4/manycore/emper/-/blob/33cad423220b3fc3c4a0f0202a61d45d104b0a19/emper/strategies/AbstractWorkStealingScheduler.cpp#L71
- https://arxiv.org/abs/1805.10941
- https://dl.acm.org/doi/10.1145/3230636https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/16CancelFutureTest is unreliable2021-04-14T13:42:33ZFlorian SchmausCancelFutureTest is unreliableWe observe CancelFutureTest failing with SIGABRT, caused by an assert firing, or running into the test timeout.We observe CancelFutureTest failing with SIGABRT, caused by an assert firing, or running into the test timeout.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/17emper::IoCompleterBehavior::wakeup failes in IncrementalCompletionTest2021-04-20T06:39:23ZFlorian Fischeremper::IoCompleterBehavior::wakeup failes in IncrementalCompletionTestWith !172 the IncrementalCompletionTest fails due to a memory corruption, most of the time in the allocator.
But I could not find any invalid/double frees in the allocation trace using chattymalloc.With !172 the IncrementalCompletionTest fails due to a memory corruption, most of the time in the allocator.
But I could not find any invalid/double frees in the allocation trace using chattymalloc.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/18Shrink AnywhereQueue / LockedUnboundedQueue2021-05-08T17:54:20ZFlorian SchmausShrink AnywhereQueue / LockedUnboundedQueueAs of now, at least some incarnations of LockedUnboundedQueue do not release memory. We should change that.As of now, at least some incarnations of LockedUnboundedQueue do not release memory. We should change that.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/19Incorrect worker count in qemu using kvm2021-07-05T09:17:25ZFlorian FischerIncorrect worker count in qemu using kvmWhen I test emper with a custom kernel in qemu `Runtime::getDefaultWorkerCount()` returns the number of CPU available to the host not the guest running in qemu.
But running `nproc(1)` in the guest reports the correct value.
That lead me...When I test emper with a custom kernel in qemu `Runtime::getDefaultWorkerCount()` returns the number of CPU available to the host not the guest running in qemu.
But running `nproc(1)` in the guest reports the correct value.
That lead me to [the corutils nproc source code](https://github.com/coreutils/gnulib/blob/90e79512d8b385801218d6e9c4d88ff77186560b/lib/nproc.c#L206) they use `min(configured-cpus, online-cpus, cpus-available-to-process)`.
We should use something similar.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/20Invalid link behaviour changed in io_uring2021-07-08T07:45:00ZFlorian FischerInvalid link behaviour changed in io_uringSince a couple of month I noticed that our `LinkFutureTest` fails in the `Valid->Invalid->Valid` case.
We expect io_uring to submit the broken chains until the invalid requests but since [cf10960426515](https://github.com/torvalds/linu...Since a couple of month I noticed that our `LinkFutureTest` fails in the `Valid->Invalid->Valid` case.
We expect io_uring to submit the broken chains until the invalid requests but since [cf10960426515](https://github.com/torvalds/linux/commit/cf109604265156bb22c45e0c2aa62f53a697a3f4) `io_uring` does no longer submit broken links at all which seams reasonable.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/21Create a variant where exactly one io_uring is used2021-08-23T09:34:58ZFlorian SchmausCreate a variant where exactly one io_uring is usedFor comparison, it would be great to have a variant where only exactly one io_uring is used (instead of one per worker).For comparison, it would be great to have a variant where only exactly one io_uring is used (instead of one per worker).https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/22Alternative completer behavior: Only remove one CQE from the CQ at a time2021-08-12T10:16:06ZFlorian SchmausAlternative completer behavior: Only remove one CQE from the CQ at a timeInstead of having the completer drain the whole CQ at one time, have him only remove one item, then proceed to the next ready CQ. After 8 (or 16) CQEs have been obtained, push those into the AnywhereQueue. Continue doing so, until all CQ...Instead of having the completer drain the whole CQ at one time, have him only remove one item, then proceed to the next ready CQ. After 8 (or 16) CQEs have been obtained, push those into the AnywhereQueue. Continue doing so, until all CQs are drained.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/23Provide EMPER-native buffered I/O2021-08-12T10:16:28ZFlorian SchmausProvide EMPER-native buffered I/Ohttps://gitlab.cs.fau.de/i4/manycore/emper/-/issues/24gitlab-ci tests once we can use io_uring from docker2021-09-24T10:52:19ZFlorian Schmausgitlab-ci tests once we can use io_uring from dockerIt should be possible to use io_uring from gitlab runner's docker with the upcoming Debian bullseye update.
Then we should:
- classify unit tests as 'io' tests
- create jobs for io_uring and SINGLE_URING in the 'test' stageIt should be possible to use io_uring from gitlab runner's docker with the upcoming Debian bullseye update.
Then we should:
- classify unit tests as 'io' tests
- create jobs for io_uring and SINGLE_URING in the 'test' stagehttps://gitlab.cs.fau.de/i4/manycore/emper/-/issues/25SimpleDiskAndNetworkTest hangs after terminating the runtime2021-09-24T11:14:35ZFlorian FischerSimpleDiskAndNetworkTest hangs after terminating the runtimeI observed the `SimpleDiskAndNetworkTest` hang with only the main thread left waiting on the `successSem` in the test-runner's main.
It is reproducable for me on master consistently using release builds and eventually using debugoptimiz...I observed the `SimpleDiskAndNetworkTest` hang with only the main thread left waiting on the `successSem` in the test-runner's main.
It is reproducable for me on master consistently using release builds and eventually using debugoptimized with a mmaped log file.
### Steps to reproduce
make release
meson test -C build SimpleDiskAndNetworkTest --repeat=10
### First observed
I remember seeing it somewhere in our CI but I can not find it anymore.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/26Wakeup-Strategy 'throttle' is unsound2021-10-11T09:04:35ZFlorian FischerWakeup-Strategy 'throttle' is unsoundJob [#449233](https://gitlab.cs.fau.de/i4/manycore/emper/-/jobs/449233) failed for d8434b57e971136bb8376ce06a04f79a3c100318:Job [#449233](https://gitlab.cs.fau.de/i4/manycore/emper/-/jobs/449233) failed for d8434b57e971136bb8376ce06a04f79a3c100318:https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/27The introduction of WakeupStrategy introduces/reveals a memory corruption2021-09-24T11:24:49ZFlorian FischerThe introduction of WakeupStrategy introduces/reveals a memory corruptionThis bug happens in master in debugoptimized builds.
Responsible commit 37143de207fce79a76d6187f7056149b0e9f19f5.
Found using:
git bisect start master ad10eb3a00493a12045dd251fa75dc32205ae80b --
git bisect run meson test -C build-...This bug happens in master in debugoptimized builds.
Responsible commit 37143de207fce79a76d6187f7056149b0e9f19f5.
Found using:
git bisect start master ad10eb3a00493a12045dd251fa75dc32205ae80b --
git bisect run meson test -C build-debugoptimized/ c_api_test
The bug results in this stacktrace:
#0 0x00007ffff7dead22 in raise () from /usr/lib/libc.so.6
#1 0x00007ffff7dd4862 in abort () from /usr/lib/libc.so.6
#2 0x00007ffff7e2cd28 in __libc_message () from /usr/lib/libc.so.6
#3 0x00007ffff7e3492a in malloc_printerr () from /usr/lib/libc.so.6
#4 0x00007ffff7e35826 in unlink_chunk.constprop () from /usr/lib/libc.so.6
#5 0x00007ffff7e3607b in _int_free () from /usr/lib/libc.so.6
#6 0x00007ffff7e399e8 in free () from /usr/lib/libc.so.6
#7 0x00007ffff7ded553 in __run_exit_handlers () from /usr/lib/libc.so.6
#8 0x00007ffff7ded64e in exit () from /usr/lib/libc.so.6
#9 0x00005555555552ca in check_fun () at ../tests/c_api_test.c:25
#10 0x00007ffff7d69947 in std::function<void (void*)>::operator()(void*) const (
__args#0=<optimized out>, this=0x7fffdc0018c8) at /usr/include/c++/11.1.0/bits/std_function.h:560
#11 Fiber::run (this=0x7fffdc0018c0) at ../emper/Fiber.cpp:13
#12 0x00007ffff7d943ff in Dispatcher::dispatch (fiber=0x7fffdc0018c0, this=0x55555556b6b8)
at ../emper/Dispatcher.hpp:30
#13 WsDispatcher::dispatchLoop (this=0x55555556b6b8) at ../emper/strategies/ws/WsDispatcher.cpp:20
The stacktrace is from thread 13 and apparently it is currently exiting.
I find this behavior only in the `c_api_test`.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/28Find a better name for WakeupStrategy::ThrottleState::pending2021-10-11T09:05:04ZFlorian FischerFind a better name for WakeupStrategy::ThrottleState::pendinghttps://gitlab.cs.fau.de/i4/manycore/emper/-/issues/29[Job Failed #463590] LinkFutureTest hangs without worker suspension2022-01-18T14:28:48ZFlorian Fischer[Job Failed #463590] LinkFutureTest hangs without worker suspensionJob [#463590](https://gitlab.cs.fau.de/aj46ezos/emper/-/jobs/463590) failed for 964278bc2d6fd0a66b4654e909f4f4d5512fe4c4:
I can not reproduce this on my machine.
What kernel version is the CI using?
* jenkins2
* https://gitlab.cs.fa...Job [#463590](https://gitlab.cs.fau.de/aj46ezos/emper/-/jobs/463590) failed for 964278bc2d6fd0a66b4654e909f4f4d5512fe4c4:
I can not reproduce this on my machine.
What kernel version is the CI using?
* jenkins2
* https://gitlab.cs.fau.de/aj46ezos/emper/-/jobs/477320
* https://gitlab.cs.fau.de/aj46ezos/emper/-/jobs/479439
* https://gitlab.cs.fau.de/aj46ezos/emper/-/jobs/481977
* phi01
* https://gitlab.cs.fau.de/i4/manycore/emper/-/jobs/482393
* i4cinode15
* https://gitlab.cs.fau.de/aj46ezos/emper/-/jobs/483469
* https://gitlab.cs.fau.de/i4/manycore/emper/-/jobs/490542https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/30BinaryPrivateSemaphoreTest timeout for pipe sleep strategy2022-01-14T14:19:20ZFlorian FischerBinaryPrivateSemaphoreTest timeout for pipe sleep strategyJob [#485907](https://gitlab.cs.fau.de/aj46ezos/emper/-/jobs/485907) failed for f133174a
Emper config
-Dworker_sleep_strategy=pipe
* i4cinode15
* [#485907](https://gitlab.cs.fau.de/aj46ezos/emper/-/jobs/485907)
* [#507415](https://...Job [#485907](https://gitlab.cs.fau.de/aj46ezos/emper/-/jobs/485907) failed for f133174a
Emper config
-Dworker_sleep_strategy=pipe
* i4cinode15
* [#485907](https://gitlab.cs.fau.de/aj46ezos/emper/-/jobs/485907)
* [#507415](https://gitlab.cs.fau.de/aj46ezos/emper/-/jobs/507415)
* [#508657](https://gitlab.cs.fau.de/flow/emper/-/jobs/508657) (pipe-no-completer)
* faui49phi01
* [#488767](https://gitlab.cs.fau.de/aj46ezos/emper/-/jobs/488767)https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/31Future cancellation is broken by design2022-01-21T14:38:13ZFlorian FischerFuture cancellation is broken by designThe design decition of distributed independent `io_urings` per worker, which allows us to not synchronize request submission prevents us from having simple cancellation logic.
An `io_uring` can only cancel a request it actually knows ab...The design decition of distributed independent `io_urings` per worker, which allows us to not synchronize request submission prevents us from having simple cancellation logic.
An `io_uring` can only cancel a request it actually knows about. But in the EMPER design it is totally possible that a Fiber
submits a request, blocks, is continued on a different worker and then wants to cancel the original request.
Which will fail because the `io_uring` of the current worker does not know the request submitted on the previous Worker.
The cancellation of the partially completed future chain in our CancelFutureTest is the perfect example:
cancelPartialCompletedChain();
004 1535.46106302149 PS 0x7ff904082460 constructed by fiber Fiber [ptr=0x562abe4b9c40 func=0 arg=0 aff=
nullptr]
004 1535.46106309403 PS 0x7ff9040824f0 constructed by fiber Fiber [ptr=0x562abe4b9c40 func=0 arg=0 aff=
nullptr]
IOC 1535.46106308601 IO 0x7ff904000900 Reaping completions for worker 4
004 1535.46106317007 PS 0x7ff904082580 constructed by fiber Fiber [ptr=0x562abe4b9c40 func=0 arg=0 aff=
nullptr]
004 1535.46106367300 IO 0x7ff904082570 submit read Future 0x7ff904082570 to IoContext 0x7ff904000900
004 1535.46106374043 IO 0x7ff904000900 submitting read Future 0x7ff904082570 and it's dependencies
004 1535.46106381056 IO 0x7ff904000900 Prepare read Future 0x7ff904082450 as a dependency
Worker 4 has prepared a chain of two read futures and submitted those.
004 1535.46106389892 IO 0x7ff904000900 Reaping completions for worker 4
004 1535.46106396855 IO 0x7ff9040824e0 submit write Future 0x7ff9040824e0 to IoContext 0x7ff904000900
004 1535.46106403487 IO 0x7ff904000900 submitting write Future 0x7ff9040824e0
004 1535.46106416892 IO 0x7ff904000900 Reaping completions for worker 4
004 1535.46106423745 IO 0x7ff9040824e0 Waiting on write Future 0x7ff9040824e0
004 1535.46106431088 PS 0x7ff9040824f0 block() blockedContext is 0x7ff904073a00
004 1535.46106443040 C 0x7ff904073a00 saving and switching to Context 0x7ff904083ac0 [tos: 0x7ff904093
b30 bos: 0x7ff904083b40]
004 1535.46106451175 IO 0x7ff904000900 Reaping completions for worker 4
004 1535.46106462526 SLEEP_S 0x7ffedd48c478 going to sleep
Worker 4 has submitted the write requests completing one of the reads and the Fiber blocks until the write is completed.
Thus there is no more work in the system Worker 4 goes to sleep.
IOC 1535.46106575295 IO 0x7ff904000900 Reaping completions for worker 4
IOC 1535.46106629826 IO 0x7ff904000900 got 2 cqes from worker 4's io_uring
IOC 1535.46106651016 IO 0x7ff9040824e0 Complete write Future 0x7ff9040824e0 with result 8
IOC 1535.46106676874 PS 0x7ff9040824f0 unblock in fast path
IOC 1535.46106689126 IO 0x7ff904082450 Complete read Future 0x7ff904082450 with result 8
IOC 1535.46106698343 PS 0x7ff904082460 no unblock in slow path
IOC 1535.46106711087 SLEEP_S 0x7ffedd48c478 NotifyMany 1 from ANYWHERE
000 1535.46106848001 SLEEP_S 0x7ffedd48c478 awoken
000 1535.46106878818 IO 0x7ff91c000900 Reaping completions for worker 0
000 1535.46106890039 DISP 0x562abe4a0dd8 executing fiber 0x7ff8d0001140
000 1535.46106899837 F 0x7ff8d0001140 run() calling 0 (ZN5FiberC4ERKSt8functionIFvvEEPiEUlPvE_) with a
rg 0
000 1535.46106908683 C 0x7ff91c073a00 discarding and switching to 0x7ff904073a00
000 1535.46106916628 CM 0x562abe4a0c40 Freeing context 0x7ff91c073a00
000 1535.46106932126 PS 0x7ff9040821b0 constructed by fiber Fiber [ptr=0x562abe4b9c40 func=0 arg=0 aff=
nullptr]
The completer thread does its job and reaps the completions of the sleeping Worker 4 and
notifies a sleeping Worker.
Worker 0 is awoken and resumes the blocked Fiber.
000 1535.46106940171 IO 0x7ff9040821a0 submit cancel Future 0x7ff9040821a0 to IoContext 0x7ff91c000900
000 1535.46106947505 IO 0x7ff91c000900 submitting cancel Future 0x7ff9040821a0
000 1535.46106967592 IO 0x7ff91c000900 Reaping completions for worker 0
000 1535.46106975317 IO 0x7ff91c000900 got 1 cqes from worker 0's io_uring
000 1535.46106983652 IO 0x7ff9040821a0 Complete cancel Future 0x7ff9040821a0 with result -2
The cancellation fails with -EBADF because in the io_uring of Worker 0 the future submitted
on Worker 4 is unknown.
000 1535.46106991056 PS 0x7ff9040821b0 no unblock in slow path
000 1535.46106998640 IO 0x7ff9040821a0 Waiting on cancel Future 0x7ff9040821a0
000 1535.46107005452 IO 0x7ff904082570 Waiting on read Future 0x7ff904082570
000 1535.46107012415 PS 0x7ff904082580 block() blockedContext is 0x7ff904073a00
000 1535.46107024698 C 0x7ff904073a00 saving and switching to Context 0x7ff91c073a00 [tos: 0x7ff91c083
a70 bos: 0x7ff91c073a80]
000 1535.46107032232 IO 0x7ff91c000900 Reaping completions for worker 0
000 1535.46107040748 SLEEP_S 0x7ffedd48c478 going to sleep
IOC 1535.46107059783 IO 0x7ff91c000900 Reaping completions for worker 0
The result is a sleeping emper.
Problem is this is not trivially fixable.https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/32fsearch with continuation stealing fails2022-03-30T14:30:18ZFlorian Fischerfsearch with continuation stealing failsI wanted to add fsearch using continuation stealing to the EMPER variant of our fs evaluation but fsearch using continuations stealing fails with:
```
munmap_chunk(): invalid pointer
```I wanted to add fsearch using continuation stealing to the EMPER variant of our fs evaluation but fsearch using continuations stealing fails with:
```
munmap_chunk(): invalid pointer
```https://gitlab.cs.fau.de/i4/manycore/emper/-/issues/33Add scheduleIn(std::uint64_t ns, emper::io::Future::Callback fun)2022-05-27T11:40:22ZFlorian SchmausAdd scheduleIn(std::uint64_t ns, emper::io::Future::Callback fun)Possible implementation
```
void scheduleIn(std::uint64_t ns, emper::io::Future::Callback fun) {
emper::io::AlarmFuture::Timespec ts = {.tv_sec = 0, .tv_nsec = ns};
emper::io::AlarmFuture alarmFuture(ts);
alarmFuture.setCallb...Possible implementation
```
void scheduleIn(std::uint64_t ns, emper::io::Future::Callback fun) {
emper::io::AlarmFuture::Timespec ts = {.tv_sec = 0, .tv_nsec = ns};
emper::io::AlarmFuture alarmFuture(ts);
alarmFuture.setCallback(fun);
alarmFuture.submit();
}
```