emper merge requestshttps://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests2021-01-04T09:01:56Zhttps://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/60[Runtime] notify only one sleeping worker on new work2021-01-04T09:01:56ZFlorian Fischer[Runtime] notify only one sleeping worker on new workThis prevents a trampling herd effect on the AnywhereQueue and the workerSleep mutex.
On workloads where not all workers are busy it greatly reduces used
CPU time because not all workers wake up just to sleep again.
For hight intensity...This prevents a trampling herd effect on the AnywhereQueue and the workerSleep mutex.
On workloads where not all workers are busy it greatly reduces used
CPU time because not all workers wake up just to sleep again.
For hight intensity workloads when al workers are busy handling there own work
this change should not have any impact because then sleeping workers are rare.
This claim is backed by the experiments I did on faui49big02 (40 cores / 80 hw threads).
I measured the time and resources used by our tests/Echoserver handling
incremental amount of connections (one connection per client process,
each connection issued 100000 echos)
```
con 100k echos time[ns] user-time[s] sys-time[s] cpu-used
notify_all 1 49008685297 217.86 626.26 1650%
notify_one 1 31304750273 9.40 8.33 53%
notify_all 10 76487793595 665.45 1295.19 2484%
notify_one 10 35674140605 188.77 68.26 656%
...
notify_all 40 102469333659 4255.30 363.86 4399%
notify_one 40 105289161995 4167.43 322.69 4169%
notify_all 80 76883202092 3418.44 409.64 4762%
notify_one 80 68856748614 2878.56 397.66 4548%
```
Although I would not absolutely trust the numbers because there are from only one
run and quit a bit of randomness is inherent to emper because of the work stealing scheduling.
Nonetheless they show three interesting points:
1. CPU usage for low intensity workloads is drastically reduced.
2. Impact of notify_one get smaller the more intense the workload gets.
3. Somehow emper performs significantly worse for 40 than for 80 connections
Command used to generate results:
```bash
for i in 1 10 20 30 40 80; do /usr/bin/time -v build-release/tests/EchoServer 2>> notify_all.txt & sleep 2; tests/echo_client.py -p ${i} -c 1 -i 100000 >> notify_all.txt && echo "quit" | nc localhost 12345; done
```
Full results can be found here:
notify_all: https://termbin.com/6zba
notify_one: https://termbin.com/3bsihttps://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/59Fix headers when EMPER_LOG_OFF is defined2021-01-06T12:30:43ZFlorian FischerFix headers when EMPER_LOG_OFF is definedhttps://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/58[Runtime] don't allocate threads array twice2020-12-18T10:51:56ZFlorian Fischer[Runtime] don't allocate threads array twicethe threads array is initialized using the Runtime::Runtime initializer list
and afterwards again in the constructor.the threads array is initialized using the Runtime::Runtime initializer list
and afterwards again in the constructor.https://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/57handle UnboundedBlockingMpscQueue spurious wake-ups2020-12-17T18:46:53ZFlorian Fischerhandle UnboundedBlockingMpscQueue spurious wake-upsA spurious wake-up can be produced by the new UnblockOnMainActorTest which
triggers the `assert(!mpscQueue.empty())` in `UnboundedBlockingMpscQueue::get`.
Those spurious wake-ups are possible because the push and wake-up pair in
`Unboun...A spurious wake-up can be produced by the new UnblockOnMainActorTest which
triggers the `assert(!mpscQueue.empty())` in `UnboundedBlockingMpscQueue::get`.
Those spurious wake-ups are possible because the push and wake-up pair in
`UnboundedBlockingMpscQueue::put` are not atomic.
The following sequence diagram demonstrates a spurious wake-up:
```
T1 T2 Q
. . { }
put(e) . { }
push 54-57 . {e}
. get() {e}
. consume e { }
. . { }
. get() { }
. block { }
unblock . { }
. . { }
. wakeup { }
. . { }
X
assert(!queue.Empty())
```
To deal with spurious wake-ups we recheck the wake-up condition (a non empty queue)
and block again if we find it empty.
We assume spurious wake-ups are rare because it was difficult to reproduce them
even with a dedicated Test (the new UnblockOnMainActorTest) therefore we declare
the empty queue branch as unlikely.
Fixes #4.https://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/55Various small changes2020-12-14T20:03:55ZFlorian SchmausVarious small changeshttps://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/54Worker exclusive uring2021-01-26T16:50:00ZFlorian FischerWorker exclusive uringAlternative IO design to [io_uring_network](https://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/6) branch
Supersedes !6.
## TODO:
- [x] handle submit errors in Future chains. Supported since dff6c4c3a9482a69abd405fa8d82bfdf3...Alternative IO design to [io_uring_network](https://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/6) branch
Supersedes !6.
## TODO:
- [x] handle submit errors in Future chains. Supported since dff6c4c3a9482a69abd405fa8d82bfdf3576e902
- [x] handle partial completions in Future chains (short reads for example terminate sqe chains). We just disable partial comjpletion for all Futures prepared as dependency from others.
- [X] implement timeouts. supported since 179b87b653c3a296e2287a18a212d29df5183ea7
- [x] ~~remove SQPOLL or implement `io_uring_register` for each fildes~~. Linux 5.11 sopports SQPOLL for with not register fildes
- [x] make worker cq size configurable
- [X] ~~handle full sq.~~ sqes are consumed by the kernel and thus the sq can't be full
- [x] ~~fix race between `submit<ANYWHERE>` and `submit<EMPER>` both accessing the IoContext's sq possibly parallel decide between [mutex based aproach](https://gitlab.cs.fau.de/i4/manycore/emper/-/commit/31de63c7fe4d20cb6b45e585e226181f39f15e14) and [api change approach](https://gitlab.cs.fau.de/i4/manycore/emper/-/commit/6bd13e43bb0401e9211c8e5c396e79b994a71e4c)~~ Fixed by 33150a26db13bf8b62a4409a756fb347c89e0177.
## Notes about invalid Future chains
A chain of Futures which can not be fully submitted because of an invalid request fails so be submitted to the io_uring.
```
req1 -> invalid_req -> req3
```
calling io_submit after preparing this chain of sqes will submit only two sqe and leaves the last one in the SQ.
[related liburing issue](https://github.com/axboe/liburing/issues/186).
Should we cancel and signal all dependent Futures our self?
This breaks the memory safety guaranty of awaiting the last Future in the link.
Because req1 could be still in processing by the kernel but the user invalidates its
memory because the dependent Future was signaled.
This concern is nonsense because if your last Future in a chain was canceled this means
some previous request was not completed as expected and the user has to go down the
chain and check for the failure.
This is true for canceled chains because of partial completions or errors as well as
for not fully submitted chains.
```C++
{
char buf[33], buf3[32];
ReadFuture r1 = ReadFuture(0, buf, sizeof(buf), 0);
ReadFuture invalid = ReadFuture(42, nullptr, 1337, -5);
invalid.addDependency(r1);
ReadFuture r3 = ReadFuture(0, buf3, sizeof(buf3), 0);
r3.addDependency(r1);
r3.submit(); // <- this will result in the preparation of 3 sqe's but only 2 will be submitted
// this will immediately return with -ECANCELED
int32_t r = r3.wait();
if (r == -ECANCELD) {
r = invalid.wait()
// r will be a normal error indicating the invalid request in the chain
if (r == -EBADF || r == -EINVAL) {
r = r1.wait() // await the last correct Future to handle the whole chain
}
}
}
```
## Notes about timeouts
Timeouts are issued as separate sqe from the actual request which must be linked
to its timeout by setting the IOSQE_IO_LINK in sqe->flags.
See: [liburing timeout connect test](https://github.com/axboe/liburing/pull/254/files#diff-52d1636196831889ccbfd1a8ad80fb3fe3575115dfe091a1621f1dd4e51a3003)
The timeout generates a cqe with `res == -ETIME` when it expires and the
actual request results in a cqe with `res == -ECANCELED`.
To reference the wrapping Future object both sqe's would contain a link to the
Future object.
Because a single Future can now be fulfilled by two cqe's we can not immediately signal
the future on seeing either one of the completion events.
The Future object's memory may be invalid after it was signaled making reads from the pointer stored in both cqes after one was received undefined behavior.
A possible solution would be to signal a Future only if both sqes were seen.https://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/53Add meson option for scheduling strategy and according CI jobs2020-12-17T18:46:03ZFlorian SchmausAdd meson option for scheduling strategy and according CI jobshttps://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/52Improve code readability of UnboundedBlockingMpscQueue2020-12-14T13:47:34ZFlorian SchmausImprove code readability of UnboundedBlockingMpscQueuehttps://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/51Fix memory leak: Ensure that the current fiber is recycled in discardAndResume()2020-12-14T13:58:15ZFlorian SchmausFix memory leak: Ensure that the current fiber is recycled in discardAndResume()The current Fiber is now always stored in the context, not just on
debug builds. This also means that we can remove the currentFiber
thread local variable, as using only a thread local storage would
yield wrong results in case a blocked ...The current Fiber is now always stored in the context, not just on
debug builds. This also means that we can remove the currentFiber
thread local variable, as using only a thread local storage would
yield wrong results in case a blocked context is resumed on another
worker thread.https://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/50[doc] make doxygen search for input recursively2020-12-10T11:01:05ZFlorian Fischer[doc] make doxygen search for input recursivelyIf RECURSIVE is set to NO doxygen will only process files in emper/
when set to YES it will see all files reachable from emper/.If RECURSIVE is set to NO doxygen will only process files in emper/
when set to YES it will see all files reachable from emper/.https://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/49Add emper::getFullVersion()2020-12-10T07:12:08ZFlorian SchmausAdd emper::getFullVersion()This also solves a dependency declaration issue in WorkerSleepExample:
Prior to this change, a clean build could potentially result in
ninja -C build
ninja: Entering directory `build'
[13/57] Compiling C++ object apps/worker_sleep_exam...This also solves a dependency declaration issue in WorkerSleepExample:
Prior to this change, a clean build could potentially result in
ninja -C build
ninja: Entering directory `build'
[13/57] Compiling C++ object apps/worker_sleep_example.p/WorkerSleepExample.cpp.o
FAILED: apps/worker_sleep_example.p/WorkerSleepExample.cpp.o
ccache c++ -Iapps/worker_sleep_example.p -Iapps -I../apps -Iemper -I../emper -Iemper/include -I../emper/include -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wnon-virtual-dtor -Wextra -Wpedantic -Werror -std=c++17 -O2 -g -Wno-non-virtual-dtor -MD -MQ apps/worker_sleep_example.p/WorkerSleepExample.cpp.o -MF apps/worker_sleep_example.p/WorkerSleepExample.cpp.o.d -o apps/worker_sleep_example.p/WorkerSleepExample.cpp.o -c ../apps/WorkerSleepExample.cpp
../apps/WorkerSleepExample.cpp:12:10: fatal error: emper-version.h: No such file or directory
12 | #include "emper-version.h" // for EMPER_FULL_VERSION
| ^~~~~~~~~~~~~~~~~
compilation terminated.
[17/57] Generating emper-version.h with a custom command
ninja: build stopped: subcommand failed.
make: *** [Makefile:23: build] Error 1
because worker_sleep_example_exec should have also depend on
emper_version_h. However this is obviously error prone, as users
easily forget to add this dependency. Instead we add
emper::getFullVersion() which is part of the EMPER shared object (not
just of a single header).https://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/48[test] Add EMPER test runner2021-01-14T08:28:17ZFlorian Schmaus[test] Add EMPER test runnerhttps://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/47[build] Switch to C++2a2020-12-14T13:46:33ZFlorian Schmaus[build] Switch to C++2aWe are unable to switch to C++20 until
https://github.com/mesonbuild/meson/issues/8084 is fixed.We are unable to switch to C++20 until
https://github.com/mesonbuild/meson/issues/8084 is fixed.https://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/45Use lib::adt::LockedUnboundedQueue in Scheduler2020-12-09T18:30:33ZFlorian SchmausUse lib::adt::LockedUnboundedQueue in SchedulerThis class was previously unused, but can be used in Scheduler after
minor modifcations.This class was previously unused, but can be used in Scheduler after
minor modifcations.https://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/44Re-use Runtime::getWorkerId() when possible2020-12-09T18:30:26ZFlorian SchmausRe-use Runtime::getWorkerId() when possiblehttps://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/43Allow an actor to be startet from anywhere2020-12-09T12:12:57ZFlorian FischerAllow an actor to be startet from anywherehttps://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/42Improve debug output, create Worker class2020-12-14T13:45:55ZFlorian SchmausImprove debug output, create Worker classhttps://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/41Schedule from anywhere2020-12-09T10:57:07ZFlorian SchmausSchedule from anywherehttps://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/40[gitlab-ci] Bump CI container to flowdalic/debian-testing-dev:1.42020-12-05T20:11:14ZFlorian Schmaus[gitlab-ci] Bump CI container to flowdalic/debian-testing-dev:1.4https://gitlab.cs.fau.de/i4/manycore/emper/-/merge_requests/38prevent data races when initializing the workers PRNG seeds2020-12-03T15:27:22ZFlorian Fischerprevent data races when initializing the workers PRNG seedsEach worker currently calls uniformIntDistribution(randomEngine)
which modifies the randomEngine internally and thus produces data races
when the threads run in parallel.
This change calls uniformIntDistribution(randomEngine) on the mai...Each worker currently calls uniformIntDistribution(randomEngine)
which modifies the randomEngine internally and thus produces data races
when the threads run in parallel.
This change calls uniformIntDistribution(randomEngine) on the main thread
for each worker and passes the resulting seeds to the workerLoop.
The data race was found by gcc's and clang's tsan.