Skip to content

Worker exclusive uring

Florian Fischer requested to merge worker_exclusive_uring into master

Alternative IO design to io_uring_network branch
Supersedes !6 (closed).


  • handle submit errors in Future chains. Supported since dff6c4c3
  • handle partial completions in Future chains (short reads for example terminate sqe chains). We just disable partial comjpletion for all Futures prepared as dependency from others.
  • implement timeouts. supported since 179b87b6
  • remove SQPOLL or implement io_uring_register for each fildes. Linux 5.11 sopports SQPOLL for with not register fildes
  • make worker cq size configurable
  • handle full sq. sqes are consumed by the kernel and thus the sq can't be full
  • fix race between submit<ANYWHERE> and submit<EMPER> both accessing the IoContext's sq possibly parallel decide between mutex based aproach and api change approach Fixed by 33150a26.

Notes about invalid Future chains

A chain of Futures which can not be fully submitted because of an invalid request fails so be submitted to the io_uring.

req1 -> invalid_req -> req3

calling io_submit after preparing this chain of sqes will submit only two sqe and leaves the last one in the SQ. related liburing issue.

Should we cancel and signal all dependent Futures our self? This breaks the memory safety guaranty of awaiting the last Future in the link. Because req1 could be still in processing by the kernel but the user invalidates its memory because the dependent Future was signaled.

This concern is nonsense because if your last Future in a chain was canceled this means some previous request was not completed as expected and the user has to go down the chain and check for the failure. This is true for canceled chains because of partial completions or errors as well as for not fully submitted chains.

	char buf[33], buf3[32];
	ReadFuture r1 = ReadFuture(0, buf, sizeof(buf), 0);

	ReadFuture invalid = ReadFuture(42, nullptr, 1337, -5);

	ReadFuture r3 = ReadFuture(0, buf3, sizeof(buf3), 0);

	r3.submit(); // <- this will result in the preparation of 3 sqe's but only 2 will be submitted

	// this will immediately return with -ECANCELED
	int32_t r = r3.wait();
	if (r == -ECANCELD) {
		r = invalid.wait()
		// r will be a normal error indicating the invalid request in the chain

		if (r == -EBADF || r == -EINVAL) {
			r = r1.wait() // await the last correct Future to handle the whole chain

Notes about timeouts

Timeouts are issued as separate sqe from the actual request which must be linked to its timeout by setting the IOSQE_IO_LINK in sqe->flags.
See: liburing timeout connect test
The timeout generates a cqe with res == -ETIME when it expires and the actual request results in a cqe with res == -ECANCELED. To reference the wrapping Future object both sqe's would contain a link to the Future object. Because a single Future can now be fulfilled by two cqe's we can not immediately signal the future on seeing either one of the completion events. The Future object's memory may be invalid after it was signaled making reads from the pointer stored in both cqes after one was received undefined behavior.

A possible solution would be to signal a Future only if both sqes were seen.

Edited by Florian Fischer

Merge request reports