add baseline evaluation

* increase iterations to 10K this produced more stable averages
* add options to dump all measurements
* add baseline benchmark using two threads and read/write syscalls
* add iouring benchmark using two threads and iouring to read
43 jobs for io-latency-eval in 21 minutes and 12 seconds (queued for 3 seconds)