I am conducting an experiment related to SSDs and encountered an issue that I would like to ask about.
Here are the details of the experiment:
Ran RocksDB’s db_bench workload targeting /dev/nvme1n1.
Prior to step 1, executed the following command to leave a trace:
sudo blktrace -d /dev/nvme1n1 -o test
After the workload finished, stopped blktrace and executed the following command to generate a bin file:
sudo blkparse -i test.blktrace.* -d test.bin
Then, configured and executed a jobfile using fio’s read_iolog option.
[db_bench workload]
DB_PATH=test_db
./db_bench
--benchmarks=fillrandom
--num=100000000
--value_size=90
--db=$DB_PATH
--write_buffer_size=67108864
--max_write_buffer_number=3
--target_file_size_base=67108864
--max_bytes_for_level_base=536870912
[fio jobfile]
[global]
ioengine=xnvme
xnvme_async=io_uring_cmd
direct=1
thread=1
read_iolog=test_db/test.bin
read_iolog_chunked=1
replay_redirect=/dev/ng1n1
[replay]
filename=/dev/ng1n1
However, the experiment terminated with the following error:
replay: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=xnvme, iodepth=1
fio-3.36
Starting 1 thread
xnvme_cmd_ctx: {cdw0: 0x0, sc: 0xea, sct: 0x7}
fio: io_u error on file /dev/ng1n1: Input/output error: write offset=16862150656, buflen=1048576
fio: pid=86163, err=5/file:io_u.c:1896, func=io_u error, error=Input/output error
replay: (groupid=0, jobs=1): err= 5 (file:io_u.c:1896, func=io_u error, error=Input/output error): pid=86163: Thu Aug 1 09:30:51 2024
read: IOPS=15, BW=62.0KiB/s (63.5kB/s)(160KiB/2579msec)
slat (nsec): min=1969, max=14743, avg=2638.15, stdev=2096.79
clat (nsec): min=71080, max=78905, avg=71708.18, stdev=1653.64
lat (nsec): min=73086, max=93406, avg=74346.32, stdev=3467.87
clat percentiles (nsec):
| 1.00th=[71168], 5.00th=[71168], 10.00th=[71168], 20.00th=[71168],
| 30.00th=[71168], 40.00th=[71168], 50.00th=[71168], 60.00th=[71168],
| 70.00th=[71168], 80.00th=[71168], 90.00th=[71168], 95.00th=[72192],
| 99.00th=[79360], 99.50th=[79360], 99.90th=[79360], 99.95th=[79360],
| 99.99th=[79360]
bw ( KiB/s): min= 320, max= 320, per=100.00%, avg=320.00, stdev= 0.00, samples=1
iops : min= 80, max= 80, avg=80.00, stdev= 0.00, samples=1
write: IOPS=2528, BW=10.00MiB/s (10.5MB/s)(25.8MiB/2579msec); 0 zone resets
slat (usec): min=2, max=380, avg= 2.82, stdev= 4.72
clat (nsec): min=125, max=23130, avg=7813.61, stdev=674.52
lat (nsec): min=9561, max=26209, avg=10572.89, stdev=935.21
clat percentiles (nsec):
| 1.00th=[ 7392], 5.00th=[ 7456], 10.00th=[ 7520], 20.00th=[ 7584],
| 30.00th=[ 7648], 40.00th=[ 7712], 50.00th=[ 7776], 60.00th=[ 7776],
| 70.00th=[ 7840], 80.00th=[ 7904], 90.00th=[ 8032], 95.00th=[ 8160],
| 99.00th=[ 8896], 99.50th=[10688], 99.90th=[19584], 99.95th=[21120],
| 99.99th=[23168]
bw ( KiB/s): min=52800, max=52800, per=100.00%, avg=52800.00, stdev= 0.00, samples=1
iops : min=13040, max=13040, avg=13040.00, stdev= 0.00, samples=1
lat (nsec) : 250=0.02%
lat (usec) : 4=0.02%, 10=98.84%, 20=0.41%, 50=0.09%, 100=0.61%
cpu : usr=98.76%, sys=1.20%, ctx=4, majf=0, minf=831
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=100.0%, 4=0.1%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=40,6521,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=62.0KiB/s (63.5kB/s), 62.0KiB/s-62.0KiB/s (63.5kB/s-63.5kB/s), io=160KiB (164kB), run=2579-2579msec
WRITE: bw=10.00MiB/s (10.5MB/s), 10.00MiB/s-10.00MiB/s (10.5MB/s-10.5MB/s), io=25.8MiB (27.0MB), run=2579-2579msec
It seems to be an NVMe command-related error, but I have no idea why it occurred.
Could you please let me know if there is any mistake in the experiment process?
Thank you.
I want to replay the db_bench workload on fio.
But I can’t
신용상 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.