It is understood that in exceptional circumstances, a read may block after select(2) declares that data is available for reading for something like a network socket, where checksum failures or other buffer manipulations may cause data to be discarded between select
and read
.
However, I would not expect that to happen for a Python program when dealing only with standard pipes.
Consider the following:
with subprocess.Popen(["openssl", "speed"], text=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) as process:
# Something weird is going on. `fd.read()` below blocks even if `select` declares
# that there is data available. The `set_blocking` calls below partially help this,
# but now there is a strange delay at program start.
os.set_blocking(process.stdout.fileno(), False)
os.set_blocking(process.stderr.fileno(), False)
buffer = StringIO()
while process.poll() is None:
rfds, _, _ = select.select([process.stdout, process.stderr], [], [])
for fd in rfds:
if fd == process.stdout:
chunk = fd.read()
if DEBUG:
sys.stdout.write(chunk)
buffer.write(chunk)
elif fd == process.stderr:
sys.stdout.write(fd.read())
I would expect this to work without the set_blocking
calls, but it doesn’t. I have also tried with standard binary streams (text=False
) and the results are the same.
Why is read blocking when select
says it shouldn’t?
3
Note that openssl speed
outputs exclusively to stderr
.
I think your problem is that you’re not specifying a size in your calls to read()
. From the documentation:
| read(self, size=-1, /)
| Read and return up to n bytes.
|
| If the size argument is omitted, None, or negative, read and
| return all data until EOF.
By not specifying a read size, the read()
method will keep reading until EOF (so it will block indefinitely).
If we write your code like this instead:
import sys
import os
import subprocess
import select
with subprocess.Popen(
["openssl", "speed"], text=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE
) as process:
os.set_blocking(process.stdout.fileno(), False)
os.set_blocking(process.stderr.fileno(), False)
while process.poll() is None:
rfds, _, _ = select.select([process.stdout, process.stderr], [], [])
for fd in rfds:
chunk = fd.read(1024)
sys.stdout.write(chunk)
sys.stdout.flush()
Then we see “live” output from openssl speed
:
Doing md5 ops for 3s on 16 size blocks: 19116296 md5 ops in 2.99s
Doing md5 ops for 3s on 64 size blocks: 13388928 md5 ops in 3.00s
Doing md5 ops for 3s on 256 size blocks: 7106066 md5 ops in 2.99s
.
.
.
1