How to (can I) ask a PIPE how many bytes it has available for reading?

Question 1

On Linux, os.pipe() is just a wrapper around pipe(2). Both return a pair of file descriptors. Normally one would use lseek(2) (os.lseek() in Python) to reposition the offset of a file decsriptor as a way to get the amount of available data. However, not all file descriptors capable of seeking.

On Linux trying lseek(2) on a pipe will return an error, see the manual page. That's because a pipe is more or less a buffer between a producer and a consumer of data. The size of that buffer is system dependant.

On Linux, a pipe has a 64 kB buffer, so that is the most data you can have available.

Edit: If you can change the way your subprocess works, you might consider using a memory mapped file, or a nice big piece of shared memory.

Edit2: Using polling objects is probably faster than select.

Question 2

This question seems to offer a possible solution, though it may require retooling.

Non-blocking read on a subprocess.PIPE in python

Otherwise, I assume you know about reading data N bytes at a time:

all_data = ''
while True:
    data = pipe.read(1024)   # Reads 1024 bytes or to end of pipe
    if not data:
        break
    all_data += data
    # Add your timeout break here

Question 3

You can find this out by calling os.fstat(file_descriptor) and checking the st_size property, which is the number of bytes written.

import os
reader_file_descriptor, writer_file_descriptor = os.pipe()
os.write(writer_file_descriptor, b'I am some data')
readable_bytes = os.fstat(writer_file_descriptor).st_size

Question 4

I've implemented this based on the idea from spacether's answer

import select
import os

def readLen(p):
    # works on mac, might work on Linux, probably doesn't on windows (maybe return 1 in that case)
    size = os.fstat(p.fileno()).st_size
    return size

def readIfAny(p, timeout=1, default=None):
    if select.select([p], [], [], timeout)[0]:
        size = readLen(p)
        if size:
            return p.read(size)
    return default

....

import sys
data = readIfAny(sys.stdin)

Note that I've read in some places you should try to avoid reading and writing to a sub-process pipe directly like this to avoid deadlocks. but this is the safest way I've found so far.

Note 2: sys.stdin.read will return b'' or '' on eof i think. this doesn't seem to raise any exception, and i still don't really know how to tell when it finishes.

note 3: depending the mode in which they're open you get bytes or a string. also it works with stdin, stdout, and stderr.