Question

I'm writing a Python program for running user-uploaded arbitrary (and thus, at the worst case, unsafe, erroneous and crashing) code on a Linux server. The security questions aside, my objective is to determine, if the code (that might be in any language, compiled or interpreted) writes the correct things to stdout, stderr and other files on given input fed into the program's stdin. After this, I need to display the results to the user.

The current solution

Currently, my solution is to spawn the child process using subprocess.Popen(...) with file handles for the stdout, stderr and stdin. The file behind the stdin handle contains the inputs that the program reads during operation, and after the program has terminated, the stdout and stderr files are read and checked for correctness.

The problem

This approach works otherwise perfectly, but when I display the results, I can't combine the given inputs and outputs so that the inputs would appear in the same places as they would when running the program from a terminal. I.e. for a program like

print "Hello."
name = raw_input("Type your name: ")
print "Nice to meet you, %s!" % (name)

the contents of the file containing the program's stdout would, after running, be:

Hello.
Type your name: 
Nice to meet you, Anonymous!

given that the contents the file containing the stdin were Anonymous<LF>. So, in short, for the given example code (and, equivalently, for any other code) I want to achieve a result like:

Hello.
Type your name: Anonymous
Nice to meet you, Anonymous!

Thus, the problem is to detect when the program is waiting for input.

Tried methods

I've tried the following methods for solving the problem:

Popen.communicate(...)

This allows the parent process to separately send data along a pipe, but can only be called once, and is therefore not suitable for programs with multiple outputs and inputs - just as can be inferred from the documentation.

Directly reading from Popen.stdout and Popen.stderr and writing to Popen.stdin

The documentation warns against this, and the Popen.stdouts .read() and .readline() calls seem to block infinitely when the programs starts to wait for input.

Using select.select(...) to see if the file handles are ready for I/O

This doesn't seem to improve anything. Apparently the pipes are always ready for reading or writing, so select.select(...) doesn't help much here.

Using a different thread for non-blocking reading

As suggested in this answer, I have tried creating a separate Thread() that stores results from reading from the stdout into a Queue(). The output lines before a line demanding user input are displayed nicely, but the line on which the program starts to wait for user input ("Type your name: " in the example above) never gets read.

Using a PTY slave as the child process' file handles

As directed here, I've tried pty.openpty() to create a pseudo terminal with master and slave file descriptors. After that, I've given the slave file descriptor as an argument for the subprocess.Popen(...) call's stdout, stderr and stdin parameters. Reading through the master file descriptor opened with os.fdopen(...) yields the same result as using a different thread: the line demanding input doesn't get read.

Edit: Using @Antti Haapala's example of pty.fork() for child process creation instead of subprocess.Popen(...) seems to allow me also read the output created by raw_input(...).

Using pexpect

I've also tried the read(), read_nonblocking() and readline() methods (documented here) of a process spawned with pexpect, but the best result, which I got with read_nonblocking(), is the same as before: the line with outputs before wanting the user to enter something doesn't get read. is the same as with a PTY created with pty.fork(): the line demanding input does get read.

Edit: By using sys.stdout.write(...) and sys.stdout.flush() instead of printing in my master program, which creates the child, seemed to fix the prompt line not getting displayed - it actually got read in both cases, though.

Others

I've also tried select.poll(...), but it seemed that the pipe or PTY master file descriptors are always ready for writing.

Notes

Other solutions

  • What also crossed my mind is to try feeding the input when some time has passed without new output having been generated. This, however, is risky, because there's no way to know if the program is just in the middle of doing a heavy calculation.
  • As @Antti Haapala mentioned in his answer, the read() system call wrapper from glibc could be replaced to communicate the inputs to the master program. However, this doesn't work with statically linked or assembly programs. (Although, now that I think of it, any such calls could be intercepted from the source code and replaced with the patched version of read() - could be painstaking to implement still.)
  • Modifying the Linux kernel code to communicate the read() syscalls to the program is probably insane...

PTYs

I think the PTY is the way to go, since it fakes a terminal and interactive programs are run on terminals everywhere. The question is, how?

Was it helpful?

Solution

Have you noticed that raw_input writes the prompt string into stderr if stdout is terminal (isatty); if stdout is not a terminal, then the prompt too is written to stdout, but stdout will be in fully buffered mode.

With stdout on a tty

write(1, "Hello.\n", 7)                  = 7
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "Type your name: ", 16)         = 16
fstat(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 3), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb114059000
read(0, "abc\n", 1024)                   = 4
write(1, "Nice to meet you, abc!\n", 23) = 23

With stdout not on a tty

ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff8d9d3410) = -1 ENOTTY (Inappropriate ioctl for device)
# oops, python noticed that stdout is NOTTY.
fstat(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 3), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f29895f0000
read(0, "abc\n", 1024)                     = 4
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f29891c4bd0}, {0x451f62, [], SA_RESTORER, 0x7f29891c4bd0}, 8) = 0
write(1, "Hello.\nType your name: Nice to m"..., 46) = 46
# squeeze all output at the same time into stdout... pfft.

Thus all writes are squeezed into stdout all at the same time; and what is worse, after the input is read.

The real solution is thus to use the pty. However you are doing it wrong. For the pty to work, you must use the pty.fork() command, not subprocess. (This will be very tricky). I have some working code that goes like this:

import os
import tty
import pty

program = "python"

# command name in argv[0]
argv = [ "python", "foo.py" ]

pid, master_fd = pty.fork()

# we are in the child process
if pid == pty.CHILD:
    # execute the program
    os.execlp(program, *argv)

# else we are still in the parent, and pty.fork returned the pid of 
# the child. Now you can read, write in master_fd, or use select:
# rfds, wfds, xfds = select.select([master_fd], [], [], timeout)

Notice that depending on the terminal mode set by the child program there might be different kinds of linefeeds coming out, etc.

Now about the "waiting for input" problem, that cannot be really helped as one can always write to a pseudoterminal; the characters will be put to wait in the buffer. Likewise, a pipe always allows one to write up to 4K or 32K or some other implementation defined amount, before blocking. One ugly way is to strace the program and notice whenever it enters the read system call, with fd = 0; the other would be to make a C module with a replacement "read()" system call and link it in before glibc for the dynamic linker (fails if the executable is statically linked or uses system calls directly with assembler...), and then would signal python whenever the read(0, ...) system call is executed. All in all, probably not worth the trouble exactly.

OTHER TIPS

Instead of trying to detect when the child process is waiting for an input, you can use the linux script command. From the man page for script:

The script utility makes a typescript of everything printed on your terminal.

You can use it like this if you were using it on a terminal:

$ script -q <outputfile> <command>

So in Python you can try giving this command to the Popen routine instead of just <command>.

Edit: I made the following program:

#include <stdio.h>
int main() {
    int i;
    scanf("%d", &i);
    printf("i + 1 = %d\n", i+1);
}

and then ran it as follows:

$ echo 9 > infile
$ script -q output ./a.out < infile
$ cat output
9
i + 1 = 10

So I think it can be done in Python this way instead of using the stdout, stderr and stdin flags of Popen.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top