OCaml - Fatal error: exception Sys_error("Broken pipe") when using `| head` on output containing many lines

StackOverflow https://stackoverflow.com/questions/22137535

  •  19-10-2022
  •  | 
  •  

Question

I have text file with many lines. I want to write a simple OCaml program that will process this file line by line and maybe print the line.

For writing this program, I first created a smaller file, with fewer lines - so that program will finish executing faster.

$ wc -l input/master 
214745 input/master
$ head -50 input/master > input/small-master

Here is the simple boilerplate filter.ml program I wrote:

open Core.Std;;
open Printf;;
open Core.In_channel;;

if Array.length Sys.argv >= 2 then begin
  let rec process_lines ?ix master_file  =
    let ix = match ix with
      | None -> 0
      | Some x -> x
    in
    match input_line master_file with
    | Some line -> (
      if ix > 9 then printf "%d == %s\n" ix line;
      process_lines ~ix:(ix+1) master_file
    )
    | None -> close master_file
  in
  let master_file = create Sys.argv.(1) in
    process_lines master_file
end

It takes the input file's location as a command line argument, creates a file-handle for reading this file and calls the recursive function process_lines with this file-handle as an argument.

process_lines uses the optional argument ix to count the line numbers as it reads from the file-handle line by line. process_lines simply prints the line that was read from the file_handle to the standard output.

Then, when, I execute the program on the smaller input file and pipe the output to the Linux head command everything works fine:

$ ./filter.native input/small-master |head -2
10 == 1000032|BINCH JAMES G|4|2012-11-13|edgar/data/1000032/0001181431-12-058269.txt
11 == 1000032|BINCH JAMES G|4|2012-12-03|edgar/data/1000032/0001181431-12-061825.txt

And, when, I execute the program on the larger file I see a broken-pipe error:

$ ./filter.native input/master |head -2
10 == 1000032|BINCH JAMES G|4|2012-11-13|edgar/data/1000032/0001181431-12-058269.txt
11 == 1000032|BINCH JAMES G|4|2012-12-03|edgar/data/1000032/0001181431-12-061825.txt
Fatal error: exception Sys_error("Broken pipe")
Raised by primitive operation at file "pervasives.ml", line 264, characters 2-40
Called from file "printf.ml", line 615, characters 15-25
Called from file "find.ml", line 13, characters 21-48
Called from file "find.ml", line 19, characters 2-27

I learnt that such broken pipe errors will occur when the reader of a pipe (head command in this case) exits before the writer of the pipe (my OCaml program in this case) has done writing. Which is why I will never get such an error if I used the tail command as the reader.

However, why didn't the broken-pipe error occur when the file had lesser number of lines ?

Was it helpful?

Solution

The broken pipe signal is a basic part of the Unix design. When you have a pipeline a | b where b reads only a small amount of data, you don't want a to waste its time writing after b has read all it needs. To make this happen, Unix sends the broken pipe signal to a process that writes to a pipe that nobody is reading. In the usual case, this causes the program to exit silently (i.e., it kills the program), which is just what you want.

In this hypothetical example, b exits after reading a few lines, which means nobody is reading the pipe. The next time a tries to write more output, it gets sent the broken pipe signal and exits.

In your case a is your program and b is head.

It appears that the OCaml runtime is noticing the signal and is not exiting silently. You could consider this a flaw, or maybe it's good to know whenever a signal has terminated your program. The best way to fix it would be to catch the signal yourself and exit silently.

The reason it doesn't happen for the small file is that the whole output fits into the pipe. (A pipe represents a buffer of 64K bytes or so.) Your program just writes its data and exits; there's not enough time for your program to try to write to a pipe with no reader.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top