I have text file with many lines. I want to write a simple OCaml program that will process this file line by line and maybe print the line.
For writing this program, I first created a smaller file, with fewer lines - so that program will finish executing faster.
$ wc -l input/master
214745 input/master
$ head -50 input/master > input/small-master
Here is the simple boilerplate filter.ml
program I wrote:
open Core.Std;;
open Printf;;
open Core.In_channel;;
if Array.length Sys.argv >= 2 then begin
let rec process_lines ?ix master_file =
let ix = match ix with
| None -> 0
| Some x -> x
in
match input_line master_file with
| Some line -> (
if ix > 9 then printf "%d == %s\n" ix line;
process_lines ~ix:(ix+1) master_file
)
| None -> close master_file
in
let master_file = create Sys.argv.(1) in
process_lines master_file
end
It takes the input file's location as a command line argument, creates a file-handle for reading this file and calls the recursive function process_lines
with this file-handle as an argument.
process_lines
uses the optional argument ix
to count the line numbers as it reads from the file-handle line by line. process_lines simply prints the line that was read from the file_handle
to the standard output.
Then, when, I execute the program on the smaller input file and pipe the output to the Linux head
command everything works fine:
$ ./filter.native input/small-master |head -2
10 == 1000032|BINCH JAMES G|4|2012-11-13|edgar/data/1000032/0001181431-12-058269.txt
11 == 1000032|BINCH JAMES G|4|2012-12-03|edgar/data/1000032/0001181431-12-061825.txt
And, when, I execute the program on the larger file I see a broken-pipe error:
$ ./filter.native input/master |head -2
10 == 1000032|BINCH JAMES G|4|2012-11-13|edgar/data/1000032/0001181431-12-058269.txt
11 == 1000032|BINCH JAMES G|4|2012-12-03|edgar/data/1000032/0001181431-12-061825.txt
Fatal error: exception Sys_error("Broken pipe")
Raised by primitive operation at file "pervasives.ml", line 264, characters 2-40
Called from file "printf.ml", line 615, characters 15-25
Called from file "find.ml", line 13, characters 21-48
Called from file "find.ml", line 19, characters 2-27
I learnt that such broken pipe errors will occur when the reader of a pipe (head
command in this case) exits before the writer of the pipe (my OCaml program in this case) has done writing. Which is why I will never get such an error if I used the tail
command as the reader.
However, why didn't the broken-pipe error occur when the file had lesser number of lines ?