Question

I'm attempting to read in a file via ARGF and perform some operations on it. Before doing anything else I need to read in a specific line and validate its contents.

I'm calling my script like so:

./main.rb < input.txt

I'm attempting to access the specific line (lets say line 10) like so:

if __FILE__ == $0

    ARGF.lineno= 10
    puts "lineno: #{ARGF.lineno}" # Prints 10 (as expected)
    puts "readline: #{ARGF.readline}" # Prints contents of line 0 instead of 10!

end

I am able to manually set ARGF.lineno= per the docs, and this seems to work. However when I then attempt to read the line I just set, I get the contents of line 0. What, if anything, am I doing wrong?

Note that looping through the lines in order to get to the given line is not an option, my input data may be hundreds of thousands of lines long.

Thanks in advance for any help.

Was it helpful?

Solution

If you look at the source for the lineno= method, you'll see that it doesn't affect the input stream in any way - it just overwrites the automatic line number with the given value. If you want to skip to a certain line you'll need to write your own method.

Note that files are stored as sequences of bytes, not as lines. To skip to a specific line you need to scan the file for line separators.

For example:

def ARGF.skip_lines num
  enum = each_line
  num.times { enum.next }
  self
end

I tested this with a 36M file with 600,000 lines and it could skip from the first to last line in about 1 second.

If you have control over the input format, you could pad each line to a specific length and then use IO#seek to jump to a certain one. But that has other downsides.

OTHER TIPS

You want to use the pos= accessor: lineno= doesn't appear to do anything according to the docs.

pos= will jump to a byte offset, so you'll have to have a fixed line length to do this.

When you think about it, this makes sense: the stream can't tell how many bytes are on each line of a file it hasn't read yet.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top