Question
I have a huge file that looks like this:
7
bla1
blala
blabla
blab
blals
blable
bla
more here..
The first numbers tells how many values I will have. The thing, is that i just want to point directly to the line 11 (text "more here.."), without having to read all those values before. In my case, I have a big amount of numbers, so it has to be optimized.
Would you recommend me something?
Solution
You can make something file-like that will skip past the first N lines:
SkipFile.open("/tmp/frarees") do |ln|
puts ln # "more here.." and so on
end
puts SkipFile.new("/tmp/frarees").readline # "more here.."
Like so:
class SkipFile
def self.open(fn, &block)
sf = SkipFile.new(fn)
return sf unless block
sf.each(&block)
end
def initialize(fn)
@f = File.open(fn)
skip = @f.readline.to_i # Skip N lines as prescribed by the file
skip.times { @f.readline } # this could be done lazily
end
def each(&block)
@f.each(&block)
end
def readline
@f.readline
end
end
Easy to do if you just want to iterate forward through lines of a file. It becomes arduous, however, if you want to mimic the File
or IO
interface exactly (but see Delegate
) and especially if you want to support rewindability back to the fake start of your file.
OTHER TIPS
You could probably use File#seek to randomly access the file.
The problem with that approach is that it will just access data at a specified byte offset - not a line offset. If your file could give the byte offset at the start of the file to where the list finishes, then you could use that.
Here's an elegant way to do it, probably not very efficient though as it requires loading the whole file into memory at once.
File.readlines(file_path)[10..-1] # indexing starts from 0
I don't think you're going to get any more efficient than this, since you'll have read the bytes in the file to figure out what is a "line".
f = File.open('./data')
(f.readline.to_i + 2).times { f.readline }
p f.readline