Question

I have following recursive function written in Ruby, however I find that the method is running too slowly. I am unsure if this the correct way to do it, so please suggest how to improve the performance of this code. The total file count including the subdirectories is 4,535,347

    def start(directory)
      Dir.foreach(directory) do |file|
       next if file == '.' or file == '..'
       full_file_path = "#{directory}/#{file}"
       if File.directory?(full_file_path)
        start(full_file_path)
       elsif File.file?(full_file_path)
        extract(full_file_path)
       else
        raise "Unexpected input type neither file nor folder"
       end
    end
Was it helpful?

Solution 2

I don't think there's a way to speed up much your start method; it does the correct things of going through your files and processing them as soon as it encounters them. You can probably simplify it with a single Dir.glob do, but it will still be slow. I suspect that this is not were most of the time is spent.

There very well might be a way to speed up your extract method, impossible to know without the code.

The other way to speed this up might be to split the processing to multiple processes. Since reading & writing is probably what is slowing you down, this way would give you hope that the ruby code executes while another process is waiting for the IO.

OTHER TIPS

With 4.5M directories, you might be better off working with a specialized lazy enumerator so as to only process entries you actually need, rather than generating each and every one of those 4.5M lists, returning the entire set and iterating through it in entirety.

Here's the example from the docs:

class Enumerator::Lazy
  def filter_map
    Lazy.new(self) do |yielder, *values|
      result = yield *values
      yielder << result if result
    end
  end
end

(1..Float::INFINITY).lazy.filter_map{|i| i*i if i.even?}.first(5)

http://ruby-doc.org/core-2.1.1/Enumerator/Lazy.html

It's not a very good example, btw: the important part is Lazy.new() rather than the fact that Enumerator::Lazy gets monkey patched. Here's a much better example imho:

What's the best way to return an Enumerator::Lazy when your class doesn't define #each?

Further reading on the topic:

http://patshaughnessy.net/2013/4/3/ruby-2-0-works-hard-so-you-can-be-lazy

Another option you might want to consider is computing the list across multiple threads.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top