How to find all referenced files in Ruby

Question

So there's some fuzziness here regarding what 'used' means. Clearly d is used since b.rb (which is also used) calls D.new at the end. If we caveat 'used' to mean "code was executed from that file, other than during the require process" then the following code is a close as I can get on ruby 1.9.3

require 'set'
def analyze(filename)
  require_depth = 0
  files = Set.new
  set_trace_func( lambda do |event, file, line, id, binding, classname|
    case event
    when 'call'then require_depth += 1 if id == :require && classname == Kernel
    when 'return' then require_depth -= 1 if id == :require && classname == Kernel
    when 'line' 
      files << file if require_depth == 0
    end
  end)
  load filename
  set_trace_func nil
  files.reject {|f| f == __FILE__ || f =~ %r{/lib/ruby/site_ruby}}
end

You'd use it by running analyse 'a.rb' (assuming that all the files involved are on the load path). What this does is uses ruby's set_trace_func to listen to what's going on. The first part is a crude attempt to ignore everything that happens during a call to require. Then we accumulate the filename of every line of executed ruby. The last line is just clearing up junk (eg the rubygems file that patches require).

This doesn't actually work for the test example: when B.new runs, no lines of code from b.rb are actually executed. However if B (and C, D etc.) have initialize methods (or some line of code that is called) then you should get the desired result. It's pretty simplistic stuff and could be fooled by all sorts of stuff. In particular if you call a method on (say) B, but the implementation of that method isn't in b.rb (e.g. an accessor defined with attr_accessor) then b.rb isn't logged

You might be able to use the call event better but I don't think much more can be done with set_trace_func.

If you are using ruby 2.0 then you can use TracePoint which is the replacement for set_trace_func. It has slightly different semantics, in particular when we track a method call it's easier to get the class it was called on so

require 'set'
def analyze(filename)
  require_depth = 0
  files = Set.new
  classes_to_files = {}
  trace = TracePoint.new(:call, :line, :return, :c_call, :class) do |tp|
    case tp.event
    when :class
      classes_to_files[tp.self] = tp.path
    when :call, :c_call then 
      if tp.method_id == :require && tp.defined_class == Kernel
        require_depth += 1
      else
        if require_depth == 0
          if path = classes_to_files[tp.self] || classes_to_files[tp.self.class]
            files << path
          end
        end
      end
    when :return then require_depth -= 1 if tp.method_id == :require && tp.defined_class == Kernel
    when :line 
      if require_depth == 0
        files << tp.path 
      end
    end
  end

  trace.enable
  load filename
  trace.disable
  files.reject {|f| f == __FILE__ || f =~ %r{/lib/ruby/site_ruby}}
end

does return a,b,c for the test example. It's still subject to the fundamental limitation that it only knows about code that actually gets executed.