سؤال

I'm having trouble with a crawler I'm building using rubyXL. It's correctly traversing my file system, but I am receiving an (Errno::ENOENT) error. I've checked out all the rubyXL code and everything appears to check out. My code is attached below - any suggestions?

/Users/.../testdata.xlsx
/Users/.../moretestdata.xlsx
/Users/.../Lab 1 Data.xlsx
/Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/rubyXL-1.2.10/lib/rubyXL/parser.rb:404:in `initialize': No such file or directory - /Users/Dylan/.../sheet6.xml (Errno::ENOENT)
    from /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/rubyXL-1.2.10/lib/rubyXL/parser.rb:404:in `open'
    from /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/rubyXL-1.2.10/lib/rubyXL/parser.rb:404:in `block in decompress'
    from /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/rubyXL-1.2.10/lib/rubyXL/parser.rb:402:in `upto'
    from /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/rubyXL-1.2.10/lib/rubyXL/parser.rb:402:in `decompress'
    from /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/rubyXL-1.2.10/lib/rubyXL/parser.rb:47:in `parse'
    from xlcrawler.rb:9:in `block in xlcrawler'
    from /Users/Dylan/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:41:in `block in find'
    from /Users/Dylan/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:40:in `catch'
    from /Users/Dylan/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:40:in `find'
    from xlcrawler.rb:6:in `xlcrawler'
    from xlcrawler.rb:22:in `<main>'

require 'find'
require 'rubyXL'

def xlcrawler(path)
  count = 0
  Find.find(path) do |file|                                # begin iteration of each file of a specified directory
    if file =~ /\b.xlsx$\b/                                # check if a given file is xlsx format
      puts file                                            # ensure crawler is traversing the file system
      workbook = RubyXL::Parser.parse(file).worksheets     # creates an object containing all worksheets of an excel workbook
      workbook.each do |worksheet|                         # begin iteration over each worksheet
        data = worksheet.extract_data.to_s                 # extract data of a given worksheet - must be converted to a string in order to match a regex
        if data =~ /regex/
          puts file
          count += 1
        end      
      end
    end
  end
  puts "#{count} files were found"
end

xlcrawler('/Users/')
هل كانت مفيدة؟

المحلول

I did some digging through the rubyXL code on github and it looks like there is a bug in the decompress method.

  files['styles'] = Nokogiri::XML.parse(File.open(File.join(dir_path,'xl','styles.xml'),'r'))
  @num_sheets = files['workbook'].css('sheets').children.size
  @num_sheets = Integer(@num_sheets)

  #adds all worksheet xml files to files hash
  i=1
  1.upto(@num_sheets) do
    filename = 'sheet'+i.to_s # <----- BUG IS HERE
    files[i] = Nokogiri::XML.parse(File.open(File.join(dir_path,'xl','worksheets',filename+'.xml'),'r'))
    i=i+1
  end

This block of code makes an assumption about sheet numbering in excel which is not true. This code simply counts the number of sheets, and assigns them numerically. However if you delete a sheet then create a new sheet the numerical sequence is broken.

If you check your Lab Data 1.xlsx file you will see that there is no sheet6 if you pull up the vba developer window (by pressing alt + F11) you should see something like

sheet list

As you can see this arrangement will defeat the for loop and cause an exception when i = 6.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top