Question

I have a rake task that runs great on my local machine, after deploying my app to a VPS, it won't let me run the task anymore.

I run the task using --

RAILS_ENV=production bundle exec rake db:insert_properties

and the output I get is --

(in /home/deployer/apps/nsrosu/releases/20131230151646)
Killed

Anyone have an idea as to why this may be happening? I have double and triple checked that the XML file I am using to pull data for the rake task does exist in the proper directory.

Additionally, I have tried, instead of using a file stored on the server, to pull it from an external source that is stored elsewhere, but nokogiri says that the file does not exist when I try it this way. A solution to either one of these problems would be excellent :)

Also, here is the rake task, in case that will help answer any questions --

# SET RAKE TASK NAMESPACE
namespace :db do
# RAKE TASK DESCRIPTION
desc "Fetch property information and insert it into the database"

# RAKE TASK NAME    
task :insert_properties => :environment do

    # REQUIRE LIBRARIES
    require 'nokogiri'
    require 'open-uri'

    # OPEN THE XML FILE
    mits_feed = File.open("app/assets/xml/mits.xml")

    # OUTPUT THE XML DOCUMENT
    doc = Nokogiri::XML(mits_feed)

    # FIND PROPERTIES OWNED BY NORTHSTEPPE AND CYCLE THORUGH THEM
    doc.xpath("//Property[PropertyID/Identification/@OrganizationName = 'northsteppe' ]").each do |property|

        # SET UP EMPTY IMAGES ARRAY
        @images =[]

        # INSERT EACH IMAGE INTO THE IMAGES ARRAY
        property.xpath("File").each do |image|
            @images << image.at_xpath("Src/text()").to_s
        end

        # SET UP EXMPTY AMENITIES ARRAY
        @amenities = []

        # INSERT EACH AMENITY DESCRIPTION INTO THE AMENITIES ARRAY
        property.xpath("ILS_Unit/Amenity").each do |image|
            @amenities << image.at_xpath("Description/text()").to_s
        end

        # GATHER EACH PROPERTY'S INFORMATION
        information = {
            "street_address" => property.at_xpath("PropertyID/Address/AddressLine1/text()").to_s,
            "city" => property.at_xpath("PropertyID/Address/City/text()").to_s,
            "zipcode" => property.at_xpath("PropertyID/Address/PostalCode/text()").to_s,
            "short_description" => property.at_xpath("PropertyID/MarketingName/text()").to_s,
            "long_description" => property.at_xpath("Information/LongDescription/text()").to_s,
            "rent" => property.at_xpath("Information/Rents/StandardRent/text()").to_s,
            "application_fee" => property.at_xpath("Fee/ApplicationFee/text()").to_s,
            "bedrooms" => property.at_xpath("Floorplan/Room[@RoomType='Bedroom']/Count/text()").to_s,
            "bathrooms" => property.at_xpath("Floorplan/Room[@RoomType='Bathroom']/Count/text()").to_s,
            "vacancy_status" => property.at_xpath("ILS_Unit/Availability/VacancyClass/text()").to_s,
            "month_available" => property.at_xpath("ILS_Unit/Availability/MadeReadyDate/@Month").to_s,
            "latitude" => property.at_xpath("ILS_Identification/Latitude/text()").to_s,
            "longitude" => property.at_xpath("ILS_Identification/Longitude/text()").to_s,
            "images" => @images,
            "amenities" => @amenities
        }

        # SHOW RAW DATA IN TERMINAL TO MAKE SURE EVERYTHING IS WORKING
        p information


        # CREATE NEW PROPERTY WITH INFORMATION HASH CREATED ABOVE
        if Property.create!(information)
            puts "yay!"
        else
            puts "oh no! this sucks!"
        end

    end # ENDS XPATH EACH LOOP

end # ENDS INSERT_PROPERTIES RAKE TASK

end # ENDS NAMESAPCE DECLARATION

================================ UPDATE =================================

So it seem that the best approach is to run this through a SAX system and SAXMachine is all ready to work with Nokogiri, but the documentation for both of these technologies is pretty horrible. I was hoping to get some direction on how to set up a task that does the identical thing my above task does, but using SAXMachine. Please :)

I've posted an example one of the XML entries below --

<Property IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<PropertyID>
  <Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="northsteppe" IDType="property"/>
  <Identification IDValue="6e1e61523972d5f0e260e3d38eb488337424f21e" OrganizationName="northsteppe" IDType="Company"/>
  <MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
  <WebSite>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</WebSite>
  <Address AddressType="property">
    <Description>Address of Available Listing</Description>
    <AddressLine1>1689 N 4th St </AddressLine1>
    <City>Columbus</City>
    <State>OH</State>
    <PostalCode>43201</PostalCode>
    <Country>US</Country>
  </Address>
  <Phone PhoneType="office">
    <PhoneNumber>(614) 299-4110</PhoneNumber>
  </Phone>
  <Email>northsteppe.nsr@gmail.com</Email>
</PropertyID>
<ILS_Identification ILS_IdentificationType="Apartment" RentalType="Market Rate">
  <Latitude>39.997694</Latitude>
  <Longitude>-82.99903</Longitude>
  <LastUpdate Month="11" Day="11" Year="2013"/>
</ILS_Identification>
<Information>
  <StructureType>Standard</StructureType>
  <UnitCount>1</UnitCount>
  <ShortDescription>Spacious House Central Campus OSU, available fall</ShortDescription>
  <LongDescription>One of our favorites! This great house is perfect for students or a single family. With huge living and sleeping rooms, there is plenty of space. The kitchen is totally modernized with new appliances, and the bathroom has been updated. Natural woodwork and brick accents are seen within the house, and the decorative mantles. Ceiling fans and mini-blinds are included, as well as a FREE stack washer and dryer. The front and side deck. On site parking available.</LongDescription>
  <Rents>
    <StandardRent>2000.00</StandardRent>
  </Rents>
  <PropertyAvailabilityURL>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</PropertyAvailabilityURL>
</Information>
<Fee>
  <ProrateType>Standard</ProrateType>
  <LateType>Standard</LateType>
  <LatePercent>0</LatePercent>
  <LateMinFee>0</LateMinFee>
  <LateFeePerDay>0</LateFeePerDay>
  <NonRefundableHoldFee>0</NonRefundableHoldFee>
  <AdminFee>0</AdminFee>
  <ApplicationFee>30.00</ApplicationFee>
  <BrokerFee>0</BrokerFee>
</Fee>
<Deposit DepositType="Security Deposit">
  <Amount AmountType="Actual">
    <ValueRange Exact="2000.00" Currency="USD"/>
  </Amount>
</Deposit>
<Policy>
  <Pet Allowed="false"/>
</Policy>
<Phase IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
  <Name/>
  <Description/>
  <UnitCount>1</UnitCount>
  <RentableUnits>1</RentableUnits>
  <TotalSquareFeet>0</TotalSquareFeet>
  <RentableSquareFeet>0</RentableSquareFeet>
</Phase>
<Building IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
  <Name/>
  <Description/>
  <UnitCount>1</UnitCount>
  <SquareFeet>0</SquareFeet>
</Building>
<Floorplan IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
  <Name/>
  <UnitCount>1</UnitCount>
  <Room RoomType="Bedroom">
    <Count>4</Count>
    <Comment/>
  </Room>
  <Room RoomType="Bathroom">
    <Count>1</Count>
    <Comment/>
  </Room>
  <SquareFeet Min="0" Max="0"/>
  <MarketRent Min="2000" Max="2000"/>
  <EffectiveRent Min="2000" Max="2000"/>
</Floorplan>
<ILS_Unit IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
  <Units>
    <Unit>
      <Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="UL Portfolio"/>
      <MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
      <UnitBedrooms>4</UnitBedrooms>
      <UnitBathrooms>1.0</UnitBathrooms>
      <MinSquareFeet>0</MinSquareFeet>
      <MaxSquareFeet>0</MaxSquareFeet>
      <SquareFootType>internal</SquareFootType>
      <UnitRent>2000.00</UnitRent>
      <MarketRent>2000.00</MarketRent>
      <Address AddressType="property">
        <AddressLine1>1689 N 4th St </AddressLine1>
        <City>Columbus</City>
        <PostalCode>43201</PostalCode>
        <Country>US</Country>
      </Address>
    </Unit>
  </Units>
  <Availability>
    <VacateDate Month="7" Day="23" Year="2014"/>
    <VacancyClass>Unoccupied</VacancyClass>
    <MadeReadyDate Month="7" Day="23" Year="2014"/>
  </Availability>
  <Amenity AmenityType="Other">
    <Description>All new stainless steel appliances!  Refinished hardwood floors</Description>
  </Amenity>
  <Amenity AmenityType="Other">
    <Description>Ceramic tile</Description>
  </Amenity>
  <Amenity AmenityType="Other">
    <Description>Ceiling fans</Description>
  </Amenity>
  <Amenity AmenityType="Other">
    <Description>Wrap-around porch</Description>
  </Amenity>
  <Amenity AmenityType="Dryer">
    <Description>Free Washer and Dryer</Description>
  </Amenity>
  <Amenity AmenityType="Washer">
    <Description>Free Washer and Dryer</Description>
  </Amenity>
  <Amenity AmenityType="Other">
    <Description>off-street parking available</Description>
  </Amenity>
</ILS_Unit>
<File Active="true" FileID="820982141">
  <FileType>Photo</FileType>
  <Description>Unit Photo</Description>
  <Name/>
  <Caption/>
  <Format>image/jpeg</Format>
  <Src>http://pa.cdn.appfolio.com/northsteppe/images/31077069-6e81-4373-8a89-508c57585543/medium.jpg</Src>
  <Width>360</Width>
  <Height>300</Height>
  <Rank>1</Rank>
</File>
<File Active="true" FileID="820982145">
  <FileType>Photo</FileType>
  <Description>Unit Photo</Description>
  <Name/>
  <Caption/>
  <Format>image/jpeg</Format>
  <Src>http://pa.cdn.appfolio.com/northsteppe/images/84e1be40-96fd-4717-b75d-09b39231a762/medium.jpg</Src>
  <Width>350</Width>
  <Height>265</Height>
  <Rank>2</Rank>
</File>
<File Active="true" FileID="820982149">
  <FileType>Photo</FileType>
  <Description>Unit Photo</Description>
  <Name/>
  <Caption/>
  <Format>image/jpeg</Format>
  <Src>http://pa.cdn.appfolio.com/northsteppe/images/cd419635-c37f-4676-a43e-c72671a2a748/medium.jpg</Src>
  <Width>350</Width>
  <Height>265</Height>
  <Rank>3</Rank>
</File>
<File Active="true" FileID="820982152">
  <FileType>Photo</FileType>
  <Description>Unit Photo</Description>
  <Name/>
  <Caption/>
  <Format>image/jpeg</Format>
  <Src>http://pa.cdn.appfolio.com/northsteppe/images/6b68dbd5-2cde-477c-99d7-3ca33f03cce8/medium.jpg</Src>
  <Width>350</Width>
  <Height>265</Height>
  <Rank>4</Rank>
</File>
<File Active="true" FileID="820982155">
  <FileType>Photo</FileType>
  <Description>Unit Photo</Description>
  <Name/>
  <Caption/>
  <Format>image/jpeg</Format>
  <Src>http://pa.cdn.appfolio.com/northsteppe/images/17b6c7c0-686c-4e46-865b-11d80744354a/medium.jpg</Src>
  <Width>350</Width>
  <Height>265</Height>
  <Rank>5</Rank>
</File>
<File Active="true" FileID="820982157">
  <FileType>Photo</FileType>
  <Description>Unit Photo</Description>
  <Name/>
  <Caption/>
  <Format>image/jpeg</Format>
  <Src>http://pa.cdn.appfolio.com/northsteppe/images/3545ac8b-471f-404a-94b2-fcd00dd16e25/medium.jpg</Src>
  <Width>350</Width>
  <Height>265</Height>
  <Rank>6</Rank>
</File>
<File Active="true" FileID="820982160">
  <FileType>Photo</FileType>
  <Description>Unit Photo</Description>
  <Name/>
  <Caption/>
  <Format>image/jpeg</Format>
  <Src>http://pa.cdn.appfolio.com/northsteppe/images/02471172-2183-4bf1-a3d7-33415f902c1c/medium.jpg</Src>
  <Width>350</Width>
  <Height>265</Height>
  <Rank>7</Rank>
</File>

Was it helpful?

Solution 2

Here's a start for a conversion, it should be enough to get you going. And, it's untested, and it's been a long time since I've written SAX code, so beware.

The first part is a clean-up of your original code to make it more like I'd write DOM code:

require 'nokogiri'
require 'open-uri'

# doc = Nokogiri::XML(File.open("app/assets/xml/mits.xml"))

# doc.xpath("//Property/PropertyID/Identification/@OrganizationName = 'northsteppe' ]").each do |property|

#   images = property.xpath("File").map { |image|
#     image.at_xpath("Src/text()").to_s 
#   }

#   amenities = property.xpath("ILS_Unit/Amenity").map { |image|
#     image.at_xpath("Description/text()").to_s 
#   }

#   information = {
#     "street_address"    => property.at_xpath("PropertyID/Address/AddressLine1/text()").to_s,
#     "city"              => property.at_xpath("PropertyID/Address/City/text()").to_s,
#     "zipcode"           => property.at_xpath("PropertyID/Address/PostalCode/text()").to_s,
#     "short_description" => property.at_xpath("PropertyID/MarketingName/text()").to_s,
#     "long_description"  => property.at_xpath("Information/LongDescription/text()").to_s,
#     "rent"              => property.at_xpath("Information/Rents/StandardRent/text()").to_s,
#     "application_fee"   => property.at_xpath("Fee/ApplicationFee/text()").to_s,
#     "bedrooms"          => property.at_xpath("Floorplan/Room[@RoomType='Bedroom']/Count/text()").to_s,
#     "bathrooms"         => property.at_xpath("Floorplan/Room[@RoomType='Bathroom']/Count/text()").to_s,
#     "vacancy_status"    => property.at_xpath("ILS_Unit/Availability/VacancyClass/text()").to_s,
#     "month_available"   => property.at_xpath("ILS_Unit/Availability/MadeReadyDate/@Month").to_s,
#     "latitude"          => property.at_xpath("ILS_Identification/Latitude/text()").to_s,
#     "longitude"         => property.at_xpath("ILS_Identification/Longitude/text()").to_s,
#     "images"            => images,
#     "amenities"         => amenities
#   }

#   p information


#   if Property.create!(information)
#     puts "yay!"
#   else
#     puts "oh no! this sucks!"
#   end

# end

This is the start of SAX code:

class MitsDocument < Nokogiri::XML::SAX::Document

I define some class variables to keep track of the images and amenities:

  @@images = []
  @@amenities = []

Each time Nokogiri descends into a tag it calls start_element:

  def start_element(tag_name, attributes=[])

    tag_attributes = Hash[*attributes]

    # set up some flags to track the current state...
    @in_property                 = true if (tag_name == 'Property')
    @in_property_id              = true if (tag_name == 'PropertyID')

    @in_identification           = true if (tag_name == 'Identification')
    @organization_is_northsteppe = true if (tag_attributes['OrganizationName'] == 'northsteppe')

    @in_file                     = true if (tag_name == 'File')
    @in_source                   = true if (tag_name == 'Src')

    @in_ils_unit                 = true if (tag_name == 'ILS_Unit')
    @in_amentiy                  = true if (tag_name == 'Amenity')
    @in_description              = true if (tag_name == 'Description')

  end

When a text node is encountered characters gets called. If Nokogiri has descended far enough, which we can check by testing for certain flag combinations, the text will be pushed onto the appropriate array:

  def characters(str)
    if [@in_file, @in_source].all?
      @@images << str
    end

    if [@in_ils_unit, @in_amentiy, @in_description].all?
      @@amenities << str
    end
  end

When Nokogiri exits a node it calls end_element with the name of the tag:

  def end_element(name)
    @in_property                 = false if (tag_name == 'Property')
    @in_property_id              = false if (tag_name == 'PropertyID')

    @in_identification           = false if (tag_name == 'Identification')
    @organization_is_northsteppe = false if (tag_name == 'Identification')

If Nokogiri is read to exit a particular tag it's time to do something with the aggregated results of its sub-tags. This is how to deal with the class variables being tracked:

    if (tag_name == 'File')

      # do something with @@images

      @in_file = false 
    end
    @in_source = false if (tag_name == 'Src')

    if (tag_name == 'ILS_Unit')

      # do something with @@amenities

      @in_ils_unit = false 
    end
    @in_amentiy     = false if (tag_name == 'Amenity')
    @in_description = false if (tag_name == 'Description')

  end

You'd clean up DB connections, or files, or where ever you're storing your content when the end of the document is reached:

  def end_document
  end
end

parser = Nokogiri::XML::SAX::Parser.new(MitsDocument.new)

# Feed the parser some XML
parser.parse(File.open("app/assets/xml/mits.xml"))

It's late, and I'm tired, so that might not be right, but it looks like the beginnings. You'll need to add code to process tracking the tags in your information hash, but that will be similar to what's above. I'd also probably switch to using case/when statements instead of lists of if statements, to try to make the set/clear of flags a bit more clean, but, like I said, I'm tired so I won't bother right now.

On "real iron" vs. working on a virtual machine, you'd possibly be able to get enough RAM added to it to handle loading a 7M+ line XML file. Without the whole file I can't begin to guess how much RAM that'd take up in real life, but that's somewhat beside the point. SAX is designed to handle files of arbitrary size, since SAX processing really is breaking down the overall XML into smaller chunks you can more easily process.

DOM is convenient for most things; A lot of the time we see XML representing a single object, or a small extract from a database. I'm guessing you're dealing with a large, to huge, extract, or maybe even a complete database dump. DOM isn't really the tool to use in that case, but SAX is.

Having the capability in Nokogiri to handle both is the nice thing.

OTHER TIPS

You're probably going over your allotted resource limit for your VPS, and your task is being killed as a result.

Options for improving the memory footprint of the XML reading portion of the rake task include using a SAX or pull parser instead of loading the entire file into memory. Check out "How can I read a large XML file in Ruby with libxml-ruby?" for more details.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top