Is seeding data with fixtures dangerous in Ruby on Rails

https://stackoverflow.com/questions/984479

13-09-2019
|

Question

I have fixtures with initial data that needs to reside in my database (countries, regions, carriers, etc.). I have a task rake db:seed that will seed a database.

namespace :db do
  desc "Load seed fixtures (from db/fixtures) into the current environment's database." 
  task :seed => :environment do
    require 'active_record/fixtures'

    Dir.glob(RAILS_ROOT + '/db/fixtures/yamls/*.yml').each do |file|
      Fixtures.create_fixtures('db/fixtures/yamls', File.basename(file, '.*'))
    end
  end
end

I am a bit worried because this task wipes my database clean and loads the initial data. The fact that this is even possible to do more than once on production scares the crap out of me. Is this normal and do I just have to be cautious? Or do people usually protect a task like this in some way?

Solution

Seeding data with fixtures is an extremely bad idea.

Fixtures are not validated and since most Rails developers don't use database constraints this means you can easily get invalid or incomplete data inserted into your production database.

Fixtures also set strange primary key ids by default, which is not necessarily a problem but is annoying to work with.

There are a lot of solutions for this. My personal favorite is a rake task that runs a Ruby script that simply uses ActiveRecord to insert records. This is what Rails 3 will do with db:seed, but you can easily write this yourself.

I complement this with a method I add to ActiveRecord::Base called create_or_update. Using this I can run the seed script multiple times, updating old records instead of throwing an exception.

I wrote an article about these techniques a while back called Loading seed data.

OTHER TIPS

For the first part of your question, yes I'd just put some precaution for running a task like this in production. I put a protection like this in my bootstrapping/seeding task:

task :exit_or_continue_in_production? do
  if Rails.env.production?
    puts "!!!WARNING!!! This task will DESTROY " +
         "your production database and RESET all " +
         "application settings"
    puts "Continue? y/n"
    continue = STDIN.gets.chomp
    unless continue == 'y'
      puts "Exiting..."
      exit! 
    end
  end
end

I have created this gist for some context.

For the second part of the question -- usually you really want two things: a) very easily seeding the database and setting up the application for development, and b) bootstrapping the application on production server (like: inserting admin user, creating folders application depends on, etc).

I'd use fixtures for seeding in development -- everyone from the team then sees the same data in the app and what's in app is consistent with what's in tests. (Usually I wrap rake app:bootstrap, rake app:seed rake gems:install, etc into rake app:install so everyone can work on the app by just cloning the repo and running this one task.)

I'd however never use fixtures for seeding/bootstrapping on production server. Rails' db/seed.rb is really fine for this task, but you can of course put the same logic in your own rake app:seed task, like others pointed out.

Rails 3 will solve this for you using a seed.rb file.

http://github.com/brynary/rails/commit/4932f7b38f72104819022abca0c952ba6f9888cb

We've built up a bunch of best practices that we use for seeding data. We rely heavily on seeding, and we have some unique requirements since we need to seed multi-tenant systems. Here's some best practices we've used:

Fixtures aren't the best solution, but you still should store your seed data in something other than Ruby. Ruby code for storing seed data tends to get repetitive, and storing data in a parseable file means you can write generic code to handle your seeds in a consistent fashion.
If you're going to potentially update seeds, use a marker column named something like code to match your seeds file to your actual data. Never rely on ids being consistent between environments.
Think about how you want to handle updating existing seed data. Is there any potential that users have modified this data? If so, should you maintain the user's information rather than overriding it with seed data?

If you're interested in some of the ways we do seeding, we've packaged them into a gem called SeedOMatic.

How about just deleting the task off your production server once you have seeded the database?

I just had an interesting idea...

what if you created \db\seeds\ and added migration-style files:

file: 200907301234_add_us_states.rb

class AddUsStates < ActiveRecord::Seeds

  def up
    add_to(:states, [
      {:name => 'Wisconsin', :abbreviation => 'WI', :flower => 'someflower'},
      {:name => 'Louisiana', :abbreviation => 'LA', :flower => 'cypress tree'}
      ]
    end
  end

  def down
    remove_from(:states).based_on(:name).with_values('Wisconsin', 'Louisiana', ...)
  end
end

alternately:

  def up
    State.create!( :name => ... )
  end

This would allow you to run migrations and seeds in an order that would allow them to coexist more peaceably.

thoughts?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow