How to Apply What You've Learned from TDD to Writing Data Migrations

Joe Ferris

After doing TDD full time for years, I have a hard time writing code without a test. One example that I find particularly difficult is writing data migrations.

Some schema changes require more than just setting a default value for all existing rows. For example, let’s say you have this schema:

create_table :users do |table|
  table.string :email
  table.string :encrypted_password
end

create_table :posts do |table|
  table.integer :user_id
  table.boolean :published
  table.string :message
end

If you want to find the top ten users based on the number of published posts, you can do a JOIN with a COUNT and a GROUP BY clause, but over time that could get slow or just annoying, so you decide to add a cache column:

add_column :users, :published_posts_count, :integer, :default => 0, :null => false

You add code to populate the column and all the tests pass, but of course there’s a big problem: every existing user in production will report zero published posts. That means it’s time to break out a data migration. Running migrations over and over with different data or comparing lots of queries before and after migrating production data is tedious and error-prone, so let’s write a throw-away test for this migration. You can save this as spec/migration_spec.rb:

require 'spec_helper'
require Dir.glob(Rails.root.join("db", "migrate", "*_set_published_posts_for_existing_users.rb")).first

describe SetPublishedPostsForExistingUsers do
  it "counts only published posts" do
    user = FactoryBot.create(:user)
    FactoryBot.create_list(:post, 3, :user => user, :published => true)
    reset_cache_and_run_migration
    user.reload.published_posts_count.should == 3
  end

  def reset_cache_and_run_migration
    User.update_all("published_posts = 0")
    SetPublishedPostsForExistingUsers.new.up
  end
end

Add an empty migration, and the test fails because the user is reporting no published posts. We can get this test passing with a simple migration:

class SetActivatedFlagForExistingUsers < ActiveRecord::Migration
  def up
    connection.update(<<-SQL)
      UPDATE users
      SET published_posts_count = (
        SELECT COUNT(*) FROM posts
      )
    SQL
  end

  def down
    # No problem
  end
end

Next up, we need to make sure it’s only counting published posts:

it "counts only published posts" do
  user = FactoryBot.create(:user)
  FactoryBot.create_list(:post, 3, :user => user, :published => true)
  FactoryBot.create(:post, :user => user, :published => false)
  reset_cache_and_run_migration
  user.reload.published_posts_count.should == 3
end

That will fail because the migration counts the published posts, ending up with a total of four. We can fix that easily:

def up
  connection.update(<<-SQL)
    UPDATE users
    SET published_posts_count = (
      SELECT COUNT(*) FROM posts
      WHERE posts.published = true
    )
  SQL
end

Next up, we need to make sure each user only counts their own posts, so we add a post for a different user:

it "counts only published posts" do
  user = FactoryBot.create(:user)
  FactoryBot.create_list(:post, 3, :user => user, :published => true)
  FactoryBot.create(:post, :user => user, :published => false)
  other_user = FactoryBot.create(:user)
  FactoryBot.create(:post, :user => other_user, :published => true)
  reset_cache_and_run_migration
  user.reload.published_posts_count.should == 3
end

The test fails again with a count of four, since it picked up the other user’s post. Getting this test to pass leads to our final migration:

def up
  connection.update(<<-SQL)
    UPDATE users
    SET published_posts_count = (
      SELECT COUNT(*) FROM posts
      WHERE posts.published = true
      AND posts.user_id = users.id
    )
  SQL
end

At this point, I just delete the spec. Since migrations should never be edited after they run, there’s little reason to test for regressions. Inevitably the schema will change, which will mean the spec no longer applies.

The spec provides no value after the migration is committed, but I still find writing specs like these well worth the time. It’s easier for me to think like I’m used to, by writing tests first, and it makes me confident that the migration actually covers the cases it’s supposed to.