Have Some (Referential) Integrity with Foreign Keys

Referential integrity is a relational database concept that states implied relationships among data should be enforced. Referential integrity ensures that the relationship between rows in two tables will remain synchronized during all updates and deletes.

Rails allows us to easily set up these implied relationships, but does nothing to help us enforce referential integrity. It’s very simple to accidentally or intentionally break referential integrity in most Rails applications.

An Example Scenario

Consider the following minimal set of models describing a blogging engine:

class User < ActiveRecord::Base
  has_many :posts
  validates :name, presence: true
end

class Post < ActiveRecord::Base
  belongs_to :user
  validates :user, presence: true
end

Our blogging platform has taken off, but we’ve received requests from some users to delete their accounts. We add an interface for administrators to delete users and everything works fine.

A few days later we receive a report that we’re getting 500s on our “Popular Posts” page. Looking into it, we find that we’re getting:

undefined method `name' for nil:NilClass

This is happening when we render the name of the user associated with each post. Somehow we’ve got a post that has no associated user even though Post has a validation that requires a user.

We quickly realize that we allowed administrators to delete users but never cleaned up the deleted users’ posts. We manually clean the data and make the following change to our model to prevent this in the future:

class User < ActiveRecord::Base
  has_many :posts, dependent: :destroy
end

The addition of dependent: :destroy means when a user is destroyed their posts will be as well. Administrators can now delete users without fear of orphaned posts causing problems.

Months pass and our now-venture-backed blogging engine has attracted millions of users. Unfortunately, lots of those users are spammers. We’re told we’ll be given a daily list of user_ids corresponding to spammers and need to write a job to delete them. We know this list could include thousands of ids on any given day, so we write the following code to avoid instantiating those objects and issuing thousands of queries to destroy them:

user_ids = CSV.read(csv_path).flatten
User.where(id: user_ids).delete_all

We soon receive a call telling us we’re getting 500s on the “Recent Posts” page. You guessed it; we violated referential integrity once again and we’re seeing the same NoMethodError as before.

Why didn’t dependent: :destroy save us here? Well, delete_all doesn’t instantiate the objects it is deleting and thus does not fire any after_destroy callbacks. The dependent options work via that callback.

Add Foreign Key Constraints

Fool me once, shame on me. Fool me thrice and I gotta find a new job. We can’t let this happen again. Rails can’t be trusted to maintain referential integrity, but you know what’s really good at doing that? Our relational database.

We can add foreign key constraints at the database level and ensure that the database will reject any operation that would violate referential integrity. Until Rails 4.2 ships with native support for foreign keys, we’ll need to add the Foreigner gem in order to do this. We add Foreigner and run the following migration:

def change
  add_foreign_key :posts, :users
end

This will run the following SQL if you’re using Postgres and Foreigner:

ALTER TABLE `posts`
ADD CONSTRAINT `posts_user_id_fk`
FOREIGN KEY (`user_id`) REFERENCES `users`(id);

With the foreign key in place, any operation that causes a post to point to a non-existent user will fail. It’s important to realize that a user_id of NULL is allowed, so we still need appropriate presence validations and NOT NULL constraints.

Now our nightly job is failing due to the foreign key constraint. The database is preventing us from deleting any users that still have associated posts. Does this mean we have to go back to the dreaded N+1 query scenario to destroy individual users?

Cascading Deletes

With a slight tweak to our foreign key, we can have the database, rather than ActiveRecord callbacks, handle the cascading deletes. Let’s change our foreign key just a bit:

def change
  remove_foreign_key :posts, :users
  add_foreign_key :posts, :users, dependent: :delete

  # or in the upcoming native support in Rails 4.2
  # add_foreign_key :posts, :users, on_delete: :cascade
end

This will run the following SQL when creating the foreign key:

ALTER TABLE `posts`
ADD CONSTRAINT `posts_user_id_fk`
FOREIGN KEY (`user_id`) REFERENCES `users`(id)
ON DELETE CASCADE;

With the dependent option functionality now moved to our foreign key, the database can now handle cleaning up the associated records. We no longer need to rely on callbacks for this behavior, so let’s remove the option.

class User
  # Old association:
  # has_many :posts, dependent: :destroy

  has_many :posts
end

Foreigner and the native support in Rails 4.2 both support options that cascade, nullify, and restrict changes. See the documentation for Foreigner and Rails 4.2.

Adding Foreign Key Constraints to an Existing Application

With immigrant, you can automatically generate a migration that will add any foreign key constraints your application is missing. With immigrant added to your Gemfile, run rails generate immigration add_foreign_keys to create the migration.

If you’re working with an application of any substantial size that has been running for some time, you are very likely to encounter errors applying this migration to your production data. Foreign key constraints cannot be applied if they are not valid for all current data.

I suggest downloading a copy of your production data and trying to run the migration on that data to surface any issues you will have at deployment time. Once the data is fixed and your migration applied in production the actions that were causing the invalid data will result in errors, which you can then target for fixes.

Caveats

Polymorphic associations are maintained by Rails; the database knows nothing about them. Foreign key constraints cannot help you here so you must keep logic in your Rails application to try to maintain referential integrity.

The Lesson

Foreign key constraints help us maintain valid data and are yet another way of helping us to avoid unexpected nil values in our applications. It’s unrealistic to expect application logic alone to provide the same level of protection.

Enforcing referential integrity is another job relational databases are better prepared to handle than Rails application code. Be a Juke Box Hero and check out Foreigner (or Rails 4.2) today.