How Rails' Type Casting Works

Have you ever noticed that when you assign a property to an Active Record model and read it back, the value isn’t always the same? Here’s an example:

class StoreListing < ActiveRecord::Base
  connection.create_table :store_listings, force: true do |t|
    t.integer :price_in_cents
  end
end

store_listing = StoreListing.new
store_listing.price_in_cents = "100" # Note, this is a string
store_listing.price_in_cents # => 100

This is because Active Record automatically type casts all input so that it matches the database schema. Depending on the type, this may be incredibly simple, or extremely complex. Let’s take a look into how the internals work in 4.1.

The first method we need to understand in the above code is where the price_in_cents method is defined. In older versions of Rails, your models would go look up the database schema and define the attribute methods as soon as it was loaded. However, this caused problems on platforms like Heroku, where you might want to load the application when you don’t have a real database connection.

Today, the loading is lazy, and happens in a call to method_missing (source). The important line here is the call to define_attribute_methods.

def method_missing(method, *args, &block) # :nodoc:
  self.class.define_attribute_methods
  if respond_to_without_attributes?(method)
    # make sure to invoke the correct attribute method, as we might have gotten here via a `super`
    # call in a overwritten attribute method
    if attribute_method = self.class.find_generated_attribute_method(method)
      # this is probably horribly slow, but should only happen at most once for a given AR class
      attribute_method.bind(self).call(*args, &block)
    else
      return super unless respond_to_missing?(method, true)
      send(method, *args, &block)
    end
  else
    super
  end
end

Active Record’s definition of define_attribute_methods does little of note, other than call super with column_names (source).

def define_attribute_methods # :nodoc:
  return false if @attribute_methods_generated
  # Use a mutex; we don't want two thread simultaneously trying to define
  # attribute methods.
  generated_attribute_methods.synchronize do
    return false if @attribute_methods_generated
    superclass.define_attribute_methods unless self == base_class
    super(column_names)
    @attribute_methods_generated = true
  end
  true
end

We won’t look into how column_names gets determined today, but that method call is what causes Rails to go perform the SQL query that loads information about the model’s schema. Inside of Active Model, we’ll do some metaprogramming magic and ultimately end up calling define_method_attribute (source). Finally, in the body of define_method_attribute, we can see the method that gets called is read_attribute (source). Quite a bit of legwork!

If you decide to read along with us, make sure you’re on the 4-1-stable branch. A lot of this code has changed significantly on master. One of the most important changes to keep in mind is that @attributes_cache has been renamed to @attributes, and @attributes has been renamed to @raw_attributes.

The body of read_attribute looks like this:

def read_attribute(attr_name)
  # If it's cached, just return it
  # We use #[] first as a perf optimization for non-nil values. See https://gist.github.com/jonleighton/3552829.
  name = attr_name.to_s
  @attributes_cache[name] || @attributes_cache.fetch(name) {
    column = @column_types_override[name] if @column_types_override
    column ||= @column_types[name]

    return @attributes.fetch(name) {
      if name == 'id' && self.class.primary_key != name
        read_attribute(self.class.primary_key)
      end
    } unless column

    value = @attributes.fetch(name) {
      return block_given? ? yield(name) : nil
    }

    if self.class.cache_attribute?(name)
      @attributes_cache[name] = column.type_cast(value)
    else
      column.type_cast value
    end
  }
end

Let’s go through each segment and understand what it’s doing. First we call to_s on the argument, as it’s possible we were passed a symbol (this method is part of the public API). Next we check to see if we’ve already type cast this attribute, as we cache the results. The next line is not always obvious.

column = @column_types_override[name] if @column_types_override
column ||= @column_types[name]

@column_types_override is sometimes given to us when the model in question was built as part of the result of a SQL query. If you’ve done something like

Post
  .joins(:comments)
  .select('posts.*, COUNT(comments.*) AS comments_count')
  .group('comments.post_id')

then we sometimes have to do additional leg work to type cast the count to an integer. If you ran that code while using the MySQL adapter or PostgreSQL adapter (keep in mind that most MySQL users are using the MySQL2 adapter), then @column_types_override would look like: { 'comments_count' => an_object_that_type_casts_to_int }. Continuing to the next line, @column_types will contain the column object that is crucial to this behavior, except for a few special cases (which we will have to cover another time).

The next block of code causes model.id to return the primary key, even if the primary key for the table is a column other than id.

return @attributes.fetch(name) {
  if name == 'id' && self.class.primary_key != name
    read_attribute(self.class.primary_key)
  end
} unless column

Next we need to grab the raw, un-typecast version of the attribute, which came either from user input, or from the database (“user” in this case refers to you, the programmer using Rails). However, there’s an interesting fork in behavior here.

value = @attributes.fetch(name) {
  return block_given? ? yield(name) : nil
}

The first question is whether or not a block was given. This is based on how read_attribute ended up being called. If you called it as post.title, no block would have been given. If you called it as post[:title], then a block would have been given to raise an exception. The reason that the title method would exist in this case, even if we don’t have a 'title' key in our attributes hash is: you performed a custom select statement. (This is an excellent example of how one feature can cause a surprising amount of complexity if not sufficiently isolated).

The conditional around caching attributes is actually bugged, and will always return true for most users, so we’ll ignore it. This leaves us with the line of importance:

@attributes_cache[name] = column.type_cast(value)

column in this case, will be an instance of ActiveRecord::ConnectionAdapters::Column, or one of its adapter specific subclasses. The behavior in question lives on the type_cast method (source).

# Casts value (which is a String) to an appropriate instance.
def type_cast(value)
  return nil if value.nil?
  return coder.load(value) if encoded?

  klass = self.class

  case type
  when :string, :text        then value
  when :integer              then klass.value_to_integer(value)
  when :float                then value.to_f
  when :decimal              then klass.value_to_decimal(value)
  when :datetime, :timestamp then klass.string_to_time(value)
  when :time                 then klass.string_to_dummy_time(value)
  when :date                 then klass.value_to_date(value)
  when :binary               then klass.binary_to_string(value)
  when :boolean              then klass.value_to_boolean(value)
  else value
  end
end

As any good method should, we start with an extremely misleading comment (the value will be anything you passed to the writer, not a string). Like many methods in Column, we also have a case statement based on type, and will call one of many helper methods based on which type it is. type will have been set back in the constructor, by using a regex from the sql_type source. sql_type will be the raw type string from the database schema, such as varchar(255).

At this point, the behavior is linear. All of the helper methods called exist in the class, and most are no more than a few lines long.

Also of note is the method type_cast_for_write, which gets called during the writer, before we store the attributes for type casting later. (Note: Anything that happens in this method will be applied to the _before_type_cast version of the attribute as well.)

If you’ve been cringing looking through the Column class, you’re justified. Luckily, it’s gotten much better. In preparation for adding a public API to hook into the type casting behavior in 4.2, this class has been heavily refactored to focus on polymorphism, rather than conditionals and regular expressions. In part 2, we’ll dig into some of the refactoring that’s been done, and the decisions behind it.

What’s next

Does your code base resemble the code we looked at today? Learn about common smells and refactorings with Ruby Science.

What’s next

Sign up to receive a weekly recap from thoughtbot

1-on-1 and Group Mentoring