ints or strings

Jared Carroll

Occasionally when developing an app you have to break out an enumeration. They usually take the form of some name ending in ‘type’ e.g. group type, event type, user type, etc. Now as far as the rules of normalization, I’m behind the fact that these enumerations should not be classes until someone wants to CRUD them. You also end up with brittle code, such as the following, when you prematurely make them classes:

class ArticleType < ActiveRecord::Base

  class << self

    def news
      find :first,
        :conditions => 'name = "News"'
    end

    def essay
      find :first,
        :conditions => 'name = "Essay"'
    end

    def review
      find :first,
        :conditions => 'name = "Review"'
    end

  end

end

class Article < ActiveRecord::Base

  belongs_to :article_type

end

Schema:

article_types (id, name)

articles (id, title, article_type_id)

By defining class methods, we’re basically making it convenient to say things like:

ArticleType.news
ArticleType.essay
ArticleType.review

but having queries based on the value of a varchar column seems brittle to me. Code like this is a smell that you have some unnecessary classes i.e. classes with no behavior.

Let’s refactor them into a attribute and corresponding constants on the Article class:

class Article < ActiveRecord::Base

  NEWS = 'News'
  ESSAY = 'Essay'
  REVIEW = 'Review'

end

Our schema changes to move the previous article_types name column into articles:

articles (id, title, article_type)

And we reference them in code like:

Article::NEWS
Article::ESSAY
Article::REVIEW

On articles#new we’ll show a form to create a new Article and use a drop down of available ArticleType‘s, in order to set the Article’s ArticleType, so we’ll need to add another constant to Article for convenience.

class Article < ActiveRecord::Base

  NEWS = 'News'
  ESSAY = 'Essay'
  REVIEW = 'Review'

  TYPES = NEWS, ESSAY, REVIEW

end

Now we can reference it in a view easily like:

<% form_for :article, :url => articles_path do |form| %>
  <!-- other article attributes -->
  <%= form.select :article_type, Article::TYPES, :include_blank => true %>
  <!-- other article attributes -->
<% end %>

Ok thats nice.

But let’s take a look at those constants defined in Article. For example:

class Article < ActiveRecord::Base

  ESSAY = 'Essay'

end

There’s a constant named ESSAY who’s value is 'Essay’. That doesn’t feel good.

Now back in the day, in C/C++ I’d write enumerations like:

enum { NEWS, ESSAY, REVIEW }

This was basically defining constants and converting them to integers starting at 0. So the value of NEWS was 0, ESSAY was 1, REVIEW was 2. The bottom line is that they were numbers not strings.

Let’s rewrite the Article class making the constants numbers instead of strings:

class Article < ActiveRecord::Base

  NEWS = 0
  ESSAY = 1
  REVIEW = 2

  # or as a 1-liner
  # NEWS, ESSAY, REVIEW = 0, 1, 2

  TYPES = NEWS, ESSAY, REVIEW

end

Ok.

Now that’s not as strange and redundant as them having basically the same value as their name i.e. the value of NEWS was ‘News’. However, we now need something to convert these numbers to strings to display in the drop down list in the form for creating an Article.

I’ll put it in app/helpers/articleshelper.rb_:

module ArticlesHelper

  def article_types
    [['News', Article::NEWS],
     ['Essay', Article::ESSAY],
     ['Review', Article::REVIEW]]
  end

end

And our view can remain the same, because #select‘s 2nd argument expects an array of 2-element arrays ([text_to_display, value_to_POST]):

<% form_for :article, :url => articles_path do |form| %>
  <!-- other article attributes -->
  <%= form.select :article_type, Article::TYPES, :include_blank => true %>
  <!-- other article attributes -->
<% end %>

So by turning the enumeration into a 'classic’ enumeration using numbers we needed to add 1 method ArticlesHelper#article_types. It belongs in a helper because its related to view logic. Now I did this because it felt redundant to have enumeration values who’s value is the same as their name e.g. ESSAY‘s value is 'Essay’.

Now is there another reason behind the tendency to use numbers instead of strings as enumeration values besides this strange feeling of redundancy? Looking at performance, I bet in SQL its quicker to do:

select *
from articles
where article_type = 1

than

select *
from articles
where article_type = 'News'

In other words, a number comparison is faster than a string comparison in a ‘where’ clause. I’ve noticed the tendency in older apps to represent enumerations as numbers, usually called codes, and thought maybe it was a performance optimization. There’d typically be a lookup table for the text instead of a method in the application like our ArticlesHelper#article_types, probably because the database was being used by more than 1 app. So there’d be a schema like:

articles (id, article_type_code)

article_types (article_type_code, name)

That article_types table would be a look up table.

I really doubt you’d feel any performance gains from turning a string based enumeration into a number based enumeration though.

To me its starting to feel more natural writing ‘classic’ enumerations, who’s values are numbers and having a view method to handle the conversion when displaying the enumeration values to the end user. The redundancy of the constant who’s name is the same as its value is really starting to get to me.