giant robots smashing into other giant robots

Written by thoughtbot

jayroh

Fight back UTF-8 Invalid Byte Sequences

Chances are, some of you have run into the issue with the invalid byte sequence in UTF-8 error when dealing with user-submitted data. A Google search shows that my hunch isn’t off.

Among the search results are plenty of answers—some using the deprecated iconv library—that might lead you to a sufficient fix. However, among the slew of queries are few answers on how to reliably replicate and test the issue.

In developing the Griddler gem we ran into some cases where the data being posted back to our controller had invalid UTF-8 bytes. For Griddler, our failing case needs to simulate the body of an email having an invalid byte, and encoded as UTF-8.

What are valid and invalid bytes? This table on Wikipedia tells us bytes 192, 193, and 245-255 are off limits. In ruby’s string literal we can represent this by escaping one of those numbers:

> "hi \255"
 => "hi \xAD"

There’s our string with the invalid byte! How do we know for sure? In that IRB session we can simulate a comparable issue by sending a message to the string it won’t like - like split or gsub.

> "hi \255".split(' ')
ArgumentError: invalid byte sequence in UTF-8
    from (irb):9:in `split'
    from (irb):9
    from /Users/joel/.rvm/rubies/ruby-1.9.3-p125/bin/irb:16:in `<main>'

Yup. It certainly does not like that.

Let’s create a very real-world, enterprise-level, business-critical test case:

invalid_byte_spec.rb

require 'rspec'

def replace_name(body, name)
  body.gsub(/joel/, name)
end

describe 'replace_name' do
  it 'removes my name' do
    body = "hello joel"

    replace_name(body, 'hank').should eq "hello hank"
  end

  it 'clears out invalid UTF-8 bytes' do
    body = "hello joel\255"

    replace_name(body, 'hank').should eq "hello hank"
  end
end

The first test passes as expected, and the second will fail as expected but not with the error we want. By adding that extra byte we should see an exception raised similar to what we simulated in IRB. Instead it’s failing in the comparison with the expected value.

1) replace_name clears out invalid UTF-8 bytes
   Failure/Error: replace_name(body, 'hank').should eq "hello hank"

     expected: "hello hank"
          got: "hello hank\xAD"

     (compared using ==)
   # ./invalid_byte_spec.rb:17:in `block (2 levels) in <top (required)>'

Why isn’t it failing properly? If we pry into our running test we find out that inside our file the strings being passed around are encoded as ASCII-8BIT instead of UTF-8.

[2] pry(#<RSpec::Core::ExampleGroup::Nested_1>)> body.encoding
=> #<Encoding:ASCII-8BIT>

As a result we’ll have to force that string’s encoding to UTF-8:

it 'clears out invalid UTF-8 bytes' do
  body = "hello joel\255".force_encoding('UTF-8')

  replace_name(body, 'hank').should_not raise_error(ArgumentError)
  replace_name(body, 'hank').should eq "hello hank"
end

By running the test now we will see our desired exception

1) replace_name clears out invalid UTF-8 bytes
   Failure/Error: body.gsub(/joel/, name)
   ArgumentError:
     invalid byte sequence in UTF-8
   # ./invalid_byte_spec.rb:4:in `gsub'
   # ./invalid_byte_spec.rb:4:in `replace_name'
   # ./invalid_byte_spec.rb:17:in `block (2 levels) in <top (required)>'

Finished in 0.00426 seconds
2 examples, 1 failure

Now that we’re comfortably in the red part of red/green/refactor we can move on to getting this passing by updating our replace_name method.

def replace_name(body, name)
  body
    .encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
    .gsub(/joel/, name)
end

And the test?

Finished in 0.04252 seconds
2 examples, 0 failures

For such a small piece of code we admittedly had to jump through some hoops. Through that process, however, we learned a bit about character encoding and how to put ourselves in the right position—through the red/green/refactor cycle—to fix bugs we will undoubtedly run into while writing software.

adarshp

Testing a before_filter

Recently, we implemented a feature that required a before_filter in ApplicationController and whitelisting some other controllers using skip_before_filter.

We couldn’t actually test something like this directly because callbacks aren’t really methods. They’re entirely used for their side effects so we can only test what happens when we invoke one. Let’s check out an example:

require 'test_helper'

class ApplicationControllerTest < ActionController::TestCase
  context 'ensure_manually_set_password' do
    setup do
      class ::TestingController < ApplicationController
        def hello
          render :nothing => true
        end
      end

      ActionController::Routing::Routes.draw do |map|
        map.hello '', :controller => 'testing', :action => 'hello'
      end
    end

    teardown do
      Object.send(:remove_const, :TestingController)
    end

    context 'when user is logged in' do
      setup do
        @controller = TestingController.new
      end

      context 'and user has not manually set their password' do
        setup do
          @user = Factory(:user, :manually_set_password => false)
          login_as @user
          get :hello
        end

        should 'redirect user to set their password' do
          assert_redirected_to new_password_path(@user.password_token)
        end
      end
    end
  end
end

Note the use of the double-colon prepended to the TestingController, which ensures the class is top-level, not an inner class of ApplicationControllerTest. That way we can just do :controller => 'testing' and not have to write :controller => 'application_controller_test/testing'. We also use the private method remove_const to remove the class after we’re done, so we don’t litter the namespace.

Co-written with Gabe Berke-Williams

dancroak

Short, explicit test setups

You probably know about this Factory Girl definition syntax:

FactoryGirl.define do
  factory :user do
    name 'Connie Customer'
  end
end

But did you know about this Factory Girl invocation syntax?

setup do
  @user = create(:user)
end

Or:

setup do
  @user = build(:user)
end

Or:

setup do
  post :create, user: attributes_for(:user)
end

It’s in there.

Configuration for Test::Unit / Shoulda:

class ActiveSupport::TestCase
  include FactoryGirl::Syntax::Methods
end

Configuration for RSpec:

RSpec.configure do |config|
  config.include FactoryGirl::Syntax::Methods
end

Configuration for Cucumber:

World FactoryGirl::Syntax::Methods

Written by .

jferris

How to apply what you’ve learned from TDD to writing data migrations

After doing TDD full time for years, I have a hard time writing code without a test. One example that I find particularly difficult is writing data migrations.

Some schema changes require more than just setting a default value for all existing rows. For example, let’s say you have this schema:

create_table :users do |table|
  table.string :email
  table.string :encrypted_password
end

create_table :posts do |table|
  table.integer :user_id
  table.boolean :published
  table.string :message
end

If you want to find the top ten users based on the number of published posts, you can do a JOIN with a COUNT and a GROUP BY clause, but over time that could get slow or just annoying, so you decide to add a cache column:

add_column :users, :published_posts_count, :integer, :default => 0, :null => false

You add code to populate the column and all the tests pass, but of course there’s a big problem: every existing user in production will report zero published posts. That means it’s time to break out a data migration. Running migrations over and over with different data or comparing lots of queries before and after migrating production data is tedious and error-prone, so let’s write a throw-away test for this migration. You can save this as spec/migration_spec.rb:

require 'spec_helper'
require Dir.glob(Rails.root.join("db", "migrate", "*_set_published_posts_for_existing_users.rb")).first

describe SetPublishedPostsForExistingUsers do
  it "counts only published posts" do
    user = FactoryGirl.create(:user)
    FactoryGirl.create_list(:post, 3, :user => user, :published => true)
    reset_cache_and_run_migration
    user.reload.published_posts_count.should == 3
  end

  def reset_cache_and_run_migration
    User.update_all("published_posts = 0")
    SetPublishedPostsForExistingUsers.new.up
  end
end

Add an empty migration, and the test fails because the user is reporting no published posts. We can get this test passing with a simple migration:

class SetActivatedFlagForExistingUsers < ActiveRecord::Migration
  def up
    connection.update(<<-SQL)
      UPDATE users
      SET published_posts_count = (
        SELECT COUNT(*) FROM posts
      )
    SQL
  end

  def down
    # No problem
  end
end

Next up, we need to make sure it’s only counting published posts:

it "counts only published posts" do
  user = FactoryGirl.create(:user)
  FactoryGirl.create_list(:post, 3, :user => user, :published => true)
  FactoryGirl.create(:post, :user => user, :published => false)
  reset_cache_and_run_migration
  user.reload.published_posts_count.should == 3
end

That will fail because the migration counts the published posts, ending up with a total of four. We can fix that easily:

def up
  connection.update(<<-SQL)
    UPDATE users
    SET published_posts_count = (
      SELECT COUNT(*) FROM posts
      WHERE posts.published = true
    )
  SQL
end

Next up, we need to make sure each user only counts their own posts, so we add a post for a different user:

it "counts only published posts" do
  user = FactoryGirl.create(:user)
  FactoryGirl.create_list(:post, 3, :user => user, :published => true)
  FactoryGirl.create(:post, :user => user, :published => false)
  other_user = FactoryGirl.create(:user)
  FactoryGirl.create(:post, :user => other_user, :published => true)
  reset_cache_and_run_migration
  user.reload.published_posts_count.should == 3
end

The test fails again with a count of four, since it picked up the other user’s post. Getting this test to pass leads to our final migration:

def up
  connection.update(<<-SQL)
    UPDATE users
    SET published_posts_count = (
      SELECT COUNT(*) FROM posts
      WHERE posts.published = true
      AND posts.user_id = users.id
    )
  SQL
end

At this point, I just delete the spec. Since migrations should never be edited after they run, there’s little reason to test for regressions. Inevitably the schema will change, which will mean the spec no longer applies.

The spec provides no value after the migration is committed, but I still find writing specs like these well worth the time. It’s easier for me to think like I’m used to, by writing tests first, and it makes me confident that the migration actually covers the cases it’s supposed to.