Chances are, some of you have run into the issue with the invalid byte sequence in UTF-8 error when dealing with user-submitted data. A Google search shows that my hunch isn’t off.
Among the search results are plenty of answers—some using the deprecated iconv library—that might lead you to a sufficient fix. However, among the slew of queries are few answers on how to reliably replicate and test the issue.
In developing the Griddler gem we ran into some cases where the data being posted back to our controller had invalid UTF-8 bytes. For Griddler, our failing case needs to simulate the body of an email having an invalid byte, and encoded as UTF-8.
What are valid and invalid bytes? This table on Wikipedia tells us bytes 192, 193, and 245-255 are off limits. In ruby’s string literal we can represent this by escaping one of those numbers:
> "hi \255"
=> "hi \xAD"
There’s our string with the invalid byte! How do we know for sure? In that IRB session we can simulate a comparable issue by sending a message to the string it won’t like - like split or gsub.
> "hi \255".split(' ')
ArgumentError: invalid byte sequence in UTF-8
from (irb):9:in `split'
from (irb):9
from /Users/joel/.rvm/rubies/ruby-1.9.3-p125/bin/irb:16:in `<main>'
Yup. It certainly does not like that.
Let’s create a very real-world, enterprise-level, business-critical test case:
invalid_byte_spec.rb
require 'rspec'
def replace_name(body, name)
body.gsub(/joel/, name)
end
describe 'replace_name' do
it 'removes my name' do
body = "hello joel"
replace_name(body, 'hank').should eq "hello hank"
end
it 'clears out invalid UTF-8 bytes' do
body = "hello joel\255"
replace_name(body, 'hank').should eq "hello hank"
end
end
The first test passes as expected, and the second will fail as expected but not with the error we want. By adding that extra byte we should see an exception raised similar to what we simulated in IRB. Instead it’s failing in the comparison with the expected value.
1) replace_name clears out invalid UTF-8 bytes
Failure/Error: replace_name(body, 'hank').should eq "hello hank"
expected: "hello hank"
got: "hello hank\xAD"
(compared using ==)
# ./invalid_byte_spec.rb:17:in `block (2 levels) in <top (required)>'
Why isn’t it failing properly? If we pry into our running test we find out that inside our file the strings being passed around are encoded as ASCII-8BIT instead of UTF-8.
[2] pry(#<RSpec::Core::ExampleGroup::Nested_1>)> body.encoding
=> #<Encoding:ASCII-8BIT>
As a result we’ll have to force that string’s encoding to UTF-8:
it 'clears out invalid UTF-8 bytes' do
body = "hello joel\255".force_encoding('UTF-8')
replace_name(body, 'hank').should_not raise_error(ArgumentError)
replace_name(body, 'hank').should eq "hello hank"
end
By running the test now we will see our desired exception
1) replace_name clears out invalid UTF-8 bytes
Failure/Error: body.gsub(/joel/, name)
ArgumentError:
invalid byte sequence in UTF-8
# ./invalid_byte_spec.rb:4:in `gsub'
# ./invalid_byte_spec.rb:4:in `replace_name'
# ./invalid_byte_spec.rb:17:in `block (2 levels) in <top (required)>'
Finished in 0.00426 seconds
2 examples, 1 failure
Now that we’re comfortably in the red part of red/green/refactor we can move on to getting this passing by updating our replace_name method.
def replace_name(body, name)
body
.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
.gsub(/joel/, name)
end
And the test?
Finished in 0.04252 seconds
2 examples, 0 failures
For such a small piece of code we admittedly had to jump through some hoops. Through that process, however, we learned a bit about character encoding and how to put ourselves in the right position—through the red/green/refactor cycle—to fix bugs we will undoubtedly run into while writing software.

Recently, we implemented a feature that required a before_filter in ApplicationController and whitelisting some other controllers using skip_before_filter.
We couldn’t actually test something like this directly because callbacks aren’t really methods. They’re entirely used for their side effects so we can only test what happens when we invoke one. Let’s check out an example:
require 'test_helper'
class ApplicationControllerTest < ActionController::TestCase
context 'ensure_manually_set_password' do
setup do
class ::TestingController < ApplicationController
def hello
render :nothing => true
end
end
ActionController::Routing::Routes.draw do |map|
map.hello '', :controller => 'testing', :action => 'hello'
end
end
teardown do
Object.send(:remove_const, :TestingController)
end
context 'when user is logged in' do
setup do
@controller = TestingController.new
end
context 'and user has not manually set their password' do
setup do
@user = Factory(:user, :manually_set_password => false)
login_as @user
get :hello
end
should 'redirect user to set their password' do
assert_redirected_to new_password_path(@user.password_token)
end
end
end
end
end
Note the use of the double-colon prepended to the TestingController, which ensures the class is top-level, not an inner class of ApplicationControllerTest. That way we can just do :controller => 'testing' and not have to write :controller =>
'application_controller_test/testing'. We also use the private method remove_const to remove the class after we’re done, so we don’t litter the namespace.
You probably know about this Factory Girl definition syntax:
FactoryGirl.define do
factory :user do
name 'Connie Customer'
end
end
But did you know about this Factory Girl invocation syntax?
setup do
@user = create(:user)
end
Or:
setup do
@user = build(:user)
end
Or:
setup do
post :create, user: attributes_for(:user)
end
It’s in there.
Configuration for Test::Unit / Shoulda:
class ActiveSupport::TestCase
include FactoryGirl::Syntax::Methods
end
Configuration for RSpec:
RSpec.configure do |config|
config.include FactoryGirl::Syntax::Methods
end
Configuration for Cucumber:
World FactoryGirl::Syntax::Methods
Written by Dan Croak.
After doing TDD full time for years, I have a hard time writing code without a test. One example that I find particularly difficult is writing data migrations.
Some schema changes require more than just setting a default value for all existing rows. For example, let’s say you have this schema:
create_table :users do |table|
table.string :email
table.string :encrypted_password
end
create_table :posts do |table|
table.integer :user_id
table.boolean :published
table.string :message
end
If you want to find the top ten users based on the number of published posts, you can do a JOIN with a COUNT and a GROUP BY clause, but over time that could get slow or just annoying, so you decide to add a cache column:
add_column :users, :published_posts_count, :integer, :default => 0, :null => false
You add code to populate the column and all the tests pass, but of course there’s a big problem: every existing user in production will report zero published posts. That means it’s time to break out a data migration. Running migrations over and over with different data or comparing lots of queries before and after migrating production data is tedious and error-prone, so let’s write a throw-away test for this migration. You can save this as spec/migration_spec.rb:
require 'spec_helper'
require Dir.glob(Rails.root.join("db", "migrate", "*_set_published_posts_for_existing_users.rb")).first
describe SetPublishedPostsForExistingUsers do
it "counts only published posts" do
user = FactoryGirl.create(:user)
FactoryGirl.create_list(:post, 3, :user => user, :published => true)
reset_cache_and_run_migration
user.reload.published_posts_count.should == 3
end
def reset_cache_and_run_migration
User.update_all("published_posts = 0")
SetPublishedPostsForExistingUsers.new.up
end
end
Add an empty migration, and the test fails because the user is reporting no published posts. We can get this test passing with a simple migration:
class SetActivatedFlagForExistingUsers < ActiveRecord::Migration
def up
connection.update(<<-SQL)
UPDATE users
SET published_posts_count = (
SELECT COUNT(*) FROM posts
)
SQL
end
def down
# No problem
end
end
Next up, we need to make sure it’s only counting published posts:
it "counts only published posts" do
user = FactoryGirl.create(:user)
FactoryGirl.create_list(:post, 3, :user => user, :published => true)
FactoryGirl.create(:post, :user => user, :published => false)
reset_cache_and_run_migration
user.reload.published_posts_count.should == 3
end
That will fail because the migration counts the published posts, ending up with a total of four. We can fix that easily:
def up
connection.update(<<-SQL)
UPDATE users
SET published_posts_count = (
SELECT COUNT(*) FROM posts
WHERE posts.published = true
)
SQL
end
Next up, we need to make sure each user only counts their own posts, so we add a post for a different user:
it "counts only published posts" do
user = FactoryGirl.create(:user)
FactoryGirl.create_list(:post, 3, :user => user, :published => true)
FactoryGirl.create(:post, :user => user, :published => false)
other_user = FactoryGirl.create(:user)
FactoryGirl.create(:post, :user => other_user, :published => true)
reset_cache_and_run_migration
user.reload.published_posts_count.should == 3
end
The test fails again with a count of four, since it picked up the other user’s post. Getting this test to pass leads to our final migration:
def up
connection.update(<<-SQL)
UPDATE users
SET published_posts_count = (
SELECT COUNT(*) FROM posts
WHERE posts.published = true
AND posts.user_id = users.id
)
SQL
end
At this point, I just delete the spec. Since migrations should never be edited after they run, there’s little reason to test for regressions. Inevitably the schema will change, which will mean the spec no longer applies.
The spec provides no value after the migration is committed, but I still find writing specs like these well worth the time. It’s easier for me to think like I’m used to, by writing tests first, and it makes me confident that the migration actually covers the cases it’s supposed to.