Chances are, some of you have run into the issue with the
invalid byte sequence in UTF-8 error when dealing with user-submitted data. A Google search shows that my hunch isn’t off.
Among the search results are plenty of answers—some using the deprecated iconv library—that might lead you to a sufficient fix. However, among the slew of queries are few answers on how to reliably replicate and test the issue.
In developing the Griddler gem we ran into some cases where the data being posted back to our controller had invalid UTF-8 bytes. For Griddler, our failing case needs to simulate the body of an email having an invalid byte, and encoded as UTF-8.
What are valid and invalid bytes? This table on Wikipedia tells us bytes 192, 193, and 245-255 are off limits. In ruby’s string literal we can represent this by escaping one of those numbers:
> "hi \255" => "hi \xAD"
There’s our string with the invalid byte! How do we know for sure? In that IRB session we can simulate a comparable issue by sending a message to the string it won’t like - like
> "hi \255".split(' ') ArgumentError: invalid byte sequence in UTF-8 from (irb):9:in `split' from (irb):9 from /Users/joel/.rvm/rubies/ruby-1.9.3-p125/bin/irb:16:in `<main>'
Yup. It certainly does not like that.
Let’s create a very real-world, enterprise-level, business-critical test case:
require 'rspec' def replace_name(body, name) body.gsub(/joel/, name) end describe 'replace_name' do it 'removes my name' do body = "hello joel" replace_name(body, 'hank').should eq "hello hank" end it 'clears out invalid UTF-8 bytes' do body = "hello joel\255" replace_name(body, 'hank').should eq "hello hank" end end
The first test passes as expected, and the second will fail as expected but not with the error we want. By adding that extra byte we should see an exception raised similar to what we simulated in IRB. Instead it’s failing in the comparison with the expected value.
1) replace_name clears out invalid UTF-8 bytes Failure/Error: replace_name(body, 'hank').should eq "hello hank" expected: "hello hank" got: "hello hank\xAD" (compared using ==) # ./invalid_byte_spec.rb:17:in `block (2 levels) in <top (required)>'
Why isn’t it failing properly? If we pry into our running test we find out that inside our file the strings being passed around are encoded as
ASCII-8BIT instead of
 pry(#<RSpec::Core::ExampleGroup::Nested_1>)> body.encoding => #<Encoding:ASCII-8BIT>
As a result we’ll have to force that string’s encoding to UTF-8:
it 'clears out invalid UTF-8 bytes' do body = "hello joel\255".force_encoding('UTF-8') replace_name(body, 'hank').should_not raise_error(ArgumentError) replace_name(body, 'hank').should eq "hello hank" end
By running the test now we will see our desired exception
1) replace_name clears out invalid UTF-8 bytes Failure/Error: body.gsub(/joel/, name) ArgumentError: invalid byte sequence in UTF-8 # ./invalid_byte_spec.rb:4:in `gsub' # ./invalid_byte_spec.rb:4:in `replace_name' # ./invalid_byte_spec.rb:17:in `block (2 levels) in <top (required)>' Finished in 0.00426 seconds 2 examples, 1 failure
Now that we’re comfortably in the red part of red/green/refactor we can move on to getting this passing by updating our
def replace_name(body, name) body .encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') .gsub(/joel/, name) end
And the test?
Finished in 0.04252 seconds 2 examples, 0 failures
For such a small piece of code we admittedly had to jump through some hoops. Through that process, however, we learned a bit about character encoding and how to put ourselves in the right position—through the red/green/refactor cycle—to fix bugs we will undoubtedly run into while writing software.
Recently, we implemented a feature that required a before_filter in
ApplicationController and whitelisting some other controllers using
We couldn’t actually test something like this directly because callbacks aren’t really methods. They’re entirely used for their side effects so we can only test what happens when we invoke one. Let’s check out an example:
require 'test_helper' class ApplicationControllerTest < ActionController::TestCase context 'ensure_manually_set_password' do setup do class ::TestingController < ApplicationController def hello render :nothing => true end end ActionController::Routing::Routes.draw do |map| map.hello '', :controller => 'testing', :action => 'hello' end end teardown do Object.send(:remove_const, :TestingController) end context 'when user is logged in' do setup do @controller = TestingController.new end context 'and user has not manually set their password' do setup do @user = Factory(:user, :manually_set_password => false) login_as @user get :hello end should 'redirect user to set their password' do assert_redirected_to new_password_path(@user.password_token) end end end end end
Note the use of the double-colon prepended to the TestingController, which ensures the class is top-level, not an inner class of ApplicationControllerTest. That way we can just do
:controller => 'testing' and not have to write
'application_controller_test/testing'. We also use the private method
remove_const to remove the class after we’re done, so we don’t litter the namespace.
Co-written with Gabe Berke-Williams
You probably know about this Factory Girl definition syntax:
FactoryGirl.define do factory :user do name 'Connie Customer' end end
But did you know about this Factory Girl invocation syntax?
setup do @user = create(:user) end
setup do @user = build(:user) end
setup do post :create, user: attributes_for(:user) end
It’s in there.
Configuration for Test::Unit / Shoulda:
class ActiveSupport::TestCase include FactoryGirl::Syntax::Methods end
Configuration for RSpec:
RSpec.configure do |config| config.include FactoryGirl::Syntax::Methods end
Configuration for Cucumber:
Written by Dan Croak.
After doing TDD full time for years, I have a hard time writing code without a test. One example that I find particularly difficult is writing data migrations.
Some schema changes require more than just setting a default value for all existing rows. For example, let’s say you have this schema:
create_table :users do |table| table.string :email table.string :encrypted_password end create_table :posts do |table| table.integer :user_id table.boolean :published table.string :message end
If you want to find the top ten users based on the number of published posts, you can do a
JOIN with a
COUNT and a
GROUP BY clause, but over time that could get slow or just annoying, so you decide to add a cache column:
add_column :users, :published_posts_count, :integer, :default => 0, :null => false
You add code to populate the column and all the tests pass, but of course there’s a big problem: every existing user in production will report zero published posts. That means it’s time to break out a data migration. Running migrations over and over with different data or comparing lots of queries before and after migrating production data is tedious and error-prone, so let’s write a throw-away test for this migration. You can save this as spec/migration_spec.rb:
require 'spec_helper' require Dir.glob(Rails.root.join("db", "migrate", "*_set_published_posts_for_existing_users.rb")).first describe SetPublishedPostsForExistingUsers do it "counts only published posts" do user = FactoryGirl.create(:user) FactoryGirl.create_list(:post, 3, :user => user, :published => true) reset_cache_and_run_migration user.reload.published_posts_count.should == 3 end def reset_cache_and_run_migration User.update_all("published_posts = 0") SetPublishedPostsForExistingUsers.new.up end end
Add an empty migration, and the test fails because the user is reporting no published posts. We can get this test passing with a simple migration:
class SetActivatedFlagForExistingUsers < ActiveRecord::Migration def up connection.update(<<-SQL) UPDATE users SET published_posts_count = ( SELECT COUNT(*) FROM posts ) SQL end def down # No problem end end
Next up, we need to make sure it’s only counting published posts:
it "counts only published posts" do user = FactoryGirl.create(:user) FactoryGirl.create_list(:post, 3, :user => user, :published => true) FactoryGirl.create(:post, :user => user, :published => false) reset_cache_and_run_migration user.reload.published_posts_count.should == 3 end
That will fail because the migration counts the published posts, ending up with a total of four. We can fix that easily:
def up connection.update(<<-SQL) UPDATE users SET published_posts_count = ( SELECT COUNT(*) FROM posts WHERE posts.published = true ) SQL end
Next up, we need to make sure each user only counts their own posts, so we add a post for a different user:
it "counts only published posts" do user = FactoryGirl.create(:user) FactoryGirl.create_list(:post, 3, :user => user, :published => true) FactoryGirl.create(:post, :user => user, :published => false) other_user = FactoryGirl.create(:user) FactoryGirl.create(:post, :user => other_user, :published => true) reset_cache_and_run_migration user.reload.published_posts_count.should == 3 end
The test fails again with a count of four, since it picked up the other user’s post. Getting this test to pass leads to our final migration:
def up connection.update(<<-SQL) UPDATE users SET published_posts_count = ( SELECT COUNT(*) FROM posts WHERE posts.published = true AND posts.user_id = users.id ) SQL end
At this point, I just delete the spec. Since migrations should never be edited after they run, there’s little reason to test for regressions. Inevitably the schema will change, which will mean the spec no longer applies.
The spec provides no value after the migration is committed, but I still find writing specs like these well worth the time. It’s easier for me to think like I’m used to, by writing tests first, and it makes me confident that the migration actually covers the cases it’s supposed to.