Simple Test Metrics in Your Rails App, and What They Mean

There are two, low barrier to entry ways to get some quick metrics about your application’s test code and the coverage it provides. There are others, but today we’re going to focus on the two that are easiest to run and on what they mean: rake stats and rcov.

The first tool available to us comes built into Rails, and that’s rake stats.

rake stats

If you haven’t used it before, rake stats, when run, outputs a quick summary of the lines of code, lines of test code, number of classes, number of methods, the ratio of methods to classes, and the ratio of lines of code per method.

Lets take a look at the output from the application Joe, Mike, Micah, and myself built for the Rails Rumble, Where’s the Milk At?.

+----------------------+-------+-------+---------+---------+-----+-------+
| Name                 | Lines |   LOC | Classes | Methods | M/C | LOC/M |
+----------------------+-------+-------+---------+---------+-----+-------+
| Controllers          |   176 |   149 |      10 |      18 |   1 |     6 |
| Helpers              |    38 |    35 |       0 |       4 |   0 |     6 |
| Models               |   183 |   147 |       5 |      20 |   4 |     5 |
| Libraries            |     0 |     0 |       0 |       0 |   0 |     0 |
| Integration tests    |     0 |     0 |       0 |       0 |   0 |     0 |
| Functional tests     |   855 |   686 |       9 |       3 |   0 |   226 |
| Unit tests           |   684 |   568 |       7 |       0 |   0 |     0 |
+----------------------+-------+-------+---------+---------+-----+-------+
| Total                |  1936 |  1585 |      31 |      45 |   1 |    33 |
+----------------------+-------+-------+---------+---------+-----+-------+
  Code LOC: 331     Test LOC: 1254     Code to Test Ratio: 1:3.8

When looking at the output from rake stats, there are a few important bits of information that you should look at first, and that are all in the final summary line, in this case:

Lines of Code (Excluding test code): 331
Lines of Test Code: 1254
Code to Test Ratio: 1:3.8

A Code to Test Ratio of 1 to 3.8 is somewhat ridiculous. Its incredibly high, and when you see something like this, its important to ask why? That’s pretty much the entire usefulness of the output of rake stats as a metric. Here are some guidelines I’ve devised, based on the experience of looking at a bunch of applications I consider well tested and poorly tested.

Anything less than 1:1 the code probably lacks sufficient tests
Anything more than 1:2 is suspect to questioning, but upon investigation could be found to be perfectly reasonable.

There are a few other nice things in the output from rake stats that are helpful for a birds eye view of the application. For example, you can tell that we didn’t write integration tests, and our application has 5 models and 10 controllers.

Lets investigate why the 1:3.8 ratio we have in Where’s the Milk At. Going in, and before doing any actual investigation, I have some initial hunches as to why the application has the ratio it does. Those are

Given a rapid development schedule of 48 hours, we didn’t have any opportunity to refactor tests
Our Shoulda macros are being counted as LOTC
We have several complex named scopes that count as 1 to 3 lines of code, but have many more lines of test code

We didn’t have any opportunity to refactor tests

We were on a 48 hour clock.

Refactoring tests, like refactoring code, is an essential part of real TDD. Without taking this step, it’d only be natural that our tests would be repetitive, and the lines of test code would be increased. It’s difficult to present a brief example, but here are some typical things that you’ll want to look for in your tests that would be candidates for refactoring

Duplicated setup code that can be moved into a common context
Multiple contexts that do the same thing
Unnecessary tests
Duplicated test code that can me moved into a macro

Upon inspection of the Where’s the Milk At test code, I actually found very few, if any, instances of any of the above. In fact, I found that we used extensive use of the macros Shoulda provides, we wrote our application specific macros, such as should_have_map and should_display, and we used good practice of shared contexts.

So, I put this aside as a possible cause, but now that I’ve started to review the test code, I’ve started to develop some new ideas about our code to test ratio that I’ll come back to later on.

Our Shoulda macros are being counted as LOTC

We used several helpful shoulda test macros to speed up development. My initial suspicion was that these macros were being counted as lines of test code. After investigating, I was able to determine that rake stats only looks in test/unit, test/functional, and test/integration, so this isn’t the case. I putthis aside for now, and pocket the info about how rake stats works internally for possible future use some time down the road.

We have several complex named scopes

The last of my initial assumptions about our ratio (the astute reader will notice I’m 0 for 2 now) is that we have several complex named scopes that are only 1 to 3 lines of code, but have many more lines of test code. Upon inspection, this is the case. Lets take a look at an example.

We have a named scope which returns all of the Purchases that were made in a specific set of stores. Here’s what it looks like:

named_scope :in_stores, lambda {|stores|
  { :conditions => ['purchases.store_id IN(?)', stores] }
}

And here is the accompanying test (this test was pure TDD, the tests were written a little bit at a time before the named scope was actually written).

context "looking for purchases in stores" do
  setup do
    @stores = [Factory(:store), Factory(:store)]

    @in_store_purchases = []
    @stores.each do |store|
      2.times do
        @in_store_purchases << Factory(:purchase, :store => store)
      end
    end

    Factory(:purchase) # purchase at another store

    @result = Purchase.in_stores(@stores)
  end

  should "not return any purchases for other stores" do
    assert_all @result do |purchase|
      @stores.include?(purchase.store)
    end
  end

  should "return every purchase for the specified stores" do
    assert_all @in_store_purchases do |purchase|
      @result.include?(purchase)
    end
  end
end

You can see that for our 3 line namedscope, we have 23 lines of test code. That’s a ratio of 1:8, and this is an example of one of the simpler named scopes in the the application (assertall is an assertion we wrote).

Additionally, we could make this ratio slightly worse (or better, depending on how you’re looking at it) by putting the named scope all on one line, instead of 3.

There are quite a few of these finders and accompanying tests, and I feel confident after investigating that this is one of the reasons for the ratio.

Other causes

In reviewing the test code, I started to notice a few other things the contribute to the ratio.

Take the following test, for example:

logged_in_user_context do
  context "with at least one purchase" do
    setup do
      @purchases = paginate([Factory(:purchase)])
      @store     = @purchases.last.store

      @user.     stubs(:purchases).returns(@purchases)
      @purchases.stubs(:latest).   returns(@purchases)
      @purchases.stubs(:paginate). returns(@purchases)
    end

    context "on GET to index" do
      setup do
        get :index
      end

      before_should "find the user's purchases" do
        @user.expects(:purchases).with().returns(@purchases)
      end

      before_should "find the latest purchases" do
        @purchases.expects(:latest).with().returns(@purchases)
      end

      before_should "paginate the purchases" do
        @purchases.expects(:paginate).returns(@purchases)
      end

When you use stubbing for tests, its best practice to write the stubs and then write expectations for what you’ve stubbed. We’re doing this in the above code by putting the stubs in the setup (3 lines of test code) and then using shoulda’s before_should to declare the expectations (9 lines of test code). That’s 12 lines of test code for what is ultimately 1 line of code.

Now, there isn’t anything necessarily wrong with this, again, we’re only investigating causes of the ratio here. But its something to note and perhaps consider for either test refactoring or to somehow incorporate in your test framework.

Finally, I also noticed a lots of tests like this:

should "crown the best store" do
  assert_select 'a', "#{assigns(:stores)[0].name}" do
    assert_select 'span[class=crown]'
  end
end

should "rerender the purchase form" do
  assert_select_rjs :replace, 'new_purchase' do
    assert_select '#purchase_store_id[value=?]', @store.id
    assert_match @focus_quantity, @response.body
  end
end

should "remove the purchase from the list" do
  assert_match /new Effect.Fade\("#{dom_id(@purchase)}"/,
               @response.body
end

In short, we’re testing the views, markup, JavaScript (some of it), and RJS - as we should be. And we’re doing it quite extensively, there are 45 calls to assert_select and assert_select_rjs in the functional tests. rake stats doesn’t count the lines in the views. If you consider that most of the calls to assert_select and its ilk will be surrounded by a should and an end, that’s 3 lines of test code, that aren’t showing up at all as lines of code at all in our rake stats.

If we modify the rake stats task to include the views (which we can’t seriously do without taking other things into account, like JavaScript, but bare with me here), here is the new output of rake stats:

+----------------------+-------+-------+---------+---------+-----+-------+
| Name                 | Lines |   LOC | Classes | Methods | M/C | LOC/M |
+----------------------+-------+-------+---------+---------+-----+-------+
| Controllers          |   176 |   149 |      10 |      18 |   1 |     6 |
| Helpers              |    38 |    35 |       0 |       4 |   0 |     6 |
| Models               |   183 |   147 |       5 |      20 |   4 |     5 |
| Views                |   605 |   545 |       0 |       0 |   0 |     0 |
| Libraries            |     0 |     0 |       0 |       0 |   0 |     0 |
| Integration tests    |     0 |     0 |       0 |       0 |   0 |     0 |
| Functional tests     |   852 |   683 |       9 |       3 |   0 |   225 |
| Unit tests           |   684 |   568 |       7 |       0 |   0 |     0 |
+----------------------+-------+-------+---------+---------+-----+-------+
| Total                |  2538 |  2127 |      31 |      45 |   1 |    45 |
+----------------------+-------+-------+---------+---------+-----+-------+
  Code LOC: 876     Test LOC: 1251     Code to Test Ratio: 1:1.4

I’ve spent a lot of time talking about rake stats, but here’s the rub. It’s worthless to tell you the real important metric, how good your test code is. Or, said differently, how much coverage your tests provide for your actual code. You really only want to use rake stats for a high level assessment of your code and as one tool in the arsenal you’ll use for investigation in how to improve your tests.

The guidelines I outlined above are the extent of how you should use rake stats for judging your test code. And as I’ve illustrated here, your assumptions about your test code, and even my guidelines may be wrong or flexible.

In fact, based on what I’ve uncovered about the view LOC and the stub/expectations, I may begin to reevaluate my 1:2 guideline.

The second tool you can get up and running with easily, and one that is even more valuable than rake stats is rcov

rcov

rcov executes your tests and does the best job it can telling which lines of code were executed by your tests. The theory being, that if the line of code is executed, then there was a test for it. Rcov provides C0 coverage, so it cannot tell if two parts of a conditional were both hit, the line being executed means that that line had coverage.

You should get the latest rcov from github, it crashes less. In order to easily run rcov on your rails app, you can use this rake task, which is included in our plugin that provides standard tasks, limerick_rake, which is in turn included in our Rails application template, Suspenders.

Running rcov on Where’s the Milk At? provides the following information:

+----------------------------------------------------+-------+-------+--------+
|                  File                              | Lines |  LOC  |  COV   |
+----------------------------------------------------+-------+-------+--------+
|app/controllers/application.rb                      |    14 |    11 | 100.0% |
|app/controllers/confirmations_controller.rb         |     3 |     3 | 100.0% |
|app/controllers/items_controller.rb                 |    15 |    11 | 100.0% |
|app/controllers/openid_controller.rb                |    27 |    25 | 100.0% |
|app/controllers/passwords_controller.rb             |     3 |     3 | 100.0% |
|app/controllers/purchases_controller.rb             |    48 |    40 | 100.0% |
|app/controllers/sessions_controller.rb              |     7 |     6 | 100.0% |
|app/controllers/stores_controller.rb                |    21 |    18 | 100.0% |
|app/controllers/users_controller.rb                 |    28 |    23 | 100.0% |
|app/helpers/application_helper.rb                   |    38 |    35 | 100.0% |
|app/models/item.rb                                  |    22 |    17 | 100.0% |
|app/models/purchase.rb                              |    55 |    43 | 100.0% |
|app/models/quantity.rb                              |    28 |    27 | 100.0% |
|app/models/store.rb                                 |    10 |     7 | 100.0% |
|app/models/user.rb                                  |    63 |    49 | 100.0% |
|app/models/user_mailer.rb                           |     5 |     4 | 100.0% |
+----------------------------------------------------+-------+-------+--------+
|Total                                               |   387 |   322 | 100.0% |
+----------------------------------------------------+-------+-------+--------+
100.0%   16 file(s)   387 Lines   322 LOC

This shows us that, according to rcov, 100% of the lines of code in our application were executed when our tests were run. This is great, but as with most things, isn’t the whole story and should be taken with a grain of salt. Here are some guidelines/principals you should take into consideration for rcov.

Like we discovered with our rake stats, rcov doesn’t check coverage on the views (this includes JavaScript!), so its very possible to have 100% coverage and still have functionality that is uncovered.
Since rcov only provides C0 coverage reports, 100% doesn’t mean that you don’t have bugs or that you’re even perfectly tested.
If you’re doing real, actual, TATFT TDD, then reaching 100% coverage (as reported by rcov) should be a reachable goal; in fact, if you have less than 80% and you think you’ve been doing TDD, something is not right and you should investigate.

The most important lesson we can take away from rcov is that its not perfect, but it provides a good benchmark. When its not reporting 100%, you can click through and see exactly which lines of code were not executed by your tests. So, in short, its great at identifying deficiencies in your test suite, but should not be taken as a false safety net, thinking that with 90-100% coverage you’re all good because there can be big holes in your coverage and you’d still be reporting 100%.

What All This Means

Hopefully you’ve gotten a good idea of what to look for and how to use these two simple tools to investigate the quality of your tests. The benchmarks and guidelines I’ve presented here are based on my experience developing over 30 rails applications and reviewing the different stats and coverage reports I’ve seen from them, but that doesn’t mean they are inflexible or infallible.

Also, these metrics, the tools, and other ones that exist out there are meant to assist, but not replace your role as a developer. To correctly understand the problem domain and have confidence in the code itself and the test suite, and to realize the obvious fact that these tools do not analyze the logical correctness of anything you’ve done.

Here are the guidelines again, in summary.

Anything less than 1:1 code to test ratio from rake stats probably lacks sufficient tests.
Anything more than 1:2 is suspect to questioning, but upon investigation could be found to be perfectly reasonable.
Both rcov and rake stats don’t check the views (this includes JavaScript!) so its very possible to have 100% coverage and still have functionality that is uncovered or to have a a very high code to test ratio.
Since rcov only provides C0 coverage reports, 100% doesn’t mean that you don’t have bugs or that you’re even perfectly tested.
If you’re doing real, actual, TATFT TDD, then reaching 100% coverage (as reported by rcov) should be a reachable goal; in fact, if you have less than 80% and you think you’ve been doing TDD, something is not right and you should investigate.