Ben is joined by Bryan Helmkamp, the founder of CodeClimate. In Bryan’s second appearance on the podcast, Ben and Bryan discuss the architecture behind CodeClimate, scaling the service, and growing the business. They also discuss speaking at conferences, proposal selection, two factor authentication and adding it to CodeClimate, marketing and content marketing, how to decide what to build and proving that it was worthwhile, strategies for testing at the beginning when you have few users, and Bryan reveals CodeClimate next big upcoming feature.
People frequently use lines of code as a metric. How many times have you heard someone say, “it’s only 10 lines?” or “This class is almost 200 lines?” Have you ever heard somebody boast that they could write something in fewer lines of code? Do you talk about your test to code ratio?
How many lines of code something takes doesn’t tell us much about the code. Let’s take a closer look at some better metrics.
Focusing on lines of code encourages developers to jam as much complexity as they can into each line.
Compare this example:
def credit
user && coupon = user.coupons.where(:product_id => product.id).first.try(:amount) || 0
end
To this one:
def credit
user && user.coupon_for(product).amount || 0
end
# On User
def coupon_for(product)
coupons.where(:product_id => product.id).first.try(:amount)
end
Although the first example is only three lines of code, it has a lot of complexity in one method. That means this method is harder to understand, reuse, modify, or extract. Flog rates this method at 13.9.
The second example example has twice as many lines, but each method is less complex. Flog rates the second example at 8.1 per method.
Even in the second example, the methods are dense. Further refactoring would result in more lines of code, but smaller, simpler methods.
The previous example also improves readability. Every method you extract is an opportunity to clarify something by naming it, while every piece of complexity you add to a line is anonymous.
Code is frequently more readable when it expands over multiple lines of code. Here’s a common Ruby idiom:
users.each(&:friend_added) if new_record?
I like to call this the “if surprise.” It’s easy to scan down a method and miss that dangling if, not realizing that the method you’re debugging actually has a logical fork.
I prefer this:
if new_record?
users.each(&:friend_added)
end
It’s three times as many lines, but the same amount of complexity, and more readable.
Another place that people obsess over lines is their test to code ratio. Rails even comes with a task to print this stat out for you, and many presentations use this stat when bragging about their test coverage (or lack thereof). There are a few things that make this metric meaningless:
Here are some questions that can be useful when measuring your test suite:
rcov or simplecov.Merb popularized this saying and encouraged users not to use code they didn’t need. However, it’s important not to misinterpret this by assuming that fewer lines of code means less code, and therefore faster code.
Here’s an example of a minimal Fibonacci sequence:
class Fibber
def fib(n)
if n < 2
n
else
fib(n - 1) + fib(n - 2)
end
end
end
This implementation is pretty slow. It took almost five seconds to print out the first 35 numbers in the sequence on my machine. You can improve the performance dramatically with just a few more lines:
class Fibber
def initialize
@cache = {}
end
def fib(n)
if n < 2
n
else
@cache[n] ||= fib(n - 1) + fib(n - 2)
end
end
end
This takes around a millisecond on my machine.
Your application will perform better if you remove functionality you don’t need, but removing lines and watching the tests still pass doesn’t mean you haven’t lost anything.
In version 2.4.0 of factory_girl, we added almost three hundred lines of code and almost no features. That seems like a lot of code for no new functionality, but what actually happened is that we fixed a number of annoying bugs. Fixing the last few edge cases for a library can be a huge undertaking; the last ten percent frequently takes as long as the first ninety percent. There have been times that we’ve been tempted to rewrite factory_girl, because it doesn’t seem that hard to implement at a glance. However, a new implementation would reintroduce bugs that have been fixed for years.
If you’re thinking of rewriting something, remember that it’s probably not as easy as it looks. You don’t just have to reimplement the features; you have to fix the edge cases, too.
Let’s stop thinking in terms of lines of code. Here are a few tips:
flog or rcov, not wc -lAs a community, we’ve developed a number of techniques and tools for evaluating code. We can do better than just counting lines.
How important is the refactoring you’re working on? Michael Feathers has a metric you should consider when deciding: plot the complexity against the churn.
Chad Fowler’s turbulence gem does just that.
The “complexity” is some arbitrary number of how tricky the code is to read; this can range from the amount of indentation to the amount of metaprogramming, and everything in between. This is computed using flog.
“Churn” simply refers to how often a file changes; a file that’s changed a bunch has high churn. Highly-churned files should be easy to modify, since people do it a ton. This is computing using git.
Using it is simple: drop this in your Gemfile:
group :development do
gem 'turbulence'
end
Then run it:
bundle exec bule
Watch out, it’ll open a new tab in the browser for you:

There are two, low barrier to entry ways to get some quick metrics about your application’s test code and the coverage it provides. Of course there are others, but today we’re just going to focus on the two that are easiest to run and on what they mean: rake stats and rcov.
The first tool available to us comes built into Rails, and that’s rake stats.
If you haven’t used it before, rake stats, when run, outputs a quick summary of the lines of code, lines of test code, number of classes, number of methods, the ratio of methods to classes, and the ratio of lines of code per method.
Lets take a look at the output from the application Joe, Mike, Micah, and myself just built for the Rails Rumble, Where’s the Milk At?.
+----------------------+-------+-------+---------+---------+-----+-------+
| Name | Lines | LOC | Classes | Methods | M/C | LOC/M |
+----------------------+-------+-------+---------+---------+-----+-------+
| Controllers | 176 | 149 | 10 | 18 | 1 | 6 |
| Helpers | 38 | 35 | 0 | 4 | 0 | 6 |
| Models | 183 | 147 | 5 | 20 | 4 | 5 |
| Libraries | 0 | 0 | 0 | 0 | 0 | 0 |
| Integration tests | 0 | 0 | 0 | 0 | 0 | 0 |
| Functional tests | 855 | 686 | 9 | 3 | 0 | 226 |
| Unit tests | 684 | 568 | 7 | 0 | 0 | 0 |
+----------------------+-------+-------+---------+---------+-----+-------+
| Total | 1936 | 1585 | 31 | 45 | 1 | 33 |
+----------------------+-------+-------+---------+---------+-----+-------+
Code LOC: 331 Test LOC: 1254 Code to Test Ratio: 1:3.8
Ok, when looking at the output from rake stats, there are a few important bits of information that you should look at first, and that are all in the final summary line, in this case:
A Code to Test Ratio of 1 to 3.8 is somewhat ridiculous. Its incredibly high, and when you see something like this, its important to ask why? That’s pretty much the entire usefulness of the output of rake stats as a metric. Here are some guidelines I’ve devised, based on the experience of looking at a bunch of applications I consider “well tested” and “poorly tested”.
There are a few other nice things in the output from rake stats that are helpful for a birds eye view of the application. For example, you can tell that we didn’t write integration tests, and our application has 5 models and 10 controllers.
Lets investigate why the 1:3.8 ratio we have in Where’s the Milk At. Going in, and before doing any actual investigation, I have some initial hunches as to why the application has the ratio it does. Those are
Refactoring tests, just like refactoring code is an essential part of real TDD. Without taking this step, it’d only be natural that our tests would be repetitive, and the lines of test code would be increased. It’s difficult to present a brief example, but here are some typical things that you’ll want to look for in your tests that would be candidates for refactoring
Upon inspection of the Where’s the Milk At test code, I actually found very few, if any, instances of any of the above. In fact, I found that we used extensive use of the macros shoulda provides, we wrote our application specific macros, such as should_have_map and should_display, and we used good practice of shared contexts.
So, I put this aside as a possible cause, but now that I’ve started to review the test code, I’ve started to develop some new ideas about our code to test ratio that I’ll come back to later on.
We used several helpful shoulda test macros to speed up development. My initial suspicion was that these macros were being counted as lines of test code. After investigating, I was able to determine that rake stats only looks in test/unit, test/functional, and test/integration, so this isn’t the case. I put this aside for now, and pocket the info about how rake stats works internally for possible future use some time down the road.
The last of my initial assumptions about our ratio (the astute reader will notice I’m 0 for 2 now) is that we have several complex named scopes that are only 1 to 3 lines of code, but have many more lines of test code. Upon inspection, this is clearly the case. Lets take a look at an example.
We have a named scope which returns all of the Purchases that were made in a specific set of stores. Here’s what it looks like:
named_scope :in_stores, lambda {|stores|
{ :conditions => ['purchases.store_id IN(?)', stores] }
}
And here is the accompanying test (this test was pure TDD, the tests were written a little bit at a time before the named scope was actually written).
context "looking for purchases in stores" do
setup do
@stores = [Factory(:store), Factory(:store)]
@in_store_purchases = []
@stores.each do |store|
2.times do
@in_store_purchases << Factory(:purchase, :store => store)
end
end
Factory(:purchase) # purchase at another store
@result = Purchase.in_stores(@stores)
end
should "not return any purchases for other stores" do
assert_all @result do |purchase|
@stores.include?(purchase.store)
end
end
should "return every purchase for the specified stores" do
assert_all @in_store_purchases do |purchase|
@result.include?(purchase)
end
end
end
You can see that for our 3 line named_scope, we have 23 lines of test code. That’s a ratio of 1:8, and this is an example of one of the simpler named scopes in the the application (assert_all is an assertion we wrote).
Additionally, we could make this ratio slightly worse (or better, depending on how you’re looking at it) by putting the named scope all on one line, instead of 3.
There are quite a few of these finders and accompanying tests, and I feel confident after investigating that this is one of the reasons for the ratio.
In reviewing the test code, I started to notice a few other things the contribute to the ratio.
Take the following test, for example:
logged_in_user_context do
context "with at least one purchase" do
setup do
@purchases = paginate([Factory(:purchase)])
@store = @purchases.last.store
@user. stubs(:purchases).returns(@purchases)
@purchases.stubs(:latest). returns(@purchases)
@purchases.stubs(:paginate). returns(@purchases)
end
context "on GET to index" do
setup do
get :index
end
before_should "find the user's purchases" do
@user.expects(:purchases).with().returns(@purchases)
end
before_should "find the latest purchases" do
@purchases.expects(:latest).with().returns(@purchases)
end
before_should "paginate the purchases" do
@purchases.expects(:paginate).returns(@purchases)
end
When you use stubbing for tests, its best practice to write the stubs and then write expectations for what you’ve stubbed. We’re doing this in the above code by putting the stubs in the setup (3 lines of test code) and then using shoulda’s before_should to declare the expectations (9 lines of test code). That’s 12 lines of test code for what is ultimately 1 line of code.
Now, there isn’t anything necessarily wrong with this, again, we’re only investigating causes of the ratio here. But its something to note and perhaps consider for either test refactoring or to somehow incorporate in your test framework.
Finally, I also noticed a lots of tests like this:
should "crown the best store" do
assert_select 'a', "#{assigns(:stores)[0].name}" do
assert_select 'span[class=crown]'
end
end
should "rerender the purchase form" do
assert_select_rjs :replace, 'new_purchase' do
assert_select '#purchase_store_id[value=?]', @store.id
assert_match @focus_quantity, @response.body
end
end
should "remove the purchase from the list" do
assert_match /new Effect.Fade\("#{dom_id(@purchase)}"/,
@response.body
end
In short, we’re testing the views, markup, javascript (some of it), and RJS – as we should be. And we’re doing it quite extensively, there are 45 calls to assert_select and assert_select_rjs in the functional tests. However, rake stats doesn’t count the lines in the views. If you consider that most of the calls to assert_select and its ilk will be surrounded by a should and an end, that’s 3 lines of test code, that aren’t showing up at all as lines of code at all in our rake stats.
If we modify the rake stats task to include the views (which we can’t seriously do without taking other things into account, like javascript, but bare with me here), here is the new output of rake stats:
+----------------------+-------+-------+---------+---------+-----+-------+
| Name | Lines | LOC | Classes | Methods | M/C | LOC/M |
+----------------------+-------+-------+---------+---------+-----+-------+
| Controllers | 176 | 149 | 10 | 18 | 1 | 6 |
| Helpers | 38 | 35 | 0 | 4 | 0 | 6 |
| Models | 183 | 147 | 5 | 20 | 4 | 5 |
| Views | 605 | 545 | 0 | 0 | 0 | 0 |
| Libraries | 0 | 0 | 0 | 0 | 0 | 0 |
| Integration tests | 0 | 0 | 0 | 0 | 0 | 0 |
| Functional tests | 852 | 683 | 9 | 3 | 0 | 225 |
| Unit tests | 684 | 568 | 7 | 0 | 0 | 0 |
+----------------------+-------+-------+---------+---------+-----+-------+
| Total | 2538 | 2127 | 31 | 45 | 1 | 45 |
+----------------------+-------+-------+---------+---------+-----+-------+
Code LOC: 876 Test LOC: 1251 Code to Test Ratio: 1:1.4
I’ve spent a lot of time talking about rake stats, but here’s the rub. It’s worthless to tell you the real important metric, how good your test code is. Or, said differently, how much coverage your tests provide for your actual code. You really only want to use rake stats for a high level assessment of your code and as one tool in the arsenal you’ll use for investigation in how to improve your tests.
The guidelines I outlined above are basically the extent of how you should use rake stats for judging your test code. And as I’ve illustrated here, your assumptions about your test code, and even my guidelines may be wrong or flexible.
In fact, based on what I’ve uncovered about the view LOC and the stub/expectations, I may begin to reevaluate my 1:2 guideline.
The second tool you can get up and running with easily, and one that is even more valuable than rake stats is rcov
rcov executes your tests and does the best job it can telling which lines of code were executed by your tests. The theory being, that if the line of code is executed, then there was a test for it. Rcov provides C0 coverage, so it cannot tell if two parts of a conditional were both hit, the line being executed means that that line had coverage (See a full definition of C0 and the other types of test coverage measures here).
You should get the latest rcov from github, it crashes less. In order to easily run rcov on your rails app, you can use this rake task, which is included in our plugin that provides standard tasks, limerick_rake, which is in turn included in our Rails application template, Suspenders.
Running rcov on Where’s the Milk At? provides the following information:
+----------------------------------------------------+-------+-------+--------+
| File | Lines | LOC | COV |
+----------------------------------------------------+-------+-------+--------+
|app/controllers/application.rb | 14 | 11 | 100.0% |
|app/controllers/confirmations_controller.rb | 3 | 3 | 100.0% |
|app/controllers/items_controller.rb | 15 | 11 | 100.0% |
|app/controllers/openid_controller.rb | 27 | 25 | 100.0% |
|app/controllers/passwords_controller.rb | 3 | 3 | 100.0% |
|app/controllers/purchases_controller.rb | 48 | 40 | 100.0% |
|app/controllers/sessions_controller.rb | 7 | 6 | 100.0% |
|app/controllers/stores_controller.rb | 21 | 18 | 100.0% |
|app/controllers/users_controller.rb | 28 | 23 | 100.0% |
|app/helpers/application_helper.rb | 38 | 35 | 100.0% |
|app/models/item.rb | 22 | 17 | 100.0% |
|app/models/purchase.rb | 55 | 43 | 100.0% |
|app/models/quantity.rb | 28 | 27 | 100.0% |
|app/models/store.rb | 10 | 7 | 100.0% |
|app/models/user.rb | 63 | 49 | 100.0% |
|app/models/user_mailer.rb | 5 | 4 | 100.0% |
+----------------------------------------------------+-------+-------+--------+
|Total | 387 | 322 | 100.0% |
+----------------------------------------------------+-------+-------+--------+
100.0% 16 file(s) 387 Lines 322 LOC
This shows us that, according to rcov, 100% of the lines of code in our application were executed when our tests were run. This is great, but as with most things, isn’t the whole story and should be taken with a grain of salt. Here are some guidelines/principals you should take into consideration for rcov.
rake stats, rcov doesn’t check coverage on the views (this includes javascript!), so its very possible to have 100% coverage and still have functionality that is uncovered.The most important lesson we can take away from rcov is that its not perfect, but it provides a good benchmark. When its not reporting 100%, you can click through and see exactly which lines of code were not executed by your tests. So, in short, its great at identifying deficiencies in your test suite, but should not be taken as a false safety net, thinking that with 90-100% coverage you’re all good because there can be big holes in your coverage and you’d still be reporting 100%.
Hopefully you’ve gotten a good idea of what to look for and how to use these two simple tools to investigate the quality of your tests. The benchmarks and guidelines I’ve presented here are based on my experience developing over 30 rails applications and reviewing the different stats and coverage reports I’ve seen from them, but that doesn’t mean they are inflexible or infallible.
Also, these metrics, the tools, and other ones that exist out there are meant to assist, but not replace your role as a developer. To correctly understand the problem domain and have confidence in the code itself and the test suite, and to realize the obvious fact that these tools do not analyze the logical correctness of anything you’ve done.
Here are the guidelines again, in summary.
rake stats probably lacks sufficient testsSee, the nasty thing about metrics is that as soon as you start measuring a system, you start changing it. It’s the Heisenberg Uncertainty Principle. Apparently studies have been done that indicate that in successful, maintainable, long lived applications, the average length of methods was 10 lines (I believe this number was derived from Smalltalk applications, but we’re arguing this particular number because we don’t have numbers for Ruby at the moment— this info is also second hand, so we can assume for the sake of argument it’s just arbitrary). The argument is now that we should try to keep our methods to no more than 10 lines, because going above 10 lines means we’re doing something wrong.
I have issues with this mentality. To wit:
end”? Does things.uniq.sort.map(&:to_s) count as more of a line than options ||= {}? Should I write more dense code so I can get more lines into a method?One of my biggest problems with the practice is that it takes the art out of coding. If you want a coder to make smaller methods that do less and are easier to read, tell him that, don’t tell him to make the methods 10 lines or less. The number conveys the means, not the end, in this case, and that means the message is lost. You want smaller methods to mean more maintainable code, not simply shorter methods. By focusing on the numbers, you remember the what but forget the why.
Conscious or not, once you start taking metrics, and placing any kind of value judgment on them, you will find that your team will start to value the metrics more than they value writing code that’s good for the sake of being good. I argue against taking or caring about metrics at all, because while some proponents suggest that you take metrics and then not care about them, I don’t think that’s really possible. Once metrics are taken, coders will want to “improve” them. I can hardly blame them; any good coder will invariably want to get better. But like the problem with Amazon’s Mechanical Turk, coders will change their reward from writing good code to writing code with good metrics.