GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS

Written by thoughtbot

Get Your C On

This year, whyday happened to fall on the first day of Capeco.de. I'd been interested in playing with C for a while, so I decided to sit down and start learning. The K&R helped a lot, but digging through both Redis and Potion were a lot more insightful of how C could actually be used.

Since then, I've been trying to get some C in my Ruby by writing gems with C extensions. This isn't hard to do, per se, but it was difficult to find a comprehensive list of requirements (as well as guidelines for organizing the code). I ended up looking at one of the most-used Ruby gems with C - Nokogiri.

Where to Start

You'll want a way to test the code, as well as the rake-compiler gem. I don't test my C explicitly - think of it as a handful of private methods. You'll want to test the public methods on whatever classes and modules you write, so a C testing framework is overkill. Finally, you'll want to become acquainted with ruby.h.

I decided to start simple and write a Sieve of Eratosthenes. I'd written one in pure Ruby and wanted a basic translation to C. I was also interested in benchmarking the code since I knew C would be a lot faster in this instance.

The sieve gem can be found here and its source here.

Directory Structure

A C extension's directory structure is very similar to other Ruby gems; the only addition is an ext directory that will store files necessary for generating a Makefile and compiling the code. This is where your C files and their headers will go. In my case, these files are located in ext/sieve.

extconf.rb

extconf.rb is what will generate your Makefile. You'll need to require "mkmf" and then call create_makefile("your_gem/your_gem"). The mkmf documentation is an excellent resource if you want to include other libraries or customize anything. My gem is straightforward so all I did was ensure the Makefile was created.

sieve.h

My sieve.h is very straightforward and doesn't really need any explanation.

sieve.c

This is the meat and potatoes of the gem.

At the top of the file, you'll need to #include <ruby.h> as well as any other headers you need.

You'll also need an Init_your_gem() function that will be called similarly to main(). This is where I create the structure of my classes and modules for my sieve. I create a Sieve module, add a sieve instance method, and then have the Numeric class include Sieve.

Finally, there's the sieve function itself. It returns a Ruby object which is of type VALUE. It also accepts a Ruby object (self), which is also of type VALUE. An important reminder is to make sure you free any memory you allocate or your gem will leak memory, just as you would in C.

The /lib directory

Although most of the actual work is done in C, we'll want to have a bit of Ruby in the /lib directory. I've written a scaffold of the module at lib/sieve.rb, which has a require "sieve/sieve" at the top of the file (remember in extconf.rb when we passed a string to create_makefile? That's it.).

The Rakefile

Being able to compile the gem and run the tests is important, which is why I mentioned the rake-compiler gem earlier. After requiring rubygems, rake, and your library, you'll want to require "rake/extensiontask". That'll give you a couple of handy rake tasks, namely clean and compile.

I like to set my default task to run tests, but you'll want a couple prerequisites to that task: clean and compile. This will ensure that you're rebuilding your gem and running with the latest compiled version.

Since I'm using Cucumber, it looks like this:

require "cucumber/rake/task"
Cucumber::Rake::Task.new(:cucumber => [:clean, :compile]) do |t|
  t.rcov = true
end

task :default => :cucumber

I also have my benchmark task here so I can find out how much more performant this library is compared to a pure Ruby implementation.

Testing

You'll want to test your C extension just like any other Ruby gem. I prefer Cucumber but anything will do. I used a scenario outline for some of the basic primes and then found a file of the first one million primes for some heavy-duty lifting. I also tested that if enough memory couldn't be allocated, it would raise a Ruby NoMemoryError exception.

Since you're going to want to run the features against the latest changes of your gem (and not a version of the gem that's installed), you'll want to modify the load path within your test helper (test/test_helper.rb, spec/spec_helper.rb, or features/support/env.rb). My env.rb looks like this:

$LOAD_PATH.unshift(File.dirname(__FILE__) + '/../../lib')
require "sieve"
require "spec/expectations"

Building the gem

The gemspec for a Ruby C extension is fairly straightforward. The only thing you'll need to add is to set the spec's extensions attribute to the path to the extconf.rb file.

Gem::Specification.new do |s|
  s.require_paths = ["lib"]
  s.extensions = ["ext/sieve/extconf.rb"]
  # ... rest of the gemspec
end

As with any gemspec, you'll want to make sure that you list the .c and .h files within files.

Results

Armed with this, you should be able to go and write Ruby C extensions to your hearts content. As for my Sieve experiment, here's the pure-Ruby implementation of the sieve:

# usage:
#   >> sieve 100
#   => [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
def sieve(n)
  numbers = (0..n).map {|i| i }
  numbers[0] = numbers[1] = nil
  numbers.each do |num|
    next unless num
    break if num**2 > n
    (num**2).step(n, num) {|idx| numbers[idx] = nil }
  end
  numbers.compact
end

My benchmarks were running the sieve on numbers from zero to one million in steps of 100,000. No memoization is used for either form.

On Ruby 1.8.7, here are the results from my benchmark:

                   user     system      total        real
sieve method   4.460000   0.060000   4.520000 (  4.522069)
Numeric#sieve  0.040000   0.000000   0.040000 (  0.046349)

Ruby 1.9.2 is significantly faster, but still doesn't hold a candle to the C extension:

                   user     system      total        real
sieve method   2.410000   0.060000   2.470000 (  2.468430)
Numeric#sieve  0.050000   0.000000   0.050000 (  0.049053)

What I Learned

Writing C is both fun and can enhance performance of number-crunching and other fun things. It has it's place and is a great addition to any Rubyist's toolbox. Have you written any C extensions purely for performance gains? If you're open to sharing the context, I'd love to hear about it!