Splitting an open source project in two with Git submodules

Tute Costa

doorkeeper is an OAuth provider rubygem for Ruby applications. It historically solved at least two problems:

  1. Handling the data and logic of an OmniAuth server (set up the resource and authorization server as defined in the spec).
  2. Knowing how to persist its data in different ORMs and databases: SQL-like through ActiveRecord, and MongoDB through MongoMapper and Mongoid.

There were a few issues with having both responsibilities in the same codebase:

  • For the past two years, I’ve been the only consistent maintainer, and I’ve been using only ActiveRecord. I can’t guarantee the features I add or bugs I fix work well with MongoDB.
  • Configuring the test matrix was time-consuming. Not all versions of Mongoid run well with all versions of Ruby and Rails.
  • Setting up dependencies and running the test suite locally was complex.
  • At its peak, our test suite took 40 minutes to run in Travis CI. Feedback loop felt too slow for us.
  • Adding features that required model changes was harder than needed: we needed to make sure the changes to the gem would work in every single ORM version (and across Ruby and Rails versions).
  • Users of other ORMs would try to extend doorkeeper with their own, following current architecture: adding yet another ORM into the repository.

It has been in our roadmap to extract ORM specifics into their repositories for a long time. But we couldn’t find a way to test both projects guaranteeing they would always integrate with each other well, and they would keep at least as healthy of test coverage and reliability as it already had.

Sweeping cruft under the rug doesn’t solve all issues

Splitting the core doorkeeper functionality and its ORM adapters might solve most of the previous caveats, but it’s not free. A set of libraries is harder to work on, run integration tests on, and to release than a single one.

Our primary issue was testing:

  • Relational databases differ from each other, and any relational database works very differently from NoSQL databases. Unit tests that spec out the interface between doorkeeper and data stores are not reliable for us.
  • Test coverage and integration tests are already good in the original test suite, and we don’t want to lose that.
  • Copying specs from the main project into the ORM repository would result in verbatim duplicates that get out of sync as soon as there’s a commit changing any project’s specs, effectively forking doorkeeper’s test suite.
  • Including doorkeeper as a gem dependency didn’t work because it doesn’t allow us to run its tests as part of the extension’s suite.

The best we could come up with during these discussions was to organize ORMs in subdirectories in doorkeeper’s repository. It resulted in an acceptable compromise: we wouldn’t split doorkeeper, but boundaries between shared models code and ORM specifics were explicit, and doorkeeper was reasonably decoupled from the ORM of choice. The project was open for extension, with the ability to accept new ORMs without needing to change existing files. I didn’t take advantage of this fact though and rejected new ORMs, due to the reasons detailed above.

We still needed to to give developers a way to extend doorkeeper with the ORM they want.

The best of both worlds: git submodule

We knew we wanted a doorkeeper-mongodb project, but we didn’t know how to test it. git submodule was the tool we needed.

As described in the git-submodule man page, submodules allow other repositories to be embedded within a subdirectory of the current repository, always pointed at a particular commit. Submodules are meant for different projects you would like to make part of your source tree while the history of the two projects stay independent.

Submodules are composed of a file in the root of the main repository that refers to a particular SHA within the inner repository. A record in the .gitmodules file at the root of the source tree assigns a logical name to the submodule and describes the default URL the submodule shall be cloned from. doorkeeper-mongodbs contents are:

[submodule "doorkeeper"]
    path = doorkeeper
    url = https://github.com/doorkeeper-gem/doorkeeper.git

We can initialize and update submodules with the git submodule init and git submodule update commands:

doorkeeper-mongodb master % git submodule init && git submodule update
Submodule path 'doorkeeper': checked out
'b62dcad046564a0e535e6ac17226fc33778a2cde'

It checks out the reference the submodule was committed with, in that case, the latest commit to doorkeeper’s master branch. We can checkout another reference. Step by step details follow:

Go into the submodule’s directory:

doorkeeper-mongodb master % cd doorkeeper

We are in the doorkeeper repository; we can checkout another reference in that project:

doorkeeper HEAD % git checkout 2.2-stable
Previous HEAD position was b62dcad... Release version 3.0.0.rc1
Switched to branch '2.2-stable'
Your branch is up-to-date with 'origin/2.2-stable'.
doorkeeper 2.2-stable %

We come back to doorkeeper-mongodb, and check the difference with latest commit:

doorkeeper 2.2-stable % cd ..
doorkeeper-mongodb master % git diff
diff --git a/doorkeeper b/doorkeeper
index b62dcad..9c8ba77 160000
--- a/doorkeeper
+++ b/doorkeeper
@@ -1 +1 @@
-Subproject commit b62dcad046564a0e535e6ac17226fc33778a2cde
+Subproject commit 9c8ba7705a0af17b76990f4fbd83f5fbe5c3f9bf

If we were to commit in doorkeeper-mongodb, the only change we commit is that SHA reference difference and not all the changes that happened between master and 2.2-stable. The next time we update the submodule it will be at that revision.

To run the specs as part of the extension’s suite, before the spec task a new load_doorkeeper task is run. We make that happen with these additions to the Rakefile:

task :load_doorkeeper do
  `git submodule init`
  `git submodule update`
  `cp -r -n doorkeeper/spec .`
  `bundle exec rspec`
end

task spec: :load_doorkeeper

After the submodule initialization, it copies doorkeeper’s specs into the extension’s root path. The copy happens with the -n flag, which prevents cp from overwriting files that already exist, allowing overrides. The User model from the dummy test app, for example, needs to stay configured with MongoDB rather than upstream’s ActiveRecord.

See the code

The two Pull Requests for this project split are:

Both have several hundred lines of deletions: ORM specifics from the former, and the preexisting spec/ from the latter.

What’s next

New doorkeeper (version 3.0.0.rc1 as of today) works in the same way for ActiveRecord and MongoDB projects, with a slightly different code loading behavior for MongoDB users. If you would like to upgrade and use ActiveRecord, just bump the major version! If you are a MongoDB user, append -mongodb to the doorkeeper gem in your Gemfile, like:

diff --git a/Gemfile b/Gemfile
index b23e48a..84a4dac 100644
--- a/Gemfile
+++ b/Gemfile
@@ -12,7 +12,7 @@ gem "bourbon", "~> 3.2.1"
 gem "clearance", "~> 1.8.0"
 gem "coffee-rails"
 gem "paperclip", "~> 4.2.1"
-gem "doorkeeper", "2.0.0"
+gem "doorkeeper-mongodb", "~> 3.0.0.rc1"
 gem "dynamic_form", "~> 1.1.4"
 gem "flutie"
 gem "font-awesome-rails"

Please let us know if you run into any issues, so we can release a stable 3.0.0 version. You can check the NEWS file to check other changes you might need to make to run on the latest version. It should be a seamless upgrade for most users.

doorkeeper is now (really) open to extension: to the default ActiveRecord choice, we add the preexisting MongoDB ORM code as a plugin, which in turn sets an example for how to add new non-Omniauth features to doorkeeper. Looking forward to seeing and helping with new doorkeeper extensions!