giant robots smashing into other giant robots

We are thoughtbot. We make web & mobile apps.

Tagged:

Comments (View)

Hoptoad, the cloud, and the pond ahead

For the last couple of months, we’ve seen considerable growth on Hoptoad accounts and traffic. Thank you all! But this introduced new traffic patterns and challenges. During this time we’ve been mostly keeping up with this growth and making sure we can provide as reliable a service as possible. There have been some bumps along the way. This is what has happened, what we’ve done about it, and what is yet to come.

Hoptoad

The error process queue

For over a year, Hoptoad has stored exception details as a gzipped XML on Amazon S3. When an error is POSTed to our API endpoint, we validate it, group it with similar errors, and store it on the app server’s file system. Every five minutes there was a cron job that would upload all these XML files to S3. These details were only available for viewing on the UI after they made it to S3. This is why, more often than we had liked, you would see the dreaded message “Details for this error are still being processed”. This served us well for some time, but we knew it was time to rethink this architecture.

There were many problems with this approach. The most obvious was that this “still processing” error was becoming more and more common, and this degraded the experience of viewing error messages for our users (us included). The first thing we did to improve that experience was rather simple and did not require wholesome architectural changes: Instead of trying to display the last notice that we got for that error group, we showed you the last processed error for that group. So therefore, instead of seeing the processing message, you would see actionable data for that exception so that you can get back to work fixing bugs.

Even though this helped the situation and the number of support requests greatly decreased, we always knew this was a temporary solution and we could do better. We needed a way to store error details in the life cycle of the request, in such a way that it was available immediately afterwards for viewing. Uploading to S3 became too slow for our needs.

Furthermore, this was not the only problem with this architecture. The larger problem is that because of our high traffic, we started running into all sorts of issues with either disk space filling up before our workers were able to push notice details to S3, or even worse, an application instance failing completely thus losing any unprocessed details. In those rare cases, another application instance would be automatically provisioned, and the XML on that filesystem would be lost.

Enter MongoDB

In order to display exception details quickly, we decided to make use of MongoDB, removing temporary file system and S3 storage alltogether. When an exception hits our API, we do the same processing we’ve always done but store it in a MongoDB collection instead. The three main advantages to you are:

  • Error details are always available, immediately after we receive them. Therefore you can click on the error URL that you receive on the notification emails and start seeing details for the error with no delay.
  • A more robust storage approach, where app instance failures will never cause details to be completely lost. With careful planning, disk space is not an issue either.
  • Better response times: A nice by-product of this change has been that both storing and reading the data has improved the response time of the application by roughly 30%.

A hybrid future

We can’t stop here. We have encountered numerous problems with our current environment, and we are working to improve our infrastructure. This has been our primary focus for the last couple of months.

We plan on migrating our application to a more traditional hosting environment. While we will continue to use virtualization for application servers and other utilities, our databases will now run on bare metal. We are confident that this will increase our overall performance even more, and provide a predictable path for growth. Among other things, this solves:

  • The bad neighbor problem, where other instances in the cloud steal precious CPU cycles. For high traffic applications like Hoptoad the cost of this problem is very real. On our planned setup, our app servers will run under our own hypervisor, so it is impossible for other applications to steal our CPU.
  • I/O contingency - while most apps can run just fine on the cloud, it is underprovisioned for an application like Hoptoad. We will gain superior I/O bandwidth by designing an infrastructure with faster disks that can support our needs.

Looking forward to a brighter pond

We have been forced to focus our efforts on performance improvements and architectural changes that can support the growth we’ve seen. We are very sorry for the bumps on the road along the way. We are also tired of feeling apologetic. Enough is enough. We have made changes to improve your experience as a customer, and we will continue to do so. Please bear with us until we’ve migrated our infrastructure. We’ll keep you updated as to the timeline for the hosting move. We look forward to being able to stop worrying about performance, and start worrying about how to improve the service by providing better features that make more use of the data, and help you handle your app’s bugs efficiently.

Tagged:

Comments (View)

Recipe: Delivering email on behalf of users

A recipe for a better user experience in emails sent between users via my Rails app.

Why?

When I receive an email from an automated system like a Rails app, it is disorienting if the sender shows up in my email program as “admin” or “donotreply”.

What I want is something like this:

Ingredients

Install email-spec

I’m a fan of Ben Mabey’s email-spec gem, so I’ll install that:

group :test do
  gem 'email_spec'
end

I create a features/support/email.rb file:

require 'email_spec' # add this line if you use spork
require 'email_spec/cucumber'

Then generate some step definitions into features/step_definitions/email_steps.rb:

rails generate email_spec:steps

Feature

Now I’ll write my user story:

Scenario: Guitarist shares song with guitarist
  Given the following user exists:
    | name         | email            |
    | Eric Clapton | eric@example.com |
  And I sign in as "eric@example.com/password"
  And I am on the share page for "Layla"
  When I fill in "Share with" with "jimi@example.com"
  And I press "Share Song"
  And "jimi@example.com" opens the email
  Then he should see "Eric Clapton <admin@goodsongs.com>" in the email "From" header
  And he should see "eric@example.com" in the email "Reply-To" header

The “From” and “Reply-To” headers

I think the “From” and “Reply-To” headers can provide a better user experience.

I don’t set the author’s email as the “From” header because I hear it’s bad spam practice to send email on behalf of users in that way. ISPs use the From header (among other things) to determine if the originator is sending spam.

Making the feature pass

Ease my worried mind:

class Mailer < ActionMailer::Base
  def share_song(song, friend)
    mail :to       => friend.email,
         :from     => %{"#{song.artist.name}" <admin@goodsongs.com>},
         :reply_to => song.artist.email,
         :subject  => "Good song"
  end
end

I’ve used this format so the sender’s name shows up in the receiver’s email program:

"Name" <email@example.com>

In this case, I want Jimi to be able to reply directly to Eric, so I’ve set the “Reply-To” header to be the sender’s address. I’ve explicitly not put the sender’s name in the “Reply-To” header because that doesn’t work.

In other cases, I want the receiver to reply to the email and have that sent through the Rails app, but that’s a story for another day.

Tagged:

Comments (View)

Delivering email with Amazon SES in a Rails 3 app

We’ve been using and loving Sendgrid on all our apps. However, Amazon SES came out last week and… you know… shiny.

Why use Amazon SES?

Right now, price. At our current email rates, we would save more than $10,000 in 2011 using Amazon SES over Sendgrid for Hoptoad.

However, Sendgrid’s a reliable entity with more features (analytics, spam reports, etc.) so even with that dollar figure staring us in the face, we’re not jumping ship quite yet on Hoptoad.

In the meantime, we’re trying Amazon SES on another project that is in private beta to see how well it performs in terms of deliverability, blacklisting, etc.

Already plenty of open source libraries

A week after Amazon announced the service, there were plenty of libraries on Github for Amazon SES. I chose to use drewblas/aws-ses (the aws-ses gem) for the usual reasons:

  • It works.
  • It has a decent-looking test suite that makes me believe it will keep working.

It’s got some fairly intrusive monkey-patching but hey, it’s only a few days old.

Comparing Sendgrid implementation in a Rails app

Another thing that rocks about Sendgrid is how simple it is to use in a Rails app:

ActionMailer::Base.smtp_settings = {
  :address        => "smtp.sendgrid.net",
  :port           => "25",
  :authentication => :plain,
  :user_name      => ENV['SENDGRID_USERNAME'],
  :password       => ENV['SENDGRID_PASSWORD'],
  :domain         => ENV['SENDGRID_DOMAIN']
}

You don’t need any special gem, it’s just SMTP.

Using the aws-ses gem in a Rails app

Amazon SES requires some HMAC‘ing and other stuff, but when using a library, it’s still pretty easy and it has the same dependencies as Rails.

Add the gem to your Gemfile:

gem "aws-ses", "~> 0.3.2", :require => 'aws/ses'

Extend ActionMailer in config/initializers/amazon_ses.rb:

ActionMailer::Base.add_delivery_method :ses, AWS::SES::Base,
  :access_key_id     => ENV['AMAZON_ACCESS_KEY'],
  :secret_access_key => ENV['AMAZON_SECRET_KEY']

Set the delivery method in config/environments/*rb:

config.action_mailer.delivery_method = :ses

That’ll do it. Happy emailing!

Tagged:

Comments (View)

Fetching source index for http://rubygems.org/

Like you, I’ve sat at my terminal watching Bundler emit this post’s title and do nothing for quite a while. Imagine what we could be doing instead of waiting for dependencies to resolve! I’m out of ideas already, I love resolving dependencies.

Why it’s slow

It’s actually not Bundler that is slow…it’s RubyGems itself. To understand why this process takes a long time, you need a bit of a history lesson with how RubyGems handles its index of gems. There are three indexes available:

  • Latest index (newest versions for a given gem on a given platform)
  • Big index (all versions for all gems on all platforms)
  • Prerelease index (only prerelease gems for all gems on all platforms)

Usually we just need to request the “latest” index when you gem install something. However, Bundler needs the big index. This has a serious size difference though:

% wget http://rubygems.org/latest_specs.4.8.gz
% wget http://rubygems.org/specs.4.8.gz
% du -h *
172K    latest_specs.4.8.gz
436K    specs.4.8.gz

These indexes are big gzipped and Marshal‘d arrays of the gem name, version and platform. Our first slowdown is actually in parsing this huge array.

% irb -rubygems -rbenchmark
>> Benchmark.bmbm { |x| x.report { Marshal.load(Gem.gunzip(File.read("specs.4.8.gz"))) } }
Rehearsal ------------------------------------
   2.250000   0.050000   2.300000 (  2.321536)
--------------------------- total: 2.300000sec

       user     system      total        real
   2.280000   0.030000   2.310000 (  2.299291)

Once unzipped/unpacked, the entries in that array usually look like:

["rails", Gem::Version.new("3.0.3"), "ruby"]

Bundler also needs a given gem’s dependencies. If you haven’t noticed already, those dependencies aren’t in the index at all, they’re in the gemspecs, which are stored individually at a completely different location, also gzipped and Marshal‘d.

% irb -rubygems -ropen-uri -rpp
>> open("http://rubygems.org/quick/Marshal.4.8/rails-3.0.0.gemspec.rz").read
=> "x\234\225\223Mo\323@\020\206\vT\37..."
>> Gem.inflate(_)
=> "\004\bu:\027Gem::Specification\002..."
>> Marshal.load(_)
=> #<Gem::Specification:0x101377830 @license=[], @extensions=[], ...
>> pp _.dependencies
[Gem::Dependency.new("activesupport", Gem::Requirement.new(["= 3.0.0"]), :runtime),
 Gem::Dependency.new("actionpack", Gem::Requirement.new(["= 3.0.0"]), :runtime),
 Gem::Dependency.new("activerecord", Gem::Requirement.new(["= 3.0.0"]), :runtime),
 Gem::Dependency.new("activeresource", Gem::Requirement.new(["= 3.0.0"]), :runtime),
 Gem::Dependency.new("actionmailer", Gem::Requirement.new(["= 3.0.0"]), :runtime),
 Gem::Dependency.new("railties", Gem::Requirement.new(["= 3.0.0"]), :runtime),
 Gem::Dependency.new("bundler", Gem::Requirement.new(["~> 1.0.0"]), :runtime)]

So that’s basically how RubyGems figures out dependencies out to a N level, it has to make separate requests to each gemspec and continue to jump through until all possibilities are exhausted. At some point when you gem install a gem, add -V on and you’ll see all of these requests happening.

Those requests obviously take a lot of time, no matter how good Bundler’s resolver algorithm gets. I think we’ve pushed this system to its limits, and the fact that it does complete resolves in a reasonable amount of time is impressive.

What you can do

So it’s still slow. My general advice is to:

  • Check in your vendor/cache directory with your .gem files. If bundle install doesn’t make one, force it with bundle pack.
  • On new installs, CI runs, and deploys, use bundle --local which will attempt to resolve using only vendor/cache
  • Lock down to specific versions (or use the twiddle-wakka) in your Gemfile

What we have done about it

From the RubyGems side, I think we’ve done a good thing by making the long requests go out to CloudFront, so big gems get a CDN boost. However, all requests being are still being made to the Gemcutter server at RackSpace before being redirected to S3/CloudFront, so the network latency with that request doesn’t help those outside of the US get their gems faster.

At Cape Code, Matt and I worked on a new resolver endpoint for Bundler. The idea was that Bundler could make a request to this new API that would return one level of dependencies for a given set of gems. We can’t move the entire Bundler resolver algorithm to the server side, but this could cut down the number of requests it needs to make out for gemspecs.

This will speed things up a bit, but it doesn’t solve the root problem here.

What needs to happen

What we really need is:

  1. A better indexing scheme
  2. A mirroring system that isn’t horrible (read: round robin DNS)

RubyGems definitely needs a better indexing scheme, but this is difficult since making the client support it is going to be rough (and we have to worry about backwards compatability!)

Thankfully, our server is now in Ruby (one of the first goals of the Gemcutter project) so we can iterate rapidly and drop the changes into a gem plugin (think gem fast_install rails). I’ve been talking to some fellow robots here about some possibilities (differential indices for one) but we need to bang some code out soon.

I’m looking into getting a mirroring system set up, but as always, we need contributors to help. My first stop has been with MirrorBrain, but I’m open to anything that works and will be easy to setup. My only real requirement is that it takes < 1 minute to get a gem distributed. Perhaps we need BitTorrent? The gem files are small (most are way under 1MB) so I can’t see that as being hard to accomplish.

My goal is to get rid of at least one of these problems in 2011. Want to help? Hop on IRC (#rubygems on irc.freenode.net) and the Gemcutter mailing list as well.