Handling API Rate Limits by Retrying Requests in Background Jobs

Greg Lazarev

Has your app ever encountered a 429 (Too Many Requests) status code when making requests to a third-party API? Getting rate limited can be a nuisance and if not handled properly can result in a negative user experience. While one solution is to catch the exception and ignore it, a better solution is to retry the request.

Let’s take a look at how we can alleviate rate-limiting woes by utilizing a background job system. In this example we’ll use delayed_job, since it provides the ability to retry failed jobs.

We are going to assume that we are accessing an API of a Popular Website. First, we’ll create a background job that makes a request to that API.

class MyCustomJob < Struct.new(:username)
  def perform
    PopularSiteApi.get("/feed/#{username}")
  end
end

When this job gets executed a bunch of times in the row, we will potentially reach a limit to how many requests we can make, which is provided by the Popular Website. When that happens, an exception will be raised and our background job will fail. That’s okay, delayed_job will retry any failed job (up to 25 times by default).

Rate limiting can vary from amount of requests per day to amount of requests per minute. For the sake of example, let’s assume the latter. Now, delayed_job retries failed jobs in the following manner (from the docs):

On failure, the job is scheduled again in 5 seconds + N ** 4, where N is the number of retries.

In our case, we want to retry our jobs every minute if they fail due to rate limiting. delayed_job provides a method called error which we can define to inspect the exception.

def error(job, exception)
  @rate_limited = at_rate_limit?(exception)
end

def at_rate_limit?(exception)
  exception.is_a?(Faraday::Error::ClientError) && exception.response[:status] == 429
end

Now, we can retry this job at our known time interval by overriding the reschedule_at method. delayed_job uses reschedule_at to calculate when to re-run the particular job. We can also override the number of times we retry the job (if we want it to be different than the default 25 times).

def reschedule_at(attempts, time)
  if @rate_limited
    next_rate_limit_window
  end
end

def max_attempts
  if @rate_limited
    10
  else
    Delayed::Worker.max_attempts
  end
end

def next_rate_limit_window
  1.minute.from_now
end

Once our custom job is configured thusly, we will retry it every minute, ten times in a row until it works. If the job is still encountering a 429 status code after our retries, it will fail completely. At this point, we’ll send out a notification of the failure (using Airbrake) and consider upgrading our API rate plan.

Here’s the full code example:

class MyCustomJob < Struct.new(:param1, :param2)
  def perform
    PopularSiteApi.get('/posts')
  end

  def error(job, exception)
    @exception = exception
  end

  def reschedule_at(attempts, time)
    if at_rate_limit?
      next_rate_limit_window
    end
  end

  def failure(job)
    Airbrake.notify(error_message: "Job failure: #{job.last_error}")
  end

  def max_attempts
    if at_rate_limit?
      10
    else
      Delayed::Worker.max_attempts
    end
  end

  private

  def at_rate_limit?
    @exception.is_a?(Faraday::Error::ClientError) && @exception.response[:status] == 429
  end

  def next_rate_limit_window
    1.minute.from_now
  end
end

Look inside of app/jobs of this open source repository for a real world example.

What’s next

If you found this useful, you might also enjoy: