Redis Set Intersection - Using Sets to Filter Data

Harlow Ward

Redis is a fantastic lightweight key-value store. It also acts as a data structure server for storing data types like lists, sets, and hashes.

With Redis sets we can store records in groups based on time. The sunion operator then enables us to group multiple sets together based on a particular time frame.

Example application

Search flights based on a departure city, arrival city, and the date/time of departure.

Redis set intersection

|Id|Departure Time|Airline|Departure City|Arrival City| |1|03/23/2013 1:30 pm|Virgin America|Los Angeles|New York| |2|03/23/2013 1:45 pm|Virgin America|San Francisco|New York| |3|03/23/2013 1:45 pm|American Airlines|San Francisco|New York| |4|03/23/2013 1:50 pm|American Airlines|Los Angeles|Boston| |5|03/23/2013 2:45 pm|Southwest|San Francisco|New York| |6|03/23/2013 3:30 pm|Southwest|San Francisco|New York|

The flight data is stored as strings in Redis using key-value pairs with the following key structure:

get flights:1
"1, Virgin America, 03/23/2013 1:30 pm, San Francisco, San Diego"
get flights:2
"2, Virgin America, 03/23/2013 1:45 pm, San Francisco, New York"
...
get flights:6
"6, Southwest, 03/23/2013 3:30 pm, San Francisco, New York"

Group flights by departure time

To group flights by departure time we’ll create a Redis set for each hour of the day, and flights will be added to the appropriate set based on its epoch departure time.

smembers departure_time:1364068800:flights
1) "1"
2) "2"
3) "3"
4) "4"
smembers departure_time:1364072400:flights
1) "5"
smembers departure_time:1364076000:flights
1) "6"

Group flights by city

Flights will also be grouped into sets based on their departure and arrival cities. The city names have been parameterized to keep a consistent key structure.

Departing flights:

smembers cities:los-angeles:departures
1) "1"
2) "4"
smembers cities:san-francisco:departures
1) "2"
2) "3"
3) "5"
4) "6"

Arriving flights:

smembers cities:boston:arrivals
1) "4"
smembers cities:new-york:arrivals
1) "1"
2) "2"
3) "3"
4) "5"
5) "6"

Find flights from San Francisco to New York

To find all flights departing from San Francisco and arriving in New York we can use sinter to get the intersection of two sets.

sinter cities:san-francisco:departures cities:new-york:arrivals
1) "2"
2) "3"
3) "5"
4) "6"

However, this technique does not allow us to filter flights over a range in time.

To find flights departing on 03/23/2013 between 1:00 pm and 3:00 pm, we’ll need to group the sets of departure times, and then perform an intersection with the resulting set.

Redis set intersection

Redis set intersection

Here are the steps we’ll take:

  1. Create a new temp_set using sunionstore to group flights by departure time.
  2. Intersect the temporary set with the departure and arrival sets.
  3. Loop over the results of the intersection and generate an array of flight keys.
  4. Use mget to fetch all the flight data.

We’ll use a combination of Ruby and Redis to achieve this:

# flight_finder.rb
class FlightFinder
  def initialize(attrs)
    @arrival_city = attrs.fetch(:arrival_city).parameterize
    @departure_city = attrs.fetch(:departure_city).parameterize
    @window = attrs.fetch(:window)
  end

  def flights
    redis.mget(*flight_keys)
  end

  private

  attr_reader :arrival_city, :departure_city, :window

  def flight_keys
    flight_ids.map { |id| "flights:#{id}" }
  end

  def flight_ids
    redis.multi do
      redis.sunionstore('temp_set', *departure_time_keys)
      redis.sinter('temp_set', departure_cities_key, arrival_cities_key)
    end.last
  end

  def departure_cities_key
    "cities:#{departure_city}:departures"
  end

  def arrival_cities_key
    "cities:#{arrival_city}:arrivals"
  end

  def departure_time_keys
    window.range_keys { |epoch_time| "departure_time:#{epoch_time}:flights" }
  end

  def redis
    Redis.current
  end
end

We’ll introduce the concept of a window to avoid a long parameter list, and group parameters that naturally fit together. This allows us to encapsulate behavior between the start_at and end_at attributes.

# window.rb
class Window
  ONE_HOUR_IN_SECONDS = 3600

  def initialize(start_at, end_at)
    @start_at = convert_to_epoch(start_at)
    @end_at = convert_to_epoch(end_at)
  end

  def range_keys
    (start_at...end_at).step(hour).map do |epoch_time|
      yield(epoch_time)
    end
  end

  private

  attr_reader :start_at, :end_at

  def convert_to_epoch(string_or_time)
    DateTime.parse(string_or_time).to_time.to_i
  end

  def hour
    ONE_HOUR_IN_SECONDS
  end
end

Create a command line script.

# flights.rb
require 'flight_finder'
require 'window'

window = Window.new(
  start_at: 'March 23, 2013 1:00 pm',
  end_at: 'March 23, 2013 3:00 pm'
)

flight_finder = FlightFinder.new(
  departure_city: 'San Francisco',
  arrival_city: 'New York',
  window: window
)

puts flight_finder.flights

Run the command line script.

$ ruby flights.rb
2, Virgin America, 03/23/2013 1:45 pm, San Francisco, New York
3, American Airlines, 03/23/2013 1:45 pm, San Francisco, New York
5, Southwest, 03/23/2013 2:45 pm, San Francisco, New York

Takeaways

  • Redis set intersection is a powerful filtering technique.
  • Use sunionstore to create new sets on the fly.
  • Redis data structures are fantastic for slicing and dicing data.

Redis partial word match — you (auto)complete me

Redis Pub/Sub…how does it work?