Lazy Refactoring

Joël Quenneville and Matt Sumner

On the first Tuesday of each month our Boston office hosts Boston.rb’s project night. This is an opportunity for the community to get together and hack away on each other’s projects. At a recent project night we encountered an interesting problem. We went through several iterations, before we found a solution that satisfied us.

Initial problem

We’re implementing the ability to search events. Search terms should be matched against a set of columns. If no matches are found in the first column, search the next column. If no results are found in any column, fall back to a fuzzy search.

Here is the initial implementation:

def self.search(query)
  matching_column = [:name, :city, :title].detect do |column|
    Event.where(column => query).presence
  end

  if matching_column
    Event.where(matching_column => query)
  else
    Event.fuzzy_search(query)
  end
end

First we query all the columns and find the first column that contains results. Then query the column again and return the results. If no column contains any results, perform a fuzzy search.

The presence method returns the object it is called on or nil if it is empty.

This solution performs a duplicate query after finding which column to search. In addition the implementation is a little awkward.

First try

Our first attempt was to refactor the original implementation to avoid the extra queries.

def self.search(query)
  result = nil
  [:name, :city, :title].detect do |column|
    result = Event.where(column => query).presence
  end

  if result
    result
  else
    Event.fuzzy_search(query)
  end
end

We’ve added the local variable result to cache the query results. This takes care of the extra query. However, by initializing an empty variable and mutating it over the course of a loop we’ve introduced an anti-pattern.

This code is still tough to read.

Booleans

We decided to go back to the drawing board and think about the problem in abstract terms. This led to:

def self.search(query)
  exact_search(:name, query).presence ||
    exact_search(:city, query).presence ||
    exact_search(:title, query).presence ||
    fuzzy_search(query)
end

def self.exact_search(field, query)
  Event.where(field => query)
end

It would be tempting here to see all these ||s and attempt to construct a SQL OR statement.

def self.search(query)
  Event.where("name = :query OR city = :query OR title = :query", query: query)
end

However, SQL OR is not equivalent to logical ||. || will only return the first non-nil set. OR will return all results that match any of the criteria. The SQL query above is equivalent to:

def self.search(query)
  exact_search(:name, query) +
    exact_search(:city, query) +
    exact_search(:title, query)
end

def self.exact_search(field, query)
  Event.where(field => query)
end

This would return a mix of name, city, and title results rather than results for the first field that returned any values.

The boolean approach was a clean solution to our problem. However, a new requirement emerged that the exact search should occur on an arbitrary array of columns.

Procedural approach

We could have iterated on the previous solution by throwing in some meta programming but decided to explore some other approaches. Here we pull out some old-style procedural programming:

def self.search(query, columns)
  index = 0
  results = Event.none

  while index < columns.length
    results = Event.where(columns[index] => query)
    break if results.exists?
    index += 1
  end

  results.presence || Event.fuzzy_search(query)
end

This solution is very un-Ruby like but does meet our new requirement. This uses a while loop to iterate over each element in the columns array and breaks if results are found.

This reintroduces the temporary variable + loop anti-pattern mentioned above.

Custom enumerator

We discovered that we really wanted was an array of queries that would only be evaluated as they were accessed. This led to building a custom Enumerator:

def self.search(query, column_names)
  results(column_names).detect(&:exists?)
end

def self.results(column_names)
  columns = column_names.each

  Enumerator.new do |yielder|
    loop do
      yielder << Event.where(columns.next => query)
    end

    yielder << Event.fuzzy_search(query)
  end
end

Looking at Enumerator

To understand the code above, let’s dig into Ruby’s Enumerator. Enumerator implements the internal and external iterator patterns.

[1] pry(main)> results = Enumerator.new do |yielder|
[1] pry(main)*   yielder << "first"
[1] pry(main)*   yielder << "second"
[1] pry(main)* end
=> #<Enumerator: ...>
[2] pry(main)> results.next
=> "first"
[3] pry(main)> results.next
=> "second"
[4] pry(main)> results.next
StopIteration: iteration reached an end
from (pry):7:in `next'

Calling next on an enumerator will run the code inside its block until a value is sent to the yielder. The code block is then suspended. Subsequent invocations of next will continue running the block from the previous location. When the block reaches its end, a StopIteration error is raised.

Some collections are not just a set of hard-coded values.

[1] pry(main)> even_numbers = Enumerator.new do |yielder|
[1] pry(main)*   n = 1
[1] pry(main)*   loop do
[1] pry(main)*     yielder << n if n.even?
[1] pry(main)*     n += 1
[1] pry(main)*   end
[1] pry(main)* end
=> #<Enumerator: ...>
[2] pry(main)> even_numbers.next
=> 2
[3] pry(main)> even_numbers.next
=> 4
[4] pry(main)> even_numbers.next
=> 6
[5] pry(main)> even_numbers.take(10)
=> [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

The code inside the loop block will be run until a value is sent to the yielder. The code inside loop could be executed multiple times before a value gets sent to yielder. In this case, it only yields on every other run. We’ve created an infinite series of even numbers. take will calculate the first n numbers of the series.

[1] pry(main)> evens = Enumerator.new do |yielder|
[1] pry(main)*   yielder << 'first'
[1] pry(main)*   n = 1
[1] pry(main)*   loop do
[1] pry(main)*     if n < 7
[1] pry(main)*       yielder << n if n.even?
[1] pry(main)*     else
[1] pry(main)*       raise StopIteration
[1] pry(main)*     end
[1] pry(main)*     n += 1
[1] pry(main)*   end
[1] pry(main)*   yielder << 'last'
[1] pry(main)* end
=> #<Enumerator: ...>
[2] pry(main)> evens.to_a
=> ["first", 2, 4, 6, "last"]

This combines the two examples above and introduces the StopIteration error. This allows use to break our loop giving us a finite enumerator.

The final piece needed to understand our solution is to realize that calling each without a block returns an Enumerator.

[1] pry(main)> enum = [1,2].each
=> #<Enumerator: ...>
[2] pry(main)> enum.next
=> 1
[3] pry(main)> enum.next
=> 2
[4] pry(main)> enum.next
StopIteration: iteration reached an end
from (pry):4:in `next'

Back to the custom enumerator solution

Now that we know about enumerators, let’s take a closer look at the proposed solution:

def self.search(query, column_names)
  results(column_names).detect(&:exists?)
end

def self.results(column_names)
  columns = column_names.each

  Enumerator.new do |yielder|
    loop do
      yielder << Event.where(columns.next => query)
    end

    yielder << Event.fuzzy_search(query)
  end
end

By calling each without a block on column_names we get an enumerator. Inside the loop we call next on that enumerator to retrieve a column name. Once there are no more column names columns.next will raise a StopIteration error, breaking the loop. We add our default fuzzy search as the last call to yielder.

results now behaves like an array of queries that only get evaluated if they are accessed. Just what we set out to build.

We can now use detect to find the first result that exists. detect evaluates each value until one matches the given expression. The remaining values are not evaluted.

Lazy enumeration

It turns out the solution above is quite similar to lazy enumerators introduced by Ruby 2.0:

def self.search(query, column_names)
  queries = column_names.lazy.map do |column|
    Event.where(column => query)
  end

  queries.detect(&:exists?) || Event.fuzzy_search(query)
end

Lazy enumerators will pass a value through the entire chain of methods before evaluating the next value.

[1] pry(main)> def query(q)
[1] pry(main)*   puts "querying #{q}"
[1] pry(main)*   q
[1] pry(main)* end
=> :query
[2] pry(main)> ['title', 'name', 'city'].map { |n| query(n) }.detect { true }
querying title
querying name
querying city
=> "title"
[3] pry(main)> ['title', 'name', 'city'].lazy.map { |n| query(n) }.detect { true }
querying title
=> "title"

As we can see above, lazy.map.detect only calls query with ‘title’ while map.detect calls query for every element in the array.

lazy vs non-lazy

What’s next