Anonymizing User, Company, and Location Data Using Faker

Anonymizing User, Company, and Location Data Using Faker

Often during development it’s useful to have realistic data to get a sense of how an app would behave in the wild. Seed data is one useful method to get going pre-launch, but production data is always preferable.

However production data can contain sensitive user information, which is useful for the dev team but nerve-wracking for those looking to avoid a PR disaster, say if sensitive equipment is left lying around in bars.

One solution we’ve used recently to anonymize client data is to obscure the relevant content using the Faker gem and the rand function.

User.all.each do |user|
  genders = ['male', 'female']
  user.update_attributes!(
    :born_on => rand(50*365).days.ago
    :email => (rand(1000) + 100).to_s + Faker::Internet.email,
    :first_name => Faker::Name.first_name,
    :gender => genders.rand,
    :last_name => Faker::Name.last_name)
end

Faker’s seed name list is limited and leads to duplicates quickly. You can further randomize fields by prepending random numbers as above.

Anonymizing company names is also straightforward. Faker provides fun catch phrase generators of fake business jargon such as “Inverse 24/7 utilisation”

Company.all.each do |company|
  company.update_attributes!(
    :description_html => Faker::Company.catch_phrase,
    :name => Faker::Company.name,
    :twitter_username => Faker::Internet.user_name,
    :url => 'http://' + Faker::Internet.domain_name)
end

To anonymize location-based data is trickier - randomizing the latitude/longitude values would look scattered on a map view. One method is to keep the (lat, long) pairs together but randomize them across the column by loading them into an array, shuffling, then replacing the existing data.

Here we use a modified verison of the previously-mentioned inject method.

location_array = Location.all.inject([]) do |result, location|
  result << [location.lat, location.lng]
end.shuffle

Location.all.each do |location|
  lat, lng = location_array.pop
  location.update_attributes(
    :city => Faker::Address.city,
    :extended_address => Faker::Address.secondary_address,
    :lat => lat,
    :lng => lng,
    :phone => Faker::PhoneNumber.phone_number,
    :postal_code => Faker::Address.zip_code,
    :state => Faker::Address.state_abbr,
    :street_address => Faker::Address.street_address
  )
end

Faker generates numbers with prefixes and extension numbers, such as “+1 (877) 976-2687 x1234”. For a strict (XXX-XXX-XXXX) format, use:

:phone => (rand(900) + 100).to_s + "-" + (rand(9000) + 1000).to_s + "-" + (rand(9000) + 1000).to_s

Complete code available at this Gist.

Adarsh Pandit Developer

Sharpen your programing skills by completing coding exercises that are reviewed by other developers at Upcase today.