Anonymizing User, Company, and Location Data Using Faker
Often during development it’s useful to have realistic data to get a sense of how an app would behave in the wild. Seed data is one useful method to get going pre-launch, but production data is always preferable.
However production data can contain sensitive user information, which is useful for the dev team but nerve-wracking for those looking to avoid a PR disaster, say if sensitive equipment is left lying around in bars.
One solution we’ve used recently to anonymize client data is to obscure the relevant content using the Faker gem and the rand function.
User.all.each do |user| genders = ['male', 'female'] user.update_attributes!( :born_on => rand(50*365).days.ago :email => (rand(1000) + 100).to_s + Faker::Internet.email, :first_name => Faker::Name.first_name, :gender => genders.rand, :last_name => Faker::Name.last_name) end
Faker's seed name list is limited and leads to duplicates quickly. You can further randomize fields by prepending random numbers as above.
Anonymizing company names is also straightforward. Faker provides fun catch phrase generators of fake business jargon such as "Inverse 24/7 utilisation"
Company.all.each do |company| company.update_attributes!( :description_html => Faker::Company.catch_phrase, :name => Faker::Company.name, :twitter_username => Faker::Internet.user_name, :url => 'http://' + Faker::Internet.domain_name) end
To anonymize location-based data is trickier - randomizing the latitude/longitude values would look scattered on a map view. One method is to keep the (lat, long) pairs together but randomize them across the column by loading them into an array, shuffling, then replacing the existing data.
Here we use a modified verison of the previously-mentioned inject method.
location_array = Location.all.inject() do |result, location| result << [location.lat, location.lng] end.shuffle Location.all.each do |location| lat, lng = location_array.pop location.update_attributes( :city => Faker::Address.city, :extended_address => Faker::Address.secondary_address, :lat => lat, :lng => lng, :phone => Faker::PhoneNumber.phone_number, :postal_code => Faker::Address.zip_code, :state => Faker::Address.state_abbr, :street_address => Faker::Address.street_address ) end
Faker generates numbers with prefixes and extension numbers, such as "+1 (877) 976-2687 x1234". For a strict (XXX-XXX-XXXX) format, use:
:phone => (rand(900) + 100).to_s + "-" + (rand(9000) + 1000).to_s + "-" + (rand(9000) + 1000).to_s
Complete code available at this Gist.