giant robots smashing into other giant robots

Written by thoughtbot

dancroak

Trail Map

> How do I learn Ruby on Rails? Vim? Test-Driven Development?

Someone asks us these questions weekly. We think we finally have good answers.

Extracting answers from apprentice.io

apprentice.io is a program designed around 1-to-1 mentor-to-apprentice relationships with a heavy emphasis on pair programming.

However, each apprentice additionally has extra time each week to study topics of their choice. They set goals with their mentors and are held accountable to reaching them by publicizing the goals in an internal wiki.

Example goals include:

  • Read chapters 14 and 17 of The RSpec Book.
  • Review and merge a pull request on an open source project.
  • Write blog post about anonymizing data.

One curriculum does not fit all

We’ve been calling each apprentice’s wiki page their “trail map”.

To us, the “trail map” metaphor relates to hikers, bikers, and skiiers:

  • start in different places
  • want to go to different places
  • often change direction mid-journey

Likewise, apprentices (and anyone learning a topic):

  • have different past experiences
  • have different learning styles
  • change their goals mid-process

Trail map

Announcement!

With 12 apprentices in the apprentice.io program, we’ve noticed common patterns in each apprentice’s trail map.

So, we’ve consolidated trails into a default trail map and we’re pleased to now announce its release under a Creative Commons Atribution license.

You’re free to use the trail map however you’d like, even commercial training.

Initial trails

The trails exist as a single git repository on Github named Trail Map:

We hope learners everywhere will fork these trails for their own learning purposes and submit improvements via pull requests.

Each trail has three sections:

  • Critical learning
  • Validation
  • Ongoing reference

Critical learning

This section lists things like books or blog posts to read, screencasts to watch, code to read or write, and koans or tutorials to complete.

In each topic, we aren’t aiming for greatest depth, but rather the most efficient way for the learner to become productive.

For example: we suggest chapters, rather than entire books, to read.

Validation

This section lists simple tasks the learner should be able to perform during routine development. We’ve never liked quizzes or certifications, but some hueristic is useful for assessment. We think self-assessment is a simple, fast, and low-stress approach.

For example: we say you know everyday git when you can (among other things), “stage a file, create a commit, and push to a remote branch.”

Ongoing reference

This section lists things like man pages and API documentation which we’ll always reference regardless of experience. Many things are not worth memorizing.

For example, we suggest that a developer refers to man git-rebase during a project.

What’s next

This is a work in progress. We plan to add and edit trails as new resources are released or people tell us better ways they’re learning a topic.

We’d love to get your feedback in the form of Github Issues.

Written by .

dancroak

Global min_messages

When running Rails apps with Postgres in development mode, you might notice output like this when running tests:

NOTICE:  CREATE TABLE will create implicit sequence "users_id_seq" for serial column "users.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "users_pkey" for table "users"

It’s noisier than you need for day-to-day development. To quiet the noise, you could change a setting in your config/database.yml file:

min_messages: warning

However, we can set this machine-wide instead of project-wide, similar to the global .gitignore file and set noswapfile vim setting.

psql -d rails_app_development
ALTER ROLE  SET client_min_messages TO WARNING;

Change “ to be the user account for your machine.

Bueno.

Written by .

hgimenez

The Durable Document Store You Didn’t Know You Had, But Did

As it turns out, PostgreSQL has a number of ways of storing loosely structured data — documents — in a column on a table.

  • hstore is a data type available as a contrib package that allows you to store key/value structures just like a dictionary or hash.
  • You can store data in JSON format on a text field, and then use PLV8 to JSON.parse() it right in the database.
  • There is a native xml data type, along with a few interesting query functions that allow you to extract and operate on data that sits deep in an XML structure.

XML Storage is the topic of this post.

The XML data type has been Postgres core since version 8.3 (and as part of a contrib package prior to that). Documents that span more than one data block will be TOASTed as usual, so storing big documents should not be a problem. In essence, it is a text data type, but it does bring a few features to the table.

For starters, it will perform basic balance integrity checks on the data. After all, this is Postgres, where data integrity is high up in the priority.

Postgres also supports a number of utility functions for extracting data from XML data. Most notably, you can use XPath expressions to extract portions of the XML document. You can use the extracted data in queries as you would anything else. For example:

xml_test=# CREATE TABLE beers(
  id serial primary key,
  data xml
)
[...]
CREATE TABLE
xml_test=# \d beers
                         Table "public.beers"
 Column |  Type   |                     Modifiers
--------+---------+----------------------------------------------------
 id     | integer | not null default nextval('beers_id_seq'::regclass)
 data   | xml     |
Indexes:
    "beers_pkey" PRIMARY KEY, btree (id)
xml_test=# INSERT INTO beers (data) values ('<beer><name>Harpoon</name><location>Boston, USA</location></beer>');
INSERT 0 1
xml_test=# INSERT INTO beers (data) values ('<beer><name>Guinness</name><location>Dublin, Ireland</location></beer>');
INSERT 0 1
xml_test=# INSERT INTO beers (data) values ('<beer><name>Polar</name><location>Caracas, Venezuela</location><type>Pilsner</type></beer>');
INSERT 0 1
xml_test=# select * from beers;
 id |                                            data
----+--------------------------------------------------------------------------------------------
  1 | <beer><name>Harpoon</name><location>Boston, USA</location></beer>
  2 | <beer><name>Guinness</name><location>Dublin, Ireland</location></beer>
  3 | <beer><name>Polar</name><location>Caracas, Venezuela</location><type>Pilsner</type></beer>
(3 rows)

We’ve basically created a beers table with three entries containing some XML data. You could use that data in a number of ways, including bringing it into your application, deserializing and parsing it there. This may be acceptable in many cases, especially if you have also included foreign keys to other relations in your data model, or any other data that you can use to filter data down. However Postgres offers the ability to use XPath to extract data from the XML type directly:

xml_test=# SELECT xpath('//beer/name', data) from beers;
         xpath
------------------------
 {<name>Harpoon</name>}
 {<name>Guinness</name>}
 {<name>Polar</name>}
(3 rows)

The curly braces in Postgres indicate an array, so the xpath function returns an array of elements that match your query. You can index the first element of that array with square bracket notation:

xml_test=# SELECT (xpath('//beer/name', data))[1]::text from beers;
        xpath
----------------------
 <name>Harpoon</name>
 <name>Guinness</name>
 <name>Polar</name>
(3 rows)

And further, use the XPath text() function to extract the text within the matched XML nodes:

xml_test=# SELECT (xpath('//beer/name/text()', data))[1]::text from beers;
  xpath
---------
 Harpoon
 Guinness
 Polar
(3 rows)

This is useful for extracting data from your XML. Now let’s use it to query for beers in Venezuela:

xml_test=# SELECT * from beers where (xpath('//beer/location/text()', data))[1]::text = 'Caracas, Venezuela'::text;
 id |                                            data
----+--------------------------------------------------------------------------------------------
  3 | <beer><name>Polar</name><location>Caracas, Venezuela</location><type>Pilsner</type></beer>
(1 row)

Now we’re getting somewhere. With a bit of creativity we can create a generic finder for your ORM of choice that finds records by XPath expression. Here’s a simple one for ActiveRecord

class Beer < ActiveRecord::Base
  def self.by_xpath(xpaths)
    xpaths.inject(scoped) do |s, (xpath, value)|
      s.where("(xpath(?, data))[1]::text = ?", xpath, value)
    end
  end
end

Use it:

Beer.by_xpath('//beer/location' => 'Caracas, Venezuela', '//beer/name' => 'Polar')

You may be thinking this can’t possibly be fast to query. Postgres doesn’t allow you to add any indexes on the XML data type itself, but on the other hand any SQL expression can be used in an index. Therefore you can choose to index part of the XML document using the xpath function:

xml_test=# create index index_beers_on_location ON beers USING GIN ( CAST(xpath('//beer/location/text()', data) as TEXT[]) );
CREATE INDEX

Because we’re indexing an array value, we use a Generalized Inverted index (GIN).

When should I use this?

Some may argue that it does not make any sense to store XML data directly in Postgres, after all it is a relational database and there are databases like CouchDB or MongoDB that are designed to store document data. Storing it in your main data store has its advantages though:

  • You don’t need to maintain a new database in your infrastructure. Any service that is required to run your app adds complexity that is not always justified.
  • XML data backups continue to occur as part of your main database backups.
  • The XML data can be used to reference and JOIN other data in your data model.

Anecdotically, I recently used this strategy for storing raw XML data that we are receiving in the background from an API. We parse it, normalize some of it out to our own data model, but keep the raw XML around. This is useful because we can easily refer back to the source data to verify things when they seem off, or to extract and normalize more data out of it. However we don’t use it to display data back to the user or for reporting — we just keep it around and refer back to it from the backend. It’s working great!

hgimenez

See you at PostgreSQL West

The PostgreSQL West conference will take place in San Jose, California this upcoming September 27 through the 30th.

PostgreSQL

There are a few interesting talks for the Ruby crowd. Peter van Hardenberg has a talk titled “PostgreSQL and Ruby: Libraries and their Gaps,” Jeremy Evans will talk about the excellent Sequel library, Will Leinweber’s talk is about “Exposing the power of PostgreSQL to Ruby”, and Ryan Smith will talk about his work on Queue Classic, the PostgreSQL powered queueing library for ruby. I will also be speaking about bacon Redis, as well as giving a full day workshop on Ruby on Rails development with PostgreSQL on September 27th. At $349.00, it’s a bargain!

Additionally, there are a bunch of interesting talks about PostgreSQL and its ecosystem, including topics like replication and management tools like repmgr and Tungsten, Bucardo and so much more.

Hope to see you there!

dancroak

How to back up a Heroku production database to staging

It’s right there in the docs but I didn’t notice it until recently:

heroku pgbackups:restore DATABASE `heroku pgbackups:url --remote production` --remote staging

Boom! It transfers the production Postgres database to staging.

It’s much faster than db:pull, then db:push, which is what I used to do (like a sucker).

Setup:

git remote add staging git@heroku.com:my-staging-app.git
git remote add production git@heroku.com:my-production-app.git
heroku addons:add pgbackups --remote staging
heroku addons:add pgbackups --remote production

Create a database backup at any time:

heroku pgbackups:capture --remote production

View backups:

heroku pgbackups --remote production

Destroy a backup:

heroku pgbackups:destroy b003 --remote production