As it turns out, PostgreSQL has a number of ways of storing loosely structured data — documents — in a column on a table.
hstore is a data type available as a contrib package that allows you to store key/value structures just like a dictionary or hash.text field, and then use PLV8 to JSON.parse() it right in the database.xml data type, along with a few interesting query functions that allow you to extract and operate on data that sits deep in an XML structure.XML Storage is the topic of this post.
The XML data type has been Postgres core since version 8.3 (and as part of a contrib package prior to that). Documents that span more than one data block will be TOASTed as usual, so storing big documents should not be a problem. In essence, it is a text data type, but it does bring a few features to the table.
For starters, it will perform basic balance integrity checks on the data. After all, this is Postgres, where data integrity is high up in the priority.
Postgres also supports a number of utility functions for extracting data from XML data. Most notably, you can use XPath expressions to extract portions of the XML document. You can use the extracted data in queries as you would anything else. For example:
xml_test=# CREATE TABLE beers(
id serial primary key,
data xml
)
[...]
CREATE TABLE
xml_test=# \d beers
Table "public.beers"
Column | Type | Modifiers
--------+---------+----------------------------------------------------
id | integer | not null default nextval('beers_id_seq'::regclass)
data | xml |
Indexes:
"beers_pkey" PRIMARY KEY, btree (id)
xml_test=# INSERT INTO beers (data) values ('<beer><name>Harpoon</name><location>Boston, USA</location></beer>');
INSERT 0 1
xml_test=# INSERT INTO beers (data) values ('<beer><name>Guinness</name><location>Dublin, Ireland</location></beer>');
INSERT 0 1
xml_test=# INSERT INTO beers (data) values ('<beer><name>Polar</name><location>Caracas, Venezuela</location><type>Pilsner</type></beer>');
INSERT 0 1
xml_test=# select * from beers;
id | data
----+--------------------------------------------------------------------------------------------
1 | <beer><name>Harpoon</name><location>Boston, USA</location></beer>
2 | <beer><name>Guinness</name><location>Dublin, Ireland</location></beer>
3 | <beer><name>Polar</name><location>Caracas, Venezuela</location><type>Pilsner</type></beer>
(3 rows)
We’ve basically created a beers table with three entries containing some XML data. You could use that data in a number of ways, including bringing it into your application, deserializing and parsing it there. This may be acceptable in many cases, especially if you have also included foreign keys to other relations in your data model, or any other data that you can use to filter data down. However Postgres offers the ability to use XPath to extract data from the XML type directly:
xml_test=# SELECT xpath('//beer/name', data) from beers;
xpath
------------------------
{<name>Harpoon</name>}
{<name>Guinness</name>}
{<name>Polar</name>}
(3 rows)
The curly braces in Postgres indicate an array, so the xpath function returns an array of elements that match your query. You can index the first element of that array with square bracket notation:
xml_test=# SELECT (xpath('//beer/name', data))[1]::text from beers;
xpath
----------------------
<name>Harpoon</name>
<name>Guinness</name>
<name>Polar</name>
(3 rows)
And further, use the XPath text() function to extract the text within the matched XML nodes:
xml_test=# SELECT (xpath('//beer/name/text()', data))[1]::text from beers;
xpath
---------
Harpoon
Guinness
Polar
(3 rows)
This is useful for extracting data from your XML. Now let’s use it to query for beers in Venezuela:
xml_test=# SELECT * from beers where (xpath('//beer/location/text()', data))[1]::text = 'Caracas, Venezuela'::text;
id | data
----+--------------------------------------------------------------------------------------------
3 | <beer><name>Polar</name><location>Caracas, Venezuela</location><type>Pilsner</type></beer>
(1 row)
Now we’re getting somewhere. With a bit of creativity we can create a generic finder for your ORM of choice that finds records by XPath expression. Here’s a simple one for ActiveRecord
class Beer < ActiveRecord::Base
def self.by_xpath(xpaths)
xpaths.inject(scoped) do |s, (xpath, value)|
s.where("(xpath(?, data))[1]::text = ?", xpath, value)
end
end
end
Use it:
Beer.by_xpath('//beer/location' => 'Caracas, Venezuela', '//beer/name' => 'Polar')
You may be thinking this can’t possibly be fast to query. Postgres doesn’t allow you to add any indexes on the XML data type itself, but on the other hand any SQL expression can be used in an index. Therefore you can choose to index part of the XML document using the xpath function:
xml_test=# create index index_beers_on_location ON beers USING GIN ( CAST(xpath('//beer/location/text()', data) as TEXT[]) );
CREATE INDEX
Because we’re indexing an array value, we use a Generalized Inverted index (GIN).
Some may argue that it does not make any sense to store XML data directly in Postgres, after all it is a relational database and there are databases like CouchDB or MongoDB that are designed to store document data. Storing it in your main data store has its advantages though:
Anecdotically, I recently used this strategy for storing raw XML data that we are receiving in the background from an API. We parse it, normalize some of it out to our own data model, but keep the raw XML around. This is useful because we can easily refer back to the source data to verify things when they seem off, or to extract and normalize more data out of it. However we don’t use it to display data back to the user or for reporting — we just keep it around and refer back to it from the backend. It’s working great!
It’s right there in the docs but I didn’t notice it until recently:
heroku pgbackups:restore DATABASE `heroku pgbackups:url --remote production` --remote staging
Boom! It transfers the production Postgres database to staging.
It’s much faster than db:pull, then db:push, which is what I used to do (like a sucker).
Setup:
git remote add staging git@heroku.com:my-staging-app.git
git remote add production git@heroku.com:my-production-app.git
heroku addons:add pgbackups --remote staging
heroku addons:add pgbackups --remote production
Create a database backup at any time:
heroku pgbackups:capture --remote production
View backups:
heroku pgbackups --remote production
Destroy a backup:
heroku pgbackups:destroy b003 --remote production
Limerick Rake is a collection of useful rake tasks for Rails apps. Some we wrote ourselves, many we’ve collected from others in the community. We’re always on the lookout for helpful rake tasks so if you have anything, please post them in the comments. To use in a Rails app:
script/plugin install git://github.com/thoughtbot/limerick_rake.git
Read tasks/database.rake for details for configuration. Load initial database fixtures (in db/bootstrap/*.yml) into the current environment’s database. Load specific fixtures using FIXTURES=x,y:
rake db:bootstrap:load
Prints a list of unindexed foreign keys so you can index them:
rake db:indexes:missing
Run model validations on all model records in database:
rake db:validate_models
Merge a branch into the origin/staging branch:
rake git:push:staging
Merge the staging branch into origin/production for launch:
rake git:push:production
Show the difference between current branch and origin/staging:
rake git:diff:staging
Show the difference between origin/staging and origin/production:
rake git:diff:production
Pull updates from suspenders, the thoughtbot rails template:
rake git:pull:suspenders
Branch origin/production into BRANCH locally:
rake git:branch:production
Backup the current database. Timestamped file is created as :rails_root/../db-name-timestamp.sql:
rake backup:db
Backup all assets under public/system. File is created as :rails_root/../system.tgz:
rake backup:assets
Convert all CSS files in public/stylesheets to Sass:
rake sass:all_css2sass
Convert all Sass files to CSS:
rake sass:all_sass2css
Convert all HTML files to Haml:
rake haml:all_html2haml
The Limerick Rake is a traditional Irish song.
I am a young fellow that’s easy and bold,
In Castletown conners I’m very well known.
In Newcastle West I spent many a note,
With Kitty and Judy and Mary.
My father rebuked me for being such a rake,
And spending me time in such frolicsome ways,
But I ne’er could forget the good nature of Jane,
Agus fágaimíd siúd mar atá sé.
My parents had reared me to shake and to mow,
To plough and to harrow, to reap and to sow.
But my heart being airy to drop it so low,
I set out on high speculation.
On paper and parchment they taught me to write,
In Euclid and grammar they opened my eyes,
And in multiplication in truth I was bright,
Agus fágaimíd siúd mar atá sé.
If I chance for to go to the town of Rathkeale, The girls all round me do flock on the square.
Some give me a bottle and others sweet cakes,
To treat me unknown to their parents.
There is one from Askeaton and one from the Pike,
Another from Arda, my heart was beguiled,
Tho’ being from the mountains her stockings are white,
Agus fágaimíd siúd mar atá sé.
To quarrel for riches I ne’er was inclined,
For the greatest of misers that must leave all behind.
I’ll purchase a cow that will never run dry,
And I’ll milk her by twisting her horn.
John Damer of Shronel had plenty of gold,
And Lord Devonshire’s treasures are twenty times more,
But he’s laid on his back among nettles and stones,
Agus fágaimíd siúd mar atá sé.
The old cow could be milked without clover or grass,
She’d be pampered with corn, good barley and hops.
She’s warm and stout, and she’s free in the paps,
And she’ll milk without spancil or halter.
The man that will drink it will cock his caubeen,
And if anyone laughs there’d be wigs on the green,
And the feeble old hag will get supple and free,
Agus fágaimíd siúd mar atá sé.
There’s some say I’m foolish and more say I’m wise,
But being fond of the women I think is no crime,
For the son of King David had ten hundred wives,
And his wisdom was highly regarded.
I’ll take a good garden and live at my ease,
And each woman and child can partake of the same,
If there’d be war in the cabin, themselves they’d be to blame,
Agus fágaimíd siúd mar atá sé.
And now for the future I mean to be wise,
And I’ll send for the women that acted so kind,
I’d marry them all on the morrow by and by,
If the clergy agree to the bargain.
And when I’d be old and my soul is at peace,
These women will crowd for to cry at my wake,
And their sons and their daughters will offer their prayer,
To the Lord for the soul of their father.