jq is sed for JSON

Gabe Berke-Williams

sed is a useful tool that reformats and transforms plain text. But sed is not a good match for structured data like JSON. jq is a sed-like tool that is specifically built to deal with JSON.

Installation

Install jq on OS X:

brew install jq

Or on Ubuntu:

apt-get install jq

Your packaging system probably has jq available.

In this post, I’m assuming you’re using jq version 1.5 or newer.

Usage

jq is built around filters. The simplest filter is ., which echoes its input, but pretty-printed:

$ echo '{"hello":{ "greetings":"to you"}}' | jq .
{
  "hello": {
    "greetings": "to you"
  }
}

It’s useful for pretty-printing output from curl when you hit a JSON API, for example.

As the well-written manual points out, the simplest useful filter is .field, which pulls field out of each record:

$ echo '{"hello":{ "greetings":"to you"}}' | jq .hello
{
  "greetings": "to you"
}

We can also do .field1.field2 for nested hashes:

$ echo '{"hello":{ "greetings":"to you"}}' | jq .hello.greetings
"to you"

Let’s use jq on some real input. Here’s a Slack log in a file named 1.json:

[
  {
    "type": "message",
    "user": "U024HFHU5",
    "text": "hey there",
    "ts": "1385407681.000003"
  },
  {
    "type": "message",
    "user": "U024HGJ4E",
    "text": "right back at you",
    "ts": "1385407706.000006"
  }
]

To pull the text out of each Slack message from 1.json, we need to go “into” the array first with .[]. Then we can pass each object in the array to the next filter with |, and grab the text field from each object using .text:

$ jq ".[] | .text" 1.json
"hey there"
"right back at you"

Note that we now have plain-text representations of each value, so we could go back to sed:

$ jq ".[] | .text" 1.json | sed 's/h/H/g'
"Hey tHere"
"rigHt back at you"

Transforming input

So far we’ve been pulling out fields and displaying them. Now let’s transform the JSON before displaying it. Let’s show the user and the text:

$ jq ".[] | { the_user: .user, the_text: .text }" 1.json
{
  "the_user": "U024HFHU5",
  "the_text": "hey there"
}
{
  "the_user": "U024HGJ4E",
  "the_text": "right back at you"
}

The .[] goes into the array, giving us an array of objects. Then we pass those objects to the next filter using |, and use the {} filter to construct a new object using fields from each object. We also rename user to the_user and text to the_text.

To wrap something in an array, put brackets around an expression:

$ jq "[.[] | { the_user: .user, the_text: .text }]" 1.json
[
  {
    "the_user": "U024HFHU5",
    "the_text": "hey there"
  },
  {
    "the_user": "U024HGJ4E",
    "the_text": "right back at you"
  }
]

Note that we’re wrapping the whole expression in [], not just the {} part. If we wrapped the {} part in square brackets, that would put each object in a separate array, which we usually don’t want:

$ jq ".[] | [{ the_user: .user, the_text: .text }]" 1.json
[
  {
    "the_user": "U024HFHU5",
    "the_text": "hey there"
  }
]
[
  {
    "the_user": "U024HGJ4E",
    "the_text": "right back at you"
  }
]

Dealing with more than one file

Let’s say we have another JSON file, 2.json:

[
  {
    "type": "message",
    "user": "U028H5EBL",
    "text": "<@U02A8N1DS>: Can I get some help with a domain registration?",
    "ts": "1418301403.001783"
  },
  {
    "type": "message",
    "user": "U02A8N1DS",
    "text": "Sure thing.",
    "ts": "1418301427.001784"
  }
]

We want to print out all of the messages from 1.json and 2.json, wrapped in an array, just like we did before. Let’s run the same filter over both files:

$ jq ".[] | [{ the_user: .user, the_text: .text }]" 1.json 2.json
[
  {
    "the_user": "U024HFHU5",
    "the_text": "hey there"
  }
]
[
  {
    "the_user": "U024HGJ4E",
    "the_text": "right back at you"
  }
]
[
  {
    "the_user": "U028H5EBL",
    "the_text": "<@U02A8N1DS>: Can I get some help with a domain registration?"
  }
]
[
  {
    "the_user": "U02A8N1DS",
    "the_text": "Sure thing."
  }
]

Hey, that’s no good: the output from each file is wrapped in its own array! We want one array around the whole output, not one array per file. Fortunately, jq has an option for that: jq --slurp will smush together the arrays of messages in 1.json and 2.json and deal with them as one giant array.

Let’s see what --slurp generates before we do anything with its output:


$ jq --slurp . 1.json 2.json
[
  [
    {
      "type": "message",
      "user": "U024HFHU5",
      "text": "hey there",
      "ts": "1385407681.000003"
    },
    {
      "type": "message",
      "user": "U024HGJ4E",
      "text": "right back at you",
      "ts": "1385407706.000006"
    }
  ],
  [
    {
      "type": "message",
      "user": "U028H5EBL",
      "text": "<@U02A8N1DS>: Can I get some help with a domain registration?",
      "ts": "1418301403.001783"
    },
    {
      "type": "message",
      "user": "U02A8N1DS",
      "text": "Sure thing.",
      "ts": "1418301427.001784"
    }
  ]
]

As we can see, it wraps all of the input in one big array, so we’ll add an extra .[] when filtering to get at the objects inside:

$ jq --slurp "[.[] | .[] | { text: .text }]" 1.json 2.json
[
  {
    "text": "hey there"
  },
  {
    "text": "right back at you"
  },
  {
    "text": "<@U02A8N1DS>: Can I get some help with a domain registration?"
  },
  {
    "text": "Sure thing."
  }
]

There we go. Regardless of how many files come in, the output is contained in only one array.

Further reading

The jq tutorial and manual are very well-written and offer a gentle introduction to jq‘s filters.