jq: Ad hoc JSON processing made easy

Nowadays, a developer needs to deal with JSON quite frequently. RESTful web services are everywhere. Whether you need to work with Elasticsearch, DynamoDB, Neo4j, Titan, or an internal Restful API for a web application you will probably end up dealing with JSON.

Similarly if you are trying to interact with a web service API like Evernote, Pocket or Twitter you will need to deal with JSON all the same.

There are numerous libraries available in almost all programming languages to work with JSON (See json.org for a list). These libraries are the way to go when we are building production code, using the programming language of choice. Sometimes, however, all we need is to run some ad hoc requests and check the results. It is too much pain to just go ahead and write a program, or script for such cases. For instance, sometimes we just need to run a query against Elasticsearch and then get a specific field value out of the JSON response. Or maybe all we need is to dump a few field values from a query result in CSV format. Well, jq comes to the rescue!

jq - not to be confused with jQuery - is a command line tool that lets you manipulate JSON. As we will see in examples below, it is a very sophisticated tool for JSON processing with capabilites ranging from selecting fields, to filtering to exporting to CSV, and a lot more.

Enough said, let’s dive in.

Basic Usage

We will download USGS Earthquake data in this section to show a few examples of how jq can be used to process JSON.

curl 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2014-01-01&endtime=2014-01-02' > earthquake.json

With the above command, we download information about a list of earthquakes using curl and save into the file earthquake.json.

The file contains 290 lines, and a sample line is shown below.

{"type":"FeatureCollection","metadata":{"generated":1513583848000,"url":"https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2014-01-01&endtime=2014-01-02","title":"USGS Earthquakes","status":200,"api":"1.5.8","count":291},"features":[{"type":"Feature","properties":{"mag":1.29,"place":"10km SSW of Idyllwild, CA","time":1388620296020,"updated":1457728844428,"tz":-480,"url":"https://earthquake.usgs.gov/earthquakes/eventpage/ci11408890","detail":"https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=ci11408890&format=geojson","felt":null,"cdi":null,"mmi":null,"alert":null,"status":"reviewed","tsunami":0,"sig":26,"net":"ci","code":"11408890","ids":",ci11408890,","sources":",ci,","types":",cap,focal-mechanism,general-link,geoserve,nearby-cities,origin,phase-data,scitech-link,","nst":39,"dmin":0.06729,"rms":0.09,"gap":51,"magType":"ml","type":"earthquake","title":"M 1.3 - 10km SSW of Idyllwild, CA"},"geometry":{"type":"Point","coordinates":[-116.7776667,33.6633333,11.008]},"id":"ci11408890"},

Pretty Printing JSON

Note that the result above is a very long line of JSON, and it is not really easy to see the contents this way. It is probably better to take a look at some portion of it pretty formatted. Just using jq '.' gives us the formatting needed as shown below.

cat earthquake.json | jq '.' | head -n 20
{
  "type": "FeatureCollection",
  "metadata": {
    "generated": 1513583848000,
    "url": "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2014-01-01&endtime=2014-01-02",
    "title": "USGS Earthquakes",
    "status": 200,
    "api": "1.5.8",
    "count": 291
  },
  "features": [
    {
      "type": "Feature",
      "properties": {
        "mag": 1.29,
        "place": "10km SSW of Idyllwild, CA",
        "time": 1388620296020,
        "updated": 1457728844428,
        "tz": -480,
        "url": "https://earthquake.usgs.gov/earthquakes/eventpage/ci11408890",

Selecting a field

This is probably the most common use case for jq. We just want to show the titles of all the earthquakes in the file.

Here is how we do it.

cat earthquake.json | jq '.features[].properties.title'

Showing 3 lines of total 292 lines below:

"M 1.3 - 10km SSW of Idyllwild, CA"
"M 1.1 - 117km NW of Talkeetna, Alaska"
"M 1.3 - 6km SSW of Big Lake, Alaska"

This is something I need when dealing with Elasticsearch query results a lot. In those cases, just something like jq .hits.hits[]._source.<field_name> is a huge life saver.

Selecting a field using Python

For comparison, here I also included how you would do this in Python.

import json

file = open('earthquake.json', 'r')
js = json.load(file)
file.close()

for feature in js['features']:
  print feature['properties']['title']

Not that bad actually, but still it is not comparable to the beautifully concise jq one liner.

Printing more than one field (as an array)

Now let’s get this more sophisticated. Say, we would like to select two fields now, and show them as a JSON array.

cat earthquake.json | jq '.features[].properties | [.mag, .place]' | head

[
  1.29,
  "10km SSW of Idyllwild, CA"
]
[
  1.1,
  "117km NW of Talkeetna, Alaska"
]
[
  1.3,

This looks fine, but a bit ugly. Why not just show an array per line? We can do this with the -c argument of jq. This argument tells jq to print the results in a compact way.

cat earthquake.json | jq -c '.features[].properties | [.mag, .place]' | head

[1.29,"10km SSW of Idyllwild, CA"]
[1.1,"117km NW of Talkeetna, Alaska"]
[1.3,"6km SSW of Big Lake, Alaska"]
[1.4,"63km NW of Talkeetna, Alaska"]
[4,"27km WNW of Coquimbo, Chile"]
[0.53,"4km NW of The Geysers, California"]
[0.7,"6km E of Mammoth Lakes, California"]
[0.47,"11km ESE of Anza, CA"]
[4.2,"South of the Fiji Islands"]
[1.8,"74km SW of Unalaska, Alaska"]

Printing more than one field (as an object)

Sometimes just printing fields side by side is not good enough, you might need the field names for the fields. One of the better ways to do that is to output a JSON object instead of an array in these cases.

cat earthquake.json | jq -c '.features[].properties | {"magnitude": .mag, "location": .place}' | head

{"magnitude":1.29,"location":"10km SSW of Idyllwild, CA"}
{"magnitude":1.1,"location":"117km NW of Talkeetna, Alaska"}
{"magnitude":1.3,"location":"6km SSW of Big Lake, Alaska"}
{"magnitude":1.4,"location":"63km NW of Talkeetna, Alaska"}
{"magnitude":4,"location":"27km WNW of Coquimbo, Chile"}
{"magnitude":0.53,"location":"4km NW of The Geysers, California"}
{"magnitude":0.7,"location":"6km E of Mammoth Lakes, California"}
{"magnitude":0.47,"location":"11km ESE of Anza, CA"}
{"magnitude":4.2,"location":"South of the Fiji Islands"}
{"magnitude":1.8,"location":"74km SW of Unalaska, Alaska"}

More Advanced Examples

Filtering data

What if you want to just list earthquakes with a magnitude larger than 4.5? Simple, use select:

cat earthquake.json | jq -c '.features[].properties |select(.mag > 4.5) | {"magnitude": .mag, "location": .place}'

{"magnitude":4.6,"location":"36km WNW of Sola, Vanuatu"}
{"magnitude":4.7,"location":"69km ENE of Amahai, Indonesia"}
{"magnitude":4.7,"location":"46km ESE of Farallon de Pajaros, Northern Mariana Islands"}
{"magnitude":4.6,"location":"21km ESE of Pospelikha, Russia"}
{"magnitude":4.9,"location":"55km NW of Abra Pampa, Argentina"}
{"magnitude":6.5,"location":"32km W of Sola, Vanuatu"}
{"magnitude":4.6,"location":"61km WNW of Port-Vila, Vanuatu"}
{"magnitude":4.9,"location":"21km ESE of Takahagi, Japan"}
{"magnitude":4.9,"location":"East of the North Island of New Zealand"}
{"magnitude":4.6,"location":"88km SSW of Masachapa, Nicaragua"}
{"magnitude":4.6,"location":"West Chile Rise"}
{"magnitude":4.6,"location":"258km NNE of Ndoi Island, Fiji"}
{"magnitude":4.9,"location":"260km W of Kamiiso, Japan"}
{"magnitude":4.7,"location":"Southern Mid-Atlantic Ridge"}
{"magnitude":5.1,"location":"76km NNW of Davila, Philippines"}

Exporting data as a tsv file

Ok, you got what you are looking for, but you actually need to export the data in tsv format maybe. Just add |@tsv at the end:

cat earthquake.json| jq -r '.features[].properties |select(.mag > 4.5)| [.mag, .place]|@tsv'

4.6 36km WNW of Sola, Vanuatu
4.7 69km ENE of Amahai, Indonesia
4.7 46km ESE of Farallon de Pajaros, Northern Mariana Islands
4.6 21km ESE of Pospelikha, Russia
4.9 55km NW of Abra Pampa, Argentina
6.5 32km W of Sola, Vanuatu
4.6 61km WNW of Port-Vila, Vanuatu
4.9 21km ESE of Takahagi, Japan
4.9 East of the North Island of New Zealand
4.6 88km SSW of Masachapa, Nicaragua
4.6 West Chile Rise
4.6 258km NNE of Ndoi Island, Fiji
4.9 260km W of Kamiiso, Japan
4.7 Southern Mid-Atlantic Ridge
5.1 76km NNW of Davila, Philippines

There is also @csv which would allow exporting the data in CSV format.

Note that for TSV/CSV export you need an array, rather than a JSON object. A final note on this one is about the -r flag. This flag tells jq to generate raw output, which in this case means generating data without quotation marks.

Adding headers to a tsv exported data

The previous results look great, but there is an annoyance, right? You don’t have the headers for the fields. In order to fix this, you would simply need to add your headers/column names into the beginning of the data as follows:

cat earthquake.json| jq -r '["Magnitude", "Place"],(.features[].properties |select(.mag > 4.5)| [.mag, .place])|@tsv'

Magnitude Place
4.6 36km WNW of Sola, Vanuatu
4.7 69km ENE of Amahai, Indonesia
4.7 46km ESE of Farallon de Pajaros, Northern Mariana Islands
4.6 21km ESE of Pospelikha, Russia
4.9 55km NW of Abra Pampa, Argentina
6.5 32km W of Sola, Vanuatu
4.6 61km WNW of Port-Vila, Vanuatu
4.9 21km ESE of Takahagi, Japan
4.9 East of the North Island of New Zealand
4.6 88km SSW of Masachapa, Nicaragua
4.6 West Chile Rise
4.6 258km NNE of Ndoi Island, Fiji
4.9 260km W of Kamiiso, Japan
4.7 Southern Mid-Atlantic Ridge
5.1 76km NNW of Davila, Philippines

Looks much better now!

Seeing the fields available to use

Sometimes we just need to know the attribute names at some point in the JSON nested object hierarchy. One way to do that is trying to look into the raw dump, but there is a much better way using jq.

Say we would like to see all the attributes under features.properties. Here is how we do it:

cat earthquake.json | jq '.features[].properties|to_entries|.[].key'|sort -u
"alert"
"cdi"
"code"
"detail"
"dmin"
"felt"
"gap"
"ids"
"mag"
"magType"
"mmi"
"net"
"nst"
"place"
"rms"
"sig"
"sources"
"status"
"time"
"title"
"tsunami"
"type"
"types"
"tz"
"updated"
"url"

This might seem like a lot of work to do just to show the available attributes, but as this is a pretty common functionality, you probably would define an alias/function to do this for you.

I have the following function, for instance, defined in my .zhrc file:

function jqKeys(){
  jq 'to_entries|.[].key'|sort -u
}

Now, I can simply do the following:

cat earthquake.json | jq '.features[].properties' | jqKeys

Conclusion

jq is a little-known beast for interacting with JSON. This post is definitely far from covering all of what jq has to offer. If you need to work with JSON a lot, especially for ad hoc queries, operations or troubleshooting work, I would strongly suggest you to take a look at jq Manual.

comments powered by Disqus