Nowadays, a developer needs to deal with JSON quite frequently. RESTful web services are everywhere. Whether you need to work with Elasticsearch, DynamoDB, Neo4j, Titan, or an internal Restful API for a web application you will probably end up dealing with JSON.
Similarly if you are trying to interact with a web service API like Evernote, Pocket or Twitter you will need to deal with JSON all the same.
There are numerous libraries available in almost all programming languages to work with JSON (See json.org for a list). These libraries are the way to go when we are building production code, using the programming language of choice. Sometimes, however, all we need is to run some ad hoc requests and check the results. It is too much pain to just go ahead and write a program, or script for such cases. For instance, sometimes we just need to run a query against Elasticsearch and then get a specific field value out of the JSON response. Or maybe all we need is to dump a few field values from a query result in CSV format. Well, jq
comes to the rescue!
jq
- not to be confused with jQuery
- is a command line tool that lets you manipulate JSON. As we will see in examples below, it is a very sophisticated tool for JSON processing with capabilites ranging from selecting fields, to filtering to exporting to CSV, and a lot more.
Enough said, let’s dive in.
Basic Usage
We will download USGS Earthquake data in this section to show a few examples of how jq
can be used to process JSON.
curl 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2014-01-01&endtime=2014-01-02' > earthquake.json
With the above command, we download information about a list of earthquakes using curl
and save into the file earthquake.json
.
The file contains 290
lines, and a sample line is shown below.
{"type":"FeatureCollection","metadata":{"generated":1513583848000,"url":"https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2014-01-01&endtime=2014-01-02","title":"USGS Earthquakes","status":200,"api":"1.5.8","count":291},"features":[{"type":"Feature","properties":{"mag":1.29,"place":"10km SSW of Idyllwild, CA","time":1388620296020,"updated":1457728844428,"tz":-480,"url":"https://earthquake.usgs.gov/earthquakes/eventpage/ci11408890","detail":"https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=ci11408890&format=geojson","felt":null,"cdi":null,"mmi":null,"alert":null,"status":"reviewed","tsunami":0,"sig":26,"net":"ci","code":"11408890","ids":",ci11408890,","sources":",ci,","types":",cap,focal-mechanism,general-link,geoserve,nearby-cities,origin,phase-data,scitech-link,","nst":39,"dmin":0.06729,"rms":0.09,"gap":51,"magType":"ml","type":"earthquake","title":"M 1.3 - 10km SSW of Idyllwild, CA"},"geometry":{"type":"Point","coordinates":[-116.7776667,33.6633333,11.008]},"id":"ci11408890"},
Pretty Printing JSON
Note that the result above is a very long line of JSON, and it is not really easy to see the contents this way. It is probably better to take a look at some portion of it pretty formatted. Just using jq '.'
gives us the formatting needed as shown below.
cat earthquake.json | jq '.' | head -n 20
{
"type": "FeatureCollection",
"metadata": {
"generated": 1513583848000,
"url": "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2014-01-01&endtime=2014-01-02",
"title": "USGS Earthquakes",
"status": 200,
"api": "1.5.8",
"count": 291
},
"features": [
{
"type": "Feature",
"properties": {
"mag": 1.29,
"place": "10km SSW of Idyllwild, CA",
"time": 1388620296020,
"updated": 1457728844428,
"tz": -480,
"url": "https://earthquake.usgs.gov/earthquakes/eventpage/ci11408890",
Selecting a field
This is probably the most common use case for jq
. We just want to show the titles of all the earthquakes in the file.
Here is how we do it.
cat earthquake.json | jq '.features[].properties.title'
Showing 3 lines of total 292 lines below:
"M 1.3 - 10km SSW of Idyllwild, CA"
"M 1.1 - 117km NW of Talkeetna, Alaska"
"M 1.3 - 6km SSW of Big Lake, Alaska"
This is something I need when dealing with Elasticsearch query results a lot. In those cases, just something like jq .hits.hits[]._source.<field_name>
is a huge life saver.
Selecting a field using Python
For comparison, here I also included how you would do this in Python.
import json
file = open('earthquake.json', 'r')
js = json.load(file)
file.close()
for feature in js['features']:
print feature['properties']['title']
Not that bad actually, but still it is not comparable to the beautifully concise jq
one liner.
Printing more than one field (as an array)
Now let’s get this more sophisticated. Say, we would like to select two fields now, and show them as a JSON array.
cat earthquake.json | jq '.features[].properties | [.mag, .place]' | head
[
1.29,
"10km SSW of Idyllwild, CA"
]
[
1.1,
"117km NW of Talkeetna, Alaska"
]
[
1.3,
This looks fine, but a bit ugly. Why not just show an array per line? We can do this with the -c
argument of jq
. This argument tells jq
to print the results in a compact way.
cat earthquake.json | jq -c '.features[].properties | [.mag, .place]' | head
[1.29,"10km SSW of Idyllwild, CA"]
[1.1,"117km NW of Talkeetna, Alaska"]
[1.3,"6km SSW of Big Lake, Alaska"]
[1.4,"63km NW of Talkeetna, Alaska"]
[4,"27km WNW of Coquimbo, Chile"]
[0.53,"4km NW of The Geysers, California"]
[0.7,"6km E of Mammoth Lakes, California"]
[0.47,"11km ESE of Anza, CA"]
[4.2,"South of the Fiji Islands"]
[1.8,"74km SW of Unalaska, Alaska"]
Printing more than one field (as an object)
Sometimes just printing fields side by side is not good enough, you might need the field names for the fields. One of the better ways to do that is to output a JSON object instead of an array in these cases.
cat earthquake.json | jq -c '.features[].properties | {"magnitude": .mag, "location": .place}' | head
{"magnitude":1.29,"location":"10km SSW of Idyllwild, CA"}
{"magnitude":1.1,"location":"117km NW of Talkeetna, Alaska"}
{"magnitude":1.3,"location":"6km SSW of Big Lake, Alaska"}
{"magnitude":1.4,"location":"63km NW of Talkeetna, Alaska"}
{"magnitude":4,"location":"27km WNW of Coquimbo, Chile"}
{"magnitude":0.53,"location":"4km NW of The Geysers, California"}
{"magnitude":0.7,"location":"6km E of Mammoth Lakes, California"}
{"magnitude":0.47,"location":"11km ESE of Anza, CA"}
{"magnitude":4.2,"location":"South of the Fiji Islands"}
{"magnitude":1.8,"location":"74km SW of Unalaska, Alaska"}
More Advanced Examples
Filtering data
What if you want to just list earthquakes with a magnitude larger than 4.5? Simple, use select
:
cat earthquake.json | jq -c '.features[].properties |select(.mag > 4.5) | {"magnitude": .mag, "location": .place}'
{"magnitude":4.6,"location":"36km WNW of Sola, Vanuatu"}
{"magnitude":4.7,"location":"69km ENE of Amahai, Indonesia"}
{"magnitude":4.7,"location":"46km ESE of Farallon de Pajaros, Northern Mariana Islands"}
{"magnitude":4.6,"location":"21km ESE of Pospelikha, Russia"}
{"magnitude":4.9,"location":"55km NW of Abra Pampa, Argentina"}
{"magnitude":6.5,"location":"32km W of Sola, Vanuatu"}
{"magnitude":4.6,"location":"61km WNW of Port-Vila, Vanuatu"}
{"magnitude":4.9,"location":"21km ESE of Takahagi, Japan"}
{"magnitude":4.9,"location":"East of the North Island of New Zealand"}
{"magnitude":4.6,"location":"88km SSW of Masachapa, Nicaragua"}
{"magnitude":4.6,"location":"West Chile Rise"}
{"magnitude":4.6,"location":"258km NNE of Ndoi Island, Fiji"}
{"magnitude":4.9,"location":"260km W of Kamiiso, Japan"}
{"magnitude":4.7,"location":"Southern Mid-Atlantic Ridge"}
{"magnitude":5.1,"location":"76km NNW of Davila, Philippines"}
Exporting data as a tsv file
Ok, you got what you are looking for, but you actually need to export the data in tsv format maybe. Just add |@tsv
at the end:
cat earthquake.json| jq -r '.features[].properties |select(.mag > 4.5)| [.mag, .place]|@tsv'
4.6 | 36km WNW of Sola, Vanuatu |
---|---|
4.7 | 69km ENE of Amahai, Indonesia |
4.7 | 46km ESE of Farallon de Pajaros, Northern Mariana Islands |
4.6 | 21km ESE of Pospelikha, Russia |
4.9 | 55km NW of Abra Pampa, Argentina |
6.5 | 32km W of Sola, Vanuatu |
4.6 | 61km WNW of Port-Vila, Vanuatu |
4.9 | 21km ESE of Takahagi, Japan |
4.9 | East of the North Island of New Zealand |
4.6 | 88km SSW of Masachapa, Nicaragua |
4.6 | West Chile Rise |
4.6 | 258km NNE of Ndoi Island, Fiji |
4.9 | 260km W of Kamiiso, Japan |
4.7 | Southern Mid-Atlantic Ridge |
5.1 | 76km NNW of Davila, Philippines |
There is also @csv
which would allow exporting the data in CSV format.
Note that for TSV/CSV export you need an array, rather than a JSON object. A final note on this one is about the -r
flag. This flag tells jq
to generate raw output, which in this case means generating data without quotation marks.
Adding headers to a tsv exported data
The previous results look great, but there is an annoyance, right? You don’t have the headers for the fields. In order to fix this, you would simply need to add your headers/column names into the beginning of the data as follows:
cat earthquake.json| jq -r '["Magnitude", "Place"],(.features[].properties |select(.mag > 4.5)| [.mag, .place])|@tsv'
Magnitude | Place |
---|---|
4.6 | 36km WNW of Sola, Vanuatu |
4.7 | 69km ENE of Amahai, Indonesia |
4.7 | 46km ESE of Farallon de Pajaros, Northern Mariana Islands |
4.6 | 21km ESE of Pospelikha, Russia |
4.9 | 55km NW of Abra Pampa, Argentina |
6.5 | 32km W of Sola, Vanuatu |
4.6 | 61km WNW of Port-Vila, Vanuatu |
4.9 | 21km ESE of Takahagi, Japan |
4.9 | East of the North Island of New Zealand |
4.6 | 88km SSW of Masachapa, Nicaragua |
4.6 | West Chile Rise |
4.6 | 258km NNE of Ndoi Island, Fiji |
4.9 | 260km W of Kamiiso, Japan |
4.7 | Southern Mid-Atlantic Ridge |
5.1 | 76km NNW of Davila, Philippines |
Looks much better now!
Seeing the fields available to use
Sometimes we just need to know the attribute names at some point in the JSON nested object hierarchy. One way to do that is trying to look into the raw dump, but there is a much better way using jq
.
Say we would like to see all the attributes under features.properties
. Here is how we do it:
cat earthquake.json | jq '.features[].properties|to_entries|.[].key'|sort -u
"alert"
"cdi"
"code"
"detail"
"dmin"
"felt"
"gap"
"ids"
"mag"
"magType"
"mmi"
"net"
"nst"
"place"
"rms"
"sig"
"sources"
"status"
"time"
"title"
"tsunami"
"type"
"types"
"tz"
"updated"
"url"
This might seem like a lot of work to do just to show the available attributes, but as this is a pretty common functionality, you probably would define an alias/function to do this for you.
I have the following function, for instance, defined in my .zhrc
file:
function jqKeys(){
jq 'to_entries|.[].key'|sort -u
}
Now, I can simply do the following:
cat earthquake.json | jq '.features[].properties' | jqKeys
Conclusion
jq
is a little-known beast for interacting with JSON. This post is definitely far from covering all of what jq
has to offer. If you need to work with JSON a lot, especially for ad hoc queries, operations or troubleshooting work, I would strongly suggest you to take a look at jq Manual.