Back to articles list
- 13 minutes read

JSON and PostgreSQL: A Powerful Combination

JSON is a data interchange format that is designed to be lightweight and easy to work with. It’s quite popular in web applications, and it can be considerably more flexible than a traditional relational data model. PostgreSQL 9.3 and later versions support JSON, so you can store JSON data and use native Postgres functions to operate on it. This includes decomposing, transforming, or even creating JSON data from regular relational data.

What Is JSON?

Note: If you have experience using JSON, you can skip to the next section.

JSON is an acronym for JavaScript Object Notation. It was born as an XML alternative for message interchange between web application clients and servers. Let’s look at an example of the JSON format:

{ “name”: “France”, “population”: 65000000, “flag_colours”: [“red”, “blue”,”white”] }

The contents are fairly self-explanatory. The [ square brackets ] define a set of values – an array of the colors red, blue and white – for the flag_colours element. There’s a strong difference between the relational model and JSON here: JSON doesn’t need to adhere to any schema. For example, we can have another JSON like:

{ “name”: “Spain”, “population”: 47000000, “borders”: [“Portugal”, “France”] }

Using JSON in PostgreSQL

Storing JSON Values

In order to store a JSON value in a PostgreSQL table we need to use a JSON data type column. Later, we’ll compare and contrast the two JSON data types that Postgres provides. Right now, we’ll simply begin with an example that creates and populates a table (country) with a JSON column (extra_info).

CREATE TABLE country (
	id 		INTEGER NOT NULL PRIMARY KEY ,
	name		VARCHAR(50),
	extra_info	JSON
);

INSERT INTO country(id,extra_info)
VALUES (1, '{ "name" : "France", "population" : "65000000", "flag_colours": ["red", "blue","white"]}');

INSERT INTO country(id,extra_info)
VALUES (2, '{ "name": "Spain", "population" : "47000000", "borders": ["Portugal", "France"] }');

Extracting Element Values from JSON Data

Usually we need to extract an individual element from a JSON value; there are two important operators that do this: -> and ->>.

->> Extracts and returns the element content as simple text
-> Extracts and returns the original JSON type (which might be an object)

Let’s put the ->> operator into use and get some text showing the population of two countries:

SELECT extra_info->>'name' as Name, extra_info->>'population' as Population
FROM country

RESULTS

"France"	"65000000"
"Spain"		"47000000"

If we use the -> operator, we obtain the field “as-is” in the JSON format (not as plain text). For example:

SELECT 	extra_info->'name' as Name,
		extra_info->'population' as Population,
		extra_info->'flag_colours' as colours
FROM 		country

RESULTS

"France"	"65000000"	"["red", "blue","white"]"
"Spain"	"47000000"	""

Note that the colours value is still in its original JSON format. In this particular case, the format is an array of three elements.

If we want to extract a single element from an array, we have two more operators: #> and #>>.

#>> Extracts and returns an array element as a simple text
#> Extracts and returns an array element as a JSON object

So if we want to obtain the second element from the flag_colours array, we use the following SQL:

SELECT extra_info#>'{flag_colours,2}' AS second_colour FROM country

RESULTS

"{ "name" : "France", "population" : "65000000", "flag_colours": 
["red", "blue","white"]}"	"white"
"{ "name": "Spain", "population" : "47000000", "borders": ["Portugal", 
"France"] }"	""

Populating Relational Data with JSON Elements

This is an easy one. In the country table we have a column called name, which is empty (or NULL). In order to populate this column, we only need to execute a really simple UPDATE using the -> operator:

UPDATE country
set name = extra_info->'name' ;

Creating JSON Objects from Relational Data

Postgres also provides a function called json_build_object() to do the opposite. In other words, json_build_object() takes relational data as input and returns a JSON object. This function can take a number of arguments and is very easy to understand. To demonstrate, let’s use it to create two JSON objects that hold population statistics:

CREATE TABLE country2 (
name varchar(50),
population integer
);

INSERT INTO country2 VALUES ('Poland',38000000);
INSERT INTO country2 VALUES ('Argentina',40000000);

SELECT json_build_object('name',name,'population',population) FROM country2

We can see that two JSONs have been made:

RESULTS

"{"name" : "Poland", "population" : 38000000}"
"{"name" : "Argentina", "population" : 40000000}"

A Real Life Example: Weather Station Data

Our next scenario is a database that stores data from geographically-distributed weather stations. Every weather station collects different measures – i.e. temperature, wind speed, wind direction, moisture, pressure and cloud level. The data structures sent by each station can vary quite a bit, and stations don’t always send all the measures. Because of its schema-less nature, JSON is a good way to represent this data.

In this scenario, we have a staging table that loads raw JSON data on arrival. Then we will split the JSON data into several fields, which we will use to populate a second table called weather_events. However, we will keep the JSON sub-element measures as a JSON data type.

First, let’s create the tables:

DROP TABLE IF EXISTS staging;
CREATE TABLE staging (
	weather_info JSON
);

DROP TABLE IF EXISTS weather_events;
CREATE TABLE weather_events (
	place VARCHAR(50),
	taken_at TIMESTAMP,
	measures JSON
);

Then we populate the staging table:

INSERT INTO staging VALUES (
'{"name": "red river", "taken_at" : "2016-07-01 11:00:00", "measures":{ "temperature":7.6 ,"moisture":37.0 ,"wind_speed":27.8, "wind_direction": "NorthWest"}}'
);

INSERT INTO staging VALUES (
'{"name" : "red river", "taken_at" : "2016-07-01 17:00:00", "measures" : { "temperature": 11.8 ,"moisture":39.0 ,"wind_speed":23.5, "wind_direction": "NorthWest"}}'
);

INSERT INTO staging VALUES (
'{"name" : "sunny coast", "taken_at" : "2016-07-01 11:00:00", "measures" : { "temperature": 10.2 ,"moisture":28.2 ,"wind_speed":20.1, "wind_direction": "North"}}'
);

INSERT INTO staging VALUES (
'{"name" : "sunny coast", "taken_at" : "2016-07-01 17:00:00", "measures" : { "temperature": 11.6 ,"moisture":30.0 ,"wind_speed":25.4, "wind_direction": "NorthWest"}}'
);

The next SQL will be used to populate the weather_events table with some relational data (stored in the place and taken_at columns) and with JSON data that will be stored in the measures column.

INSERT INTO weather_events
SELECT 	weather_info->>'name',			/* extract single text */
 	(weather_info->>'taken_at')::timestamp,	/* extract single text */
weather_info->'measures' 			/* extract JSON object */
FROM staging

If we take a look at weather_events, it is populated and the measure column still contains JSON data.

Managing JSONs with Native PostgreSQL Functions

There are several PostgreSQL functions for JSON management (here’s the official list of functions). An interesting one is the json_each() function. This works like a bridge, connecting the database world with the JSON world.

Here’s how it works. In a database, we usually have result sets composed of records. In JSON, we usually have repetitive elements of the same type: one after the other at the same level – {"a":8, "b":10}. We can use json_each() to translate from the JSON format to the result set format:

SELECT json_each('{"a":8, "b":10, "c":"John", "d":{"d1":10,"d2":"hello"}}'::json)

RESULTS

"(a,8)"
"(b,10)"
"(c,"John")"
"(d,"{"d1":10,"d2":"hello"}")"

In the next example, we’ll use the json_each_text() function. This is similar to json_each(), but it returns text instead of a JSON element. Let’s apply it to the measures column:

SELECT place, taken_at, json_each_text(measures) FROM weather_events

RESULTS

"red river";	"2016-07-01 11:00:00";	"(temperature,7.6)"
"red river";	"2016-07-01 11:00:00";	"(moisture,37.0)"
"red river";	"2016-07-01 11:00:00";	"(wind_speed,27.8)"
"red river";	"2016-07-01 11:00:00";	"(wind_direction,"""NorthWest""")"
"red river";	"2016-07-01 17:00:00";	"(temperature,11.8)"
"red river";	"2016-07-01 17:00:00";	"(moisture,39.0)"
"red river";	"2016-07-01 17:00:00";	"(wind_speed,23.5)"
"red river";	"2016-07-01 17:00:00";	"(wind_direction,"""NorthWest""")"
"sunny coast";	"2016-07-01 11:00:00";	"(temperature,10.2)"
"sunny coast";	"2016-07-01 11:00:00";	"(moisture,28.2)"
"sunny coast";	"2016-07-01 11:00:00";	"(wind_speed,20.1)"
"sunny coast";	"2016-07-01 11:00:00";	"(wind_direction,"""North""")"
"sunny coast";	"2016-07-01 17:00:00";	"(temperature,11.6)"
"sunny coast";	"2016-07-01 17:00:00";	"(moisture,30.0)"
"sunny coast";	"2016-07-01 17:00:00";	"(wind_speed,25.4)"
"sunny coast"	;"2016-07-01 17:00:00";	"(wind_direction,"""NorthWest""")"

JSON and JSONB PostgreSQL Data Types

PostgreSQL provides two data types to store JSON elements: JSON and JSONB. Both accept an identical set of values; the main difference is their efficiency level. While JSON is stored as text, JSONB is first decomposed into binary components and then stored in a more efficient way.

To understand the difference, consider that many functions or operations are needed to parse a JSON data value. Parsing a JSONB is faster.

In our previous examples, we used the JSON data type exclusively. However, we could use JSONB and results would be exactly the same. In the next paragraphs, we will focus on the differences between both data types to find the strong points of each.

The first difference is how these data types are stored:


Data TypeStorage FormatAdvantage
JSONSimple text stringFaster insertion and simple storage
JSONBDecomposed binary formatFaster parsing of sub-elements


Now let’s create two tables, one using JSON and the other using JSONB. We want to compare how much disk space each type consumes, so we will set the fill factor to 100. If we use the default fill factor (about 50%), we leave free space inside every data page. We want both tables compacted as much as possible so that we will know how much disk space each needs. So:

CREATE TABLE weather_jsonb (info jsonb) with (fillfactor=100);
CREATE TABLE weather_json (info json) with (fillfactor=100);

Next, we will insert 800,000 JSON records into both tables and compare insert duration. As a side point, please note the conditionless join against the generate_series(1,200000) function. The idea behind this join is to increase the number of resulting rows from four to 800,000. (In fact, we are doing a Cartesian product between a four-row table and a 200,000-row table.)

Let’s focus on the INSERTs:

postgres=# INSERT INTO weather_json SELECT weather_info FROM staging, generate_series(1,200000);
INSERT 0 800000	
Duration: 7593.142 ms

postgres=# INSERT INTO weather_jsonb SELECT weather_info FROM staging, generate_series(1,200000);
INSERT 0 800000
Duration: 12558.682 ms

From the previous SQL statements, we can conclude that JSON inserts are faster than JSONB inserts. The reason is that JSONB needs to parse at the time of insertion.

Let’s compare how much disk space is consumed by each data type:

SELECT pg_size_pretty(pg_relation_size('weather_json'::regclass)) AS json,
	   pg_size_pretty(pg_relation_size('weather_jsonb'::regclass)) AS jsonb;

RESULTS

JSON        JSONB
------------------------------
152MB       184MB

The following table sums up our conclusions:


Data TypeStorage Space
JSONOccupies less disk space
JSONBOccupies more disk space


INDEXES ON JSON SUB-ELEMENTS

Both JSON and JSONB support indexes using JSON sub-elements as the index key. (Spoiler alert: This round will be won by JSONB, as it supports more types of indexes.) Anyway, let’s begin by explaining similarities: which indexes are supported by both data types, how we can create them, and when a query can use them.

Before we can appreciate the advantages of using indexes, we need a slow query. So we will insert one specific row into each of our 800,000-record tables. Then we will execute a simple query to search and return this new row. Note the response time for both queries.

INSERT INTO weather_json VALUES ('{"name": "new york", "taken_at" : "2015-07-01 11:00:00", "measures":{ "temperature":7.6 ,"moisture":37.0 ,"wind_speed":27.8, "wind_direction": "NorthWest"}}');

INSERT INTO weather_jsonb VALUES ('{"name": "new york", "taken_at" : "2015-07-01 11:00:00", "measures":{ "temperature":7.6 ,"moisture":37.0 ,"wind_speed":27.8, "wind_direction": "NorthWest"}}');

SELECT * FROM weather_json WHERE info->>'name' = 'new york';

RESULTS

{"name": "new york", "taken_at" : "2015-07-01 11:00:00", "measures":{ "temperature":7.6 ,"moisture":37.0 ,"wind_speed":27.8, "wind_direction": "NorthWest"}}

Duration: 1635.235 ms

SELECT * FROM weather_jsonb WHERE info->>'name' = 'new york';

RESULTS

{"name": "new york", "measures": {"moisture": 37.0, "wind_speed": 27.8, 
"temperature": 7.6, "wind_direction": "NorthWest"}, "taken_at": 
"2015-07-01 11:00:00"}

Duration: 255.581 ms

The unindexed JSONB duration (0.25 secs) is considerably faster than using an unindexed JSON (1.6 secs). Now let’s try using an index to see how much faster the same queries can be.

We will create two indexes, one on the weather_json table and one on the weather_jsonb table. Remember, we are running all commands twice because we are showing the features of the JSON and JSONB data types.

As the index key and search criteria, we will use the JSON name element. Look for this code: info->>name.

Here is the snippet:

CREATE INDEX jsonb_idx ON weather_jsonb((info->>'name') );
CREATE INDEX json_idx ON weather_json((info->>'name') );

Now we will run the queries again and see if the response is better:

SELECT * FROM weather_jsonb WHERE info->>'name' = 'new york';

{"name": "new york", "measures": {"moisture": 37.0, "wind_speed": 27.8, 
"temperature": 7.6, "wind_direction": "NorthWest"}, "taken_at": 
"2015-07-01 11:00:00"}

Duration: 1.540 ms

SELECT * FROM weather_json WHERE info->>'name' = 'new york';

....{"name": "new york", "taken_at" : "2015-07-01 11:00:00", "measures":{ 
"temperature":7.6 ,"moisture":37.0 ,"wind_speed":27.8, "wind_direction": 
"NorthWest"}}

Duration: 1.277 ms


Data typeTime without indexTime with index
JSON1635.23 ms1.27 ms
JSONB255.58 ms1.54 ms


The above table shows one difference and one similarity between JSON and JSONB. Both data types perform comparably (and really well) when there is an index on the search criteria. The difference is how they perform when indexes are not present. In that case, JSONB has a sizeable advantage over JSON; this fact becomes more important when we can’t have indexes on all JSON sub-elements.

DIFFERENCES BETWEEN JSON AND JSONB

There are some operators only available for JSONB. (Here’s a link to Postgres’ documentation for JSON and JSONB operators) . One of them is >@, which is used as a containment boolean operator. The following table shows how it works:


EXPRESSION USING @> OR <@RESULT
'{"a":1, "b":2}'::jsonb @> '{"b":2}'::jsonbTRUE
'{"a":1, "b":2}'::jsonb @> '{"b":3}'::jsonbFALSE
'{"a":1, "b":2}'::jsonb @> '{"a":1, "b":2}'::jsonbTRUE
'{"b":2}'::jsonb @> '{"a":1, "b":2}'::jsonbFALSE
'{"b":2}'::jsonb <@ '{"a":1, "b":2}'::jsonbTRUE
'{"a":1, "c":{"b":2}}'::jsonb @> '{"b":2}'::jsonbFALSE
'{"a":1,"c":{"b":2}}'::jsonb@> '{"c"{"b":2}}'::jsonbTRUE


The following three SQL statements use @> on the weather_jsonb table. If we try the same expression using the weather_json table, it will fail. This is because @> is only available for JSONB.

SELECT max(((info->'measures')->>'wind_speed')::numeric) from weather_jsonb WHERE info @> '{"name": "new york"}' ;

max
------
27.8
Duration: 1223.014 ms

SELECT count(*) from weather_jsonb WHERE info @> '{"measures": {"moisture": 37.0, "wind_speed": 27.8, "temperature": 7.6, "wind_direction": "NorthWest"}}' ;

count
--------
300002
Duration: 1377.369 ms

SELECT info->'name' as placename from weather_jsonb WHERE info @> '{"taken_at": "2015-07-01 11:00:00"}' ;

placename
------------
"new york"
Duration: 1071.939 ms

However, PostgreSQL doesn’t have an index to accelerate these queries. Let’s create an index to see how the @> operator performs. We’ll use the following CREATE INDEX statement and re-execute the three SQLs again to compare time responses.

CREATE INDEX weather_gin ON weather_jsonb USING GIN (info);

Note that the index is a GIN index, not the usual BTREE. The use of GIN is mandatory when the index key is a JSONB column. GIN is just a type of index, like BTREE or HASH.

Let’s see how the previous queries perform now:

SELECT info->'name' as placename from weather_jsonb WHERE info @> '{"taken_at": "2015-07-01 11:00:00"}' ;

placename
------------
"new york"
Duration: 0.671 ms

SELECT count(*) from weather_jsonb WHERE info @> '{"measures": {"moisture": 37.0, "wind_speed": 27.8, "temperature": 7.6, "wind_direction": "NorthWest"}}' ;

 count
--------
 300002
Duration: 707.941 ms

SELECT max(((info->'measures')->>'wind_speed')::numeric) from weather_jsonb WHERE info @> '{"name": "new york"}' ;

 max
------
 27.8
Duration: 0.626 ms

To determine if the index improved performance, we will compare response times in the following table:


Query @> expressionTime without indexTime with index
info @> '{"taken_at": "2015-07-01 11:00:00"}'1071.939 ms0.671 ms
info @> '{"measures": {"moisture": 37.0, "wind_speed": 27.8, "temperature": 7.6, "wind_direction": "NorthWest"}}'1377.369 ms707.941 ms
info @> '{"name": "new york"}' ;1223.014 ms0.626 ms


We can clearly see that the index highly improves response times no matter which JSON element the search is based on (name, taken_at, measures). In fact, there’s an improvement by roughly 1,000 ms in two of the queries. The other query doesn’t show as great an improvement, but the engine needed to traverse about 30,000 rows to obtain the result of the count.

We omitted to explain some other operators, like ?, ?| and ?&. These are used for existence checks on JSONB elements. We leave it to you to research these operators!

Try It Yourself

Execute the queries searching for 'new york', but precede them with the clause EXPLAIN ANALYZE. Study the output; you will find interesting information about the query planner, including which index is being used, the query duration, and many other details. Try to familiarize yourself with the planner output if you want to know more about query optimization. It is really interesting – try it yourself!

go to top