How to work with ARRAYs in Postgres

Working with ARRAYS

Arrays in PostgreSQL are very similar to arrays in most programming languages. You can create multi-dimensional arrays of varying lengths for any native data type in PostgreSQL.

Most data science requires getting data out of a database using SQL queries with SELECT statements. But before you are able to extract data, someone needs to create the database, add atleast one table with atleast one column and then insert some records.

The CREATE TABLE command as you can see in the example, will create an empty table called my_first_table.

CREATE TABLE my_first_table (
    first_column text,
    second_column int
);

The INSERT statement will add one record to this table.

INSERT INTO my_first_table
    (first_column, second_column) VALUES ('Text value', 12);

OK, so now that we know this simple stuff, let’s now see how we can extend this syntax into creating an ARRAY type. To create an ARRAY type, you simply need to add [] to the end of the data type that you want to make an array. Let’s now create a simple table with two array columns to illustrate how this is done.

CREATE TABLE grades (
    student_id int,
    email text[][],
    test_scores int[]
);

The grades table has an email column which will be a nested array of text data to store the email type and the address for a given student_id. The test_scores column will contain an array of integer values representing the numeric test scores. Once the table is created we can use the INSERT statement to add a couple of records to the table.

INSERT INTO grades
    VALUES(1, '{ {"work", "work1@gmail.com"}, {"home", "home1@gmail.com"} }',
            '{92, 58,98, 100}');

Notice how the arrays are represented in the SQL with curly brackets and a single quotes for email and a comma separated list of whole numbers for test_scores.

Now that we have data in our table, let’s see how we can access array data in a SELECT statement. Accessing arrays in PostgreSQL is very similar to accessing arrays in other programming languages.

SELECT
    email[1][1] as type,
    email[1][2] as address,
    test_scores[1]
FROM
    grades;

Example, we can get the first element of the first array by using the array notation you see here with index values of 1. Note: PostgreSQL array indexes start with one and not zero.

The same notation used to access ARRAYs in the SELECT statement, can also be used in the WHERE clause as a filter. Here we look for records that have work as a value in the first index of the email array. Using standard syntax for non-array columns like WHERE email = 'work' will generate an error.

Array functions and operators

ANY(): The ANY() function allows you to search an array for a value and return a record if it finds a match. In the following example, we want to query all records where the email address contains 'other' in any value of the array.

Accessing data in an ARRAY

In our DVD Rentals database, the film table contains an ARRAY for special_features which has a type of TEXT[]. Much like any ARRAY data type in PostgreSQL, a TEXT[] array can store an array of TEXT values. This comes in handy when you want to store things like phone numbers or email addresses as we saw above.

Let’s take a look at the special_features column and also practice accessing data in the ARRAY.

%load_ext sql
%sql postgresql://shravan:shravan@127.0.0.1/dvdrental
'Connected: shravan@dvdrental'

Select the title and special_features from the film table and compare the results between the two columns.

%%sql
-- Select the title and special features column
SELECT
  title,
  special_features
FROM film
LIMIT 5;
 * postgresql://shravan:***@127.0.0.1/dvdrental
5 rows affected.
title special_features
Chamber Italian ['Trailers']
Grosse Wonderful ['Behind the Scenes']
Airport Pollock ['Trailers']
Bright Encounters ['Trailers']
Academy Dinosaur ['Deleted Scenes', 'Behind the Scenes']

Select all films that have a special feature Trailers by filtering on the first index of the special_features ARRAY.

%%sql
-- Select the title and special features column
SELECT
  title,
  special_features
FROM film
-- Use the array index of the special_features column
WHERE special_features[1] = 'Trailers'
LIMIT 5;
 * postgresql://shravan:***@127.0.0.1/dvdrental
5 rows affected.
title special_features
Chamber Italian ['Trailers']
Airport Pollock ['Trailers']
Bright Encounters ['Trailers']
Ace Goldfinger ['Trailers', 'Deleted Scenes']
Adaptation Holes ['Trailers', 'Deleted Scenes']

Now let’s select all films that have Deleted Scenes in the second index of the special_features ARRAY.

%%sql
-- Select the title and special features column
SELECT
  title,
  special_features
FROM film
-- Use the array index of the special_features column
WHERE special_features[2] = 'Deleted Scenes'
LIMIT 5;
 * postgresql://shravan:***@127.0.0.1/dvdrental
5 rows affected.
title special_features
Ace Goldfinger ['Trailers', 'Deleted Scenes']
Adaptation Holes ['Trailers', 'Deleted Scenes']
Airplane Sierra ['Trailers', 'Deleted Scenes']
Alabama Devil ['Trailers', 'Deleted Scenes']
Aladdin Calendar ['Trailers', 'Deleted Scenes']

Understanding how to access ARRAY data types in PostgreSQL is an important skill in your SQL arsenal.

Searching an ARRAY with ANY

As you can see from the above examples, it is quite limiting to use first index, second index etc. PostgreSQL also provides the ability to filter results by searching for values in an ARRAY. The ANY function allows you to search for a value in any index position of an ARRAY.

When using the ANY function, the value you are filtering on appears on the left side of the equation with the name of the ARRAY column as the parameter in the ANY function. Now, lets match 'Trailers' in any index of the special_features ARRAY regardless of position.

%%sql
SELECT
  title,
  special_features
FROM film
-- Modify the query to use the ANY function
WHERE 'Trailers' = ANY (special_features)
LIMIT 5;
 * postgresql://shravan:***@127.0.0.1/dvdrental
5 rows affected.
title special_features
Chamber Italian ['Trailers']
Airport Pollock ['Trailers']
Bright Encounters ['Trailers']
Ace Goldfinger ['Trailers', 'Deleted Scenes']
Adaptation Holes ['Trailers', 'Deleted Scenes']

The ANY function is a flexible tool that you will use often when searching an ARRAY data type in PostgreSQL.

Searching an ARRAY with @>

The contains operator @> operator is alternative syntax to the ANY function and matches data in an ARRAY using the following syntax.

WHERE array_name @> ARRAY['search text'] :: type[]

Use the contains operator to match the text 'Deleted Scenes' in the special_features column.

%%sql
SELECT
  title,
  special_features
FROM film
-- Filter where special_features contains 'Deleted Scenes'
WHERE special_features @> ARRAY['Deleted Scenes']
LIMIT 5;
 * postgresql://shravan:***@127.0.0.1/dvdrental
5 rows affected.
title special_features
Academy Dinosaur ['Deleted Scenes', 'Behind the Scenes']
Ace Goldfinger ['Trailers', 'Deleted Scenes']
Adaptation Holes ['Trailers', 'Deleted Scenes']
African Egg ['Deleted Scenes']
Agent Truman ['Deleted Scenes']

Example to calculate percentage

What percentage of films have trailers as special features?

%%sql
SELECT
    CASE WHEN 'Trailers' = ANY(special_features) THEN 1 ELSE 0 END has_trailer,
    COUNT(*) as count
FROM film
GROUP BY has_trailer
 * postgresql://shravan:***@127.0.0.1/dvdrental
2 rows affected.
has_trailer count
1 535
0 465
%%sql
WITH has_trailers as (
    SELECT
        film,
        CASE WHEN 'Trailers' = ANY(special_features) THEN 1 ELSE 0 END has_trailer
    FROM film
)
SELECT
    ROUND(COUNT(*) * 100.0/(SELECT COUNT(*) from film),2) as percent_trailers
FROM has_trailers
WHERE has_trailer = 1
 * postgresql://shravan:***@127.0.0.1/dvdrental
1 rows affected.
percent_trailers
53.50

OK, this could have been done without a CASE statement, here is a much simpler way.

%%sql
SELECT ROUND(COUNT(*)*100.0/(SELECT COUNT(*) FROM film), 2) as percent_trailers
FROM film
WHERE 'Trailers' = ANY(special_features)
percent_trailers
53.50