How to work with JSON data using PostgreSQL

Postgres and JSON

How to work with JSON data using PostgreSQL?

I will address this question in this post. We will look at a real-world scenarios of a health-care company data and learn how to query JSON data using normal PostgreSQL.

History of JSON and Postgres: Postgres has had JSON support for a while, but to be honest it wasn’t that great due to a lack of indexing and key extractor methods. With the release of version 9.2, Postgres added native JSON support. You could finally use Postgres as a “NoSQL” database. In version 9.3, Postgres improved on that by adding additional constructor and extractor methods. 9.4 added the ability to store JSON as “Binary JSON” (or JSONB), which strips out insignificant whitespace (not a big deal), adds a tiny bit of overhead when inserting data, but provides a huge benefit when querying it: indexes.

What about MongoDB? I’m not going to argue that Postgres handles JSON as well as MongoDB. MongoDB was, afterall, specifically made as a JSON document store and has some pretty great features like the aggregation pipeline. But the truth is that Postgres now handles JSON pretty well.

Need for storing JSON data in the db

A lot of data is modeled quite well using a relational database. The reason for this is because website data tends to be relational. A user makes purchases and leaves reviews, a movie has actors which act in various movies, etc. However, there are use cases where it makes a lot of sense to incorporate a JSON document into your model. For example, the provider information is all stored inside one record for each provider. Some pros of this approach:

  1. One benefit of storing it this way, is that we can avoid complicated joins on data that is siloed or isolated. Having the data denormalized makes it possible to fetch a provider and it’s data with a single query.
  2. Maintain data that comes from an external service in the same structure and format (as JSON) that it arrived to you as. What ends up in the database is exactly what the API provided. For instance, look at the response object that we got from the API below. It’s nested and has arrays. Instead of trying to normalize this data across five or more tables, you can store it as it is (and still query against it).

Now that we have gone over some of the benefits and use-cases for storing JSON data in Postgres, let’s take a look at how it’s actually done.

Connect to the database

Get your creds for connecting to the Postgres database and store them in a file like ribbon_db.cfg. The contents of this file are shown here:

ribbon_db.cfg

[DB_CREDS]
USERNAME=
PASSWD=
HOST=
DB_NAME=
import configparser
config = configparser.ConfigParser()
config.read('ribbon_db.cfg')
conn = "postgresql://{}:{}@{}/{}".format(*config['DB_CREDS'].values())
%load_ext sql
%sql postgresql://user:passwd@dbserver.com/dbname
'Connected: user@dbname'

Data Schema

The schema mainly contains 4 tables.

  • provider_json (npi bigint, record json): This table represents the most accurate data we have for a given provider. The NPI is the national provider identifier. The NPI is how we identify providers across the country. The record is a JSON blob representing all data we have for this provider. In order to access this data you will need to be able to query JSON data.
  • insurances (uuid uuid, display text): This table contains the unique insurance plans in our DB.
  • specialties (uuid uuid, display text): This table contains the unique specialties in our DB.
  • provider_puf_data (npi bigint, source text, address json, plans json, created_at timestamp): This table contains a couple snapshots of government data we ingest. For this table, we ingested data in January and June of this year, you can leverage the “created_at” field to differentiate these times. The source field here represents a unique insurance broker. The address and plans fields represent the address and plans the broker has for this NPI. You can assume there is a uniqueness constraint on (npi, source, address, created_at). Thus, one source/broker could have many addresses and plan combinations for each NPI.

First, let’s do some simple Exploratory data analysis on the given tables.

%%sql
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'provider_json' AND table_schema = 'public';
 * postgresql://read_user:***@server/dbname
2 rows affected.
column_name data_type
npi bigint
record json
%%sql
SELECT  *
FROM provider_json
LIMIT 2;
 * postgresql://read_user:***@server/dbname
2 rows affected.
npi record
1669554267 {'npi': '1669554267', 'first_name': 'Hyung', 'middle_name': 'Soon', 'last_name': 'Park', 'age': 83, 'gender': 'm', 'ratings_count': 1, 'ratings_avg': 2.0, 'degrees': [], 'specialties': ['c741df78-8460-46f6-a2ce-93de36eb7205'], 'languages': [], 'educations': [], 'insurances': [], 'provider_types': ['Doctor'], 'locations': [{'uuid': '8d96e755-65d3-470f-baaf-c20c73d673bb', 'name': None, 'address': '2296 Opitz Blvd # 255, Woodbridge, VA 22191, US', 'address_details': {'street': '2296 Opitz Blvd # 255', 'address_line_1': '2296 Opitz Blvd', 'address_line_2': '# 255', 'city': 'Woodbridge', 'state': 'VA', 'zip': '22191'}, 'latitude': 38.63739, 'longitude': -77.28514129999999, 'google_maps_link': 'https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US', 'phone_numbers': [{'phone': '7036704986', 'details': 'primary'}], 'confidence': 2}], 'online_profiles': [{'url': 'https://www.healthgrades.com/physician/dr-hyung-park-x3syc'}]}
1467524520 {'npi': '1467524520', 'first_name': 'Rohit', 'middle_name': None, 'last_name': 'Bhaskar', 'age': 57, 'gender': 'm', 'ratings_count': 35, 'ratings_avg': 9.07, 'degrees': ['MD'], 'specialties': ['dff54cb3-c9b0-43a0-8ca8-26c7c05bee83', '1de33770-eb1c-47fa-ab3e-f9a4ab924d9d', '76126f5d-723a-40bc-b4c1-7dcda477797a', '475b20fc-c188-4ca1-8e3a-b0614b8f8ce2'], 'languages': ['english'], 'educations': [{'education': {'name': 'Cornell University', 'uuid': 'c4588ea7-567d-43dc-a0e5-67937fb77f5c'}, 'type': 'Undergraduate School', 'year': 1982}, {'education': {'name': 'Baylor College of Medicine', 'uuid': 'b2ed4e48-6cb0-42d2-8804-7557b6452e8d'}, 'type': 'Residency', 'year': None}, {'education': {'name': 'Icahn School of Medicine at Mount Sinai', 'uuid': '16e70085-6b81-4581-b903-dce9d9acf74a'}, 'type': 'Medical School', 'year': None}, {'education': {'name': 'Baylor College of Medicine', 'uuid': 'b2ed4e48-6cb0-42d2-8804-7557b6452e8d'}, 'type': 'Internship', 'year': None}], 'insurances': ['5ad8ff7d-5227-46d7-a392-d28665610257', '3a56d135-dded-48c3-90db-a4f5d9da9786', 'dc059078-9115-488f-bdf6-e9106822225a', 'daaa1ac3-eb9d-424b-a249-8d7c51d39ce1', 'ae0b1dd0-7356-41ee-8df4-871a52451eb7', 'd280371e-f9f1-42a2-aed5-fcb0fe89c219', '3e1ac59e-223a-469d-bb30-63f6988e4134', '5da15840-6f46-4978-8747-8be580d48079', '9eb760da-dade-475d-9390-0c998ff50fcc', 'ad3204b8-1355-48f4-9ee3-194fc5d970f1', '2d5231c0-2d29-4026-9138-5dcbd0591404', '6ab9d1d3-b969-4352-9fac-68b6494cbe7b', '67f5d31d-88f6-4f7a-9b1d-74667ca37f08', '90e00746-69ed-430e-98f9-f637abcbe144', 'b10209f5-96d9-4bbc-8fda-38590460a34a', 'cfb5d792-2d33-444f-a1e8-5487de63e9a0', '51a98931-aace-42c1-8890-66ce53eb1971', 'a6e318c6-cf38-4d65-8bd0-6976b647974b', 'ba8cc746-665a-4925-8f5f-3b213dad238d', '36f607e8-2785-411e-8384-f17195f09468', 'cbbc9743-5cdf-4df2-92ca-fcde6a536160', 'a003a89c-eecb-470b-938d-1b97225f3d81', '791d592b-d6c0-4576-99d0-d78dac8496a1', 'f6a0b45a-042e-4886-82c6-b3b9e086cf83', '57ecd933-4bb8-4289-9080-a8259a6d683c', '6d2b08ec-2f67-4a5b-ac89-642f15e032b8', '749b0b00-ddf8-4cef-bc5b-5694cccbad0f', 'a7edd72c-4010-448e-8624-c92f997fb2a2', '730a2f48-0cb6-49b8-a5bb-807534ef0f24', '9d5266c8-a7b3-4d27-ab96-3a524b4cc491', 'f7627ef2-7d1d-4974-b934-2fdb29c0ff4f', '69af8b6c-06b3-4b66-91c5-ede90be026d8', '96292303-3e6e-4758-a851-f049d7956345', '61a6872e-3c89-4c93-b9b2-ca6ca1db7a8e', '2fdd9e5c-840a-40f1-a575-8ac6ec6094eb', 'd4357c57-76d9-4fda-94b6-16ca8869cd45', 'b1085d28-d17c-42c1-8448-fc56f5bf8ff8', 'ebf487b7-36a0-4ba9-84f6-c2f5f004e06a', '5b7ba09b-9a42-454a-84a6-bc1ba7449379', '84d17e89-55e7-4a08-8468-f52780b926e4', '2e7f2d0a-308e-43d4-9bfb-227a7571550f', 'e364db55-c32d-4e9d-957b-84bfbf62dfea', 'b7f67654-891f-4152-acc8-def976e30826', 'c94db5a7-e5f7-4d1f-9ff9-7ba8e683b57d', 'f2466b1a-782f-4cac-a204-cacb50b8f322', '6417d256-4fdb-4a60-903b-8f42b0405653', '4a1b53fe-79e5-464c-83d9-14143306c02c', 'eecb0179-615f-42cd-9fc0-8f7570f4de63', '0bcbfcff-0e3f-47bc-a32e-a52add577d5b', '9d4642de-8626-4889-8315-9e7ec4f1cd1f', '71e54292-85b7-4cff-a1f0-e06e92f64b5e', '672921ed-9c68-45a5-b808-3df8261d8ff0', 'de8b20b6-a2e3-460a-9018-9d89c009cef8', '40e72184-ff85-456b-a794-35967bc8cfb2', '4d893957-8038-409e-8c96-2e5edd626d45', 'b70e5e27-f14b-4a83-84d8-bb711b70886f', 'd122f267-a255-4eed-afa1-8252e2cbdc17', '1d76bc02-2621-4041-914a-9b9e28babeff', 'd308de64-b0a3-4a50-96b1-4aa813794a1d', '49841d97-6ede-48af-841f-8548d2930e18', 'ee9db5f9-de43-4c3e-a156-d5ba8ba1f6b3', '931810bc-cd6d-4bb6-a77a-7c979fbac3fd', 'c5734165-3e62-4f14-9559-1e5caec18179', 'a0dd1e6d-ccf8-4f09-bbd1-e4fcbdf438ac', 'c1ba934d-cf6c-462d-af28-9cc92ab24937', '4c388b98-8aca-4c28-bd43-74ab84d438f2', '01047fb0-4036-4dcd-81e0-62e6cef4e3e8', '1c27a9d5-3412-41f7-bde2-87eba4abcb5e', '9161f466-8e7e-493d-bd61-d647755dbc53', '46db02d5-a040-4ca5-9510-ef0591feca22', '3a4b2c17-4235-4982-b275-4018c5283ab4', 'ed3f6f88-a4ad-4b56-9d31-8d086ddd1521', 'dae8c52c-b0e0-4688-9078-6e5dcef9de03', '87842763-d404-418e-a265-fea0ef2cac50', '413e5c40-5f12-47c2-9142-4ee27ff61372', 'd54100c0-7990-4638-b57d-79df580f192f', '5b20ceac-5463-4255-b55d-c1a639cc3cc7', '239f0c59-4f28-43ab-9fde-ded6ad8e6cfe', '837c4cd1-4324-44eb-8af3-5bcf2622d8be', '1ec09209-7e22-4698-981d-9cbb150c5746', 'd1c4b24d-2136-4e8d-8497-8ff1fced234d', 'f67a5513-7c63-4243-afcf-84684202f901', '7dca5f3b-e88c-4c32-858f-f21f6fb1ff85', 'a21318a9-a835-4e15-87b2-da893031b6ce', '0486ad3e-5eb2-4589-8383-24bfcec56b47', 'faa0d1f5-388e-4bc4-b60f-baeab55a9930', '4762f4c5-411c-4c86-8af4-324355546ab9', '6a76f092-a95b-40ed-9bbc-c5f5dffb6495', '1038bfd5-c637-493f-95c7-f294b8a84fd6', '81d688b5-d3c2-4c83-8572-b82423525f5b', '05ac5752-da26-4347-a3d2-8b13b008fd70', 'd61b4d0a-4897-4403-afd6-d84fc909caa6', '5c1b23af-e735-408d-94c3-c0146b19b180', '2b8503a8-9ed9-4481-9dd3-23beeee297e9', 'e560745b-6da7-4b5f-b147-7b4f1fea3cd8', '2040857e-d4ad-417d-92ab-52e3c95e5cba', '6584c17c-b804-4f94-ad34-cd8e63f76162', '79cff8b6-cf98-4dcf-8b54-383fa2c43689', 'ea604298-bb16-4d84-b76a-ce6d7c8c3262', 'd4805da1-cb4a-420a-95a4-daee6f7f0fd9', '5278b602-0ea2-4716-af69-9b1560a56ef6', 'b8581687-2b4e-4f2f-91a9-9128f5bec540', '94eaf60d-67fa-4081-a37e-048dfa107f36', '787fa83d-b044-4c30-a7dd-50551c9badb1', '4c3c10e6-6371-463a-af5f-3bb5212aa05e', 'a0587784-a863-47c3-84ae-0dfda49a4be0', '80a261e0-4d0f-4e91-8a70-0c761fbe40ca'], 'provider_types': ['Doctor'], 'locations': [{'uuid': '0decdff6-1703-4d4a-9821-caad045c2f70', 'name': None, 'address': '3941 J St # 260, Sacramento, CA 95819, US', 'address_details': {'street': '3941 J St # 260', 'address_line_1': '3941 J St', 'address_line_2': '# 260', 'city': 'Sacramento', 'state': 'CA', 'zip': '95819'}, 'latitude': 38.5706552, 'longitude': -121.4535451, 'google_maps_link': 'https://www.google.com/maps/@38.5706552,-121.4535451?q=3941%20J%20St%20%23%20260%2C%20Sacramento%2C%20CA%2095819%2C%20US', 'phone_numbers': [{'phone': '9167362323', 'details': 'primary'}, {'phone': '9169845318', 'details': 'secondary'}], 'confidence': 4}, {'uuid': '7974eec5-2760-4642-9ce8-b2401f33c61f', 'name': None, 'address': '1580 Creekside Dr # 250, Folsom, CA 95630, US', 'address_details': {'street': '1580 Creekside Dr # 250', 'address_line_1': '1580 Creekside Dr', 'address_line_2': '# 250', 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.6687282, 'longitude': -121.1479608, 'google_maps_link': 'https://www.google.com/maps/@38.6687282,-121.1479608?q=1580%20Creekside%20Dr%20%23%20250%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169845318', 'details': 'primary'}], 'confidence': 1}, {'uuid': '3840e421-0e5b-498b-b937-2eb75ef3e09b', 'name': None, 'address': '8120 Timberlake Way # 108, Sacramento, CA 95823, US', 'address_details': {'street': '8120 Timberlake Way # 108', 'address_line_1': '8120 Timberlake Way', 'address_line_2': '# 108', 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4620284, 'longitude': -121.4144321, 'google_maps_link': 'https://www.google.com/maps/@38.4620284,-121.4144321?q=8120%20Timberlake%20Way%20%23%20108%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '9166885551', 'details': 'primary'}], 'confidence': 1}, {'uuid': '48a62df8-f2f4-4320-905a-c67600804be6', 'name': 'Mercy Hospital of Folsom', 'address': '1650 Creekside Dr, Folsom, CA 95630, US', 'address_details': {'street': '1650 Creekside Dr', 'address_line_1': '1650 Creekside Dr', 'address_line_2': None, 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.670333, 'longitude': -121.1458461, 'google_maps_link': 'https://www.google.com/maps/@38.670333,-121.1458461?q=1650%20Creekside%20Dr%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169837400', 'details': 'primary'}], 'confidence': 1}, {'uuid': 'f41bc9cd-e982-4022-9427-3a6bad872a2d', 'name': None, 'address': '7500 Hospital Dr, Sacramento, CA 95823, US', 'address_details': {'street': '7500 Hospital Dr', 'address_line_1': '7500 Hospital Dr', 'address_line_2': None, 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4638056, 'longitude': -121.4164937, 'google_maps_link': 'https://www.google.com/maps/@38.4638056,-121.4164937?q=7500%20Hospital%20Dr%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '8006774491', 'details': 'primary'}], 'confidence': 1}], 'online_profiles': [{'url': 'https://www.healthgrades.com/physician/dr-rohit-bhaskar-2hspx'}, {'url': 'https://www.vitals.com/doctors/Dr_Rohit_Bhaskar.html'}]}

We will now see how to work with PostgreSQL JSON data type. In addition, we will learn some of the most common PostgreSQL JSON operators and functions for handling JSON data.

To query JSON data, you use the SELECT statement, which is similar to querying other native data types. The output shown above is the result set returned by Postgres in the form of JSON.

Operators -> and ->>

Postgres provides two native operators -> and ->> to help query JSON data.

  • -> operator returns JSON object field by key.
  • ->> operator returns JSON object field by text.

The following query uses the operator -> to get the education and age of providers in the form of JSON.

%%sql
SELECT
    record -> 'educations' as education,
    record -> 'age' as age
FROM provider_json
LIMIT 2;
 * postgresql://read_user:***@server/dbname
2 rows affected.
education age
[] 83
[{'education': {'name': 'Cornell University', 'uuid': 'c4588ea7-567d-43dc-a0e5-67937fb77f5c'}, 'type': 'Undergraduate School', 'year': 1982}, {'education': {'name': 'Baylor College of Medicine', 'uuid': 'b2ed4e48-6cb0-42d2-8804-7557b6452e8d'}, 'type': 'Residency', 'year': None}, {'education': {'name': 'Icahn School of Medicine at Mount Sinai', 'uuid': '16e70085-6b81-4581-b903-dce9d9acf74a'}, 'type': 'Medical School', 'year': None}, {'education': {'name': 'Baylor College of Medicine', 'uuid': 'b2ed4e48-6cb0-42d2-8804-7557b6452e8d'}, 'type': 'Internship', 'year': None}] 57

Notice, that we are getting the data in the form of JSON field.

Next, the following operator ->> gets the education and age of the providers in the form of Text.

%%sql
SELECT
    record ->> 'educations' as education,
    record ->> 'age' as age
FROM provider_json
LIMIT 2;
 * postgresql://read_user:***@server/dbname
2 rows affected.
education age
[] 83
[{"education": {"name": "Cornell University", "uuid": "c4588ea7-567d-43dc-a0e5-67937fb77f5c"}, "type": "Undergraduate School", "year": 1982}, {"education": {"name": "Baylor College of Medicine", "uuid": "b2ed4e48-6cb0-42d2-8804-7557b6452e8d"}, "type": "Residency", "year": null}, {"education": {"name": "Icahn School of Medicine at Mount Sinai", "uuid": "16e70085-6b81-4581-b903-dce9d9acf74a"}, "type": "Medical School", "year": null}, {"education": {"name": "Baylor College of Medicine", "uuid": "b2ed4e48-6cb0-42d2-8804-7557b6452e8d"}, "type": "Internship", "year": null}] 57

The output at first glance looks the same, but notice the year has null value instead of None, this is because, it is returned as Text and not JSON object.

Chaining operators -> and ->>

Because the -> operator returns a JSON object, we can chain it with the operator ->> to retrieve nested elements. For example, the following statement returns all educations.

%%sql
SELECT
    record -> 'educations' ->> 'education' as colleges
FROM provider_json
LIMIT 2;
 * postgresql://read_user:***@server/dbname
2 rows affected.
colleges
None
None
%%sql
SELECT
    record -> 'educations' as educations
FROM provider_json
LIMIT 2;
 * postgresql://read_user:***@server/dbname
2 rows affected.
educations
[]
[{'education': {'name': 'Cornell University', 'uuid': 'c4588ea7-567d-43dc-a0e5-67937fb77f5c'}, 'type': 'Undergraduate School', 'year': 1982}, {'education': {'name': 'Baylor College of Medicine', 'uuid': 'b2ed4e48-6cb0-42d2-8804-7557b6452e8d'}, 'type': 'Residency', 'year': None}, {'education': {'name': 'Icahn School of Medicine at Mount Sinai', 'uuid': '16e70085-6b81-4581-b903-dce9d9acf74a'}, 'type': 'Medical School', 'year': None}, {'education': {'name': 'Baylor College of Medicine', 'uuid': 'b2ed4e48-6cb0-42d2-8804-7557b6452e8d'}, 'type': 'Internship', 'year': None}]

The reason we got None when we were running the chain command is because we got back a list of educations. We need some other feature to access elements inside a list. We will learn how to deal with this later.

Functions

Use json_array_elements_text() function to unnest the array

The specialties column is a list of specialties for a given provider. In the below output you can see that we have a list of specialties. There another table which describes what each of these specialties mean. Our goal is to find out the most popular specialty of the providers in provider_json table.

%%sql
SELECT
    record -> 'npi' as npi,
    record -> 'specialties' as specialties
FROM provider_json
LIMIT 10;
 * postgresql://read_user:***@server/dbname
10 rows affected.
npi specialties
1669554267 ['c741df78-8460-46f6-a2ce-93de36eb7205']
1467524520 ['dff54cb3-c9b0-43a0-8ca8-26c7c05bee83', '1de33770-eb1c-47fa-ab3e-f9a4ab924d9d', '76126f5d-723a-40bc-b4c1-7dcda477797a', '475b20fc-c188-4ca1-8e3a-b0614b8f8ce2']
1598949232 ['8257ce19-3245-47b0-bdd5-eb07ca777513', '50b212c8-1888-492f-95db-9620ce29fd4e']
1538557251 ['dc211bc8-16f5-4f55-bb90-df37d1f1a620']
1841451556 ['8257ce19-3245-47b0-bdd5-eb07ca777513', '79d8040f-a85f-417f-8926-d00093b2860c']
1700847084 ['c7fdb5ef-516c-46fa-986c-9ba13d3b072c', 'c72c0b39-6ded-46ca-ac15-e663ea7f2dec']
1295776458 ['a77d23ba-29f1-4afd-a0c4-62d2f0444cf7']
1528440773 ['9b9a5c64-3d99-4f23-b12c-a61c33ca9dc0']
1275517336 ['475b20fc-c188-4ca1-8e3a-b0614b8f8ce2', '69562903-0290-41fd-9af5-049916ceee13']
1871502948 ['d8f7fff0-9c5b-42e2-b011-376315d4be3d', '9c470211-e3ea-4963-9568-8d8e9b8f5c44']
%%sql
SELECT
    record -> 'npi' as npi,
    json_array_elements_text(record -> 'specialties') as specialties
FROM provider_json
LIMIT 10;
 * postgresql://read_user:***@server/dbname
10 rows affected.
npi specialties
1669554267 c741df78-8460-46f6-a2ce-93de36eb7205
1467524520 dff54cb3-c9b0-43a0-8ca8-26c7c05bee83
1467524520 1de33770-eb1c-47fa-ab3e-f9a4ab924d9d
1467524520 76126f5d-723a-40bc-b4c1-7dcda477797a
1467524520 475b20fc-c188-4ca1-8e3a-b0614b8f8ce2
1598949232 8257ce19-3245-47b0-bdd5-eb07ca777513
1598949232 50b212c8-1888-492f-95db-9620ce29fd4e
1538557251 dc211bc8-16f5-4f55-bb90-df37d1f1a620
1841451556 8257ce19-3245-47b0-bdd5-eb07ca777513
1841451556 79d8040f-a85f-417f-8926-d00093b2860c

NOTE: json_array_elements_text() creates a temporary table

This unnests each of these array into its own row. But we are not done yet, we need to group these values and count them, so that we can find out the most popular specialty. What we are essentially doing here is that we are creating a table on the fly using the json_array_elements_text() function.

Next, we need to get the specialty name by joining this temp table with specialities table. Shown below are the column types for this table.

%%sql
SELECT * FROM specialties LIMIT 5;
 * postgresql://read_user:***@server/dbname
5 rows affected.
uuid display
2a6ea574-686a-4c60-b215-508c7c192203 Foster Care Agency
fa710988-0955-422d-9ffd-0924c2b9e16c Home Health Aide Agency
ad840de3-8ec1-45ad-a3d5-858669a0f3b6 Multi-Specialty
875a24ea-9748-439a-8dcb-4aea82c18883 In Home Supportive Care Specialist
9b9a5c64-3d99-4f23-b12c-a61c33ca9dc0 Student
%%sql
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'specialties' AND table_schema = 'public';
 * postgresql://read_user:***@server/dbname
2 rows affected.
column_name data_type
uuid uuid
display text

Cast to UUID type before joining: In order to calculate the most popular specialty, we first create the temp table using json_array_elements_text() and alias the column as specialty and the temp table as s. We then JOIN this temp table with specialties table aliased as sp.

NOTE: The specialties column in our temp table contains a text representation of the UUID, so in order for us to join, we need to cast this into a UUID type. Finally, we group by the display name and show only the most popular specialty.

%%sql
SELECT
    sp.display as most_popular_specialty
FROM (SELECT
        json_array_elements_text(record -> 'specialties') as specialty
    FROM provider_json) as s
JOIN specialties as sp
ON s.specialty::uuid = sp.uuid
GROUP BY 1
ORDER BY COUNT(*) DESC
LIMIT 1
 * postgresql://read_user:***@server/dbname
1 rows affected.
most_popular_specialty
Internal Medicine

json_array_elements() to expand the array

As I have alluded to above, working with arrays comes up time and again when dealing with JSON objects. A lot of many-to-one relations are captured this way. Now we will dive into looking at locations array which contains location details for each provider. As you can see, locations is an array which contains location json objects.

Now let’s take a look at the locations field of providers_json table.

%%sql
SELECT
    record -> 'locations' as locations
FROM provider_json
LIMIT 2;
 * postgresql://read_user:***@server/dbname
2 rows affected.
locations
[{'uuid': '8d96e755-65d3-470f-baaf-c20c73d673bb', 'name': None, 'address': '2296 Opitz Blvd # 255, Woodbridge, VA 22191, US', 'address_details': {'street': '2296 Opitz Blvd # 255', 'address_line_1': '2296 Opitz Blvd', 'address_line_2': '# 255', 'city': 'Woodbridge', 'state': 'VA', 'zip': '22191'}, 'latitude': 38.63739, 'longitude': -77.28514129999999, 'google_maps_link': 'https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US', 'phone_numbers': [{'phone': '7036704986', 'details': 'primary'}], 'confidence': 2}]
[{'uuid': '0decdff6-1703-4d4a-9821-caad045c2f70', 'name': None, 'address': '3941 J St # 260, Sacramento, CA 95819, US', 'address_details': {'street': '3941 J St # 260', 'address_line_1': '3941 J St', 'address_line_2': '# 260', 'city': 'Sacramento', 'state': 'CA', 'zip': '95819'}, 'latitude': 38.5706552, 'longitude': -121.4535451, 'google_maps_link': 'https://www.google.com/maps/@38.5706552,-121.4535451?q=3941%20J%20St%20%23%20260%2C%20Sacramento%2C%20CA%2095819%2C%20US', 'phone_numbers': [{'phone': '9167362323', 'details': 'primary'}, {'phone': '9169845318', 'details': 'secondary'}], 'confidence': 4}, {'uuid': '7974eec5-2760-4642-9ce8-b2401f33c61f', 'name': None, 'address': '1580 Creekside Dr # 250, Folsom, CA 95630, US', 'address_details': {'street': '1580 Creekside Dr # 250', 'address_line_1': '1580 Creekside Dr', 'address_line_2': '# 250', 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.6687282, 'longitude': -121.1479608, 'google_maps_link': 'https://www.google.com/maps/@38.6687282,-121.1479608?q=1580%20Creekside%20Dr%20%23%20250%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169845318', 'details': 'primary'}], 'confidence': 1}, {'uuid': '3840e421-0e5b-498b-b937-2eb75ef3e09b', 'name': None, 'address': '8120 Timberlake Way # 108, Sacramento, CA 95823, US', 'address_details': {'street': '8120 Timberlake Way # 108', 'address_line_1': '8120 Timberlake Way', 'address_line_2': '# 108', 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4620284, 'longitude': -121.4144321, 'google_maps_link': 'https://www.google.com/maps/@38.4620284,-121.4144321?q=8120%20Timberlake%20Way%20%23%20108%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '9166885551', 'details': 'primary'}], 'confidence': 1}, {'uuid': '48a62df8-f2f4-4320-905a-c67600804be6', 'name': 'Mercy Hospital of Folsom', 'address': '1650 Creekside Dr, Folsom, CA 95630, US', 'address_details': {'street': '1650 Creekside Dr', 'address_line_1': '1650 Creekside Dr', 'address_line_2': None, 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.670333, 'longitude': -121.1458461, 'google_maps_link': 'https://www.google.com/maps/@38.670333,-121.1458461?q=1650%20Creekside%20Dr%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169837400', 'details': 'primary'}], 'confidence': 1}, {'uuid': 'f41bc9cd-e982-4022-9427-3a6bad872a2d', 'name': None, 'address': '7500 Hospital Dr, Sacramento, CA 95823, US', 'address_details': {'street': '7500 Hospital Dr', 'address_line_1': '7500 Hospital Dr', 'address_line_2': None, 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4638056, 'longitude': -121.4164937, 'google_maps_link': 'https://www.google.com/maps/@38.4638056,-121.4164937?q=7500%20Hospital%20Dr%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '8006774491', 'details': 'primary'}], 'confidence': 1}]
%%sql
SELECT
    record->'locations' as locations
FROM provider_json
LIMIT 2;
 * postgresql://read_user:***@server/dbname
2 rows affected.
locations
[{'uuid': '8d96e755-65d3-470f-baaf-c20c73d673bb', 'name': None, 'address': '2296 Opitz Blvd # 255, Woodbridge, VA 22191, US', 'address_details': {'street': '2296 Opitz Blvd # 255', 'address_line_1': '2296 Opitz Blvd', 'address_line_2': '# 255', 'city': 'Woodbridge', 'state': 'VA', 'zip': '22191'}, 'latitude': 38.63739, 'longitude': -77.28514129999999, 'google_maps_link': 'https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US', 'phone_numbers': [{'phone': '7036704986', 'details': 'primary'}], 'confidence': 2}]
[{'uuid': '0decdff6-1703-4d4a-9821-caad045c2f70', 'name': None, 'address': '3941 J St # 260, Sacramento, CA 95819, US', 'address_details': {'street': '3941 J St # 260', 'address_line_1': '3941 J St', 'address_line_2': '# 260', 'city': 'Sacramento', 'state': 'CA', 'zip': '95819'}, 'latitude': 38.5706552, 'longitude': -121.4535451, 'google_maps_link': 'https://www.google.com/maps/@38.5706552,-121.4535451?q=3941%20J%20St%20%23%20260%2C%20Sacramento%2C%20CA%2095819%2C%20US', 'phone_numbers': [{'phone': '9167362323', 'details': 'primary'}, {'phone': '9169845318', 'details': 'secondary'}], 'confidence': 4}, {'uuid': '7974eec5-2760-4642-9ce8-b2401f33c61f', 'name': None, 'address': '1580 Creekside Dr # 250, Folsom, CA 95630, US', 'address_details': {'street': '1580 Creekside Dr # 250', 'address_line_1': '1580 Creekside Dr', 'address_line_2': '# 250', 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.6687282, 'longitude': -121.1479608, 'google_maps_link': 'https://www.google.com/maps/@38.6687282,-121.1479608?q=1580%20Creekside%20Dr%20%23%20250%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169845318', 'details': 'primary'}], 'confidence': 1}, {'uuid': '3840e421-0e5b-498b-b937-2eb75ef3e09b', 'name': None, 'address': '8120 Timberlake Way # 108, Sacramento, CA 95823, US', 'address_details': {'street': '8120 Timberlake Way # 108', 'address_line_1': '8120 Timberlake Way', 'address_line_2': '# 108', 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4620284, 'longitude': -121.4144321, 'google_maps_link': 'https://www.google.com/maps/@38.4620284,-121.4144321?q=8120%20Timberlake%20Way%20%23%20108%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '9166885551', 'details': 'primary'}], 'confidence': 1}, {'uuid': '48a62df8-f2f4-4320-905a-c67600804be6', 'name': 'Mercy Hospital of Folsom', 'address': '1650 Creekside Dr, Folsom, CA 95630, US', 'address_details': {'street': '1650 Creekside Dr', 'address_line_1': '1650 Creekside Dr', 'address_line_2': None, 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.670333, 'longitude': -121.1458461, 'google_maps_link': 'https://www.google.com/maps/@38.670333,-121.1458461?q=1650%20Creekside%20Dr%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169837400', 'details': 'primary'}], 'confidence': 1}, {'uuid': 'f41bc9cd-e982-4022-9427-3a6bad872a2d', 'name': None, 'address': '7500 Hospital Dr, Sacramento, CA 95823, US', 'address_details': {'street': '7500 Hospital Dr', 'address_line_1': '7500 Hospital Dr', 'address_line_2': None, 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4638056, 'longitude': -121.4164937, 'google_maps_link': 'https://www.google.com/maps/@38.4638056,-121.4164937?q=7500%20Hospital%20Dr%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '8006774491', 'details': 'primary'}], 'confidence': 1}]

Each of these location objects have the following fields:

%%sql
SELECT
    json_object_keys(record->'locations'->-1)
FROM provider_json
LIMIT 10;
 * postgresql://read_user:***@server/dbname
10 rows affected.
json_object_keys
uuid
name
address
address_details
latitude
longitude
google_maps_link
phone_numbers
confidence
uuid

Similar to how we expanded the specialties array, we will be expanding this locations array into their own records using json_array_elements() function. Now, for each provider, we have one row for each of the locations where that provider works.

Let’s say our objective is to keep the providers who have atleast 2 locations with confidence >= 4. More specifically: How many providers have at least 2 locations with confidence of 4 or higher?

Using json_array_elements() on the locations field, we first build out our intermediate table. As shown here are the first 5 rows. Observe that we have locations with confidence scores ranging from 0-5.

%%sql
SELECT
    record->'npi' as provider,
    json_array_elements(record->'locations') as location
FROM provider_json
LIMIT 5;
 * postgresql://read_user:***@server/dbname
5 rows affected.
provider location
1669554267 {'uuid': '8d96e755-65d3-470f-baaf-c20c73d673bb', 'name': None, 'address': '2296 Opitz Blvd # 255, Woodbridge, VA 22191, US', 'address_details': {'street': '2296 Opitz Blvd # 255', 'address_line_1': '2296 Opitz Blvd', 'address_line_2': '# 255', 'city': 'Woodbridge', 'state': 'VA', 'zip': '22191'}, 'latitude': 38.63739, 'longitude': -77.28514129999999, 'google_maps_link': 'https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US', 'phone_numbers': [{'phone': '7036704986', 'details': 'primary'}], 'confidence': 2}
1467524520 {'uuid': '0decdff6-1703-4d4a-9821-caad045c2f70', 'name': None, 'address': '3941 J St # 260, Sacramento, CA 95819, US', 'address_details': {'street': '3941 J St # 260', 'address_line_1': '3941 J St', 'address_line_2': '# 260', 'city': 'Sacramento', 'state': 'CA', 'zip': '95819'}, 'latitude': 38.5706552, 'longitude': -121.4535451, 'google_maps_link': 'https://www.google.com/maps/@38.5706552,-121.4535451?q=3941%20J%20St%20%23%20260%2C%20Sacramento%2C%20CA%2095819%2C%20US', 'phone_numbers': [{'phone': '9167362323', 'details': 'primary'}, {'phone': '9169845318', 'details': 'secondary'}], 'confidence': 4}
1467524520 {'uuid': '7974eec5-2760-4642-9ce8-b2401f33c61f', 'name': None, 'address': '1580 Creekside Dr # 250, Folsom, CA 95630, US', 'address_details': {'street': '1580 Creekside Dr # 250', 'address_line_1': '1580 Creekside Dr', 'address_line_2': '# 250', 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.6687282, 'longitude': -121.1479608, 'google_maps_link': 'https://www.google.com/maps/@38.6687282,-121.1479608?q=1580%20Creekside%20Dr%20%23%20250%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169845318', 'details': 'primary'}], 'confidence': 1}
1467524520 {'uuid': '3840e421-0e5b-498b-b937-2eb75ef3e09b', 'name': None, 'address': '8120 Timberlake Way # 108, Sacramento, CA 95823, US', 'address_details': {'street': '8120 Timberlake Way # 108', 'address_line_1': '8120 Timberlake Way', 'address_line_2': '# 108', 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4620284, 'longitude': -121.4144321, 'google_maps_link': 'https://www.google.com/maps/@38.4620284,-121.4144321?q=8120%20Timberlake%20Way%20%23%20108%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '9166885551', 'details': 'primary'}], 'confidence': 1}
1467524520 {'uuid': '48a62df8-f2f4-4320-905a-c67600804be6', 'name': 'Mercy Hospital of Folsom', 'address': '1650 Creekside Dr, Folsom, CA 95630, US', 'address_details': {'street': '1650 Creekside Dr', 'address_line_1': '1650 Creekside Dr', 'address_line_2': None, 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.670333, 'longitude': -121.1458461, 'google_maps_link': 'https://www.google.com/maps/@38.670333,-121.1458461?q=1650%20Creekside%20Dr%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169837400', 'details': 'primary'}], 'confidence': 1}

Now, our next task is to peek into each of these location objects and filter this intermediate table for rows containing confidence >= 4.

%%sql
SELECT
    l.provider,
    l.location
FROM (SELECT
    record->'npi' as provider,
    json_array_elements(record->'locations') as location
    FROM provider_json) as l
WHERE (l.location->>'confidence')::int >= 4
LIMIT 5;
 * postgresql://read_user:***@server/dbname
5 rows affected.
provider location
1467524520 {'uuid': '0decdff6-1703-4d4a-9821-caad045c2f70', 'name': None, 'address': '3941 J St # 260, Sacramento, CA 95819, US', 'address_details': {'street': '3941 J St # 260', 'address_line_1': '3941 J St', 'address_line_2': '# 260', 'city': 'Sacramento', 'state': 'CA', 'zip': '95819'}, 'latitude': 38.5706552, 'longitude': -121.4535451, 'google_maps_link': 'https://www.google.com/maps/@38.5706552,-121.4535451?q=3941%20J%20St%20%23%20260%2C%20Sacramento%2C%20CA%2095819%2C%20US', 'phone_numbers': [{'phone': '9167362323', 'details': 'primary'}, {'phone': '9169845318', 'details': 'secondary'}], 'confidence': 4}
1598949232 {'uuid': 'b4ffbd01-7c97-4d05-8b9f-33c883beb9cc', 'name': None, 'address': '1932 Alcoa Hwy # 270, Knoxville, TN 37920, US', 'address_details': {'street': '1932 Alcoa Hwy # 270', 'address_line_1': '1932 Alcoa Hwy', 'address_line_2': '# 270', 'city': 'Knoxville', 'state': 'TN', 'zip': '37920'}, 'latitude': 35.9391103, 'longitude': -83.94478629999999, 'google_maps_link': 'https://www.google.com/maps/@35.9391103,-83.94478629999999?q=1932%20Alcoa%20Hwy%20%23%20270%2C%20Knoxville%2C%20TN%2037920%2C%20US', 'phone_numbers': [{'phone': '8652514658', 'details': 'primary'}, {'phone': '8653056058', 'details': 'secondary'}, {'phone': '8656706199', 'details': 'secondary'}], 'confidence': 4}
1598949232 {'uuid': '0b922f49-2e0c-4964-9066-d371fc91d0ee', 'name': 'University of Tennessee Medical Center', 'address': '1924 Alcoa Hwy, Knoxville, TN 37920, US', 'address_details': {'street': '1924 Alcoa Hwy', 'address_line_1': '1924 Alcoa Hwy', 'address_line_2': None, 'city': 'Knoxville', 'state': 'TN', 'zip': '37920'}, 'latitude': 35.9398588, 'longitude': -83.9430702, 'google_maps_link': 'https://www.google.com/maps/@35.9398588,-83.9430702?q=1924%20Alcoa%20Hwy%2C%20Knoxville%2C%20TN%2037920%2C%20US', 'phone_numbers': [{'phone': '8652514658', 'details': 'primary'}, {'phone': '8653056058', 'details': 'secondary'}], 'confidence': 4}
1598949232 {'uuid': '08626d97-25f7-40b8-9d84-5561abf8bcf7', 'name': None, 'address': '5779 Creekwood Park Blvd, Lenoir City, TN 37772, US', 'address_details': {'street': '5779 Creekwood Park Blvd', 'address_line_1': '5779 Creekwood Park Blvd', 'address_line_2': None, 'city': 'Lenoir City', 'state': 'TN', 'zip': '37772'}, 'latitude': 35.837749, 'longitude': -84.271581, 'google_maps_link': 'https://www.google.com/maps/@35.837749,-84.271581?q=5779%20Creekwood%20Park%20Blvd%2C%20Lenoir%20City%2C%20TN%2037772%2C%20US', 'phone_numbers': [{'phone': '8652514658', 'details': 'primary'}, {'phone': '8652514659', 'details': 'secondary'}], 'confidence': 4}
1700847084 {'uuid': '8407a81c-d85d-4994-8c4e-3807869a0737', 'name': "North Hawaii Community Hospital, an Affiliate of The Queen's Health Systems", 'address': '67-1125 Mamalahoa Hwy # 110, Waimea, HI 96743, US', 'address_details': {'street': '67-1125 Mamalahoa Hwy # 110', 'address_line_1': '67-1125 Mamalahoa Hwy', 'address_line_2': '# 110', 'city': 'Waimea', 'state': 'HI', 'zip': '96743'}, 'latitude': 20.02287, 'longitude': -155.6650027, 'google_maps_link': 'https://www.google.com/maps/@20.02287,-155.6650027?q=67-1125%20Mamalahoa%20Hwy%20%23%20110%2C%20Waimea%2C%20HI%2096743%2C%20US', 'phone_numbers': [{'phone': '8085504939', 'details': 'primary'}], 'confidence': 4}

The next constraint or requirement is that we want to keep only providers that have atleast 2 locations with high confidence. This can be achieved using a GROUP BY provider and then keeping only groups that have atleast 2 locations with high confidence by using the HAVING clause.

%%sql
SELECT
    provider::text
FROM (SELECT
    record->'npi' as provider,
    json_array_elements(record->'locations') as location
    FROM provider_json) as l
WHERE (location->>'confidence')::int>=4
GROUP BY provider::text
HAVING COUNT(location)>=2
LIMIT 5;
 * postgresql://read_user:***@server/dbname
5 rows affected.
provider
"1194791335"
"1508295841"
"1407907256"
"1669436309"
"1821139510"

Finally, we will count the number of providers that meet this criteria.

%%sql
SELECT
    COUNT(DISTINCT provider) as num_providers
FROM (SELECT
            provider::text
        FROM (SELECT
                record->'npi' as provider,
                json_array_elements(record->'locations') as location
              FROM provider_json) as l
      WHERE (location->>'confidence')::int>=4
      GROUP BY provider::text
      HAVING COUNT(location)>=2) as num_p
 * postgresql://read_user:***@server/dbname
1 rows affected.
num_providers
2054

Exploring arrays further

In all of these provider records, we see that there is an insurances field. Some providers have a lot of insurances that they accept. This is the unique list of all insurance plans a provider accepts (we represent this as UUID which connects to the insurances table).

Find the total number of unique insurance plans accepted by all providers at the most popular location of this data set. (Popular = the most providers practice there)

Answer: My first thought is to calculate the most popular location and then use that to filter rows containing that location. This will leave us with a list of providers practicing at that location and the insurances that they accept.

First, let’s take a peek at the insurances list for each provider.

%%sql
SELECT
    npi,
    record->'insurances' as insurances
FROM provider_json
LIMIT 5;
 * postgresql://read_user:***@server/dbname
5 rows affected.
npi insurances
1669554267 []
1467524520 ['5ad8ff7d-5227-46d7-a392-d28665610257', '3a56d135-dded-48c3-90db-a4f5d9da9786', 'dc059078-9115-488f-bdf6-e9106822225a', 'daaa1ac3-eb9d-424b-a249-8d7c51d39ce1', 'ae0b1dd0-7356-41ee-8df4-871a52451eb7', 'd280371e-f9f1-42a2-aed5-fcb0fe89c219', '3e1ac59e-223a-469d-bb30-63f6988e4134', '5da15840-6f46-4978-8747-8be580d48079', '9eb760da-dade-475d-9390-0c998ff50fcc', 'ad3204b8-1355-48f4-9ee3-194fc5d970f1', '2d5231c0-2d29-4026-9138-5dcbd0591404', '6ab9d1d3-b969-4352-9fac-68b6494cbe7b', '67f5d31d-88f6-4f7a-9b1d-74667ca37f08', '90e00746-69ed-430e-98f9-f637abcbe144', 'b10209f5-96d9-4bbc-8fda-38590460a34a', 'cfb5d792-2d33-444f-a1e8-5487de63e9a0', '51a98931-aace-42c1-8890-66ce53eb1971', 'a6e318c6-cf38-4d65-8bd0-6976b647974b', 'ba8cc746-665a-4925-8f5f-3b213dad238d', '36f607e8-2785-411e-8384-f17195f09468', 'cbbc9743-5cdf-4df2-92ca-fcde6a536160', 'a003a89c-eecb-470b-938d-1b97225f3d81', '791d592b-d6c0-4576-99d0-d78dac8496a1', 'f6a0b45a-042e-4886-82c6-b3b9e086cf83', '57ecd933-4bb8-4289-9080-a8259a6d683c', '6d2b08ec-2f67-4a5b-ac89-642f15e032b8', '749b0b00-ddf8-4cef-bc5b-5694cccbad0f', 'a7edd72c-4010-448e-8624-c92f997fb2a2', '730a2f48-0cb6-49b8-a5bb-807534ef0f24', '9d5266c8-a7b3-4d27-ab96-3a524b4cc491', 'f7627ef2-7d1d-4974-b934-2fdb29c0ff4f', '69af8b6c-06b3-4b66-91c5-ede90be026d8', '96292303-3e6e-4758-a851-f049d7956345', '61a6872e-3c89-4c93-b9b2-ca6ca1db7a8e', '2fdd9e5c-840a-40f1-a575-8ac6ec6094eb', 'd4357c57-76d9-4fda-94b6-16ca8869cd45', 'b1085d28-d17c-42c1-8448-fc56f5bf8ff8', 'ebf487b7-36a0-4ba9-84f6-c2f5f004e06a', '5b7ba09b-9a42-454a-84a6-bc1ba7449379', '84d17e89-55e7-4a08-8468-f52780b926e4', '2e7f2d0a-308e-43d4-9bfb-227a7571550f', 'e364db55-c32d-4e9d-957b-84bfbf62dfea', 'b7f67654-891f-4152-acc8-def976e30826', 'c94db5a7-e5f7-4d1f-9ff9-7ba8e683b57d', 'f2466b1a-782f-4cac-a204-cacb50b8f322', '6417d256-4fdb-4a60-903b-8f42b0405653', '4a1b53fe-79e5-464c-83d9-14143306c02c', 'eecb0179-615f-42cd-9fc0-8f7570f4de63', '0bcbfcff-0e3f-47bc-a32e-a52add577d5b', '9d4642de-8626-4889-8315-9e7ec4f1cd1f', '71e54292-85b7-4cff-a1f0-e06e92f64b5e', '672921ed-9c68-45a5-b808-3df8261d8ff0', 'de8b20b6-a2e3-460a-9018-9d89c009cef8', '40e72184-ff85-456b-a794-35967bc8cfb2', '4d893957-8038-409e-8c96-2e5edd626d45', 'b70e5e27-f14b-4a83-84d8-bb711b70886f', 'd122f267-a255-4eed-afa1-8252e2cbdc17', '1d76bc02-2621-4041-914a-9b9e28babeff', 'd308de64-b0a3-4a50-96b1-4aa813794a1d', '49841d97-6ede-48af-841f-8548d2930e18', 'ee9db5f9-de43-4c3e-a156-d5ba8ba1f6b3', '931810bc-cd6d-4bb6-a77a-7c979fbac3fd', 'c5734165-3e62-4f14-9559-1e5caec18179', 'a0dd1e6d-ccf8-4f09-bbd1-e4fcbdf438ac', 'c1ba934d-cf6c-462d-af28-9cc92ab24937', '4c388b98-8aca-4c28-bd43-74ab84d438f2', '01047fb0-4036-4dcd-81e0-62e6cef4e3e8', '1c27a9d5-3412-41f7-bde2-87eba4abcb5e', '9161f466-8e7e-493d-bd61-d647755dbc53', '46db02d5-a040-4ca5-9510-ef0591feca22', '3a4b2c17-4235-4982-b275-4018c5283ab4', 'ed3f6f88-a4ad-4b56-9d31-8d086ddd1521', 'dae8c52c-b0e0-4688-9078-6e5dcef9de03', '87842763-d404-418e-a265-fea0ef2cac50', '413e5c40-5f12-47c2-9142-4ee27ff61372', 'd54100c0-7990-4638-b57d-79df580f192f', '5b20ceac-5463-4255-b55d-c1a639cc3cc7', '239f0c59-4f28-43ab-9fde-ded6ad8e6cfe', '837c4cd1-4324-44eb-8af3-5bcf2622d8be', '1ec09209-7e22-4698-981d-9cbb150c5746', 'd1c4b24d-2136-4e8d-8497-8ff1fced234d', 'f67a5513-7c63-4243-afcf-84684202f901', '7dca5f3b-e88c-4c32-858f-f21f6fb1ff85', 'a21318a9-a835-4e15-87b2-da893031b6ce', '0486ad3e-5eb2-4589-8383-24bfcec56b47', 'faa0d1f5-388e-4bc4-b60f-baeab55a9930', '4762f4c5-411c-4c86-8af4-324355546ab9', '6a76f092-a95b-40ed-9bbc-c5f5dffb6495', '1038bfd5-c637-493f-95c7-f294b8a84fd6', '81d688b5-d3c2-4c83-8572-b82423525f5b', '05ac5752-da26-4347-a3d2-8b13b008fd70', 'd61b4d0a-4897-4403-afd6-d84fc909caa6', '5c1b23af-e735-408d-94c3-c0146b19b180', '2b8503a8-9ed9-4481-9dd3-23beeee297e9', 'e560745b-6da7-4b5f-b147-7b4f1fea3cd8', '2040857e-d4ad-417d-92ab-52e3c95e5cba', '6584c17c-b804-4f94-ad34-cd8e63f76162', '79cff8b6-cf98-4dcf-8b54-383fa2c43689', 'ea604298-bb16-4d84-b76a-ce6d7c8c3262', 'd4805da1-cb4a-420a-95a4-daee6f7f0fd9', '5278b602-0ea2-4716-af69-9b1560a56ef6', 'b8581687-2b4e-4f2f-91a9-9128f5bec540', '94eaf60d-67fa-4081-a37e-048dfa107f36', '787fa83d-b044-4c30-a7dd-50551c9badb1', '4c3c10e6-6371-463a-af5f-3bb5212aa05e', 'a0587784-a863-47c3-84ae-0dfda49a4be0', '80a261e0-4d0f-4e91-8a70-0c761fbe40ca']
1598949232 ['2f18249b-fa80-48bd-bdf0-c02d13639fc9', '4a3d26e7-f309-4100-baed-60f39e0db3c9', '08533208-cdf3-4649-b0a3-3f05ca9f93be', '5ad8ff7d-5227-46d7-a392-d28665610257', '32fbf66a-07e0-412f-944c-4238c1366303', '0374d936-3b01-451c-bb4e-3d872830d202', 'dc059078-9115-488f-bdf6-e9106822225a', 'daaa1ac3-eb9d-424b-a249-8d7c51d39ce1', 'ae0b1dd0-7356-41ee-8df4-871a52451eb7', 'c656776f-2979-463a-9de3-1e5ae99d8ad2', '5495056f-faff-4144-b5f8-7fef1705f5ff', 'd280371e-f9f1-42a2-aed5-fcb0fe89c219', '3e1ac59e-223a-469d-bb30-63f6988e4134', '5da15840-6f46-4978-8747-8be580d48079', '9eb760da-dade-475d-9390-0c998ff50fcc', 'ad3204b8-1355-48f4-9ee3-194fc5d970f1', '2d5231c0-2d29-4026-9138-5dcbd0591404', '6ab9d1d3-b969-4352-9fac-68b6494cbe7b', 'd6ea2513-25df-41f8-9dbf-c56916e98041', 'b10209f5-96d9-4bbc-8fda-38590460a34a', '1e41cffb-e231-41a5-b088-5f356177d721', 'db401fa1-945b-4f0e-ac38-fdbbe590a798', '8517ef20-844d-4152-8690-1b8dda67e159', 'cfb5d792-2d33-444f-a1e8-5487de63e9a0', 'ba8cc746-665a-4925-8f5f-3b213dad238d', 'a003a89c-eecb-470b-938d-1b97225f3d81', '6fc564dc-38c4-4e83-8d4e-f991d17f3323', '842eb14d-997f-4036-bdd6-8bd6db0b9c9f', '5933a762-9e94-4b8b-bcc6-2d29d34acb7b', '3262782b-54c0-4d9d-b46e-9ea4ddfe11f5', 'ab996d6b-a0d7-4a21-91fb-498f2f446b6c', '5bea2eb5-a841-46b1-8952-0cfd3bdb1708', 'a7edd72c-4010-448e-8624-c92f997fb2a2', '730a2f48-0cb6-49b8-a5bb-807534ef0f24', '12403618-49d5-43ee-99ad-5e99194fe05c', 'f7627ef2-7d1d-4974-b934-2fdb29c0ff4f', '96292303-3e6e-4758-a851-f049d7956345', 'd4357c57-76d9-4fda-94b6-16ca8869cd45', '7dce5e40-f667-4b7c-b411-dc505f8d4143', '3577526f-463a-4af7-8696-4e31408c51e8', '534fe74b-3ce4-4bc1-8fba-ec5836f37d48', 'a3c11024-243e-41fb-94ba-2bba1de12e1c', '2e7f2d0a-308e-43d4-9bfb-227a7571550f', 'ac593265-f7d1-4088-8e39-5febc9ed3537', '7479e338-e4f2-43c2-b1ca-d33fb6727961', 'f5b8e6f3-7311-4d71-aec4-02088111f7b0', 'f2466b1a-782f-4cac-a204-cacb50b8f322', '6417d256-4fdb-4a60-903b-8f42b0405653', '4a1b53fe-79e5-464c-83d9-14143306c02c', 'eecb0179-615f-42cd-9fc0-8f7570f4de63', 'cebd0ac8-2c0e-4ee5-8ec7-f2480bc09fe8', '71e54292-85b7-4cff-a1f0-e06e92f64b5e', '672921ed-9c68-45a5-b808-3df8261d8ff0', 'eda43d1e-88e5-440e-8570-6f8197037ec8', '89209861-c3b0-4a50-9ccb-bd21eae71fbf', 'de8b20b6-a2e3-460a-9018-9d89c009cef8', 'e55400ce-a20a-4bc3-8792-b8e3f5449248', '4bb4a413-a23c-49fe-bec7-d8621560205b', '63642695-9e9a-458c-97a2-a9db8cc1c9ed', '512ccd7d-a2ed-4613-9c53-232c85a50878', 'ee9db5f9-de43-4c3e-a156-d5ba8ba1f6b3', 'c5734165-3e62-4f14-9559-1e5caec18179', '931810bc-cd6d-4bb6-a77a-7c979fbac3fd', 'f8af8717-40ea-40f1-b721-b627163f9efb', 'c912b33b-c624-41e5-987e-125d7f78f1f5', '0b94bcdf-dc51-495a-b7bd-2c317f2778a8', '78799ee1-c078-4dc8-a843-1df3aaad5960', 'a0dd1e6d-ccf8-4f09-bbd1-e4fcbdf438ac', 'd8addf29-1054-4ccb-b179-dda65f7fefdd', 'c1ba934d-cf6c-462d-af28-9cc92ab24937', 'a2291231-46dd-407e-a228-135ad229e056', '4c388b98-8aca-4c28-bd43-74ab84d438f2', '01047fb0-4036-4dcd-81e0-62e6cef4e3e8', '9161f466-8e7e-493d-bd61-d647755dbc53', '5a7af96b-f6b1-4b6c-9f6b-c5f33377b1df', '3a4b2c17-4235-4982-b275-4018c5283ab4', 'ed3f6f88-a4ad-4b56-9d31-8d086ddd1521', '40e1a551-3a18-4483-9f36-435ccbae03d6', 'b0c8f4fc-ec98-493b-a560-3c21e652319d', '413e5c40-5f12-47c2-9142-4ee27ff61372', '4493db22-1f7c-4e10-89c4-9663133d87ef', '5b7af866-43fc-4071-9878-b366df20993e', 'c1710528-c7c8-4118-b8a3-5c041eb7e9ab', '837c4cd1-4324-44eb-8af3-5bcf2622d8be', '9fe2c75c-eef5-477b-af9e-1f0e8ef180a0', '7534e133-0828-4dca-b121-778e5550e736', 'f67a5513-7c63-4243-afcf-84684202f901', 'd1c4b24d-2136-4e8d-8497-8ff1fced234d', '1ec09209-7e22-4698-981d-9cbb150c5746', '7dca5f3b-e88c-4c32-858f-f21f6fb1ff85', 'a21318a9-a835-4e15-87b2-da893031b6ce', '297392f2-c6f3-4fef-8f8a-a4b5a11a93ba', '8ae49175-1941-4266-9ec4-20488802e24c', 'f2d90ee2-d68f-4445-8c64-bf87ab01fe0e', 'd6ffded9-3495-4b70-8fda-cf88d95b19b9', '4762f4c5-411c-4c86-8af4-324355546ab9', 'a4cd92f7-5f0b-43a0-a9fa-c13fab0c12ff', '1b13bd4b-724a-4900-92ac-58d1f19a2f6f', 'cc730ea4-d7ee-4558-a2bd-0f50aea6e6eb', '53e4b27d-e8a2-4733-8427-29355443ba48', '1038bfd5-c637-493f-95c7-f294b8a84fd6', '81d688b5-d3c2-4c83-8572-b82423525f5b', '5c1b23af-e735-408d-94c3-c0146b19b180', 'd61b4d0a-4897-4403-afd6-d84fc909caa6', 'e560745b-6da7-4b5f-b147-7b4f1fea3cd8', '8b9795dc-4b56-4271-afe8-ef382ea6bff2', '99f5e2be-1ceb-459b-b5fa-cdd468cb6197', 'd4805da1-cb4a-420a-95a4-daee6f7f0fd9', 'ea604298-bb16-4d84-b76a-ce6d7c8c3262', 'b8581687-2b4e-4f2f-91a9-9128f5bec540', '5a456d15-0fdd-4418-a149-b13ebe2e3738', '94eaf60d-67fa-4081-a37e-048dfa107f36', '787fa83d-b044-4c30-a7dd-50551c9badb1', '4c3c10e6-6371-463a-af5f-3bb5212aa05e']
1538557251 []
1841451556 ['c4ab979a-e2ca-45f6-9e5d-07a49bd52d9e', '5ad8ff7d-5227-46d7-a392-d28665610257', '32fbf66a-07e0-412f-944c-4238c1366303', '0374d936-3b01-451c-bb4e-3d872830d202', 'dc059078-9115-488f-bdf6-e9106822225a', 'daaa1ac3-eb9d-424b-a249-8d7c51d39ce1', 'ae0b1dd0-7356-41ee-8df4-871a52451eb7', '5495056f-faff-4144-b5f8-7fef1705f5ff', 'c656776f-2979-463a-9de3-1e5ae99d8ad2', 'd280371e-f9f1-42a2-aed5-fcb0fe89c219', '3e1ac59e-223a-469d-bb30-63f6988e4134', '5da15840-6f46-4978-8747-8be580d48079', '9eb760da-dade-475d-9390-0c998ff50fcc', 'ad3204b8-1355-48f4-9ee3-194fc5d970f1', '2d5231c0-2d29-4026-9138-5dcbd0591404', 'e81b5005-c300-4b73-9c30-75a0a438e312', 'edfdd7b3-5732-48eb-9814-b0f28a6c90a4', '6ab9d1d3-b969-4352-9fac-68b6494cbe7b', '67f5d31d-88f6-4f7a-9b1d-74667ca37f08', '181a3fbb-0b05-4fd1-83ff-c8c596563534', 'a1ad7230-1275-447e-96e5-1ebfa1a9022b', 'd6ea2513-25df-41f8-9dbf-c56916e98041', 'b10209f5-96d9-4bbc-8fda-38590460a34a', '1e41cffb-e231-41a5-b088-5f356177d721', 'db401fa1-945b-4f0e-ac38-fdbbe590a798', '7ee3d179-6f3f-42fd-a466-0a42c36c3532', '8517ef20-844d-4152-8690-1b8dda67e159', 'cfb5d792-2d33-444f-a1e8-5487de63e9a0', '51a98931-aace-42c1-8890-66ce53eb1971', 'a6e318c6-cf38-4d65-8bd0-6976b647974b', 'adbfbd00-de76-4694-9786-3c53ba965936', 'ba8cc746-665a-4925-8f5f-3b213dad238d', '36f607e8-2785-411e-8384-f17195f09468', 'a003a89c-eecb-470b-938d-1b97225f3d81', '5933a762-9e94-4b8b-bcc6-2d29d34acb7b', '6d2b08ec-2f67-4a5b-ac89-642f15e032b8', '3262782b-54c0-4d9d-b46e-9ea4ddfe11f5', '6fa0da45-27ad-4f64-84c6-454c28e6d9bb', 'ab996d6b-a0d7-4a21-91fb-498f2f446b6c', 'c6df80aa-a974-433b-9d95-13fc3bbe3480', '5bea2eb5-a841-46b1-8952-0cfd3bdb1708', '730a2f48-0cb6-49b8-a5bb-807534ef0f24', '12403618-49d5-43ee-99ad-5e99194fe05c', '38c1e68c-7591-4244-a1a4-8ac014da340c', 'f5b0dad0-cbf2-4ad6-bdf7-cf5f26743b4c', '69af8b6c-06b3-4b66-91c5-ede90be026d8', '008e2415-00a0-4ff5-acea-ba8f4f136db3', '61a6872e-3c89-4c93-b9b2-ca6ca1db7a8e', 'd4357c57-76d9-4fda-94b6-16ca8869cd45', '7dce5e40-f667-4b7c-b411-dc505f8d4143', '3577526f-463a-4af7-8696-4e31408c51e8', 'ebf487b7-36a0-4ba9-84f6-c2f5f004e06a', '534fe74b-3ce4-4bc1-8fba-ec5836f37d48', '5b7ba09b-9a42-454a-84a6-bc1ba7449379', '84d17e89-55e7-4a08-8468-f52780b926e4', 'a3c11024-243e-41fb-94ba-2bba1de12e1c', '2e7f2d0a-308e-43d4-9bfb-227a7571550f', 'c94db5a7-e5f7-4d1f-9ff9-7ba8e683b57d', '7479e338-e4f2-43c2-b1ca-d33fb6727961', 'df6374fa-0f70-4fa7-a0b9-47c9233f9f14', 'f5b8e6f3-7311-4d71-aec4-02088111f7b0', '1584fed3-37f3-46a9-822e-b56f5c3a7367', 'f2466b1a-782f-4cac-a204-cacb50b8f322', '6417d256-4fdb-4a60-903b-8f42b0405653', '4a1b53fe-79e5-464c-83d9-14143306c02c', '2b9309d3-f2da-4d39-8384-187cf33b2a95', '37bc5e7f-9655-416a-9938-b10ea5ce3e9a', 'eecb0179-615f-42cd-9fc0-8f7570f4de63', '0bcbfcff-0e3f-47bc-a32e-a52add577d5b', '9d4642de-8626-4889-8315-9e7ec4f1cd1f', '71e54292-85b7-4cff-a1f0-e06e92f64b5e', '672921ed-9c68-45a5-b808-3df8261d8ff0', 'eda43d1e-88e5-440e-8570-6f8197037ec8', '89209861-c3b0-4a50-9ccb-bd21eae71fbf', 'de8b20b6-a2e3-460a-9018-9d89c009cef8', 'e55400ce-a20a-4bc3-8792-b8e3f5449248', '40e72184-ff85-456b-a794-35967bc8cfb2', '4bb4a413-a23c-49fe-bec7-d8621560205b', 'd122f267-a255-4eed-afa1-8252e2cbdc17', '1d76bc02-2621-4041-914a-9b9e28babeff', '63642695-9e9a-458c-97a2-a9db8cc1c9ed', '512ccd7d-a2ed-4613-9c53-232c85a50878', 'ee9db5f9-de43-4c3e-a156-d5ba8ba1f6b3', 'd8addf29-1054-4ccb-b179-dda65f7fefdd', '931810bc-cd6d-4bb6-a77a-7c979fbac3fd', 'f8af8717-40ea-40f1-b721-b627163f9efb', 'c5734165-3e62-4f14-9559-1e5caec18179', '0b94bcdf-dc51-495a-b7bd-2c317f2778a8', 'c912b33b-c624-41e5-987e-125d7f78f1f5', 'a0dd1e6d-ccf8-4f09-bbd1-e4fcbdf438ac', '78799ee1-c078-4dc8-a843-1df3aaad5960', 'c1ba934d-cf6c-462d-af28-9cc92ab24937', 'a2291231-46dd-407e-a228-135ad229e056', '4c388b98-8aca-4c28-bd43-74ab84d438f2', 'dd7d906c-8a34-4ae9-ac75-633150b39869', '01047fb0-4036-4dcd-81e0-62e6cef4e3e8', '1c27a9d5-3412-41f7-bde2-87eba4abcb5e', '9161f466-8e7e-493d-bd61-d647755dbc53', '5a7af96b-f6b1-4b6c-9f6b-c5f33377b1df', '3a4b2c17-4235-4982-b275-4018c5283ab4', 'ed3f6f88-a4ad-4b56-9d31-8d086ddd1521', '40e1a551-3a18-4483-9f36-435ccbae03d6', '9c236fd5-0cd3-49dd-9cbb-500905805c74', 'b0c8f4fc-ec98-493b-a560-3c21e652319d', '413e5c40-5f12-47c2-9142-4ee27ff61372', '4493db22-1f7c-4e10-89c4-9663133d87ef', '5b7af866-43fc-4071-9878-b366df20993e', 'c1710528-c7c8-4118-b8a3-5c041eb7e9ab', '1d5c6bc2-1f15-447a-a139-0437720034dd', '837c4cd1-4324-44eb-8af3-5bcf2622d8be', '9fe2c75c-eef5-477b-af9e-1f0e8ef180a0', 'e31e5a23-68bf-4675-8f8d-054c99031df3', 'f67a5513-7c63-4243-afcf-84684202f901', 'd1c4b24d-2136-4e8d-8497-8ff1fced234d', '1ec09209-7e22-4698-981d-9cbb150c5746', 'a21318a9-a835-4e15-87b2-da893031b6ce', '297392f2-c6f3-4fef-8f8a-a4b5a11a93ba', '8ae49175-1941-4266-9ec4-20488802e24c', 'f2d90ee2-d68f-4445-8c64-bf87ab01fe0e', '9b9a9438-7ab1-4193-a5db-8d7b13c7cfca', 'a8cd5919-d517-4a37-90f5-e6b472eb0ac5', '4762f4c5-411c-4c86-8af4-324355546ab9', '54ade7ac-1df4-4ba5-b162-36d7e392f8cf', '5b78d187-c9db-477e-97ed-54574053ab2b', 'a4cd92f7-5f0b-43a0-a9fa-c13fab0c12ff', '1b13bd4b-724a-4900-92ac-58d1f19a2f6f', '9e2ddab2-fa29-4077-823c-40aaefe01299', '53e4b27d-e8a2-4733-8427-29355443ba48', '1038bfd5-c637-493f-95c7-f294b8a84fd6', 'ab8e7509-0aca-447d-939d-0f80da0b1bf9', '81d688b5-d3c2-4c83-8572-b82423525f5b', '05ac5752-da26-4347-a3d2-8b13b008fd70', '5c1b23af-e735-408d-94c3-c0146b19b180', '2b8503a8-9ed9-4481-9dd3-23beeee297e9', '8b9795dc-4b56-4271-afe8-ef382ea6bff2', '99f5e2be-1ceb-459b-b5fa-cdd468cb6197', 'a207d83f-6117-4874-80aa-44958693b8b6', '6584c17c-b804-4f94-ad34-cd8e63f76162', '8652d100-4702-42c7-9bdd-308d43a04282', 'd4805da1-cb4a-420a-95a4-daee6f7f0fd9', 'ea604298-bb16-4d84-b76a-ce6d7c8c3262', '5278b602-0ea2-4716-af69-9b1560a56ef6', 'b8581687-2b4e-4f2f-91a9-9128f5bec540', '5a456d15-0fdd-4418-a149-b13ebe2e3738', '94eaf60d-67fa-4081-a37e-048dfa107f36', '787fa83d-b044-4c30-a7dd-50551c9badb1', '4c3c10e6-6371-463a-af5f-3bb5212aa05e', 'a0587784-a863-47c3-84ae-0dfda49a4be0']

Next, how do we calculate the most popular location? in this dataset. Remember, popular here means the most number of providers practicing at that location.

%%sql
SELECT
    record->'npi' as provider,
    json_array_elements(record->'locations')->'uuid' as location
FROM provider_json
LIMIT 5;
 * postgresql://read_user:***@server/dbname
5 rows affected.
provider location
1669554267 8d96e755-65d3-470f-baaf-c20c73d673bb
1467524520 0decdff6-1703-4d4a-9821-caad045c2f70
1467524520 7974eec5-2760-4642-9ce8-b2401f33c61f
1467524520 3840e421-0e5b-498b-b937-2eb75ef3e09b
1467524520 48a62df8-f2f4-4320-905a-c67600804be6
%%sql
SELECT
    location::text,
    COUNT(provider) as num_providers
FROM (SELECT
        npi as provider,
        json_array_elements(record->'locations')->'uuid' as location
      FROM provider_json) as l
GROUP BY 1
ORDER BY COUNT(provider) DESC
LIMIT 1;
 * postgresql://read_user:***@server/dbname
1 rows affected.
location num_providers
"50c425a1-4cdd-49fe-9ce6-25fc85938262" 45

Now that we found the most popular location in this dataset, we are going to find all the providers who practise at this location and list out their insurances.

%%sql
SELECT
    provider
FROM(
SELECT
    record->'npi' as provider,
    json_array_elements(record->'locations') as location
FROM provider_json) as l
WHERE location->>'uuid' = '50c425a1-4cdd-49fe-9ce6-25fc85938262'
 * postgresql://read_user:***@server/dbname
45 rows affected.
provider
1104849025
1831410646
1417986258
1568649556
1023502028
1609040443
1669670618
1760408900
1861987174
1588158570
1043500259
1013320191
1588953020
1295932960
1497043533
1750427480
1518983733
1871511329
1124513841
1467455964
1497240899
1952854002
1316963630
1659471795
1134146632
1780776799
1871014654
1841786894
1063443570
1275641813
1588046437
1306367644
1760592794
1841546876
1114367794
1083137764
1639338114
1194246314
1992227045
1417185588
1689024234
1316236482
1518173194
1629096177
1922237858
%%sql
SELECT
    COUNT(distinct ins.insurances::text)
FROM(SELECT
        record->>'npi' as provider,
        json_array_elements(record->'insurances') as insurances
     FROM provider_json) as ins
INNER JOIN (SELECT
                provider
            FROM(SELECT
                    record->>'npi' as provider,
                    json_array_elements(record->'locations') as location
                 FROM provider_json) as l
                 WHERE location->>'uuid' = '50c425a1-4cdd-49fe-9ce6-25fc85938262'
           ) as loc
ON ins.provider = loc.provider
 * postgresql://read_user:***@server/dbname
1 rows affected.
count
361

Let’s clean this up a bit, by putting them into CTEs

%%sql
WITH location_cte as
(
    SELECT
        npi as provider,
        json_array_elements(record->'locations')->>'uuid' as location
    FROM provider_json
),
popular_location as
(
    SELECT
        location
    FROM location_cte as l
    GROUP BY 1
    ORDER BY COUNT(provider) DESC
    LIMIT 1  
),
providers_at_popular_location as
(
    SELECT
        provider
    FROM location_cte
    WHERE location IN (SELECT location FROM popular_location)  
)

SELECT
    COUNT(distinct ins.insurances::text)
FROM(SELECT
        record->>'npi' as provider,
        json_array_elements(record->'insurances') as insurances
     FROM provider_json) as ins
INNER JOIN providers_at_popular_location as pop_prov
ON ins.provider = pop_prov.provider::text
 * postgresql://read_user:***@server/dbname
1 rows affected.
count
361

Exploring providers_json table

What is the average age of the providers in providers_json?

%%sql
SELECT
    FLOOR(AVG((record ->> 'age')::int)) as age
FROM provider_json;
 * postgresql://read_user:***@server/dbname
1 rows affected.
age
57

Note: 68% of records have missing values for age.

%%sql
SELECT (COUNT(*) - COUNT(record ->> 'age')) * 1.0/ COUNT(*) as perc_missing
FROM provider_json;
 * postgresql://read_user:***@server/dbname
1 rows affected.
perc_missing
0.68327291558777835318
%%sql
SELECT
    sp.display as most_popular_specialty
FROM (SELECT
        json_array_elements_text(record -> 'specialties') as specialty
    FROM provider_json) as s
JOIN specialties as sp
ON s.specialty::uuid = sp.uuid
GROUP BY 1
ORDER BY COUNT(*) DESC
LIMIT 1
 * postgresql://read_user:***@server/dbname
1 rows affected.
most_popular_specialty
Internal Medicine

Providers practising at locations with high confidence score

Question: You’ll notice within all locations objects of the providers_json record field, there exists a “confidence” score. This is a representation of our model deciding the probability that a provider is indeed practicing at this location. We allow scores between 0 and 5. How many providers have at least 2 locations with confidence of 4 or higher?

%%sql
SELECT
    COUNT(DISTINCT provider) as num_providers_high_conf_score
FROM (SELECT
            provider::text
      FROM (SELECT
                record->'npi' as provider,
                json_array_elements(record->'locations') as location
            FROM provider_json) as l
      WHERE (location->>'confidence')::int>=4
      GROUP BY provider::text
      HAVING COUNT(location)>=2) as num_p
 * postgresql://read_user:***@server/dbname
1 rows affected.
num_providers_high_conf_score
2054

Question: In all provider records, you’ll see a field called “insurances”. This is the unique list of all insurance plans a provider accepts (we represent this as UUID which connects to the insurances table). For now, let’s assume a provider accepts all of these insurance plans at all locations they practice at. Find the total number of unique insurance plans accepted by all providers at the most ​popular​ location of this data set. (Popular = the most providers practice there)

%%sql
WITH location_cte as
(
    SELECT
        npi as provider,
        json_array_elements(record->'locations')->>'uuid' as location
    FROM provider_json
),
popular_location as
(
    SELECT
        location
    FROM location_cte as l
    GROUP BY 1
    ORDER BY COUNT(provider) DESC
    LIMIT 1  
),
providers_at_popular_location as
(
    SELECT
        provider
    FROM location_cte
    WHERE location IN (SELECT location FROM popular_location)  
)

SELECT
    COUNT(distinct ins.insurances::text) as total_unique_insurances_at_most_popular_location
FROM providers_at_popular_location as pop_prov
LEFT JOIN (SELECT
            record->>'npi' as provider,
            json_array_elements(record->'insurances') as insurances
           FROM provider_json) as ins
ON pop_prov.provider::text = ins.provider
 * postgresql://read_user:***@server/dbname
1 rows affected.
total_unique_insurances_at_most_popular_location
361

Exploring provider_puf_data

Now let’s take a look at another dataset which contains the following data.

  • provider_puf_data (npi bigint, source text, address json, plans json, created_at timestamp): This table contains a couple snapshots of government data we ingest. For this table, we ingested data in January and June of this year, you can leverage the “created_at” field to differentiate these times. The source field here represents a unique insurance broker. The address and plans fields represent the address and plans the broker has for this NPI. You can assume there is a uniqueness constraint on (npi, source, address, created_at). Thus, one source/broker could have many addresses and plan combinations for each NPI.
%%sql
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'provider_puf_data' AND table_schema = 'public';
 * postgresql://read_user:***@server/dbname
5 rows affected.
column_name data_type
npi bigint
source text
address json
plans json
created_at timestamp without time zone
%%sql
SELECT * FROM provider_puf_data LIMIT 2;
 * postgresql://read_user:***@server/dbname
2 rows affected.
npi source address plans created_at
1003038803 https://api.centene.com/ambetter/reference/cms-data-index.json {'zip': '98837', 'city': 'Moses Lake', 'phone': '5097655606', 'state': 'WA', 'address': '801 East Wheeler Road'} [{'years': [2018, 2019], 'plan_id': '61836WA0050002', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0050003', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0050006', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0050007', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0090002', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0090003', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0090005', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0100001', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}] 2019-06-25 00:00:00
1003038803 https://esbgatewaypub.medica.com:443/rest/QHP/cms-data-index.json?HIOSID=20305&fmt=json {'zip': '55009', 'city': 'Cannon Falls', 'phone': '5072634221', 'state': 'MN', 'address': '32021 County 24 Blvd'} [{'years': [2019], 'plan_id': '20305NE0040001', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2019], 'plan_id': '20305NE0040003', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2019], 'plan_id': '20305NE0040005', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2019], 'plan_id': '20305NE0040011', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2019], 'plan_id': '20305NE0040013', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2019], 'plan_id': '20305NE0040023', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}] 2019-01-08 00:00:00
%%sql
SELECT COUNT(DISTINCT source) FROM provider_puf_data;
 * postgresql://read_user:***@server/dbname
1 rows affected.
count
114
%%sql
SELECT
    source,
    COUNT(*)
FROM provider_puf_data
GROUP BY source
ORDER BY count desc
LIMIT 5;
 * postgresql://read_user:***@server/dbname
5 rows affected.
source count
https://www.bcbsil.com/ 6748
https://www.bcbstx.com/forms/tx/index_tx.JSON 6520
https://www.bcbsmt.com/forms/mt/index_mt.JSON 5669
https://www.bcbsnm.com/forms/nm/index_nm.JSON 5652
https://api.centene.com/ambetter/reference/cms-data-index.json 3225

Question: Who are the top 3 sources/brokers from ​provider_puf_data​ which had the most rows added or dropped between January and June. NOTE: If a source didn’t exist in January or June, treat that source like it has 0 rows for that date.

Check my other post

Additional PostgreSQL JSON functions

json_each() function

The json_each() function allows us to expand the outermost JSON object into a set of key-value pairs. See the following statement:

%%sql
SELECT * FROM provider_json LIMIT 1;
 * postgresql://read_user:***@server/dbname
1 rows affected.
npi record
1669554267 {'npi': '1669554267', 'first_name': 'Hyung', 'middle_name': 'Soon', 'last_name': 'Park', 'age': 83, 'gender': 'm', 'ratings_count': 1, 'ratings_avg': 2.0, 'degrees': [], 'specialties': ['c741df78-8460-46f6-a2ce-93de36eb7205'], 'languages': [], 'educations': [], 'insurances': [], 'provider_types': ['Doctor'], 'locations': [{'uuid': '8d96e755-65d3-470f-baaf-c20c73d673bb', 'name': None, 'address': '2296 Opitz Blvd # 255, Woodbridge, VA 22191, US', 'address_details': {'street': '2296 Opitz Blvd # 255', 'address_line_1': '2296 Opitz Blvd', 'address_line_2': '# 255', 'city': 'Woodbridge', 'state': 'VA', 'zip': '22191'}, 'latitude': 38.63739, 'longitude': -77.28514129999999, 'google_maps_link': 'https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US', 'phone_numbers': [{'phone': '7036704986', 'details': 'primary'}], 'confidence': 2}], 'online_profiles': [{'url': 'https://www.healthgrades.com/physician/dr-hyung-park-x3syc'}]}
%%sql
SELECT
    json_each(record)
FROM
    provider_json
LIMIT 20;
 * postgresql://read_user:***@server/dbname
20 rows affected.
json_each
(npi,"""1669554267""")
(first_name,"""Hyung""")
(middle_name,"""Soon""")
(last_name,"""Park""")
(age,83)
(gender,"""m""")
(ratings_count,1)
(ratings_avg,2.0)
(degrees,[])
(specialties,"[""c741df78-8460-46f6-a2ce-93de36eb7205""]")
(languages,[])
(educations,[])
(insurances,[])
(provider_types,"[""Doctor""]")
(locations,"[{""uuid"": ""8d96e755-65d3-470f-baaf-c20c73d673bb"", ""name"": null, ""address"": ""2296 Opitz Blvd # 255, Woodbridge, VA 22191, US"", ""address_details"": {""street"": ""2296 Opitz Blvd # 255"", ""address_line_1"": ""2296 Opitz Blvd"", ""address_line_2"": ""# 255"", ""city"": ""Woodbridge"", ""state"": ""VA"", ""zip"": ""22191""}, ""latitude"": 38.63739, ""longitude"": -77.28514129999999, ""google_maps_link"": ""https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US"", ""phone_numbers"": [{""phone"": ""7036704986"", ""details"": ""primary""}], ""confidence"": 2}]")
(online_profiles,"[{""url"": ""https://www.healthgrades.com/physician/dr-hyung-park-x3syc""}]")
(npi,"""1467524520""")
(first_name,"""Rohit""")
(middle_name,null)
(last_name,"""Bhaskar""")

json_each_text() function

If you want to get a set of key-value pairs as text, you use the json_each_text() function instead.

%%sql
SELECT
    json_each_text(record)
FROM
    provider_json
LIMIT 20;
 * postgresql://read_user:***@server/dbname
20 rows affected.
json_each_text
(npi,1669554267)
(first_name,Hyung)
(middle_name,Soon)
(last_name,Park)
(age,83)
(gender,m)
(ratings_count,1)
(ratings_avg,2.0)
(degrees,[])
(specialties,"[""c741df78-8460-46f6-a2ce-93de36eb7205""]")
(languages,[])
(educations,[])
(insurances,[])
(provider_types,"[""Doctor""]")
(locations,"[{""uuid"": ""8d96e755-65d3-470f-baaf-c20c73d673bb"", ""name"": null, ""address"": ""2296 Opitz Blvd # 255, Woodbridge, VA 22191, US"", ""address_details"": {""street"": ""2296 Opitz Blvd # 255"", ""address_line_1"": ""2296 Opitz Blvd"", ""address_line_2"": ""# 255"", ""city"": ""Woodbridge"", ""state"": ""VA"", ""zip"": ""22191""}, ""latitude"": 38.63739, ""longitude"": -77.28514129999999, ""google_maps_link"": ""https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US"", ""phone_numbers"": [{""phone"": ""7036704986"", ""details"": ""primary""}], ""confidence"": 2}]")
(online_profiles,"[{""url"": ""https://www.healthgrades.com/physician/dr-hyung-park-x3syc""}]")
(npi,1467524520)
(first_name,Rohit)
(middle_name,)
(last_name,Bhaskar)

json_object_keys() function

To get a set of keys in the outermost JSON object, you use the json_object_keys() function. The following query returns all keys of the nested record object. NOTE: While you may think that you can get the keys inside the locations field using this function, we can’t, since locations and specialties are both arrays. We will learn about arrays in detail.

%%sql
SELECT
    json_object_keys(record)
FROM
    provider_json
LIMIT 20;
 * postgresql://read_user:***@server/dbname
20 rows affected.
json_object_keys
npi
first_name
middle_name
last_name
age
gender
ratings_count
ratings_avg
degrees
specialties
languages
educations
insurances
provider_types
locations
online_profiles
npi
first_name
middle_name
last_name

json_typeof() function

The json_typeof() function returns type of the outermost JSON value as a string. It can be number, boolean, null, object, array, and string.

The following query return the data type of the record:

%%sql
SELECT
    json_typeof(record->'locations')
FROM provider_json
LIMIT 10;
 * postgresql://read_user:***@server/dbname
10 rows affected.
json_typeof
array
array
array
array
array
array
array
array
array
array

Appendix

Useful links:

Tags:

Updated: