How to work with JSON data using PostgreSQL
Postgres and JSON
How to work with JSON data using PostgreSQL?
I will address this question in this post. We will look at a real-world scenarios of a health-care company data and learn how to query JSON data using normal PostgreSQL.
History of JSON and Postgres: Postgres has had JSON support for a while, but to be honest it wasn’t that great due to a lack of indexing and key extractor methods. With the release of version 9.2, Postgres added native JSON support. You could finally use Postgres as a “NoSQL” database. In version 9.3, Postgres improved on that by adding additional constructor and extractor methods. 9.4 added the ability to store JSON as “Binary JSON” (or JSONB), which strips out insignificant whitespace (not a big deal), adds a tiny bit of overhead when inserting data, but provides a huge benefit when querying it: indexes.
What about MongoDB? I’m not going to argue that Postgres handles JSON as well as MongoDB. MongoDB was, afterall, specifically made as a JSON document store and has some pretty great features like the aggregation pipeline. But the truth is that Postgres now handles JSON pretty well.
Need for storing JSON data in the db
A lot of data is modeled quite well using a relational database. The reason for this is because website data tends to be relational. A user makes purchases and leaves reviews, a movie has actors which act in various movies, etc. However, there are use cases where it makes a lot of sense to incorporate a JSON document into your model. For example, the provider information is all stored inside one record for each provider. Some pros of this approach:
- One benefit of storing it this way, is that we can avoid complicated joins on data that is siloed or isolated. Having the data denormalized makes it possible to fetch a
provider
and it’s data with a single query. - Maintain data that comes from an external service in the same structure and format (as JSON) that it arrived to you as. What ends up in the database is exactly what the API provided. For instance, look at the response object that we got from the API below. It’s nested and has arrays. Instead of trying to normalize this data across five or more tables, you can store it as it is (and still query against it).
Now that we have gone over some of the benefits and use-cases for storing JSON data in Postgres, let’s take a look at how it’s actually done.
Connect to the database
Get your creds for connecting to the Postgres database and store them in a file like ribbon_db.cfg
. The contents of this file are shown here:
ribbon_db.cfg
[DB_CREDS]
USERNAME=
PASSWD=
HOST=
DB_NAME=
import configparser
config = configparser.ConfigParser()
config.read('ribbon_db.cfg')
conn = "postgresql://{}:{}@{}/{}".format(*config['DB_CREDS'].values())
%load_ext sql
%sql postgresql://user:passwd@dbserver.com/dbname
'Connected: user@dbname'
Data Schema
The schema mainly contains 4 tables.
provider_json (npi bigint, record json):
This table represents the most accurate data we have for a given provider. The NPI is the national provider identifier. The NPI is how we identify providers across the country. The record is a JSON blob representing all data we have for this provider. In order to access this data you will need to be able to query JSON data.insurances (uuid uuid, display text):
This table contains the unique insurance plans in our DB.specialties (uuid uuid, display text):
This table contains the unique specialties in our DB.provider_puf_data (npi bigint, source text, address json, plans json, created_at timestamp):
This table contains a couple snapshots of government data we ingest. For this table, we ingested data in January and June of this year, you can leverage the “created_at” field to differentiate these times. The source field here represents a unique insurance broker. The address and plans fields represent the address and plans the broker has for this NPI. You can assume there is a uniqueness constraint on (npi, source, address, created_at). Thus, one source/broker could have many addresses and plan combinations for each NPI.
First, let’s do some simple Exploratory data analysis on the given tables.
%%sql
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'provider_json' AND table_schema = 'public';
* postgresql://read_user:***@server/dbname
2 rows affected.
column_name | data_type |
---|---|
npi | bigint |
record | json |
%%sql
SELECT *
FROM provider_json
LIMIT 2;
* postgresql://read_user:***@server/dbname
2 rows affected.
npi | record |
---|---|
1669554267 | {'npi': '1669554267', 'first_name': 'Hyung', 'middle_name': 'Soon', 'last_name': 'Park', 'age': 83, 'gender': 'm', 'ratings_count': 1, 'ratings_avg': 2.0, 'degrees': [], 'specialties': ['c741df78-8460-46f6-a2ce-93de36eb7205'], 'languages': [], 'educations': [], 'insurances': [], 'provider_types': ['Doctor'], 'locations': [{'uuid': '8d96e755-65d3-470f-baaf-c20c73d673bb', 'name': None, 'address': '2296 Opitz Blvd # 255, Woodbridge, VA 22191, US', 'address_details': {'street': '2296 Opitz Blvd # 255', 'address_line_1': '2296 Opitz Blvd', 'address_line_2': '# 255', 'city': 'Woodbridge', 'state': 'VA', 'zip': '22191'}, 'latitude': 38.63739, 'longitude': -77.28514129999999, 'google_maps_link': 'https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US', 'phone_numbers': [{'phone': '7036704986', 'details': 'primary'}], 'confidence': 2}], 'online_profiles': [{'url': 'https://www.healthgrades.com/physician/dr-hyung-park-x3syc'}]} |
1467524520 | {'npi': '1467524520', 'first_name': 'Rohit', 'middle_name': None, 'last_name': 'Bhaskar', 'age': 57, 'gender': 'm', 'ratings_count': 35, 'ratings_avg': 9.07, 'degrees': ['MD'], 'specialties': ['dff54cb3-c9b0-43a0-8ca8-26c7c05bee83', '1de33770-eb1c-47fa-ab3e-f9a4ab924d9d', '76126f5d-723a-40bc-b4c1-7dcda477797a', '475b20fc-c188-4ca1-8e3a-b0614b8f8ce2'], 'languages': ['english'], 'educations': [{'education': {'name': 'Cornell University', 'uuid': 'c4588ea7-567d-43dc-a0e5-67937fb77f5c'}, 'type': 'Undergraduate School', 'year': 1982}, {'education': {'name': 'Baylor College of Medicine', 'uuid': 'b2ed4e48-6cb0-42d2-8804-7557b6452e8d'}, 'type': 'Residency', 'year': None}, {'education': {'name': 'Icahn School of Medicine at Mount Sinai', 'uuid': '16e70085-6b81-4581-b903-dce9d9acf74a'}, 'type': 'Medical School', 'year': None}, {'education': {'name': 'Baylor College of Medicine', 'uuid': 'b2ed4e48-6cb0-42d2-8804-7557b6452e8d'}, 'type': 'Internship', 'year': None}], 'insurances': ['5ad8ff7d-5227-46d7-a392-d28665610257', '3a56d135-dded-48c3-90db-a4f5d9da9786', 'dc059078-9115-488f-bdf6-e9106822225a', 'daaa1ac3-eb9d-424b-a249-8d7c51d39ce1', 'ae0b1dd0-7356-41ee-8df4-871a52451eb7', 'd280371e-f9f1-42a2-aed5-fcb0fe89c219', '3e1ac59e-223a-469d-bb30-63f6988e4134', '5da15840-6f46-4978-8747-8be580d48079', '9eb760da-dade-475d-9390-0c998ff50fcc', 'ad3204b8-1355-48f4-9ee3-194fc5d970f1', '2d5231c0-2d29-4026-9138-5dcbd0591404', '6ab9d1d3-b969-4352-9fac-68b6494cbe7b', '67f5d31d-88f6-4f7a-9b1d-74667ca37f08', '90e00746-69ed-430e-98f9-f637abcbe144', 'b10209f5-96d9-4bbc-8fda-38590460a34a', 'cfb5d792-2d33-444f-a1e8-5487de63e9a0', '51a98931-aace-42c1-8890-66ce53eb1971', 'a6e318c6-cf38-4d65-8bd0-6976b647974b', 'ba8cc746-665a-4925-8f5f-3b213dad238d', '36f607e8-2785-411e-8384-f17195f09468', 'cbbc9743-5cdf-4df2-92ca-fcde6a536160', 'a003a89c-eecb-470b-938d-1b97225f3d81', '791d592b-d6c0-4576-99d0-d78dac8496a1', 'f6a0b45a-042e-4886-82c6-b3b9e086cf83', '57ecd933-4bb8-4289-9080-a8259a6d683c', '6d2b08ec-2f67-4a5b-ac89-642f15e032b8', '749b0b00-ddf8-4cef-bc5b-5694cccbad0f', 'a7edd72c-4010-448e-8624-c92f997fb2a2', '730a2f48-0cb6-49b8-a5bb-807534ef0f24', '9d5266c8-a7b3-4d27-ab96-3a524b4cc491', 'f7627ef2-7d1d-4974-b934-2fdb29c0ff4f', '69af8b6c-06b3-4b66-91c5-ede90be026d8', '96292303-3e6e-4758-a851-f049d7956345', '61a6872e-3c89-4c93-b9b2-ca6ca1db7a8e', '2fdd9e5c-840a-40f1-a575-8ac6ec6094eb', 'd4357c57-76d9-4fda-94b6-16ca8869cd45', 'b1085d28-d17c-42c1-8448-fc56f5bf8ff8', 'ebf487b7-36a0-4ba9-84f6-c2f5f004e06a', '5b7ba09b-9a42-454a-84a6-bc1ba7449379', '84d17e89-55e7-4a08-8468-f52780b926e4', '2e7f2d0a-308e-43d4-9bfb-227a7571550f', 'e364db55-c32d-4e9d-957b-84bfbf62dfea', 'b7f67654-891f-4152-acc8-def976e30826', 'c94db5a7-e5f7-4d1f-9ff9-7ba8e683b57d', 'f2466b1a-782f-4cac-a204-cacb50b8f322', '6417d256-4fdb-4a60-903b-8f42b0405653', '4a1b53fe-79e5-464c-83d9-14143306c02c', 'eecb0179-615f-42cd-9fc0-8f7570f4de63', '0bcbfcff-0e3f-47bc-a32e-a52add577d5b', '9d4642de-8626-4889-8315-9e7ec4f1cd1f', '71e54292-85b7-4cff-a1f0-e06e92f64b5e', '672921ed-9c68-45a5-b808-3df8261d8ff0', 'de8b20b6-a2e3-460a-9018-9d89c009cef8', '40e72184-ff85-456b-a794-35967bc8cfb2', '4d893957-8038-409e-8c96-2e5edd626d45', 'b70e5e27-f14b-4a83-84d8-bb711b70886f', 'd122f267-a255-4eed-afa1-8252e2cbdc17', '1d76bc02-2621-4041-914a-9b9e28babeff', 'd308de64-b0a3-4a50-96b1-4aa813794a1d', '49841d97-6ede-48af-841f-8548d2930e18', 'ee9db5f9-de43-4c3e-a156-d5ba8ba1f6b3', '931810bc-cd6d-4bb6-a77a-7c979fbac3fd', 'c5734165-3e62-4f14-9559-1e5caec18179', 'a0dd1e6d-ccf8-4f09-bbd1-e4fcbdf438ac', 'c1ba934d-cf6c-462d-af28-9cc92ab24937', '4c388b98-8aca-4c28-bd43-74ab84d438f2', '01047fb0-4036-4dcd-81e0-62e6cef4e3e8', '1c27a9d5-3412-41f7-bde2-87eba4abcb5e', '9161f466-8e7e-493d-bd61-d647755dbc53', '46db02d5-a040-4ca5-9510-ef0591feca22', '3a4b2c17-4235-4982-b275-4018c5283ab4', 'ed3f6f88-a4ad-4b56-9d31-8d086ddd1521', 'dae8c52c-b0e0-4688-9078-6e5dcef9de03', '87842763-d404-418e-a265-fea0ef2cac50', '413e5c40-5f12-47c2-9142-4ee27ff61372', 'd54100c0-7990-4638-b57d-79df580f192f', '5b20ceac-5463-4255-b55d-c1a639cc3cc7', '239f0c59-4f28-43ab-9fde-ded6ad8e6cfe', '837c4cd1-4324-44eb-8af3-5bcf2622d8be', '1ec09209-7e22-4698-981d-9cbb150c5746', 'd1c4b24d-2136-4e8d-8497-8ff1fced234d', 'f67a5513-7c63-4243-afcf-84684202f901', '7dca5f3b-e88c-4c32-858f-f21f6fb1ff85', 'a21318a9-a835-4e15-87b2-da893031b6ce', '0486ad3e-5eb2-4589-8383-24bfcec56b47', 'faa0d1f5-388e-4bc4-b60f-baeab55a9930', '4762f4c5-411c-4c86-8af4-324355546ab9', '6a76f092-a95b-40ed-9bbc-c5f5dffb6495', '1038bfd5-c637-493f-95c7-f294b8a84fd6', '81d688b5-d3c2-4c83-8572-b82423525f5b', '05ac5752-da26-4347-a3d2-8b13b008fd70', 'd61b4d0a-4897-4403-afd6-d84fc909caa6', '5c1b23af-e735-408d-94c3-c0146b19b180', '2b8503a8-9ed9-4481-9dd3-23beeee297e9', 'e560745b-6da7-4b5f-b147-7b4f1fea3cd8', '2040857e-d4ad-417d-92ab-52e3c95e5cba', '6584c17c-b804-4f94-ad34-cd8e63f76162', '79cff8b6-cf98-4dcf-8b54-383fa2c43689', 'ea604298-bb16-4d84-b76a-ce6d7c8c3262', 'd4805da1-cb4a-420a-95a4-daee6f7f0fd9', '5278b602-0ea2-4716-af69-9b1560a56ef6', 'b8581687-2b4e-4f2f-91a9-9128f5bec540', '94eaf60d-67fa-4081-a37e-048dfa107f36', '787fa83d-b044-4c30-a7dd-50551c9badb1', '4c3c10e6-6371-463a-af5f-3bb5212aa05e', 'a0587784-a863-47c3-84ae-0dfda49a4be0', '80a261e0-4d0f-4e91-8a70-0c761fbe40ca'], 'provider_types': ['Doctor'], 'locations': [{'uuid': '0decdff6-1703-4d4a-9821-caad045c2f70', 'name': None, 'address': '3941 J St # 260, Sacramento, CA 95819, US', 'address_details': {'street': '3941 J St # 260', 'address_line_1': '3941 J St', 'address_line_2': '# 260', 'city': 'Sacramento', 'state': 'CA', 'zip': '95819'}, 'latitude': 38.5706552, 'longitude': -121.4535451, 'google_maps_link': 'https://www.google.com/maps/@38.5706552,-121.4535451?q=3941%20J%20St%20%23%20260%2C%20Sacramento%2C%20CA%2095819%2C%20US', 'phone_numbers': [{'phone': '9167362323', 'details': 'primary'}, {'phone': '9169845318', 'details': 'secondary'}], 'confidence': 4}, {'uuid': '7974eec5-2760-4642-9ce8-b2401f33c61f', 'name': None, 'address': '1580 Creekside Dr # 250, Folsom, CA 95630, US', 'address_details': {'street': '1580 Creekside Dr # 250', 'address_line_1': '1580 Creekside Dr', 'address_line_2': '# 250', 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.6687282, 'longitude': -121.1479608, 'google_maps_link': 'https://www.google.com/maps/@38.6687282,-121.1479608?q=1580%20Creekside%20Dr%20%23%20250%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169845318', 'details': 'primary'}], 'confidence': 1}, {'uuid': '3840e421-0e5b-498b-b937-2eb75ef3e09b', 'name': None, 'address': '8120 Timberlake Way # 108, Sacramento, CA 95823, US', 'address_details': {'street': '8120 Timberlake Way # 108', 'address_line_1': '8120 Timberlake Way', 'address_line_2': '# 108', 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4620284, 'longitude': -121.4144321, 'google_maps_link': 'https://www.google.com/maps/@38.4620284,-121.4144321?q=8120%20Timberlake%20Way%20%23%20108%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '9166885551', 'details': 'primary'}], 'confidence': 1}, {'uuid': '48a62df8-f2f4-4320-905a-c67600804be6', 'name': 'Mercy Hospital of Folsom', 'address': '1650 Creekside Dr, Folsom, CA 95630, US', 'address_details': {'street': '1650 Creekside Dr', 'address_line_1': '1650 Creekside Dr', 'address_line_2': None, 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.670333, 'longitude': -121.1458461, 'google_maps_link': 'https://www.google.com/maps/@38.670333,-121.1458461?q=1650%20Creekside%20Dr%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169837400', 'details': 'primary'}], 'confidence': 1}, {'uuid': 'f41bc9cd-e982-4022-9427-3a6bad872a2d', 'name': None, 'address': '7500 Hospital Dr, Sacramento, CA 95823, US', 'address_details': {'street': '7500 Hospital Dr', 'address_line_1': '7500 Hospital Dr', 'address_line_2': None, 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4638056, 'longitude': -121.4164937, 'google_maps_link': 'https://www.google.com/maps/@38.4638056,-121.4164937?q=7500%20Hospital%20Dr%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '8006774491', 'details': 'primary'}], 'confidence': 1}], 'online_profiles': [{'url': 'https://www.healthgrades.com/physician/dr-rohit-bhaskar-2hspx'}, {'url': 'https://www.vitals.com/doctors/Dr_Rohit_Bhaskar.html'}]} |
We will now see how to work with PostgreSQL JSON data type. In addition, we will learn some of the most common PostgreSQL JSON operators and functions for handling JSON data.
To query JSON data, you use the SELECT
statement, which is similar to querying other native data types. The output shown above is the result set returned by Postgres in the form of JSON.
Operators -> and ->>
Postgres provides two native operators ->
and ->>
to help query JSON data.
->
operator returns JSON object field by key.->>
operator returns JSON object field by text.
The following query uses the operator ->
to get the education and age of providers in the form of JSON.
%%sql
SELECT
record -> 'educations' as education,
record -> 'age' as age
FROM provider_json
LIMIT 2;
* postgresql://read_user:***@server/dbname
2 rows affected.
education | age |
---|---|
[] | 83 |
[{'education': {'name': 'Cornell University', 'uuid': 'c4588ea7-567d-43dc-a0e5-67937fb77f5c'}, 'type': 'Undergraduate School', 'year': 1982}, {'education': {'name': 'Baylor College of Medicine', 'uuid': 'b2ed4e48-6cb0-42d2-8804-7557b6452e8d'}, 'type': 'Residency', 'year': None}, {'education': {'name': 'Icahn School of Medicine at Mount Sinai', 'uuid': '16e70085-6b81-4581-b903-dce9d9acf74a'}, 'type': 'Medical School', 'year': None}, {'education': {'name': 'Baylor College of Medicine', 'uuid': 'b2ed4e48-6cb0-42d2-8804-7557b6452e8d'}, 'type': 'Internship', 'year': None}] | 57 |
Notice, that we are getting the data in the form of JSON field.
Next, the following operator ->>
gets the education and age of the providers in the form of Text.
%%sql
SELECT
record ->> 'educations' as education,
record ->> 'age' as age
FROM provider_json
LIMIT 2;
* postgresql://read_user:***@server/dbname
2 rows affected.
education | age |
---|---|
[] | 83 |
[{"education": {"name": "Cornell University", "uuid": "c4588ea7-567d-43dc-a0e5-67937fb77f5c"}, "type": "Undergraduate School", "year": 1982}, {"education": {"name": "Baylor College of Medicine", "uuid": "b2ed4e48-6cb0-42d2-8804-7557b6452e8d"}, "type": "Residency", "year": null}, {"education": {"name": "Icahn School of Medicine at Mount Sinai", "uuid": "16e70085-6b81-4581-b903-dce9d9acf74a"}, "type": "Medical School", "year": null}, {"education": {"name": "Baylor College of Medicine", "uuid": "b2ed4e48-6cb0-42d2-8804-7557b6452e8d"}, "type": "Internship", "year": null}] | 57 |
The output at first glance looks the same, but notice the year
has null
value instead of None
, this is because, it is returned as Text and not JSON object.
Chaining operators ->
and ->>
Because the ->
operator returns a JSON object, we can chain it with the operator ->>
to retrieve nested elements. For example, the following statement returns all educations.
%%sql
SELECT
record -> 'educations' ->> 'education' as colleges
FROM provider_json
LIMIT 2;
* postgresql://read_user:***@server/dbname
2 rows affected.
colleges |
---|
None |
None |
%%sql
SELECT
record -> 'educations' as educations
FROM provider_json
LIMIT 2;
* postgresql://read_user:***@server/dbname
2 rows affected.
educations |
---|
[] |
[{'education': {'name': 'Cornell University', 'uuid': 'c4588ea7-567d-43dc-a0e5-67937fb77f5c'}, 'type': 'Undergraduate School', 'year': 1982}, {'education': {'name': 'Baylor College of Medicine', 'uuid': 'b2ed4e48-6cb0-42d2-8804-7557b6452e8d'}, 'type': 'Residency', 'year': None}, {'education': {'name': 'Icahn School of Medicine at Mount Sinai', 'uuid': '16e70085-6b81-4581-b903-dce9d9acf74a'}, 'type': 'Medical School', 'year': None}, {'education': {'name': 'Baylor College of Medicine', 'uuid': 'b2ed4e48-6cb0-42d2-8804-7557b6452e8d'}, 'type': 'Internship', 'year': None}] |
The reason we got None when we were running the chain command is because we got back a list of educations. We need some other feature to access elements inside a list. We will learn how to deal with this later.
Functions
Use json_array_elements_text()
function to unnest the array
The specialties
column is a list of specialties for a given provider. In the below output you can see that we have a list of specialties. There another table which describes what each of these specialties mean. Our goal is to find out the most popular specialty of the providers in provider_json
table.
%%sql
SELECT
record -> 'npi' as npi,
record -> 'specialties' as specialties
FROM provider_json
LIMIT 10;
* postgresql://read_user:***@server/dbname
10 rows affected.
npi | specialties |
---|---|
1669554267 | ['c741df78-8460-46f6-a2ce-93de36eb7205'] |
1467524520 | ['dff54cb3-c9b0-43a0-8ca8-26c7c05bee83', '1de33770-eb1c-47fa-ab3e-f9a4ab924d9d', '76126f5d-723a-40bc-b4c1-7dcda477797a', '475b20fc-c188-4ca1-8e3a-b0614b8f8ce2'] |
1598949232 | ['8257ce19-3245-47b0-bdd5-eb07ca777513', '50b212c8-1888-492f-95db-9620ce29fd4e'] |
1538557251 | ['dc211bc8-16f5-4f55-bb90-df37d1f1a620'] |
1841451556 | ['8257ce19-3245-47b0-bdd5-eb07ca777513', '79d8040f-a85f-417f-8926-d00093b2860c'] |
1700847084 | ['c7fdb5ef-516c-46fa-986c-9ba13d3b072c', 'c72c0b39-6ded-46ca-ac15-e663ea7f2dec'] |
1295776458 | ['a77d23ba-29f1-4afd-a0c4-62d2f0444cf7'] |
1528440773 | ['9b9a5c64-3d99-4f23-b12c-a61c33ca9dc0'] |
1275517336 | ['475b20fc-c188-4ca1-8e3a-b0614b8f8ce2', '69562903-0290-41fd-9af5-049916ceee13'] |
1871502948 | ['d8f7fff0-9c5b-42e2-b011-376315d4be3d', '9c470211-e3ea-4963-9568-8d8e9b8f5c44'] |
%%sql
SELECT
record -> 'npi' as npi,
json_array_elements_text(record -> 'specialties') as specialties
FROM provider_json
LIMIT 10;
* postgresql://read_user:***@server/dbname
10 rows affected.
npi | specialties |
---|---|
1669554267 | c741df78-8460-46f6-a2ce-93de36eb7205 |
1467524520 | dff54cb3-c9b0-43a0-8ca8-26c7c05bee83 |
1467524520 | 1de33770-eb1c-47fa-ab3e-f9a4ab924d9d |
1467524520 | 76126f5d-723a-40bc-b4c1-7dcda477797a |
1467524520 | 475b20fc-c188-4ca1-8e3a-b0614b8f8ce2 |
1598949232 | 8257ce19-3245-47b0-bdd5-eb07ca777513 |
1598949232 | 50b212c8-1888-492f-95db-9620ce29fd4e |
1538557251 | dc211bc8-16f5-4f55-bb90-df37d1f1a620 |
1841451556 | 8257ce19-3245-47b0-bdd5-eb07ca777513 |
1841451556 | 79d8040f-a85f-417f-8926-d00093b2860c |
NOTE: json_array_elements_text() creates a temporary table
This unnests each of these array into its own row. But we are not done yet, we need to group these values and count them, so that we can find out the most popular specialty. What we are essentially doing here is that we are creating a table on the fly using the json_array_elements_text()
function.
Next, we need to get the specialty name by joining this temp table with specialities
table. Shown below are the column types for this table.
%%sql
SELECT * FROM specialties LIMIT 5;
* postgresql://read_user:***@server/dbname
5 rows affected.
uuid | display |
---|---|
2a6ea574-686a-4c60-b215-508c7c192203 | Foster Care Agency |
fa710988-0955-422d-9ffd-0924c2b9e16c | Home Health Aide Agency |
ad840de3-8ec1-45ad-a3d5-858669a0f3b6 | Multi-Specialty |
875a24ea-9748-439a-8dcb-4aea82c18883 | In Home Supportive Care Specialist |
9b9a5c64-3d99-4f23-b12c-a61c33ca9dc0 | Student |
%%sql
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'specialties' AND table_schema = 'public';
* postgresql://read_user:***@server/dbname
2 rows affected.
column_name | data_type |
---|---|
uuid | uuid |
display | text |
Cast to UUID
type before joining: In order to calculate the most popular specialty, we first create the temp table using json_array_elements_text()
and alias the column as specialty and the temp table as s. We then JOIN this temp table with specialties
table aliased as sp.
NOTE: The specialties column in our temp table contains a text representation of the UUID, so in order for us to join, we need to cast this into a UUID type. Finally, we group by the display name and show only the most popular specialty.
%%sql
SELECT
sp.display as most_popular_specialty
FROM (SELECT
json_array_elements_text(record -> 'specialties') as specialty
FROM provider_json) as s
JOIN specialties as sp
ON s.specialty::uuid = sp.uuid
GROUP BY 1
ORDER BY COUNT(*) DESC
LIMIT 1
* postgresql://read_user:***@server/dbname
1 rows affected.
most_popular_specialty |
---|
Internal Medicine |
json_array_elements()
to expand the array
As I have alluded to above, working with arrays comes up time and again when dealing with JSON objects. A lot of many-to-one relations are captured this way. Now we will dive into looking at locations array which contains location details for each provider. As you can see, locations is an array which contains location json objects.
Now let’s take a look at the locations field of providers_json
table.
%%sql
SELECT
record -> 'locations' as locations
FROM provider_json
LIMIT 2;
* postgresql://read_user:***@server/dbname
2 rows affected.
locations |
---|
[{'uuid': '8d96e755-65d3-470f-baaf-c20c73d673bb', 'name': None, 'address': '2296 Opitz Blvd # 255, Woodbridge, VA 22191, US', 'address_details': {'street': '2296 Opitz Blvd # 255', 'address_line_1': '2296 Opitz Blvd', 'address_line_2': '# 255', 'city': 'Woodbridge', 'state': 'VA', 'zip': '22191'}, 'latitude': 38.63739, 'longitude': -77.28514129999999, 'google_maps_link': 'https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US', 'phone_numbers': [{'phone': '7036704986', 'details': 'primary'}], 'confidence': 2}] |
[{'uuid': '0decdff6-1703-4d4a-9821-caad045c2f70', 'name': None, 'address': '3941 J St # 260, Sacramento, CA 95819, US', 'address_details': {'street': '3941 J St # 260', 'address_line_1': '3941 J St', 'address_line_2': '# 260', 'city': 'Sacramento', 'state': 'CA', 'zip': '95819'}, 'latitude': 38.5706552, 'longitude': -121.4535451, 'google_maps_link': 'https://www.google.com/maps/@38.5706552,-121.4535451?q=3941%20J%20St%20%23%20260%2C%20Sacramento%2C%20CA%2095819%2C%20US', 'phone_numbers': [{'phone': '9167362323', 'details': 'primary'}, {'phone': '9169845318', 'details': 'secondary'}], 'confidence': 4}, {'uuid': '7974eec5-2760-4642-9ce8-b2401f33c61f', 'name': None, 'address': '1580 Creekside Dr # 250, Folsom, CA 95630, US', 'address_details': {'street': '1580 Creekside Dr # 250', 'address_line_1': '1580 Creekside Dr', 'address_line_2': '# 250', 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.6687282, 'longitude': -121.1479608, 'google_maps_link': 'https://www.google.com/maps/@38.6687282,-121.1479608?q=1580%20Creekside%20Dr%20%23%20250%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169845318', 'details': 'primary'}], 'confidence': 1}, {'uuid': '3840e421-0e5b-498b-b937-2eb75ef3e09b', 'name': None, 'address': '8120 Timberlake Way # 108, Sacramento, CA 95823, US', 'address_details': {'street': '8120 Timberlake Way # 108', 'address_line_1': '8120 Timberlake Way', 'address_line_2': '# 108', 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4620284, 'longitude': -121.4144321, 'google_maps_link': 'https://www.google.com/maps/@38.4620284,-121.4144321?q=8120%20Timberlake%20Way%20%23%20108%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '9166885551', 'details': 'primary'}], 'confidence': 1}, {'uuid': '48a62df8-f2f4-4320-905a-c67600804be6', 'name': 'Mercy Hospital of Folsom', 'address': '1650 Creekside Dr, Folsom, CA 95630, US', 'address_details': {'street': '1650 Creekside Dr', 'address_line_1': '1650 Creekside Dr', 'address_line_2': None, 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.670333, 'longitude': -121.1458461, 'google_maps_link': 'https://www.google.com/maps/@38.670333,-121.1458461?q=1650%20Creekside%20Dr%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169837400', 'details': 'primary'}], 'confidence': 1}, {'uuid': 'f41bc9cd-e982-4022-9427-3a6bad872a2d', 'name': None, 'address': '7500 Hospital Dr, Sacramento, CA 95823, US', 'address_details': {'street': '7500 Hospital Dr', 'address_line_1': '7500 Hospital Dr', 'address_line_2': None, 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4638056, 'longitude': -121.4164937, 'google_maps_link': 'https://www.google.com/maps/@38.4638056,-121.4164937?q=7500%20Hospital%20Dr%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '8006774491', 'details': 'primary'}], 'confidence': 1}] |
%%sql
SELECT
record->'locations' as locations
FROM provider_json
LIMIT 2;
* postgresql://read_user:***@server/dbname
2 rows affected.
locations |
---|
[{'uuid': '8d96e755-65d3-470f-baaf-c20c73d673bb', 'name': None, 'address': '2296 Opitz Blvd # 255, Woodbridge, VA 22191, US', 'address_details': {'street': '2296 Opitz Blvd # 255', 'address_line_1': '2296 Opitz Blvd', 'address_line_2': '# 255', 'city': 'Woodbridge', 'state': 'VA', 'zip': '22191'}, 'latitude': 38.63739, 'longitude': -77.28514129999999, 'google_maps_link': 'https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US', 'phone_numbers': [{'phone': '7036704986', 'details': 'primary'}], 'confidence': 2}] |
[{'uuid': '0decdff6-1703-4d4a-9821-caad045c2f70', 'name': None, 'address': '3941 J St # 260, Sacramento, CA 95819, US', 'address_details': {'street': '3941 J St # 260', 'address_line_1': '3941 J St', 'address_line_2': '# 260', 'city': 'Sacramento', 'state': 'CA', 'zip': '95819'}, 'latitude': 38.5706552, 'longitude': -121.4535451, 'google_maps_link': 'https://www.google.com/maps/@38.5706552,-121.4535451?q=3941%20J%20St%20%23%20260%2C%20Sacramento%2C%20CA%2095819%2C%20US', 'phone_numbers': [{'phone': '9167362323', 'details': 'primary'}, {'phone': '9169845318', 'details': 'secondary'}], 'confidence': 4}, {'uuid': '7974eec5-2760-4642-9ce8-b2401f33c61f', 'name': None, 'address': '1580 Creekside Dr # 250, Folsom, CA 95630, US', 'address_details': {'street': '1580 Creekside Dr # 250', 'address_line_1': '1580 Creekside Dr', 'address_line_2': '# 250', 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.6687282, 'longitude': -121.1479608, 'google_maps_link': 'https://www.google.com/maps/@38.6687282,-121.1479608?q=1580%20Creekside%20Dr%20%23%20250%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169845318', 'details': 'primary'}], 'confidence': 1}, {'uuid': '3840e421-0e5b-498b-b937-2eb75ef3e09b', 'name': None, 'address': '8120 Timberlake Way # 108, Sacramento, CA 95823, US', 'address_details': {'street': '8120 Timberlake Way # 108', 'address_line_1': '8120 Timberlake Way', 'address_line_2': '# 108', 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4620284, 'longitude': -121.4144321, 'google_maps_link': 'https://www.google.com/maps/@38.4620284,-121.4144321?q=8120%20Timberlake%20Way%20%23%20108%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '9166885551', 'details': 'primary'}], 'confidence': 1}, {'uuid': '48a62df8-f2f4-4320-905a-c67600804be6', 'name': 'Mercy Hospital of Folsom', 'address': '1650 Creekside Dr, Folsom, CA 95630, US', 'address_details': {'street': '1650 Creekside Dr', 'address_line_1': '1650 Creekside Dr', 'address_line_2': None, 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.670333, 'longitude': -121.1458461, 'google_maps_link': 'https://www.google.com/maps/@38.670333,-121.1458461?q=1650%20Creekside%20Dr%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169837400', 'details': 'primary'}], 'confidence': 1}, {'uuid': 'f41bc9cd-e982-4022-9427-3a6bad872a2d', 'name': None, 'address': '7500 Hospital Dr, Sacramento, CA 95823, US', 'address_details': {'street': '7500 Hospital Dr', 'address_line_1': '7500 Hospital Dr', 'address_line_2': None, 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4638056, 'longitude': -121.4164937, 'google_maps_link': 'https://www.google.com/maps/@38.4638056,-121.4164937?q=7500%20Hospital%20Dr%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '8006774491', 'details': 'primary'}], 'confidence': 1}] |
Each of these location objects have the following fields:
%%sql
SELECT
json_object_keys(record->'locations'->-1)
FROM provider_json
LIMIT 10;
* postgresql://read_user:***@server/dbname
10 rows affected.
json_object_keys |
---|
uuid |
name |
address |
address_details |
latitude |
longitude |
google_maps_link |
phone_numbers |
confidence |
uuid |
Similar to how we expanded the specialties array, we will be expanding this locations array into their own records using json_array_elements()
function. Now, for each provider, we have one row for each of the locations where that provider works.
Let’s say our objective is to keep the providers who have atleast 2 locations with confidence >= 4. More specifically: How many providers have at least 2 locations with confidence of 4 or higher?
Using json_array_elements()
on the locations field, we first build out our intermediate table. As shown here are the first 5 rows. Observe that we have locations with confidence scores ranging from 0-5.
%%sql
SELECT
record->'npi' as provider,
json_array_elements(record->'locations') as location
FROM provider_json
LIMIT 5;
* postgresql://read_user:***@server/dbname
5 rows affected.
provider | location |
---|---|
1669554267 | {'uuid': '8d96e755-65d3-470f-baaf-c20c73d673bb', 'name': None, 'address': '2296 Opitz Blvd # 255, Woodbridge, VA 22191, US', 'address_details': {'street': '2296 Opitz Blvd # 255', 'address_line_1': '2296 Opitz Blvd', 'address_line_2': '# 255', 'city': 'Woodbridge', 'state': 'VA', 'zip': '22191'}, 'latitude': 38.63739, 'longitude': -77.28514129999999, 'google_maps_link': 'https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US', 'phone_numbers': [{'phone': '7036704986', 'details': 'primary'}], 'confidence': 2} |
1467524520 | {'uuid': '0decdff6-1703-4d4a-9821-caad045c2f70', 'name': None, 'address': '3941 J St # 260, Sacramento, CA 95819, US', 'address_details': {'street': '3941 J St # 260', 'address_line_1': '3941 J St', 'address_line_2': '# 260', 'city': 'Sacramento', 'state': 'CA', 'zip': '95819'}, 'latitude': 38.5706552, 'longitude': -121.4535451, 'google_maps_link': 'https://www.google.com/maps/@38.5706552,-121.4535451?q=3941%20J%20St%20%23%20260%2C%20Sacramento%2C%20CA%2095819%2C%20US', 'phone_numbers': [{'phone': '9167362323', 'details': 'primary'}, {'phone': '9169845318', 'details': 'secondary'}], 'confidence': 4} |
1467524520 | {'uuid': '7974eec5-2760-4642-9ce8-b2401f33c61f', 'name': None, 'address': '1580 Creekside Dr # 250, Folsom, CA 95630, US', 'address_details': {'street': '1580 Creekside Dr # 250', 'address_line_1': '1580 Creekside Dr', 'address_line_2': '# 250', 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.6687282, 'longitude': -121.1479608, 'google_maps_link': 'https://www.google.com/maps/@38.6687282,-121.1479608?q=1580%20Creekside%20Dr%20%23%20250%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169845318', 'details': 'primary'}], 'confidence': 1} |
1467524520 | {'uuid': '3840e421-0e5b-498b-b937-2eb75ef3e09b', 'name': None, 'address': '8120 Timberlake Way # 108, Sacramento, CA 95823, US', 'address_details': {'street': '8120 Timberlake Way # 108', 'address_line_1': '8120 Timberlake Way', 'address_line_2': '# 108', 'city': 'Sacramento', 'state': 'CA', 'zip': '95823'}, 'latitude': 38.4620284, 'longitude': -121.4144321, 'google_maps_link': 'https://www.google.com/maps/@38.4620284,-121.4144321?q=8120%20Timberlake%20Way%20%23%20108%2C%20Sacramento%2C%20CA%2095823%2C%20US', 'phone_numbers': [{'phone': '9166885551', 'details': 'primary'}], 'confidence': 1} |
1467524520 | {'uuid': '48a62df8-f2f4-4320-905a-c67600804be6', 'name': 'Mercy Hospital of Folsom', 'address': '1650 Creekside Dr, Folsom, CA 95630, US', 'address_details': {'street': '1650 Creekside Dr', 'address_line_1': '1650 Creekside Dr', 'address_line_2': None, 'city': 'Folsom', 'state': 'CA', 'zip': '95630'}, 'latitude': 38.670333, 'longitude': -121.1458461, 'google_maps_link': 'https://www.google.com/maps/@38.670333,-121.1458461?q=1650%20Creekside%20Dr%2C%20Folsom%2C%20CA%2095630%2C%20US', 'phone_numbers': [{'phone': '9169837400', 'details': 'primary'}], 'confidence': 1} |
Now, our next task is to peek into each of these location
objects and filter this intermediate table for rows containing confidence >= 4.
%%sql
SELECT
l.provider,
l.location
FROM (SELECT
record->'npi' as provider,
json_array_elements(record->'locations') as location
FROM provider_json) as l
WHERE (l.location->>'confidence')::int >= 4
LIMIT 5;
* postgresql://read_user:***@server/dbname
5 rows affected.
provider | location |
---|---|
1467524520 | {'uuid': '0decdff6-1703-4d4a-9821-caad045c2f70', 'name': None, 'address': '3941 J St # 260, Sacramento, CA 95819, US', 'address_details': {'street': '3941 J St # 260', 'address_line_1': '3941 J St', 'address_line_2': '# 260', 'city': 'Sacramento', 'state': 'CA', 'zip': '95819'}, 'latitude': 38.5706552, 'longitude': -121.4535451, 'google_maps_link': 'https://www.google.com/maps/@38.5706552,-121.4535451?q=3941%20J%20St%20%23%20260%2C%20Sacramento%2C%20CA%2095819%2C%20US', 'phone_numbers': [{'phone': '9167362323', 'details': 'primary'}, {'phone': '9169845318', 'details': 'secondary'}], 'confidence': 4} |
1598949232 | {'uuid': 'b4ffbd01-7c97-4d05-8b9f-33c883beb9cc', 'name': None, 'address': '1932 Alcoa Hwy # 270, Knoxville, TN 37920, US', 'address_details': {'street': '1932 Alcoa Hwy # 270', 'address_line_1': '1932 Alcoa Hwy', 'address_line_2': '# 270', 'city': 'Knoxville', 'state': 'TN', 'zip': '37920'}, 'latitude': 35.9391103, 'longitude': -83.94478629999999, 'google_maps_link': 'https://www.google.com/maps/@35.9391103,-83.94478629999999?q=1932%20Alcoa%20Hwy%20%23%20270%2C%20Knoxville%2C%20TN%2037920%2C%20US', 'phone_numbers': [{'phone': '8652514658', 'details': 'primary'}, {'phone': '8653056058', 'details': 'secondary'}, {'phone': '8656706199', 'details': 'secondary'}], 'confidence': 4} |
1598949232 | {'uuid': '0b922f49-2e0c-4964-9066-d371fc91d0ee', 'name': 'University of Tennessee Medical Center', 'address': '1924 Alcoa Hwy, Knoxville, TN 37920, US', 'address_details': {'street': '1924 Alcoa Hwy', 'address_line_1': '1924 Alcoa Hwy', 'address_line_2': None, 'city': 'Knoxville', 'state': 'TN', 'zip': '37920'}, 'latitude': 35.9398588, 'longitude': -83.9430702, 'google_maps_link': 'https://www.google.com/maps/@35.9398588,-83.9430702?q=1924%20Alcoa%20Hwy%2C%20Knoxville%2C%20TN%2037920%2C%20US', 'phone_numbers': [{'phone': '8652514658', 'details': 'primary'}, {'phone': '8653056058', 'details': 'secondary'}], 'confidence': 4} |
1598949232 | {'uuid': '08626d97-25f7-40b8-9d84-5561abf8bcf7', 'name': None, 'address': '5779 Creekwood Park Blvd, Lenoir City, TN 37772, US', 'address_details': {'street': '5779 Creekwood Park Blvd', 'address_line_1': '5779 Creekwood Park Blvd', 'address_line_2': None, 'city': 'Lenoir City', 'state': 'TN', 'zip': '37772'}, 'latitude': 35.837749, 'longitude': -84.271581, 'google_maps_link': 'https://www.google.com/maps/@35.837749,-84.271581?q=5779%20Creekwood%20Park%20Blvd%2C%20Lenoir%20City%2C%20TN%2037772%2C%20US', 'phone_numbers': [{'phone': '8652514658', 'details': 'primary'}, {'phone': '8652514659', 'details': 'secondary'}], 'confidence': 4} |
1700847084 | {'uuid': '8407a81c-d85d-4994-8c4e-3807869a0737', 'name': "North Hawaii Community Hospital, an Affiliate of The Queen's Health Systems", 'address': '67-1125 Mamalahoa Hwy # 110, Waimea, HI 96743, US', 'address_details': {'street': '67-1125 Mamalahoa Hwy # 110', 'address_line_1': '67-1125 Mamalahoa Hwy', 'address_line_2': '# 110', 'city': 'Waimea', 'state': 'HI', 'zip': '96743'}, 'latitude': 20.02287, 'longitude': -155.6650027, 'google_maps_link': 'https://www.google.com/maps/@20.02287,-155.6650027?q=67-1125%20Mamalahoa%20Hwy%20%23%20110%2C%20Waimea%2C%20HI%2096743%2C%20US', 'phone_numbers': [{'phone': '8085504939', 'details': 'primary'}], 'confidence': 4} |
The next constraint or requirement is that we want to keep only providers that have atleast 2 locations with high confidence. This can be achieved using a GROUP BY provider
and then keeping only groups that have atleast 2 locations with high confidence by using the HAVING
clause.
%%sql
SELECT
provider::text
FROM (SELECT
record->'npi' as provider,
json_array_elements(record->'locations') as location
FROM provider_json) as l
WHERE (location->>'confidence')::int>=4
GROUP BY provider::text
HAVING COUNT(location)>=2
LIMIT 5;
* postgresql://read_user:***@server/dbname
5 rows affected.
provider |
---|
"1194791335" |
"1508295841" |
"1407907256" |
"1669436309" |
"1821139510" |
Finally, we will count the number of providers that meet this criteria.
%%sql
SELECT
COUNT(DISTINCT provider) as num_providers
FROM (SELECT
provider::text
FROM (SELECT
record->'npi' as provider,
json_array_elements(record->'locations') as location
FROM provider_json) as l
WHERE (location->>'confidence')::int>=4
GROUP BY provider::text
HAVING COUNT(location)>=2) as num_p
* postgresql://read_user:***@server/dbname
1 rows affected.
num_providers |
---|
2054 |
Exploring arrays further
In all of these provider records, we see that there is an insurances field. Some providers have a lot of insurances that they accept. This is the unique list of all insurance plans a provider accepts (we represent this as UUID which connects to the insurances table).
Find the total number of unique insurance plans accepted by all providers at the most popular location of this data set. (Popular = the most providers practice there)
Answer: My first thought is to calculate the most popular location and then use that to filter rows containing that location. This will leave us with a list of providers practicing at that location and the insurances that they accept.
First, let’s take a peek at the insurances list for each provider.
%%sql
SELECT
npi,
record->'insurances' as insurances
FROM provider_json
LIMIT 5;
* postgresql://read_user:***@server/dbname
5 rows affected.
npi | insurances |
---|---|
1669554267 | [] |
1467524520 | ['5ad8ff7d-5227-46d7-a392-d28665610257', '3a56d135-dded-48c3-90db-a4f5d9da9786', 'dc059078-9115-488f-bdf6-e9106822225a', 'daaa1ac3-eb9d-424b-a249-8d7c51d39ce1', 'ae0b1dd0-7356-41ee-8df4-871a52451eb7', 'd280371e-f9f1-42a2-aed5-fcb0fe89c219', '3e1ac59e-223a-469d-bb30-63f6988e4134', '5da15840-6f46-4978-8747-8be580d48079', '9eb760da-dade-475d-9390-0c998ff50fcc', 'ad3204b8-1355-48f4-9ee3-194fc5d970f1', '2d5231c0-2d29-4026-9138-5dcbd0591404', '6ab9d1d3-b969-4352-9fac-68b6494cbe7b', '67f5d31d-88f6-4f7a-9b1d-74667ca37f08', '90e00746-69ed-430e-98f9-f637abcbe144', 'b10209f5-96d9-4bbc-8fda-38590460a34a', 'cfb5d792-2d33-444f-a1e8-5487de63e9a0', '51a98931-aace-42c1-8890-66ce53eb1971', 'a6e318c6-cf38-4d65-8bd0-6976b647974b', 'ba8cc746-665a-4925-8f5f-3b213dad238d', '36f607e8-2785-411e-8384-f17195f09468', 'cbbc9743-5cdf-4df2-92ca-fcde6a536160', 'a003a89c-eecb-470b-938d-1b97225f3d81', '791d592b-d6c0-4576-99d0-d78dac8496a1', 'f6a0b45a-042e-4886-82c6-b3b9e086cf83', '57ecd933-4bb8-4289-9080-a8259a6d683c', '6d2b08ec-2f67-4a5b-ac89-642f15e032b8', '749b0b00-ddf8-4cef-bc5b-5694cccbad0f', 'a7edd72c-4010-448e-8624-c92f997fb2a2', '730a2f48-0cb6-49b8-a5bb-807534ef0f24', '9d5266c8-a7b3-4d27-ab96-3a524b4cc491', 'f7627ef2-7d1d-4974-b934-2fdb29c0ff4f', '69af8b6c-06b3-4b66-91c5-ede90be026d8', '96292303-3e6e-4758-a851-f049d7956345', '61a6872e-3c89-4c93-b9b2-ca6ca1db7a8e', '2fdd9e5c-840a-40f1-a575-8ac6ec6094eb', 'd4357c57-76d9-4fda-94b6-16ca8869cd45', 'b1085d28-d17c-42c1-8448-fc56f5bf8ff8', 'ebf487b7-36a0-4ba9-84f6-c2f5f004e06a', '5b7ba09b-9a42-454a-84a6-bc1ba7449379', '84d17e89-55e7-4a08-8468-f52780b926e4', '2e7f2d0a-308e-43d4-9bfb-227a7571550f', 'e364db55-c32d-4e9d-957b-84bfbf62dfea', 'b7f67654-891f-4152-acc8-def976e30826', 'c94db5a7-e5f7-4d1f-9ff9-7ba8e683b57d', 'f2466b1a-782f-4cac-a204-cacb50b8f322', '6417d256-4fdb-4a60-903b-8f42b0405653', '4a1b53fe-79e5-464c-83d9-14143306c02c', 'eecb0179-615f-42cd-9fc0-8f7570f4de63', '0bcbfcff-0e3f-47bc-a32e-a52add577d5b', '9d4642de-8626-4889-8315-9e7ec4f1cd1f', '71e54292-85b7-4cff-a1f0-e06e92f64b5e', '672921ed-9c68-45a5-b808-3df8261d8ff0', 'de8b20b6-a2e3-460a-9018-9d89c009cef8', '40e72184-ff85-456b-a794-35967bc8cfb2', '4d893957-8038-409e-8c96-2e5edd626d45', 'b70e5e27-f14b-4a83-84d8-bb711b70886f', 'd122f267-a255-4eed-afa1-8252e2cbdc17', '1d76bc02-2621-4041-914a-9b9e28babeff', 'd308de64-b0a3-4a50-96b1-4aa813794a1d', '49841d97-6ede-48af-841f-8548d2930e18', 'ee9db5f9-de43-4c3e-a156-d5ba8ba1f6b3', '931810bc-cd6d-4bb6-a77a-7c979fbac3fd', 'c5734165-3e62-4f14-9559-1e5caec18179', 'a0dd1e6d-ccf8-4f09-bbd1-e4fcbdf438ac', 'c1ba934d-cf6c-462d-af28-9cc92ab24937', '4c388b98-8aca-4c28-bd43-74ab84d438f2', '01047fb0-4036-4dcd-81e0-62e6cef4e3e8', '1c27a9d5-3412-41f7-bde2-87eba4abcb5e', '9161f466-8e7e-493d-bd61-d647755dbc53', '46db02d5-a040-4ca5-9510-ef0591feca22', '3a4b2c17-4235-4982-b275-4018c5283ab4', 'ed3f6f88-a4ad-4b56-9d31-8d086ddd1521', 'dae8c52c-b0e0-4688-9078-6e5dcef9de03', '87842763-d404-418e-a265-fea0ef2cac50', '413e5c40-5f12-47c2-9142-4ee27ff61372', 'd54100c0-7990-4638-b57d-79df580f192f', '5b20ceac-5463-4255-b55d-c1a639cc3cc7', '239f0c59-4f28-43ab-9fde-ded6ad8e6cfe', '837c4cd1-4324-44eb-8af3-5bcf2622d8be', '1ec09209-7e22-4698-981d-9cbb150c5746', 'd1c4b24d-2136-4e8d-8497-8ff1fced234d', 'f67a5513-7c63-4243-afcf-84684202f901', '7dca5f3b-e88c-4c32-858f-f21f6fb1ff85', 'a21318a9-a835-4e15-87b2-da893031b6ce', '0486ad3e-5eb2-4589-8383-24bfcec56b47', 'faa0d1f5-388e-4bc4-b60f-baeab55a9930', '4762f4c5-411c-4c86-8af4-324355546ab9', '6a76f092-a95b-40ed-9bbc-c5f5dffb6495', '1038bfd5-c637-493f-95c7-f294b8a84fd6', '81d688b5-d3c2-4c83-8572-b82423525f5b', '05ac5752-da26-4347-a3d2-8b13b008fd70', 'd61b4d0a-4897-4403-afd6-d84fc909caa6', '5c1b23af-e735-408d-94c3-c0146b19b180', '2b8503a8-9ed9-4481-9dd3-23beeee297e9', 'e560745b-6da7-4b5f-b147-7b4f1fea3cd8', '2040857e-d4ad-417d-92ab-52e3c95e5cba', '6584c17c-b804-4f94-ad34-cd8e63f76162', '79cff8b6-cf98-4dcf-8b54-383fa2c43689', 'ea604298-bb16-4d84-b76a-ce6d7c8c3262', 'd4805da1-cb4a-420a-95a4-daee6f7f0fd9', '5278b602-0ea2-4716-af69-9b1560a56ef6', 'b8581687-2b4e-4f2f-91a9-9128f5bec540', '94eaf60d-67fa-4081-a37e-048dfa107f36', '787fa83d-b044-4c30-a7dd-50551c9badb1', '4c3c10e6-6371-463a-af5f-3bb5212aa05e', 'a0587784-a863-47c3-84ae-0dfda49a4be0', '80a261e0-4d0f-4e91-8a70-0c761fbe40ca'] |
1598949232 | ['2f18249b-fa80-48bd-bdf0-c02d13639fc9', '4a3d26e7-f309-4100-baed-60f39e0db3c9', '08533208-cdf3-4649-b0a3-3f05ca9f93be', '5ad8ff7d-5227-46d7-a392-d28665610257', '32fbf66a-07e0-412f-944c-4238c1366303', '0374d936-3b01-451c-bb4e-3d872830d202', 'dc059078-9115-488f-bdf6-e9106822225a', 'daaa1ac3-eb9d-424b-a249-8d7c51d39ce1', 'ae0b1dd0-7356-41ee-8df4-871a52451eb7', 'c656776f-2979-463a-9de3-1e5ae99d8ad2', '5495056f-faff-4144-b5f8-7fef1705f5ff', 'd280371e-f9f1-42a2-aed5-fcb0fe89c219', '3e1ac59e-223a-469d-bb30-63f6988e4134', '5da15840-6f46-4978-8747-8be580d48079', '9eb760da-dade-475d-9390-0c998ff50fcc', 'ad3204b8-1355-48f4-9ee3-194fc5d970f1', '2d5231c0-2d29-4026-9138-5dcbd0591404', '6ab9d1d3-b969-4352-9fac-68b6494cbe7b', 'd6ea2513-25df-41f8-9dbf-c56916e98041', 'b10209f5-96d9-4bbc-8fda-38590460a34a', '1e41cffb-e231-41a5-b088-5f356177d721', 'db401fa1-945b-4f0e-ac38-fdbbe590a798', '8517ef20-844d-4152-8690-1b8dda67e159', 'cfb5d792-2d33-444f-a1e8-5487de63e9a0', 'ba8cc746-665a-4925-8f5f-3b213dad238d', 'a003a89c-eecb-470b-938d-1b97225f3d81', '6fc564dc-38c4-4e83-8d4e-f991d17f3323', '842eb14d-997f-4036-bdd6-8bd6db0b9c9f', '5933a762-9e94-4b8b-bcc6-2d29d34acb7b', '3262782b-54c0-4d9d-b46e-9ea4ddfe11f5', 'ab996d6b-a0d7-4a21-91fb-498f2f446b6c', '5bea2eb5-a841-46b1-8952-0cfd3bdb1708', 'a7edd72c-4010-448e-8624-c92f997fb2a2', '730a2f48-0cb6-49b8-a5bb-807534ef0f24', '12403618-49d5-43ee-99ad-5e99194fe05c', 'f7627ef2-7d1d-4974-b934-2fdb29c0ff4f', '96292303-3e6e-4758-a851-f049d7956345', 'd4357c57-76d9-4fda-94b6-16ca8869cd45', '7dce5e40-f667-4b7c-b411-dc505f8d4143', '3577526f-463a-4af7-8696-4e31408c51e8', '534fe74b-3ce4-4bc1-8fba-ec5836f37d48', 'a3c11024-243e-41fb-94ba-2bba1de12e1c', '2e7f2d0a-308e-43d4-9bfb-227a7571550f', 'ac593265-f7d1-4088-8e39-5febc9ed3537', '7479e338-e4f2-43c2-b1ca-d33fb6727961', 'f5b8e6f3-7311-4d71-aec4-02088111f7b0', 'f2466b1a-782f-4cac-a204-cacb50b8f322', '6417d256-4fdb-4a60-903b-8f42b0405653', '4a1b53fe-79e5-464c-83d9-14143306c02c', 'eecb0179-615f-42cd-9fc0-8f7570f4de63', 'cebd0ac8-2c0e-4ee5-8ec7-f2480bc09fe8', '71e54292-85b7-4cff-a1f0-e06e92f64b5e', '672921ed-9c68-45a5-b808-3df8261d8ff0', 'eda43d1e-88e5-440e-8570-6f8197037ec8', '89209861-c3b0-4a50-9ccb-bd21eae71fbf', 'de8b20b6-a2e3-460a-9018-9d89c009cef8', 'e55400ce-a20a-4bc3-8792-b8e3f5449248', '4bb4a413-a23c-49fe-bec7-d8621560205b', '63642695-9e9a-458c-97a2-a9db8cc1c9ed', '512ccd7d-a2ed-4613-9c53-232c85a50878', 'ee9db5f9-de43-4c3e-a156-d5ba8ba1f6b3', 'c5734165-3e62-4f14-9559-1e5caec18179', '931810bc-cd6d-4bb6-a77a-7c979fbac3fd', 'f8af8717-40ea-40f1-b721-b627163f9efb', 'c912b33b-c624-41e5-987e-125d7f78f1f5', '0b94bcdf-dc51-495a-b7bd-2c317f2778a8', '78799ee1-c078-4dc8-a843-1df3aaad5960', 'a0dd1e6d-ccf8-4f09-bbd1-e4fcbdf438ac', 'd8addf29-1054-4ccb-b179-dda65f7fefdd', 'c1ba934d-cf6c-462d-af28-9cc92ab24937', 'a2291231-46dd-407e-a228-135ad229e056', '4c388b98-8aca-4c28-bd43-74ab84d438f2', '01047fb0-4036-4dcd-81e0-62e6cef4e3e8', '9161f466-8e7e-493d-bd61-d647755dbc53', '5a7af96b-f6b1-4b6c-9f6b-c5f33377b1df', '3a4b2c17-4235-4982-b275-4018c5283ab4', 'ed3f6f88-a4ad-4b56-9d31-8d086ddd1521', '40e1a551-3a18-4483-9f36-435ccbae03d6', 'b0c8f4fc-ec98-493b-a560-3c21e652319d', '413e5c40-5f12-47c2-9142-4ee27ff61372', '4493db22-1f7c-4e10-89c4-9663133d87ef', '5b7af866-43fc-4071-9878-b366df20993e', 'c1710528-c7c8-4118-b8a3-5c041eb7e9ab', '837c4cd1-4324-44eb-8af3-5bcf2622d8be', '9fe2c75c-eef5-477b-af9e-1f0e8ef180a0', '7534e133-0828-4dca-b121-778e5550e736', 'f67a5513-7c63-4243-afcf-84684202f901', 'd1c4b24d-2136-4e8d-8497-8ff1fced234d', '1ec09209-7e22-4698-981d-9cbb150c5746', '7dca5f3b-e88c-4c32-858f-f21f6fb1ff85', 'a21318a9-a835-4e15-87b2-da893031b6ce', '297392f2-c6f3-4fef-8f8a-a4b5a11a93ba', '8ae49175-1941-4266-9ec4-20488802e24c', 'f2d90ee2-d68f-4445-8c64-bf87ab01fe0e', 'd6ffded9-3495-4b70-8fda-cf88d95b19b9', '4762f4c5-411c-4c86-8af4-324355546ab9', 'a4cd92f7-5f0b-43a0-a9fa-c13fab0c12ff', '1b13bd4b-724a-4900-92ac-58d1f19a2f6f', 'cc730ea4-d7ee-4558-a2bd-0f50aea6e6eb', '53e4b27d-e8a2-4733-8427-29355443ba48', '1038bfd5-c637-493f-95c7-f294b8a84fd6', '81d688b5-d3c2-4c83-8572-b82423525f5b', '5c1b23af-e735-408d-94c3-c0146b19b180', 'd61b4d0a-4897-4403-afd6-d84fc909caa6', 'e560745b-6da7-4b5f-b147-7b4f1fea3cd8', '8b9795dc-4b56-4271-afe8-ef382ea6bff2', '99f5e2be-1ceb-459b-b5fa-cdd468cb6197', 'd4805da1-cb4a-420a-95a4-daee6f7f0fd9', 'ea604298-bb16-4d84-b76a-ce6d7c8c3262', 'b8581687-2b4e-4f2f-91a9-9128f5bec540', '5a456d15-0fdd-4418-a149-b13ebe2e3738', '94eaf60d-67fa-4081-a37e-048dfa107f36', '787fa83d-b044-4c30-a7dd-50551c9badb1', '4c3c10e6-6371-463a-af5f-3bb5212aa05e'] |
1538557251 | [] |
1841451556 | ['c4ab979a-e2ca-45f6-9e5d-07a49bd52d9e', '5ad8ff7d-5227-46d7-a392-d28665610257', '32fbf66a-07e0-412f-944c-4238c1366303', '0374d936-3b01-451c-bb4e-3d872830d202', 'dc059078-9115-488f-bdf6-e9106822225a', 'daaa1ac3-eb9d-424b-a249-8d7c51d39ce1', 'ae0b1dd0-7356-41ee-8df4-871a52451eb7', '5495056f-faff-4144-b5f8-7fef1705f5ff', 'c656776f-2979-463a-9de3-1e5ae99d8ad2', 'd280371e-f9f1-42a2-aed5-fcb0fe89c219', '3e1ac59e-223a-469d-bb30-63f6988e4134', '5da15840-6f46-4978-8747-8be580d48079', '9eb760da-dade-475d-9390-0c998ff50fcc', 'ad3204b8-1355-48f4-9ee3-194fc5d970f1', '2d5231c0-2d29-4026-9138-5dcbd0591404', 'e81b5005-c300-4b73-9c30-75a0a438e312', 'edfdd7b3-5732-48eb-9814-b0f28a6c90a4', '6ab9d1d3-b969-4352-9fac-68b6494cbe7b', '67f5d31d-88f6-4f7a-9b1d-74667ca37f08', '181a3fbb-0b05-4fd1-83ff-c8c596563534', 'a1ad7230-1275-447e-96e5-1ebfa1a9022b', 'd6ea2513-25df-41f8-9dbf-c56916e98041', 'b10209f5-96d9-4bbc-8fda-38590460a34a', '1e41cffb-e231-41a5-b088-5f356177d721', 'db401fa1-945b-4f0e-ac38-fdbbe590a798', '7ee3d179-6f3f-42fd-a466-0a42c36c3532', '8517ef20-844d-4152-8690-1b8dda67e159', 'cfb5d792-2d33-444f-a1e8-5487de63e9a0', '51a98931-aace-42c1-8890-66ce53eb1971', 'a6e318c6-cf38-4d65-8bd0-6976b647974b', 'adbfbd00-de76-4694-9786-3c53ba965936', 'ba8cc746-665a-4925-8f5f-3b213dad238d', '36f607e8-2785-411e-8384-f17195f09468', 'a003a89c-eecb-470b-938d-1b97225f3d81', '5933a762-9e94-4b8b-bcc6-2d29d34acb7b', '6d2b08ec-2f67-4a5b-ac89-642f15e032b8', '3262782b-54c0-4d9d-b46e-9ea4ddfe11f5', '6fa0da45-27ad-4f64-84c6-454c28e6d9bb', 'ab996d6b-a0d7-4a21-91fb-498f2f446b6c', 'c6df80aa-a974-433b-9d95-13fc3bbe3480', '5bea2eb5-a841-46b1-8952-0cfd3bdb1708', '730a2f48-0cb6-49b8-a5bb-807534ef0f24', '12403618-49d5-43ee-99ad-5e99194fe05c', '38c1e68c-7591-4244-a1a4-8ac014da340c', 'f5b0dad0-cbf2-4ad6-bdf7-cf5f26743b4c', '69af8b6c-06b3-4b66-91c5-ede90be026d8', '008e2415-00a0-4ff5-acea-ba8f4f136db3', '61a6872e-3c89-4c93-b9b2-ca6ca1db7a8e', 'd4357c57-76d9-4fda-94b6-16ca8869cd45', '7dce5e40-f667-4b7c-b411-dc505f8d4143', '3577526f-463a-4af7-8696-4e31408c51e8', 'ebf487b7-36a0-4ba9-84f6-c2f5f004e06a', '534fe74b-3ce4-4bc1-8fba-ec5836f37d48', '5b7ba09b-9a42-454a-84a6-bc1ba7449379', '84d17e89-55e7-4a08-8468-f52780b926e4', 'a3c11024-243e-41fb-94ba-2bba1de12e1c', '2e7f2d0a-308e-43d4-9bfb-227a7571550f', 'c94db5a7-e5f7-4d1f-9ff9-7ba8e683b57d', '7479e338-e4f2-43c2-b1ca-d33fb6727961', 'df6374fa-0f70-4fa7-a0b9-47c9233f9f14', 'f5b8e6f3-7311-4d71-aec4-02088111f7b0', '1584fed3-37f3-46a9-822e-b56f5c3a7367', 'f2466b1a-782f-4cac-a204-cacb50b8f322', '6417d256-4fdb-4a60-903b-8f42b0405653', '4a1b53fe-79e5-464c-83d9-14143306c02c', '2b9309d3-f2da-4d39-8384-187cf33b2a95', '37bc5e7f-9655-416a-9938-b10ea5ce3e9a', 'eecb0179-615f-42cd-9fc0-8f7570f4de63', '0bcbfcff-0e3f-47bc-a32e-a52add577d5b', '9d4642de-8626-4889-8315-9e7ec4f1cd1f', '71e54292-85b7-4cff-a1f0-e06e92f64b5e', '672921ed-9c68-45a5-b808-3df8261d8ff0', 'eda43d1e-88e5-440e-8570-6f8197037ec8', '89209861-c3b0-4a50-9ccb-bd21eae71fbf', 'de8b20b6-a2e3-460a-9018-9d89c009cef8', 'e55400ce-a20a-4bc3-8792-b8e3f5449248', '40e72184-ff85-456b-a794-35967bc8cfb2', '4bb4a413-a23c-49fe-bec7-d8621560205b', 'd122f267-a255-4eed-afa1-8252e2cbdc17', '1d76bc02-2621-4041-914a-9b9e28babeff', '63642695-9e9a-458c-97a2-a9db8cc1c9ed', '512ccd7d-a2ed-4613-9c53-232c85a50878', 'ee9db5f9-de43-4c3e-a156-d5ba8ba1f6b3', 'd8addf29-1054-4ccb-b179-dda65f7fefdd', '931810bc-cd6d-4bb6-a77a-7c979fbac3fd', 'f8af8717-40ea-40f1-b721-b627163f9efb', 'c5734165-3e62-4f14-9559-1e5caec18179', '0b94bcdf-dc51-495a-b7bd-2c317f2778a8', 'c912b33b-c624-41e5-987e-125d7f78f1f5', 'a0dd1e6d-ccf8-4f09-bbd1-e4fcbdf438ac', '78799ee1-c078-4dc8-a843-1df3aaad5960', 'c1ba934d-cf6c-462d-af28-9cc92ab24937', 'a2291231-46dd-407e-a228-135ad229e056', '4c388b98-8aca-4c28-bd43-74ab84d438f2', 'dd7d906c-8a34-4ae9-ac75-633150b39869', '01047fb0-4036-4dcd-81e0-62e6cef4e3e8', '1c27a9d5-3412-41f7-bde2-87eba4abcb5e', '9161f466-8e7e-493d-bd61-d647755dbc53', '5a7af96b-f6b1-4b6c-9f6b-c5f33377b1df', '3a4b2c17-4235-4982-b275-4018c5283ab4', 'ed3f6f88-a4ad-4b56-9d31-8d086ddd1521', '40e1a551-3a18-4483-9f36-435ccbae03d6', '9c236fd5-0cd3-49dd-9cbb-500905805c74', 'b0c8f4fc-ec98-493b-a560-3c21e652319d', '413e5c40-5f12-47c2-9142-4ee27ff61372', '4493db22-1f7c-4e10-89c4-9663133d87ef', '5b7af866-43fc-4071-9878-b366df20993e', 'c1710528-c7c8-4118-b8a3-5c041eb7e9ab', '1d5c6bc2-1f15-447a-a139-0437720034dd', '837c4cd1-4324-44eb-8af3-5bcf2622d8be', '9fe2c75c-eef5-477b-af9e-1f0e8ef180a0', 'e31e5a23-68bf-4675-8f8d-054c99031df3', 'f67a5513-7c63-4243-afcf-84684202f901', 'd1c4b24d-2136-4e8d-8497-8ff1fced234d', '1ec09209-7e22-4698-981d-9cbb150c5746', 'a21318a9-a835-4e15-87b2-da893031b6ce', '297392f2-c6f3-4fef-8f8a-a4b5a11a93ba', '8ae49175-1941-4266-9ec4-20488802e24c', 'f2d90ee2-d68f-4445-8c64-bf87ab01fe0e', '9b9a9438-7ab1-4193-a5db-8d7b13c7cfca', 'a8cd5919-d517-4a37-90f5-e6b472eb0ac5', '4762f4c5-411c-4c86-8af4-324355546ab9', '54ade7ac-1df4-4ba5-b162-36d7e392f8cf', '5b78d187-c9db-477e-97ed-54574053ab2b', 'a4cd92f7-5f0b-43a0-a9fa-c13fab0c12ff', '1b13bd4b-724a-4900-92ac-58d1f19a2f6f', '9e2ddab2-fa29-4077-823c-40aaefe01299', '53e4b27d-e8a2-4733-8427-29355443ba48', '1038bfd5-c637-493f-95c7-f294b8a84fd6', 'ab8e7509-0aca-447d-939d-0f80da0b1bf9', '81d688b5-d3c2-4c83-8572-b82423525f5b', '05ac5752-da26-4347-a3d2-8b13b008fd70', '5c1b23af-e735-408d-94c3-c0146b19b180', '2b8503a8-9ed9-4481-9dd3-23beeee297e9', '8b9795dc-4b56-4271-afe8-ef382ea6bff2', '99f5e2be-1ceb-459b-b5fa-cdd468cb6197', 'a207d83f-6117-4874-80aa-44958693b8b6', '6584c17c-b804-4f94-ad34-cd8e63f76162', '8652d100-4702-42c7-9bdd-308d43a04282', 'd4805da1-cb4a-420a-95a4-daee6f7f0fd9', 'ea604298-bb16-4d84-b76a-ce6d7c8c3262', '5278b602-0ea2-4716-af69-9b1560a56ef6', 'b8581687-2b4e-4f2f-91a9-9128f5bec540', '5a456d15-0fdd-4418-a149-b13ebe2e3738', '94eaf60d-67fa-4081-a37e-048dfa107f36', '787fa83d-b044-4c30-a7dd-50551c9badb1', '4c3c10e6-6371-463a-af5f-3bb5212aa05e', 'a0587784-a863-47c3-84ae-0dfda49a4be0'] |
Next, how do we calculate the most popular location? in this dataset. Remember, popular here means the most number of providers practicing at that location.
%%sql
SELECT
record->'npi' as provider,
json_array_elements(record->'locations')->'uuid' as location
FROM provider_json
LIMIT 5;
* postgresql://read_user:***@server/dbname
5 rows affected.
provider | location |
---|---|
1669554267 | 8d96e755-65d3-470f-baaf-c20c73d673bb |
1467524520 | 0decdff6-1703-4d4a-9821-caad045c2f70 |
1467524520 | 7974eec5-2760-4642-9ce8-b2401f33c61f |
1467524520 | 3840e421-0e5b-498b-b937-2eb75ef3e09b |
1467524520 | 48a62df8-f2f4-4320-905a-c67600804be6 |
%%sql
SELECT
location::text,
COUNT(provider) as num_providers
FROM (SELECT
npi as provider,
json_array_elements(record->'locations')->'uuid' as location
FROM provider_json) as l
GROUP BY 1
ORDER BY COUNT(provider) DESC
LIMIT 1;
* postgresql://read_user:***@server/dbname
1 rows affected.
location | num_providers |
---|---|
"50c425a1-4cdd-49fe-9ce6-25fc85938262" | 45 |
Now that we found the most popular location in this dataset, we are going to find all the providers who practise at this location and list out their insurances.
%%sql
SELECT
provider
FROM(
SELECT
record->'npi' as provider,
json_array_elements(record->'locations') as location
FROM provider_json) as l
WHERE location->>'uuid' = '50c425a1-4cdd-49fe-9ce6-25fc85938262'
* postgresql://read_user:***@server/dbname
45 rows affected.
provider |
---|
1104849025 |
1831410646 |
1417986258 |
1568649556 |
1023502028 |
1609040443 |
1669670618 |
1760408900 |
1861987174 |
1588158570 |
1043500259 |
1013320191 |
1588953020 |
1295932960 |
1497043533 |
1750427480 |
1518983733 |
1871511329 |
1124513841 |
1467455964 |
1497240899 |
1952854002 |
1316963630 |
1659471795 |
1134146632 |
1780776799 |
1871014654 |
1841786894 |
1063443570 |
1275641813 |
1588046437 |
1306367644 |
1760592794 |
1841546876 |
1114367794 |
1083137764 |
1639338114 |
1194246314 |
1992227045 |
1417185588 |
1689024234 |
1316236482 |
1518173194 |
1629096177 |
1922237858 |
%%sql
SELECT
COUNT(distinct ins.insurances::text)
FROM(SELECT
record->>'npi' as provider,
json_array_elements(record->'insurances') as insurances
FROM provider_json) as ins
INNER JOIN (SELECT
provider
FROM(SELECT
record->>'npi' as provider,
json_array_elements(record->'locations') as location
FROM provider_json) as l
WHERE location->>'uuid' = '50c425a1-4cdd-49fe-9ce6-25fc85938262'
) as loc
ON ins.provider = loc.provider
* postgresql://read_user:***@server/dbname
1 rows affected.
count |
---|
361 |
Let’s clean this up a bit, by putting them into CTEs
%%sql
WITH location_cte as
(
SELECT
npi as provider,
json_array_elements(record->'locations')->>'uuid' as location
FROM provider_json
),
popular_location as
(
SELECT
location
FROM location_cte as l
GROUP BY 1
ORDER BY COUNT(provider) DESC
LIMIT 1
),
providers_at_popular_location as
(
SELECT
provider
FROM location_cte
WHERE location IN (SELECT location FROM popular_location)
)
SELECT
COUNT(distinct ins.insurances::text)
FROM(SELECT
record->>'npi' as provider,
json_array_elements(record->'insurances') as insurances
FROM provider_json) as ins
INNER JOIN providers_at_popular_location as pop_prov
ON ins.provider = pop_prov.provider::text
* postgresql://read_user:***@server/dbname
1 rows affected.
count |
---|
361 |
Exploring providers_json
table
What is the average age of the providers in providers_json
?
%%sql
SELECT
FLOOR(AVG((record ->> 'age')::int)) as age
FROM provider_json;
* postgresql://read_user:***@server/dbname
1 rows affected.
age |
---|
57 |
Note: 68% of records have missing values for age
.
%%sql
SELECT (COUNT(*) - COUNT(record ->> 'age')) * 1.0/ COUNT(*) as perc_missing
FROM provider_json;
* postgresql://read_user:***@server/dbname
1 rows affected.
perc_missing |
---|
0.68327291558777835318 |
What is the most popular specialty of the providers in providers_json
?
%%sql
SELECT
sp.display as most_popular_specialty
FROM (SELECT
json_array_elements_text(record -> 'specialties') as specialty
FROM provider_json) as s
JOIN specialties as sp
ON s.specialty::uuid = sp.uuid
GROUP BY 1
ORDER BY COUNT(*) DESC
LIMIT 1
* postgresql://read_user:***@server/dbname
1 rows affected.
most_popular_specialty |
---|
Internal Medicine |
Providers practising at locations with high confidence score
Question: You’ll notice within all locations objects of the providers_json record field, there exists a “confidence” score. This is a representation of our model deciding the probability that a provider is indeed practicing at this location. We allow scores between 0 and 5. How many providers have at least 2 locations with confidence of 4 or higher?
%%sql
SELECT
COUNT(DISTINCT provider) as num_providers_high_conf_score
FROM (SELECT
provider::text
FROM (SELECT
record->'npi' as provider,
json_array_elements(record->'locations') as location
FROM provider_json) as l
WHERE (location->>'confidence')::int>=4
GROUP BY provider::text
HAVING COUNT(location)>=2) as num_p
* postgresql://read_user:***@server/dbname
1 rows affected.
num_providers_high_conf_score |
---|
2054 |
Number of unique insurances at most popular location
Question: In all provider records, you’ll see a field called “insurances”. This is the unique list of all insurance plans a provider accepts (we represent this as UUID which connects to the insurances table). For now, let’s assume a provider accepts all of these insurance plans at all locations they practice at. Find the total number of unique insurance plans accepted by all providers at the most popular location of this data set. (Popular = the most providers practice there)
%%sql
WITH location_cte as
(
SELECT
npi as provider,
json_array_elements(record->'locations')->>'uuid' as location
FROM provider_json
),
popular_location as
(
SELECT
location
FROM location_cte as l
GROUP BY 1
ORDER BY COUNT(provider) DESC
LIMIT 1
),
providers_at_popular_location as
(
SELECT
provider
FROM location_cte
WHERE location IN (SELECT location FROM popular_location)
)
SELECT
COUNT(distinct ins.insurances::text) as total_unique_insurances_at_most_popular_location
FROM providers_at_popular_location as pop_prov
LEFT JOIN (SELECT
record->>'npi' as provider,
json_array_elements(record->'insurances') as insurances
FROM provider_json) as ins
ON pop_prov.provider::text = ins.provider
* postgresql://read_user:***@server/dbname
1 rows affected.
total_unique_insurances_at_most_popular_location |
---|
361 |
Exploring provider_puf_data
Now let’s take a look at another dataset which contains the following data.
provider_puf_data (npi bigint, source text, address json, plans json, created_at timestamp)
: This table contains a couple snapshots of government data we ingest. For this table, we ingested data in January and June of this year, you can leverage the “created_at” field to differentiate these times. The source field here represents a unique insurance broker. The address and plans fields represent the address and plans the broker has for this NPI. You can assume there is a uniqueness constraint on (npi, source, address, created_at). Thus, one source/broker could have many addresses and plan combinations for each NPI.
%%sql
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'provider_puf_data' AND table_schema = 'public';
* postgresql://read_user:***@server/dbname
5 rows affected.
column_name | data_type |
---|---|
npi | bigint |
source | text |
address | json |
plans | json |
created_at | timestamp without time zone |
%%sql
SELECT * FROM provider_puf_data LIMIT 2;
* postgresql://read_user:***@server/dbname
2 rows affected.
npi | source | address | plans | created_at |
---|---|---|---|---|
1003038803 | https://api.centene.com/ambetter/reference/cms-data-index.json | {'zip': '98837', 'city': 'Moses Lake', 'phone': '5097655606', 'state': 'WA', 'address': '801 East Wheeler Road'} | [{'years': [2018, 2019], 'plan_id': '61836WA0050002', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0050003', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0050006', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0050007', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0090002', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0090003', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0090005', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2018, 2019], 'plan_id': '61836WA0100001', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}] | 2019-06-25 00:00:00 |
1003038803 | https://esbgatewaypub.medica.com:443/rest/QHP/cms-data-index.json?HIOSID=20305&fmt=json | {'zip': '55009', 'city': 'Cannon Falls', 'phone': '5072634221', 'state': 'MN', 'address': '32021 County 24 Blvd'} | [{'years': [2019], 'plan_id': '20305NE0040001', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2019], 'plan_id': '20305NE0040003', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2019], 'plan_id': '20305NE0040005', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2019], 'plan_id': '20305NE0040011', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2019], 'plan_id': '20305NE0040013', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}, {'years': [2019], 'plan_id': '20305NE0040023', 'network_tier': 'PREFERRED', 'plan_id_type': 'HIOS-PLAN-ID'}] | 2019-01-08 00:00:00 |
%%sql
SELECT COUNT(DISTINCT source) FROM provider_puf_data;
* postgresql://read_user:***@server/dbname
1 rows affected.
count |
---|
114 |
%%sql
SELECT
source,
COUNT(*)
FROM provider_puf_data
GROUP BY source
ORDER BY count desc
LIMIT 5;
* postgresql://read_user:***@server/dbname
5 rows affected.
source | count |
---|---|
https://www.bcbsil.com/ | 6748 |
https://www.bcbstx.com/forms/tx/index_tx.JSON | 6520 |
https://www.bcbsmt.com/forms/mt/index_mt.JSON | 5669 |
https://www.bcbsnm.com/forms/nm/index_nm.JSON | 5652 |
https://api.centene.com/ambetter/reference/cms-data-index.json | 3225 |
Question: Who are the top 3 sources/brokers from provider_puf_data which had the most rows added or dropped between January and June. NOTE: If a source didn’t exist in January or June, treat that source like it has 0 rows for that date.
Check my other post
Additional PostgreSQL JSON functions
json_each()
function
The json_each() function allows us to expand the outermost JSON object into a set of key-value pairs. See the following statement:
%%sql
SELECT * FROM provider_json LIMIT 1;
* postgresql://read_user:***@server/dbname
1 rows affected.
npi | record |
---|---|
1669554267 | {'npi': '1669554267', 'first_name': 'Hyung', 'middle_name': 'Soon', 'last_name': 'Park', 'age': 83, 'gender': 'm', 'ratings_count': 1, 'ratings_avg': 2.0, 'degrees': [], 'specialties': ['c741df78-8460-46f6-a2ce-93de36eb7205'], 'languages': [], 'educations': [], 'insurances': [], 'provider_types': ['Doctor'], 'locations': [{'uuid': '8d96e755-65d3-470f-baaf-c20c73d673bb', 'name': None, 'address': '2296 Opitz Blvd # 255, Woodbridge, VA 22191, US', 'address_details': {'street': '2296 Opitz Blvd # 255', 'address_line_1': '2296 Opitz Blvd', 'address_line_2': '# 255', 'city': 'Woodbridge', 'state': 'VA', 'zip': '22191'}, 'latitude': 38.63739, 'longitude': -77.28514129999999, 'google_maps_link': 'https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US', 'phone_numbers': [{'phone': '7036704986', 'details': 'primary'}], 'confidence': 2}], 'online_profiles': [{'url': 'https://www.healthgrades.com/physician/dr-hyung-park-x3syc'}]} |
%%sql
SELECT
json_each(record)
FROM
provider_json
LIMIT 20;
* postgresql://read_user:***@server/dbname
20 rows affected.
json_each |
---|
(npi,"""1669554267""") |
(first_name,"""Hyung""") |
(middle_name,"""Soon""") |
(last_name,"""Park""") |
(age,83) |
(gender,"""m""") |
(ratings_count,1) |
(ratings_avg,2.0) |
(degrees,[]) |
(specialties,"[""c741df78-8460-46f6-a2ce-93de36eb7205""]") |
(languages,[]) |
(educations,[]) |
(insurances,[]) |
(provider_types,"[""Doctor""]") |
(locations,"[{""uuid"": ""8d96e755-65d3-470f-baaf-c20c73d673bb"", ""name"": null, ""address"": ""2296 Opitz Blvd # 255, Woodbridge, VA 22191, US"", ""address_details"": {""street"": ""2296 Opitz Blvd # 255"", ""address_line_1"": ""2296 Opitz Blvd"", ""address_line_2"": ""# 255"", ""city"": ""Woodbridge"", ""state"": ""VA"", ""zip"": ""22191""}, ""latitude"": 38.63739, ""longitude"": -77.28514129999999, ""google_maps_link"": ""https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US"", ""phone_numbers"": [{""phone"": ""7036704986"", ""details"": ""primary""}], ""confidence"": 2}]") |
(online_profiles,"[{""url"": ""https://www.healthgrades.com/physician/dr-hyung-park-x3syc""}]") |
(npi,"""1467524520""") |
(first_name,"""Rohit""") |
(middle_name,null) |
(last_name,"""Bhaskar""") |
json_each_text()
function
If you want to get a set of key-value pairs as text, you use the json_each_text()
function instead.
%%sql
SELECT
json_each_text(record)
FROM
provider_json
LIMIT 20;
* postgresql://read_user:***@server/dbname
20 rows affected.
json_each_text |
---|
(npi,1669554267) |
(first_name,Hyung) |
(middle_name,Soon) |
(last_name,Park) |
(age,83) |
(gender,m) |
(ratings_count,1) |
(ratings_avg,2.0) |
(degrees,[]) |
(specialties,"[""c741df78-8460-46f6-a2ce-93de36eb7205""]") |
(languages,[]) |
(educations,[]) |
(insurances,[]) |
(provider_types,"[""Doctor""]") |
(locations,"[{""uuid"": ""8d96e755-65d3-470f-baaf-c20c73d673bb"", ""name"": null, ""address"": ""2296 Opitz Blvd # 255, Woodbridge, VA 22191, US"", ""address_details"": {""street"": ""2296 Opitz Blvd # 255"", ""address_line_1"": ""2296 Opitz Blvd"", ""address_line_2"": ""# 255"", ""city"": ""Woodbridge"", ""state"": ""VA"", ""zip"": ""22191""}, ""latitude"": 38.63739, ""longitude"": -77.28514129999999, ""google_maps_link"": ""https://www.google.com/maps/@38.63739,-77.28514129999999?q=2296%20Opitz%20Blvd%20%23%20255%2C%20Woodbridge%2C%20VA%2022191%2C%20US"", ""phone_numbers"": [{""phone"": ""7036704986"", ""details"": ""primary""}], ""confidence"": 2}]") |
(online_profiles,"[{""url"": ""https://www.healthgrades.com/physician/dr-hyung-park-x3syc""}]") |
(npi,1467524520) |
(first_name,Rohit) |
(middle_name,) |
(last_name,Bhaskar) |
json_object_keys()
function
To get a set of keys in the outermost JSON object, you use the json_object_keys() function. The following query returns all keys of the nested record
object. NOTE: While you may think that you can get the keys inside the locations
field using this function, we can’t, since locations and specialties are both arrays. We will learn about arrays in detail.
%%sql
SELECT
json_object_keys(record)
FROM
provider_json
LIMIT 20;
* postgresql://read_user:***@server/dbname
20 rows affected.
json_object_keys |
---|
npi |
first_name |
middle_name |
last_name |
age |
gender |
ratings_count |
ratings_avg |
degrees |
specialties |
languages |
educations |
insurances |
provider_types |
locations |
online_profiles |
npi |
first_name |
middle_name |
last_name |
json_typeof()
function
The json_typeof() function returns type of the outermost JSON value as a string. It can be number
, boolean
, null
, object
, array
, and string
.
The following query return the data type of the record:
%%sql
SELECT
json_typeof(record->'locations')
FROM provider_json
LIMIT 10;
* postgresql://read_user:***@server/dbname
10 rows affected.
json_typeof |
---|
array |
array |
array |
array |
array |
array |
array |
array |
array |
array |
Appendix
Useful links: