Working with Dates and Times in Postgresql

Working with dates and times in Postgres

Here we will explore how to work with Date-Time data. As the name suggests, date/time refers to columns that store dates and/or times. There are two main types:

  • date: Dates only include year, month and day.
  • timestamp: Timestamps include a date plus a time. Times are specified in terms of hours(0-24), minutes, and seconds.

There is also a third date/time type: interval. Intervals represent time durations. Examples: 6 days 01:48:06, 00:51:03 (51 minutes 3 seconds). Columns could be of type interval, but it more common to find them as the result of subtracting one date or timestamp from another. Intervals will default to display the number of days, if any, and the time. Eg: 1 day 21:57:30 or 07:48:34.

Need for ISO 8601

Date/time data can be diffcult to work with because people have multiple ways of recording dates. Take a look at the formats here:

To address this ambiguity, Postgres stores date/time information according to something called the ISO 8601 standard. ISO 8601 specifies one way to record date/time information. The units are listed in order from the largest to the smallest - just like we write numbers. YYYY-MM-DD HH:MM:SS.

Each component has a fixed number of digits, so smaller values must be padded with a leading zero.

Comparisons

Date/time entries can be compared with each other just as numbers can. We can get the current timestamp with now() function. This can be useful when comparing values to the current date and time.

Subtraction

In addition to comparing date, we can also subtract them from each other. The result type is interval.

Addition

We can also add integer values to dates. But we need to first let Postgres know that the string is a date by casting it.

Examples and tips

Cast to timestamp when comparing dates to timestamps

When you have a timestamp column like lets say: date_created, and we are trying to fetch all the records on a given date by comparing date_created with a date, then if you specify only a date in a comparison, you may get unexpected results. This query:

SELECT count(*)
  FROM evanston311
 WHERE date_created = '2018-01-02';

returns 0, even though there were 49 requests on January 2, 2018. This is because dates are automatically converted to timestamps when compared to a timestamp. The time fields are all set to zero:

SELECT '2018-01-02'::timestamp;

Output:

2018-01-02 00:00:00

When working with both timestamps and dates, you’ll need to keep this in mind.

Completion time by category

The evanston311 data includes a date_created timestamp from when each request was created and a date_completed timestamp for when it was completed. The difference between these tells us how long a request was open.

Which category of Evanston 311 requests takes the longest to complete?

-- Select the category and the average completion time by category
SELECT category,
       AVG(date_completed - date_created) AS completion_time
  FROM evanston311
 GROUP BY category
-- Order the results
 ORDER BY completion_time desc;

Date/time components and aggregation

As with numerical and character data, sometimes we need to extract components of a date/time, or truncate the value so that we can aggregate the data in a meaningful way. Functions exist to extract individual components of date/time data. These components are called fields. The fields are defined in the Postgres documentation. Many are based on the ISO 8601 standard. Let’s look at some common fields starting with the largest unit of time.

First, we can get the century or decade that a timestamp belongs in. Next, we can get the year, month, and day fields that make up a date. We can also get the hour, minute and second fields that make up a time. Week is the week number in the year - based on the ISO 8601 definition. DOW is day of the week - the week starts with Sunday (with a value of 0) and ends on Saturday (with a value of 6).

To extract these fields from a date/timestamp you can use either of these functions. Expects a timestamp, but they automatically convert dates to timestamps. These two functions give the same output. They just have different syntax.

  • date_part(): It uses a comma to separate arguments, the name of the field should be surrounded by single quotes.
  • EXTRACT(): It uses the FROM keyword, the name of the field can be unquoted.

Extracting fields from dates can be useful when looking at how data varies by one unit of time across a larger unit of time. For example, how does sales vary by month across years? Suppose that we have individual sales from year 2010-2016, then we can answer questions like: Are sales in January usually higher than those in March?

Instead of extracting single fields, you can also truncate dates and timestamps to a specified level of precision. We can use the date_trunc() function to specify how much of a timestamp to keep.

date_trunc() replaces fields smaller than, or less significant than, the one specified with either zero or one, as appropriate. Month and day are set to 1, while times are set to 0.

Truncating dates is useful when you want to count, average, or sum data associated with timestamps or dates by larger units of time. For example, starting with individual timestamped sales transactions, what is the monthly trend in sales from June 2017 to January 2019?.

Example: Variation by day of week

Does the time required to complete a request vary by the day of the week on which the request was created?

We can get the name of the day of the week by converting a timestamp to character data:

to_char(date_created, 'day')

But character names for the days of the week sort in alphabetical, not chronological, order. To get the chronological order of days of the week with an integer value for each day, we can use:

EXTRACT(DOW FROM date_created)

Example:

-- Select name of the day of the week the request was created
SELECT to_char(date_created, 'day') AS day,
       -- Select avg time between request creation and completion
       AVG(date_completed - date_created) AS duration
  FROM evanston311
 -- Group by the name of the day of the week and
 -- integer value of day of week the request was created
 GROUP BY day, EXTRACT(DOW FROM date_created)
 -- Order by integer value of the day of the week
 -- the request was created
 ORDER BY EXTRACT(DOW FROM date_created);
day	        duration
sunday   	9 days, 1:47:22.572982
monday   	7 days, 0:56:40.041519
tuesday  	7 days, 2:56:21.726767
wednesday	7 days, 12:07:08.185632
thursday 	7 days, 10:23:30.633975
friday   	8 days, 10:44:09.025246
saturday 	7 days, 14:37:00.356259

date_trunc() example

Unlike date_part() or EXTRACT(), date_trunc() keeps date/time units larger than the field you specify as part of the date. So instead of just extracting one component of a timestamp, date_trunc() returns the specified unit and all larger ones as well.

Using date_trunc(), find the average number of Evanston 311 requests created per day for each month of the data. Ignore days with no requests when taking the average.

-- Aggregate daily counts by month
SELECT date_trunc('month', day) AS month,
       AVG(count)
  -- Subquery to compute daily counts
  FROM (SELECT date_trunc('day', date_created) AS day,
               COUNT(*) AS count
          FROM evanston311
         GROUP BY day) AS daily_count
 GROUP BY month
 ORDER BY month;
month	                    avg
2016-01-01 00:00:00+00:00	23.5161290322580645
2016-02-01 00:00:00+00:00	30.7241379310344828
2016-03-01 00:00:00+00:00	35.5483870967741935
2016-04-01 00:00:00+00:00	37.3000000000000000
2016-05-01 00:00:00+00:00	40.7666666666666667

Aggregating with date/time series

When counting observations by month or day, the result only includes rows for values that appear in your data. So then, How do you find periods of time with no observations?

  • generate_series() function can be used to generate a series of timestamps. It expects two timestamps: from and to. Dates will automatically be cast to a timestamp. The last argument is the interval. For example, here we have an interval of two days:

The result is a series of timestamps between start and end values separated by the interval. Here is another example of interval of hours.