Resampling a time series: Downsampling

3 minute read

Downsampling & aggregation

In this post, we will learn how to down-sample, that is, how to reduce the frequency of your time series. This includes, for instance, converting hourly data to daily data, or daily data to monthly data. In this case, we need to decide how to summarize the existing data as 24 hours become a single day. Our options are familiar aggregation metrics like the mean or median, or simply the last value, and we choice will depend on the context.

We will first look at ozone data for both NYC and LA since 2000 to compare the air quality trend at weekly, monthly and annual frequencies and explore how different resampling periods impact the visualization.

import pandas as pd
import matplotlib.pyplot as plt

# Import and inspect data here
ozone = pd.read_csv('data/air_quality_data/ozone_nyla.csv', index_col='date', parse_dates=True)
print(ozone.info())

# Calculate and plot the weekly average ozone trend
ozone.resample('W').mean().plot(figsize=(8, 8))
plt.show()

# Calculate and plot the monthly average ozone trend
ozone.resample('M').mean().plot(figsize=(8, 8))
plt.show()

# Calculate and plot the annual average ozone trend
ozone.resample('A').mean().plot(figsize=(8, 8))
plt.show()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6291 entries, 2000-01-01 to 2017-03-31
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Los Angeles  5488 non-null   float64
 1   New York     6167 non-null   float64
dtypes: float64(2)
memory usage: 147.4 KB
None

png

png

png

See how changing the resampling period changes the plot of the time series? It is useful to observe the trend sometimes.

Compare monthly average stock prices for Facebook and Google

Now we will apply our new resampling skills to daily stock price series for Facebook and Google for the 2015-2016 period to compare the trend of the monthly averages.

# Import and inspect data here
stocks = pd.read_csv('data/stock_data/goog_fb.csv', index_col='date', parse_dates=True)
print(stocks.info())

# Calculate and plot the monthly averages
monthly_average = stocks.resample('M').mean()
monthly_average.plot(subplots=True, figsize=(8,8))
plt.show()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 504 entries, 2015-01-02 to 2016-12-30
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   FB      504 non-null    float64
 1   GOOG    504 non-null    float64
dtypes: float64(2)
memory usage: 11.8 KB
None

png

Compare quarterly GDP growth rate and stock returns

With our new skill to downsample and aggregate time series, we can compare higher-frequency stock price series to lower-frequency economic time series.

As a first example, let’s compare the quarterly GDP growth rate to the quarterly rate of return on the (resampled) Dow Jones Industrial index of 30 large US stocks.

GDP growth is reported at the beginning of each quarter for the previous quarter. To calculate matching stock returns, we’ll resample the stock index to quarter start frequency using the alias 'QS', and aggregating using the .first() observations.

# Import and inspect gdp_growth here
gdp_growth = pd.read_csv('data/stock_data/gdp_growth.csv', index_col='date', parse_dates=True)
print(gdp_growth.info())


# Import and inspect djia here
djia = pd.read_csv('data/stock_data/djia.csv', index_col='date', parse_dates=True)
print(djia.info())

# Calculate djia quarterly returns here
djia_quarterly = djia.resample('QS').first()
djia_quarterly_return = djia_quarterly.pct_change().mul(100)

# Concatenate, rename and plot djia_quarterly_return and gdp_growth here
data = pd.concat([gdp_growth, djia_quarterly_return], axis=1)
data.columns = ['gdp', 'djia']
data.plot(figsize=(8,8))
plt.show()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 41 entries, 2007-01-01 to 2017-01-01
Data columns (total 1 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   gdp_growth  41 non-null     float64
dtypes: float64(1)
memory usage: 656.0 bytes
None
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2610 entries, 2007-06-29 to 2017-06-29
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   djia    2519 non-null   float64
dtypes: float64(1)
memory usage: 40.8 KB
None

png

Do you think there’s a relationship between GDP and the Dow Jones?

Visualize monthly mean, median and standard deviation of S&P500 returns

We have also learned how to calculate several aggregate statistics from upsampled data.

Let’s use this to explore how the monthly mean, median and standard deviation of daily S&P500 returns have trended over the last 10 years.

# Import data here
sp500 = pd.read_csv('data/stock_data/sp500.csv', index_col='date', parse_dates=True)

# Convert sp500 to a pd.Series() using .squeeze(), and apply .pct_change() to calculate daily_returns
# Calculate daily returns here
daily_returns = sp500.squeeze().pct_change()

.resample() daily_returns to month-end frequency (alias: 'M'), and apply .agg() to calculate ‘mean’, ‘median’, and ‘std’. Assign the result to stats.

# Resample and calculate statistics
stats = daily_returns.resample('M').agg(['mean', 'median', 'std'])

# Plot stats here
stats.plot(figsize=(8, 8))
plt.show()

png

We have seen several downsampling techniques in this post, next we will see how to apply Rolling Window calculations.