Resampling a time series: Downsampling
Downsampling & aggregation
In this post, we will learn how to down-sample, that is, how to reduce the frequency of your time series. This includes, for instance, converting hourly data to daily data, or daily data to monthly data. In this case, we need to decide how to summarize the existing data as 24 hours become a single day. Our options are familiar aggregation metrics like the mean or median, or simply the last value, and we choice will depend on the context.
Compare weekly, monthly and annual ozone trends for NYC & LA
We will first look at ozone data for both NYC and LA since 2000 to compare the air quality trend at weekly, monthly and annual frequencies and explore how different resampling periods impact the visualization.
import pandas as pd
import matplotlib.pyplot as plt
# Import and inspect data here
ozone = pd.read_csv('data/air_quality_data/ozone_nyla.csv', index_col='date', parse_dates=True)
print(ozone.info())
# Calculate and plot the weekly average ozone trend
ozone.resample('W').mean().plot(figsize=(8, 8))
plt.show()
# Calculate and plot the monthly average ozone trend
ozone.resample('M').mean().plot(figsize=(8, 8))
plt.show()
# Calculate and plot the annual average ozone trend
ozone.resample('A').mean().plot(figsize=(8, 8))
plt.show()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6291 entries, 2000-01-01 to 2017-03-31
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Los Angeles 5488 non-null float64
1 New York 6167 non-null float64
dtypes: float64(2)
memory usage: 147.4 KB
None
See how changing the resampling period changes the plot of the time series? It is useful to observe the trend sometimes.
Compare monthly average stock prices for Facebook and Google
Now we will apply our new resampling skills to daily stock price series for Facebook and Google for the 2015-2016 period to compare the trend of the monthly averages.
# Import and inspect data here
stocks = pd.read_csv('data/stock_data/goog_fb.csv', index_col='date', parse_dates=True)
print(stocks.info())
# Calculate and plot the monthly averages
monthly_average = stocks.resample('M').mean()
monthly_average.plot(subplots=True, figsize=(8,8))
plt.show()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 504 entries, 2015-01-02 to 2016-12-30
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 FB 504 non-null float64
1 GOOG 504 non-null float64
dtypes: float64(2)
memory usage: 11.8 KB
None
Compare quarterly GDP growth rate and stock returns
With our new skill to downsample and aggregate time series, we can compare higher-frequency stock price series to lower-frequency economic time series.
As a first example, let’s compare the quarterly GDP growth rate to the quarterly rate of return on the (resampled) Dow Jones Industrial index of 30 large US stocks.
GDP growth is reported at the beginning of each quarter for the previous quarter. To calculate matching stock returns, we’ll resample the stock index to quarter start frequency using the alias 'QS'
, and aggregating using the .first()
observations.
# Import and inspect gdp_growth here
gdp_growth = pd.read_csv('data/stock_data/gdp_growth.csv', index_col='date', parse_dates=True)
print(gdp_growth.info())
# Import and inspect djia here
djia = pd.read_csv('data/stock_data/djia.csv', index_col='date', parse_dates=True)
print(djia.info())
# Calculate djia quarterly returns here
djia_quarterly = djia.resample('QS').first()
djia_quarterly_return = djia_quarterly.pct_change().mul(100)
# Concatenate, rename and plot djia_quarterly_return and gdp_growth here
data = pd.concat([gdp_growth, djia_quarterly_return], axis=1)
data.columns = ['gdp', 'djia']
data.plot(figsize=(8,8))
plt.show()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 41 entries, 2007-01-01 to 2017-01-01
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gdp_growth 41 non-null float64
dtypes: float64(1)
memory usage: 656.0 bytes
None
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2610 entries, 2007-06-29 to 2017-06-29
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 djia 2519 non-null float64
dtypes: float64(1)
memory usage: 40.8 KB
None
Do you think there’s a relationship between GDP and the Dow Jones?
Visualize monthly mean, median and standard deviation of S&P500 returns
We have also learned how to calculate several aggregate statistics from upsampled data.
Let’s use this to explore how the monthly mean, median and standard deviation of daily S&P500 returns have trended over the last 10 years.
# Import data here
sp500 = pd.read_csv('data/stock_data/sp500.csv', index_col='date', parse_dates=True)
# Convert sp500 to a pd.Series() using .squeeze(), and apply .pct_change() to calculate daily_returns
# Calculate daily returns here
daily_returns = sp500.squeeze().pct_change()
.resample()
daily_returns to month-end frequency (alias: 'M')
, and apply .agg()
to calculate ‘mean’, ‘median’, and ‘std’. Assign the result to stats.
# Resample and calculate statistics
stats = daily_returns.resample('M').agg(['mean', 'median', 'std'])
# Plot stats here
stats.plot(figsize=(8, 8))
plt.show()
We have seen several downsampling techniques in this post, next we will see how to apply Rolling Window
calculations.