In the earlier post, we looked at correlation of two time series.
Autocorrelation is the correlation of a single time series with a lagged copy of itself.
It’s also called “serial correlation”. Often, when we refer to a series’s autocorrelation, we mean the “lag-one” autocorrelation. So when using daily data, for example, the autocorrelation would be the correlation of the series with the same series lagged by one day.
Positive vs Negative Autocorrelation
What does it mean when a series has a positive or negative autocorrelation?
With financial time series, when returns have a negative autocorrelation, we say it is “mean reverting”. Alternatively, if a series has positive autocorrelation, we say it is “trend-following”.
Where is autocorrelation used?
While these concepts of autocorrelation are purely theoretical, they are actually used on Wall Street to make money. Many hedge fund strategies are only slightly more complex versions of mean reversion and momentum strategies. Since stocks have historically had negative autocorrelation over horizons of about a week, one popular strategy is to buy stocks that have dropped over the last week and sell stocks that have gone up.
For other assets like commodities and currencies, they have historically had positive autocorrelation over horizons of several months, so the typical hedge fund strategy there is to buy commodities that have gone up in the last several months and sell those commodities that have gone down.
Autocorrelation as a simple hedge fund strategy
One puzzling anomaly with stocks is that investors tend to overreact to news. Following large jumps, either up or down, stock prices tend to reverse. This is described as mean reversion in stock prices: prices tend to bounce back, or revert, towards previous levels after large moves, which are observed over time horizons of about a week. A more mathematical way to describe mean reversion is to say that stock returns are negatively autocorrelated.
This simple idea is actually the basis for a popular hedge fund strategy.
Example of calculating autocorrelation for a stock
We’ll look at the autocorrelation of weekly returns of
MSFT stock from 2012 to 2017. We’ll start with a DataFrame MSFT of daily prices. We should use the
.resample() method to get weekly prices and then compute returns from prices. Use the pandas method
.autocorr() to get the autocorrelation and show that the autocorrelation is negative. Note that the
.autocorr() method only works on Series, not DataFrames (even DataFrames with one column), so you will have to select the column in the DataFrame.
import pandas as pd import numpy as np import matplotlib.pyplot as plt MSFT = pd.read_csv('data/MSFT.csv', index_col='Date', parse_dates=True) MSFT.head()
# Convert the daily data to weekly data MSFT = MSFT.resample('W').last() # Compute the percentage change of prices returns = MSFT.pct_change() # Compute and print the autocorrelation of returns autocorrelation = returns['Adj Close'].autocorr() print("The autocorrelation of weekly returns is %4.2f" %(autocorrelation))
The autocorrelation of weekly returns is -0.16
# plot the weekly returns for 2017 returns.loc['2017'].plot(figsize=(10, 10))
Notice how the autocorrelation of returns for MSFT is negative, so the stock is ‘mean reverting’
Autocorrelation of interest rates
When you look at daily changes in interest rates, the autocorrelation is close to zero. However, if you resample the data and look at annual changes, the autocorrelation is negative. This implies that while short term changes in interest rates may be uncorrelated, long term changes in interest rates are negatively autocorrelated. A daily move up or down in interest rates is unlikely to tell you anything about interest rates tomorrow, but a move in interest rates over a year can tell you something about where interest rates are going over the next year. And this makes some economic sense: over long horizons, when interest rates go up, the economy tends to slow down, which consequently causes interest rates to fall, and vice versa.
daily_rates = pd.read_csv('data/daily_interest_rates_us.csv', index_col=['DATE'], parse_dates=True) daily_rates.columns = ['APR'] daily_rates.head()
# Compute the daily change in interest rates daily_diff = daily_rates.diff() # Compute and print the autocorrelation of daily changes autocorrelation_daily = daily_diff['APR'].autocorr() print("The autocorrelation of daily interest rate changes is %4.2f" %(autocorrelation_daily)) # Convert the daily data to annual data yearly_rates = daily_rates.resample('A').last() # Repeat above for annual data yearly_diff = yearly_rates.diff() autocorrelation_yearly = yearly_diff['APR'].autocorr() print("The autocorrelation of annual interest rate changes is %4.2f" %(autocorrelation_yearly))
The autocorrelation of daily interest rate changes is 0.39 The autocorrelation of annual interest rate changes is 0.25
The sample autocorrelation function, or ACF, shows not only the lag-one autocorrelation from the previous section, but the entire autocorrelation function for different lags.
Any significant non-zero autocorrelations implies that the series can be forecast from the past.
Example 1: This autocorrelation function implies that you can forecast the next value of the series from the last two values, since the lag-one and lag-two autocorrelations differ from zero.
Example 2: Consider the time series of quarterly earnings of the company H&R Block. As we know, a vast majority of their earnings occurs in the quarter that taxes are due. In this case, we can clearly see a seasonal pattern in the quarterly data on the left, and the autocorrelation function on the right shows strong autocorrelation at lags 4, 8, 12, 16, and 20
NOTE: ACF can also be useful for model selection, which I will cover in the next post.
Understanding ACF using HRB earnings
Often we are interested in seeing the autocorrelation over many lags. The quarterly earnings for H&R Block (ticker symbol HRB) is plotted below, and you can see the extreme cyclicality of its earnings. A vast majority of its earnings occurs in the quarter that taxes are due.
HRB = pd.read_csv('data/HRB.csv', index_col=['Quarter'], parse_dates=True) HRB.head()
# Import the acf module and the plot_acf module from statsmodels from statsmodels.tsa.stattools import acf from statsmodels.graphics.tsaplots import plot_acf import warnings warnings.filterwarnings('ignore') # Compute the acf array of HRB acf_array = acf(HRB) print(acf_array) # Plot the acf function, pass alpha=1 to suppress the confidence interval plot_acf(HRB, lags=20, alpha=1) plt.show()
[ 1. -0.22122696 -0.39856504 -0.26615093 0.83479804 -0.1901038 -0.3475634 -0.23140368 0.71995993 -0.15661007 -0.29766783 -0.22097189 0.61656933 -0.15022869 -0.27922022 -0.22465946 0.5725259 -0.08758288 -0.24075584 -0.20363054 0.4797058 -0.06091139 -0.20935484 -0.18303202 0.42481275 -0.03352559 -0.17471087 -0.16384328 0.34341079 -0.01734364 -0.13820811 -0.12232172 0.28407164 -0.01927656 -0.11757974 -0.10386933 0.20156485 -0.0120634 -0.07509539 -0.0707104 0.10222029]
Notice the strong positive autocorrelation at lags 4, 8, 12, 16,20, …
Are We Confident
MSFT Stock is Mean Reverting?
Earlier in the post, we saw that the autocorrelation of
MSFT’s weekly stock returns was
-0.16. That autocorrelation seems large, but is it statistically significant? In other words, can you say that there is less than a 5% chance that we would observe such a large negative autocorrelation if the true autocorrelation were really zero? And are there any autocorrelations at other lags that are significantly different from zero?
Even if the true autocorrelations were zero at all lags, in a finite sample of returns you won’t see the estimate of the autocorrelations exactly zero. In fact, the standard deviation of the sample autocorrelation is 1/sqrt(N) where N is the number of observations, so if , for example, the standard deviation of the ACF is
0.1, and since
95% of a normal curve is between
-1.96 standard deviations from the mean, the
95% confidence interval is plus or minus
1.96/sqrt(N). This approximation only holds when the true autocorrelations are all zero.
Next, we will compute the actual and approximate confidence interval for the ACF, and compare it to the lag-one autocorrelation of
# load in the data MSFT = pd.read_csv('data/MSFT.csv', index_col='Date', parse_dates=True) # Convert the daily data to weekly data MSFT = MSFT.resample('W').last() # Compute the percentage change of prices returns = MSFT.pct_change() # remove the first row returns = returns.dropna()
# Import the plot_acf module from statsmodels and sqrt from math from statsmodels.graphics.tsaplots import plot_acf from math import sqrt # Compute and print the autocorrelation of MSFT weekly returns autocorrelation = returns['Adj Close'].autocorr() print("The autocorrelation of weekly MSFT returns is %4.2f" %(autocorrelation)) # Find the number of observations by taking the length of the returns DataFrame nobs = len(returns) # Compute the approximate confidence interval conf = 1.96/sqrt(nobs) print("The approximate confidence interval is +/- %4.2f" %(conf)) # Plot the autocorrelation function with 95% confidence intervals and 20 lags using plot_acf plot_acf(returns, alpha=0.05, lags=20) plt.show()
The autocorrelation of weekly MSFT returns is -0.16 The approximate confidence interval is +/- 0.12
Notice that the autocorrelation with lag 1 is significantly negative, but none of the other lags are significantly different from zero.
So, lag 1 autocorrelation of
MSFT stock does suggest that it is mean reverting (negatively autocorrelated).