White Noise, Random Walk and Stationarity

15 minute read

White Noise

White Noise is a series with mean that is constant with time, a variance that is also constant with time, and zero autocorrelation at all lags.

There are several special cases of White Noise. For example, if the data is white noise but also has a normal, or Gaussian, distribution, then it is called Gaussian White Noise.

We can’t forecast white noise

A white noise time series is simply a sequence of uncorrelated random variables that are identically distributed. Stock returns are often modeled as white noise. So, with white noise, we cannot forecast future observations based on the past since autocorrelations at all lags are zero.

We will generate a white noise series and plot the autocorrelation function to show that it is zero for all lags. We can use np.random.normal() to generate random returns. For a Gaussian white noise process, the mean and standard deviation describe the entire process.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Import the plot_acf module from statsmodels
from statsmodels.graphics.tsaplots import plot_acf

# Simulate white noise returns:
# np.random.normal() creates an array of normally distributed random numbers.
# The loc argument is the mean and the scale argument is the standard deviation.
# This is one way to generate a white noise series.
returns = np.random.normal(loc=0.02, scale=0.05, size=1000)

# Print out the mean and standard deviation of returns
mean = np.mean(returns)
std = np.std(returns)
print("The mean is %5.3f and the standard deviation is %5.3f" %(mean,std))

# Plot returns series
plt.plot(returns)
plt.show()

# Plot autocorrelation function of white noise returns
plot_acf(returns, lags=20)
plt.show()
The mean is 0.020 and the standard deviation is 0.052

png

png

Notice that for a white noise time series, all the autocorrelations are close to zero, so the past will not help you forecast the future.

Random Walk

It is important to recognize when a time series is a random walk or not. It is a tool that can help you understand the predictability of your time series.

Random Series: Random walk is often confused with a Random Series. Take a look at the plot below. This is not a random walk. It is just a sequence of random numbers. Random walk is not a list of random numbers.

from random import seed
from random import randrange
seed(1)
series = [randrange(10) for i in range(1000)]
plt.plot(series)
[<matplotlib.lines.Line2D at 0x7f9f37b98400>]

png

Random Walk: A random walk is different from a list of random numbers because the next value in the sequence is a modification of the previous value in the sequence. The process used to generate the series forces dependence from one-time step to the next. This dependence provides some consistency from step-to-step rather than the large jumps that a series of independent, random numbers provides.

It is this dependency that gives the process its name as a “random walk” or a “drunkard’s walk”.

The current observation is a random step from the previous observation.

In a random walk, today’s price is equal to yesterday’s price plus some noise. Here is a plot of a simulated random walk.

The change in price of a random walk is just White Noise.

Incidentally, if prices are in logs, then the difference in log prices is one way to measure returns.

The bottom line is that if stock prices follow a random walk, then stock returns are White Noise.

You can’t forecast a random walk. The best guess for tomorrow’s price is simply today’s price.

Drift: In a random walk with drift, prices on average drift by mu every period. And the change in price for a random walk with drift is still white noise but with a mean of mu. So if we now think of stock prices as a random walk with drift, then the returns are still white noise, but with an average return of mu instead of zero.

Statistical Test for Random Walk

To test whether a series like stock prices follows a random walk, you can regress current prices on lagged prices.

Null Hypothesis: Series is a Random Walk

  • If the slope coefficient, beta, is not significantly different from one, then we cannot (or fail to) reject the null hypothesis that the series is a random walk.
  • However, if the slope coefficient is significantly less than one, then we can reject the null hypothesis that the series is a random walk.

An identical way to do that test is to regress the difference in prices on the lagged price, and instead of testing whether the slope coefficient is 1, now we test whether it is zero. This is called the “Dickey-Fuller” test. If you add more lagged prices on the right hand side, then it’s called the Augmented Dickey-Fuller test.

Generate a random walk

Stock returns are often modeled as white noise, and stock prices closely follow a random walk. In other words, today’s price is yesterday’s price plus some random noise.

We will simulate the price of a stock over time that has a starting price of 100 and every day goes up or down by a random amount. Then, plot the simulated stock price.

import matplotlib.pyplot as plt
# Generate 500 random steps with mean=0 and standard deviation=1
steps = np.random.normal(loc=0, scale=1, size=500)

# Set first element to 0 so that the first price will be the starting stock price
steps[0]=0

# Simulate stock prices, P with a starting price of 100
P = 100 + np.cumsum(steps)

# Plot the simulated stock prices
plt.plot(P)
plt.title("Simulated Random Walk")
plt.show()

png

The simulated price series we plotted should closely resemble a random walk.

Random Walk with Drift

Above we simulated stock prices that follow a random walk. We will extend this in two ways:

  • We will look at a random walk with a drift. Many time series, like stock prices, are random walks but tend to drift up over time.
  • In the above section, the noise in the random walk was additive: random, normal changes in price were added to the last price. However, when adding noise, you could theoretically get negative prices. Now we will make the noise multiplicative: we will add one to the random, normal changes to get a total return, and multiply that by the last price.
# Generate 500 random steps
# Generate 500 random normal multiplicative "steps" with mean 0.1% and
# standard deviation 1% using np.random.normal(), which are now returns,
# and add one for total return.
steps = np.random.normal(loc=0.001, scale=0.01, size=500) + 1

# Set first element to 1
steps[0]=1

# Simulate the stock price, P, by taking the cumulative product
P = 100 * np.cumprod(steps)

# Plot the simulated stock prices
plt.plot(P)
plt.title("Simulated Random Walk with Drift")
plt.show()

png

This simulated price series you plotted should closely resemble a random walk for a high flying stock.

Are Stock Prices a Random Walk?

Most stock prices follow a random walk (perhaps with a drift). We will look at a time series of Amazon stock prices, loaded in the DataFrame AMZN, and run the 'Augmented Dickey-Fuller Test' from the statsmodels library to show that it does indeed follow a random walk.

AMZN = pd.read_csv('data/AMZN.csv', index_col='Date', parse_dates=True)
AMZN.head()
Adj Close
Date
1997-05-15 1.958333
1997-05-16 1.729167
1997-05-19 1.708333
1997-05-20 1.635417
1997-05-21 1.427083

Run the Augmented Dickey-Fuller test to check if the time series is a Random Walk

The null hypothesis here is that AMZN stock price is a Random walk.

  • Run the Augmented Dickey-Fuller test on the series of closing stock prices, which is the column ‘Adj Close’ in the AMZN DataFrame.

  • Print out the entire output, which includes the test statistic, the p-values, and the critical values for tests with 1%, 10%, and 5% levels.

  • Print out just the p-value of the test (results[0] is the test statistic, and results[1] is the p-value).

# Import the adfuller module from statsmodels
from statsmodels.tsa.stattools import adfuller

# Run the ADF test on the price series and print out the results
results = adfuller(AMZN['Adj Close'])
print(results)

# Just print out the p-value
print('The p-value of the test on prices is: ' + str(results[1]))
(4.02516852577074, 1.0, 33, 5054, {'1%': -3.4316445438146865, '5%': -2.862112049726916, '10%': -2.5670745025321304}, 30308.64216426981)
The p-value of the test on prices is: 1.0

According to this test, we cannot reject the hypothesis that Amazon prices follow a random walk.

In other words, we showed that Amazon stock prices follows a random walk.

Next we will do the same thing for Amazon returns (percent change in prices) and show that the returns do not follow a random walk.

# Import the adfuller module from statsmodels
from statsmodels.tsa.stattools import adfuller

# Create a DataFrame of AMZN returns
AMZN_ret = AMZN.pct_change()

# Eliminate the NaN in the first row of returns
AMZN_ret = AMZN_ret.dropna()

# Run the ADF test on the return series and print out the p-value
results = adfuller(AMZN_ret['Adj Close'])
print('The p-value of the test on returns is: ' + str(results[1]))
The p-value of the test on returns is: 2.5655898083476245e-22

The p-value is extremely small, so we can easily reject the hypothesis that returns are a random walk at all levels of significance.

In other words, we showed that Amazon returns does not follow a random walk.

Stationarity

There are different ways to define stationarity, but in its strictest sense, it means that the joint distribution of the observations do not depend on time.

A less restrictive version of stationarity, and one that is easier to test, is weak stationarity, which just means that the mean, variance, and autocorrelations of the observations do not depend on time.

In other words, for the autocorrelation, the correlation between X-t and X-(t-tau) is only a function of the lag tau, and not a function of time.

Why do we care if it is stationary?:

If a process is not stationary, then it becomes difficult to model.

Modeling involves estimating a set of parameters, and if a process is not stationary, and the parameters are different at each point in time, then there are too many parameters to estimate. You may end up having more parameters than actual data!

So stationarity is necessary for a parsimonious model - one with a smaller set of parameters to estimate.

Examples of non-stationary series

A random walk is a common type of non-stationary series. The variance grows with time. For example, if stock prices are a random walk, then the uncertainty about prices tomorrow is much less than the uncertainty 10 years from now.

Seasonal series are also non-stationary. Here is the dataset you saw earlier on the frequency of Google searches for the word ‘diet’. The mean varies with the time of the year.

Here is White Noise, which would ordinarily be a stationary process, but here the mean increases over time, which makes it non-stationary.

Transforming non-stationary series to stationary series

Many non-stationary series can be made stationary through a simple transformation. A Random Walk is a non-stationary series, but if you take the first differences, the new series is White Noise, which is stationary. On the left are S&P500 prices, which is a non-stationary random walk, but if you compute first differences on the right, it becomes a stationary white noise process.

On the left, we have the quarterly earnings for H&R Block, which has a large seasonal component and is therefore not stationary. If we take the seasonal difference, by taking the difference with lag of 4, the transformed series looks stationary.

Sometimes, you may need to make two transformations. Here is a time series of Amazon’s quarterly revenue. It is growing exponentially as well as exhibiting a strong seasonal pattern. First, if you take only the log of the series, in the upper right, you eliminate the exponential growth. But if you take both the log of the series and then the seasonal difference, in the lower right, the transformed series looks stationary.

Seasonal Adjustment

Many time series exhibit strong seasonal behavior. The procedure for removing the seasonal component of a time series is called seasonal adjustment. For example, most economic data published by the government is seasonally adjusted.

You saw earlier that by taking first differences of a random walk, you get a stationary white noise process. For seasonal adjustments, instead of taking first differences, you will take differences with a lag corresponding to the periodicity.

Look again at the ACF of H&R Block’s quarterly earnings, loaded in the DataFrame HRB, and there is a clear seasonal component. The autocorrelation is high for lags 4,8,12,16,… because of the spike in earnings every four quarters during tax season. We will apply a seasonal adjustment by taking the fourth difference (four represents the periodicity of the series). Then compute the autocorrelation of the transformed series.

# Import the acf module and the plot_acf module from statsmodels
from statsmodels.tsa.stattools import acf
from statsmodels.graphics.tsaplots import plot_acf
import warnings
warnings.filterwarnings('ignore')

HRB = pd.read_csv('data/HRB.csv', index_col=['Quarter'], parse_dates=True)

# Compute the acf array of HRB
acf_array = acf(HRB)

# Plot the acf function, pass alpha=1 to suppress the confidence interval
plot_acf(HRB, lags=20, alpha=0.05)
plt.show()

png

# Seasonally adjust quarterly earnings
HRBsa = HRB.diff(4)

# Print the first 10 rows of the seasonally adjusted series
print(HRBsa.head(10))

# Drop the NaN data in the first four rows
HRBsa = HRBsa.dropna()

# Plot the autocorrelation function of the seasonally adjusted series
plot_acf(HRBsa)
plt.show()
            Earnings
Quarter             
2007-01-01       NaN
2007-04-01       NaN
2007-07-01       NaN
2007-10-01       NaN
2008-01-01      0.02
2008-04-01     -0.04
2008-07-01     -0.05
2008-10-01      0.26
2009-01-01     -0.05
2009-04-01      0.02

png

By seasonally adjusting the series, we eliminated the seasonal pattern in the autocorrelation function

Summary and FAQs

A stationary time series is one where the values are not a function of time.

white noise is stationary.

random walk is non-stationary.

1. Understanding Stationarity

Properties of data such as central tendency, dispersion, skewness, and kurtosis are called sample statistics. Mean and variance are two of the most commonly used sample statistics. In any analysis, data is collected by gathering information from a sample of the larger population. Mean, variance, and other properties are then estimated based on the sample data. Hence these are referred to as sample statistics.

An important assumption in statistical estimation theory is that, for sample statistics to be reliable, the population does not undergo any fundamental or systemic shifts over the individuals in the sample or over the time during which the data has been collected. This assumption ensures that sample statistics do not alter and will hold for entities that are outside the sample used for their estimation.

This assumption also applies to time series analysis so that mean, variance and auto-correlation estimated from the sample can be used as a reasonable estimate for future occurrences.

In time series analysis, this assumption is known as stationarity, which requires that the internal structures of the series do not change over time.

Therefore, stationarity requires mean, variance, and autocorrelation to be invariant with respect to the actual time of observation. Another way of understanding stationarity is that the series has constant mean and constant variance without any predictable and repetitive patterns.

A popular example of a stationary time series is the zero-mean series which is a collection of samples generated from a normal distribution with mean at zero. The zero-mean series is illustrated in the following figure which is generated from points which are sampled from a normal distribution of zero mean and unit variance.

Though points are sequentially sampled from the normal distribution and plotted as a time series, the individual observations are independent and identically distributed. The zero-mean series does not show any temporal patterns such as trend, seasonality and auto-correlation.

However, most real-life time series are not stationary. Non-stationarity mostly arises due to the presence of trend and seasonality that affects the mean, variance, and autocorrelation at different points in time.

In general, a time series with no predictable patterns in the long run is stationary.

A crucial step in time series analysis is statistically verifying stationarity and destationarizing a non-stationary time series through special mathematical operations like differencing.

We use Augmented Dickey-Fuller (ADF) test for detecting stationarity and describe the method of differencing for destationarizing non-stationary time series.

2. Random Walk and Autocorrelation

We can calculate the correlation between each observation and the observations at previous time steps. A plot of these correlations is called an autocorrelation plot. Given the way that the random walk is constructed, we would expect a strong autocorrelation with the previous observation and a linear fall off from there with previous lag values.

3. Random Walk and Stationarity

A stationary time series is one where the values are not a function of time. Given the way that the random walk is constructed and the results of reviewing the autocorrelation, we know that the observations in a random walk are dependent on time.

The current observation is a random step from the previous observation.

Therefore we can expect a random walk to be non-stationary. In fact, all random walk processes are non-stationary. Note that not all non-stationary time series are random walks.

Also note, a non-stationary time series does not have a consistent mean and/or variance over time.

4. Predicting a Random Walk

A random walk is unpredictable; it cannot reasonably be predicted. Given the way that the random walk is constructed, we can expect that the best prediction we could make would be to use the observation at the previous time step as what will happen in the next time step.

Simply because we know that the next time step will be a function of the prior time step. This is called naive forceasting or persistence model.

5. How to know if my time series is a Random Walk?

Your time series may be a random walk.

Some ways to check if your time series is a random walk are as follows:

  • The time series shows a strong temporal dependence that decays linearly or in a similar pattern.
  • The time series is non-stationary and making it stationary shows no obviously learnable structure in the data.
  • The persistence model provides the best source of reliable predictions.

This last point is key for time series forecasting. Baseline forecasts with the persistence model quickly flesh out whether you can do significantly better. If you can’t, you’re probably working with a random walk.

Many time series are random walks, particularly those of security prices over time.