Coursera Learner working on a presentation with Coursera logo and
Coursera Learner working on a presentation with Coursera logo and

ARIMA(p,d,q) forecasting equation: ARIMA models are, in theory, the foremost general class of models for forecasting a statistic which may be made to be “stationary” by differencing (if necessary), perhaps in conjunction with nonlinear transformations like logging or deflating (if necessary). A variate that’s a statistic is stationary if its statistical properties are all constant over time. A stationary series has no trend, its variations around its mean have continuing amplitude, and it wiggles during a consistent fashion, i.e., its short-term random time patterns always look an equivalent during a statistical sense. The latter condition means its autocorrelations (correlations with its own prior deviations from the mean) remain constant over time, or equivalently, that its power spectrum remains constant over time. A variate of this type are often viewed (as usual) as a mixture of signal and noise, and therefore the signal (if one is apparent) might be a pattern of fast or slow mean reversion, or sinusoidal oscillation, or rapid alternation in sign, and it could even have a seasonal component. An ARIMA model are often viewed as a “filter” that tries to separate the signal from the noise, and therefore the signal is then extrapolated into the longer term to get forecasts.

The ARIMA forecasting equation for a stationary statistic may be a linear (i.e., regression-type) equation during which the predictors contains lags of the variable and/or lags of the forecast errors. That is:

Predicted value of Y = a continuing and/or a weighted sum of 1 or newer values of Y and/or a weighted sum of 1 or newer values of the errors.

If the predictors consist only of lagged values of Y, it’s a pure autoregressive (“self-regressed”) model, which is simply a special case of a regression model and which might be fitted with standard regression software. for instance , a first-order autoregressive (“AR(1)”) model for Y may be a regression model during which the experimental variable is simply Y lagged by one period (LAG(Y,1) in Statgraphics or Y_LAG1 in RegressIt). If a number of the predictors are lags of the errors, an ARIMA model it’s NOT a rectilinear regression model, because there’s no thanks to specify “last period’s error” as an independent variable: the errors must be computed on a period-to-period basis when the model is fitted to the info . From a technical standpoint, the matter with using lagged errors as predictors is that the model’s predictions aren’t linear functions of the coefficients, albeit they’re linear functions of the past data. So, coefficients in ARIMA models that include lagged errors must be estimated by nonlinear optimization methods (“hill-climbing”) instead of by just solving a system of equations.

The acronym ARIMA stands for Auto-Regressive Integrated Moving Average. Lags of the stationarized series within the forecasting equation are called “autoregressive” terms, lags of the forecast errors are called “moving average” terms, and a statistic which must be differenced to be made stationary is claimed to be an “integrated” version of a stationary series. Random-walk and random-trend models, autoregressive models, and exponential smoothing models are all special cases of ARIMA models.

A nonseasonal ARIMA model is assessed as an “ARIMA(p,d,q)” model, where:

p is that the number of autoregressive terms,

d is that the number of nonseasonal differences needed for stationarity, and

q is that the number of lagged forecast errors within the prediction equation.

The forecasting equation is made as follows. First, let y denote the dth difference of Y, which means:

If d=0: yt = Yt

If d=1: yt = Yt – Yt-1

If d=2: yt = (Yt – Yt-1) – (Yt-1 – Yt-2) = Yt – 2Yt-1 + Yt-2

Note that the second difference of Y (the d=2 case) isn’t the difference from 2 periods ago. Rather, it’s the first-difference-of-the-first difference, which is that the discrete analog of a second derivative, i.e., the local acceleration of the series instead of its local trend.

In terms of y, the overall forecasting equation is:

ŷt = μ + ϕ1 yt-1 +…+ ϕp yt-p – θ1et-1 -…- θqet-q

Here the moving average parameters (θ’s) are defined in order that their signs are negative within the equation, following the convention introduced by Box and Jenkins. Some authors and software (including the R programming language) define them in order that they need plus signs instead. When actual numbers are plugged into the equation, there’s no ambiguity, but it’s important to understand which convention your software uses once you are reading the output. Often the parameters are denoted there by AR(1), AR(2), …, and MA(1), MA(2), … etc..

To identify the acceptable ARIMA model for Y, you start by determining the order of differencing (d) wanting to stationarize the series and take away the gross features of seasonality, perhaps in conjunction with a variance-stabilizing transformation like logging or deflating. If you stop at now and predict that the differenced series is constant, you’ve got merely fitted a stochastic process or random trend model. However, the stationarized series should have autocorrelated errors, suggesting that some number of AR terms (p ≥ 1) and/or some number MA terms (q ≥ 1) also are needed within the forecasting equation.

The process of determining the values of p, d, and q that are best for a given statistic are going to be discussed in later sections of the notes (whose links are at the highest of this page), but a preview of a number of the kinds of nonseasonal ARIMA models that are commonly encountered is given below.

ARIMA(1,0,0) = first-order autoregressive model: if the series is stationary and autocorrelated, perhaps it are often predicted as a multiple of its own previous value, plus a continuing . The forecasting equation during this case is

Ŷt = μ + ϕ1Yt-1

…which is Y regressed on itself lagged by one period. this is often an “ARIMA(1,0,0)+constant” model. If the mean of Y is zero, then the constant term wouldn’t be included.

If the slope coefficient ϕ1 is positive and fewer than 1 in magnitude (it must be but 1 in magnitude if Y is stationary), the model describes mean-reverting behavior during which next period’s value should be predicted to be ϕ1 times as distant from the mean as this period’s value. If ϕ1 is negative, it predicts mean-reverting behavior with alternation of signs, i.e., it also predicts that Y are going to be below the mean next period if it’s above the mean this era .

In a second-order autoregressive model (ARIMA(2,0,0)), there would be a Yt-2 term on the proper also , and so on. counting on the signs and magnitudes of the coefficients, an ARIMA(2,0,0) model could describe a system whose mean reversion takes place during a sinusoidally oscillating fashion, just like the motion of a mass on a spring that’s subjected to random shocks.

ARIMA(0,1,0) = random walk: If the series Y isn’t stationary, the only possible model for it’s a stochastic process model, which may be considered as a limiting case of an AR(1) model during which the autoregressive coefficient is adequate to 1, i.e., a series with infinitely slow mean reversion. The prediction equation for this model are often written as:

Ŷt – Yt-1 = μ

or equivalently

Ŷt = μ + Yt-1

…where the constant term is that the average period-to-period change (i.e. the long-term drift) in Y. This model might be fitted as a no-intercept regression model during which the primary difference of Y is that the variable . Since it includes (only) a nonseasonal difference and a continuing term, it’s classified as an “ARIMA(0,1,0) model with constant.” The random-walk-without-drift model would be an ARIMA(0,1,0) model without constant

ARIMA(1,1,0) = differenced first-order autoregressive model: If the errors of a stochastic process model are autocorrelated, perhaps the matter are often fixed by adding one lag of the variable to the prediction equation–i.e., by regressing the primary difference of Y on itself lagged by one period. this is able to yield the subsequent prediction equation:

Ŷt – Yt-1 = μ + ϕ1(Yt-1 – Yt-2)

Ŷt – Yt-1 = μ

which can be rearranged to

Ŷt = μ + Yt-1 + ϕ1 (Yt-1 – Yt-2)

This is a first-order autoregressive model with one order of nonseasonal differencing and a continuing term–i.e., an ARIMA(1,1,0) model.

ARIMA(0,1,1) without constant = simple exponential smoothing: Another strategy for correcting autocorrelated errors during a stochastic process model is usually recommended by the straightforward exponential smoothing model. Recall that for a few nonstationary statistic (e.g., ones that exhibit noisy fluctuations around a slowly-varying mean), the stochastic process model doesn’t perform also as a moving average of past values. In other words, instead of taking the foremost recent observation because the forecast of subsequent observation, it’s better to use a mean of the previous couple of observations so as to filter the noise and more accurately estimate the local mean. the straightforward exponential smoothing model uses an exponentially weighted moving average of past values to realize this effect. The prediction equation for the straightforward exponential smoothing model are often written during a number of mathematically equivalent forms, one among which is that the so-called “error correction” form, during which the previous forecast is adjusted within the direction of the error it made:

Ŷt = Ŷt-1 + αet-1

Because et-1 = Yt-1 – Ŷt-1 by definition, this will be rewritten as:

Ŷt = Yt-1 – (1-α)et-1

= Yt-1 – θ1et-1

which is an ARIMA(0,1,1)-without-constant forecasting equation with θ1 = 1-α. this suggests that you simply can fit an easy exponential smoothing by specifying it as an ARIMA(0,1,1) model without constant, and therefore the estimated MA(1) coefficient corresponds to 1-minus-alpha within the SES formula. Recall that within the SES model, the typical age of the info within the 1-period-ahead forecasts is 1/α, meaning that they’re going to tend to lag behind trends or turning points by about 1/α periods. It follows that the typical age of the info within the 1-period-ahead forecasts of an ARIMA(0,1,1)-without-constant model is 1/(1-θ1). So, for instance , if θ1 = 0.8, the typical age is 5. As θ1 approaches 1, the ARIMA(0,1,1)-without-constant model becomes a very-long-term moving average, and as θ1 approaches 0 it becomes a random-walk-without-drift model.

What’s the simplest thanks to correct for autocorrelation: adding AR terms or adding MA terms? within the previous two models discussed above, the matter of autocorrelated errors during a stochastic process model was fixed in two different ways: by adding a lagged value of the differenced series to the equation or adding a lagged value of the forecast error. Which approach is best? A rule-of-thumb for this example , which can be discussed in additional detail afterward , is that positive autocorrelation is typically best treated by adding an AR term to the model and negative autocorrelation is typically best treated by adding an MA term. In business and economic statistic , negative autocorrelation often arises as an artifact of differencing. (In general, differencing reduces positive autocorrelation and should even cause a switch from positive to negative autocorrelation.) So, the ARIMA(0,1,1) model, during which differencing is amid an MA term, is more often used than an ARIMA(1,1,0) model.

ARIMA(0,1,1) with constant = simple exponential smoothing with growth: By implementing the SES model as an ARIMA model, you really gain some flexibility. First of all, the estimated MA(1) coefficient is allowed to be negative: this corresponds to a smoothing factor larger than 1 in an SES model, which is typically not allowed by the SES model-fitting procedure. Second, you’ve got the choice of including a continuing term within the ARIMA model if you would like , so as to estimate a mean non-zero trend. The ARIMA(0,1,1) model with constant has the prediction equation:

Ŷt = μ + Yt-1 – θ1et-1

The one-period-ahead forecasts from this model are qualitatively almost like those of the SES model, except that the trajectory of the long-term forecasts is usually a sloping line (whose slope is adequate to mu) instead of a horizontal line.

ARIMA(0,2,1) or (0,2,2) without constant = linear exponential smoothing: Linear exponential smoothing models are ARIMA models which use two nonseasonal differences in conjunction with MA terms. The second difference of a series Y isn’t simply the difference between Y and itself lagged by two periods, but rather it’s the primary difference of the primary difference–i.e., the change-in-the-change of Y at period t. Thus, the second difference of Y at period t is adequate to (Yt – Yt-1) – (Yt-1 – Yt-2) = Yt – 2Yt-1 + Yt-2. A second difference of a discrete function is analogous to a second derivative of endless function: it measures the “acceleration” or “curvature” within the function at a given point in time.

The ARIMA(0,2,2) model without constant predicts that the second difference of the series equals a linear function of the last two forecast errors:

Ŷt – 2Yt-1 + Yt-2 = – θ1et-1 – θ2et-2

which can be rearranged as:

Ŷt = 2 Yt-1 – Yt-2 – θ1et-1 – θ2et-2

where θ1 and θ2 are the MA(1) and MA(2) coefficients. this is often a general linear exponential smoothing model, essentially an equivalent as Holt’s model, and Brown’s model may be a special case. It uses exponentially weighted moving averages to estimate both an area level and an area trend within the series. The long-term forecasts from this model converge to a line whose slope depends on the typical trend observed toward the top of the series.

ARIMA(1,1,2) without constant = damped-trend linear exponential smoothing:

Ŷt = Yt-1 + ϕ1 (Yt-1 – Yt-2 ) – θ1et-1 – θ1et-1

This model is illustrated within the accompanying slides on ARIMA models. It extrapolates the local trend at the top of the series but flattens it out at longer forecast horizons to introduce a note of conservatism, a practice that has empirical support. See the article on “Why the Damped Trend works” by Gardner and McKenzie and therefore the “Golden Rule” article by Armstrong et al. for details.

It is generally advisable to stay to models during which a minimum of one among p and q is not any larger than 1, i.e., don’t attempt to fit a model like ARIMA(2,1,2), as this is often likely to steer to overfitting and “common-factor” issues that are discussed in additional detail within the notes on the mathematical structure of ARIMA models.

Spreadsheet implementation: ARIMA models like those described above are easy to implement on a spreadsheet. The prediction equation is just a equation that refers to past values of original statistic and past values of the errors. Thus, you’ll found out an ARIMA forecasting spreadsheet by storing the info in column A, the forecasting formula in column B, and therefore the errors (data minus forecasts) in column C. The forecasting formula during a typical cell in column B would simply be a linear expression pertaining to values in preceding rows of columns A and C, multiplied by the acceptable AR or MA coefficients stored in cells elsewhere on the spreadsheet.


Weekly newsletter

No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every week.