Whether we wish to predict the trend in financial markets or electricity consumption, time is a crucial factor that has got to now be considered in our models. For instance, it might be interesting to not only know when a stock will move up in price, but also when it’ll move up.
Enter statistic. A statistic is just a series of knowledge points ordered in time. During a statistic, time is typically the experimental variable and therefore the goal is usually to form a forecast for the longer term.
However, there are other aspects that inherit play when handling statistic.
Is it stationary?
Is there seasonality?
Is the target variable auto correlated?
In this post, I will be able to introduce different characteristics of your time series and the way we will model them to get accurate (as very much like possible) forecasts.
Informally, autocorrelation is that the similarity between observations as a function of the delay between them.
Above is an example of an autocorrelation plot. Looking closely, you realize that the primary value and therefore the 24th value have a high autocorrelation. Similarly, the 12th and 36th observations are highly correlated. This suggests that we’ll find a really similar value at every 24 unit of your time.
Notice how the plot seems like sinusoidal function. This is often a touch for seasonality, and you’ll find its value by finding the amount within the plot above, which might give 24h.
Seasonality refers to periodic fluctuations. For instance, electricity consumption is high during the day and low during night, or online sales increase during Christmas before slowing down again.
As you’ll see above, there’s a transparent daily seasonality. Every day, you see a peak towards the evening, and therefore the refore the lowest points are the start and the end of every day.
Remember that seasonality also can be derived from an autocorrelation plot if it’s a sinusoidal shape. Simply check out the amount and it gives the length of the season.
Stationary is a crucial characteristic of your time series. A statistic is claimed to be stationary if its statistical properties don’t change over time. In other words, it’s constant mean and variance, and covariance is independent of your time.
Looking again at an equivalent plot, we see that the method above is stationary. The mean and variance don’t vary over time.
Often, stock prices aren’t a stationary process, since we’d see a growing trend, or its volatility might increase over time (meaning that variance is changing).
Ideally, we would like to possess a stationary statistic for modeling. Of course, not all of them are stationary, but we will make different transformations to form them stationary.
How to test if a process is stationary
You may have noticed within the title of the plot above Dickey-Fuller. This is often the statistical test that we run to work out if a statistic is stationary or not.
Without going into the technicalities of the Dickey-Fuller test, it tests the null hypothesis that a unit root is present.
If it is, then p > 0, and therefore the process aren’t stationary.
Otherwise, p = 0, the null hypothesis is rejected, and therefore the process is taken into account to be stationary.
As an example, the method below isn’t stationary. Notice how the mean isn’t constant through time.
There are some ways to model a statistic so as to form predictions. Here, I will be able to present:
The moving average model is perhaps the foremost naive approach to statistic modeling. This model simply states that subsequent observation is that the mean of all past observations.
Although simple, this model could be surprisingly good and it represents an honest start line.
Otherwise, the moving average is often wont to identify interesting trends within the data. We will define a window to use the moving average model to smooth the statistic, and highlight different trends.
In the plot above, we applied the moving average model to a 24h window. The Green Line smoothed the statistic, and that we can see that there are 2 peaks during a 24h period.
Of course, the longer the window, the smoother the trend is going to be. Below is an example of moving average on a smaller window.
Exponential smoothing uses an identical logic to moving average, but this point, a special decreasing weight is assigned to every observation. In other words, less importance is given to observations as we move beyond this.
alpha is a smoothing factor that takes values between 0 and 1. It determines how fast the weight decreases for previous observations.
From the plot above, the navy line represents the exponential smoothing of the statistic employing a smoothing factor of 0.3, while the orange line uses a smoothing factor of 0.05.
As you’ll see, the smaller the smoothing factor, the smoother the statistic is going to be. This is sensible, because the smoothing factor approaches 0; we approach the moving average model.
Double exponential smoothing
Double exponential smoothing is employed when there’s a trend within the statistic. therein case, we use this system, which is just a recursive use of exponential smoothing twice.
Here, beta is that the trend smoothing factor and it takes values between 0 and 1.
Below, you’ll see how different values of alpha and beta affect the form of the statistic .Tripe exponential smoothing
This method extends double exponential smoothing, by adding a seasonal smoothing factor. Of course, this is often useful if you notice seasonality in some time series.
Tripe exponential smoothing
This method extends double exponential smoothing, by adding a seasonal smoothing factor. Of course, this is useful if you notice seasonality in your time series.
Mathematically, triple exponential smoothing is expressed as:
Where gamma is that the seasonal smoothing factor and L is that the length of the season.
Seasonal autoregressive integraded moving average model (SARIMA)
SARIMA is really the mixture of simpler models to form a posh model which will model statistic exhibiting non-stationary properties and seasonality.
At first, we’ve the auto regression model AR(p). This is often basically a regression of the statistic onto itself. Here, we assume that the present value depends on its previous values with some lag. It takes a parameter p which represents the utmost lag. To seek out it, we glance at the partial autocorrelation plot and identify the lag after which most lags aren’t significant.
In the example below, p would be 4.
Then, we add the moving average model MA(q). This takes a parameter q which represents the most important lag after which other lags aren’t significant on the autocorrelation plot.
Below, q would be 4.
After, we add the order of integration I(d). The parameter d represents the amount of differences required to form the series stationary.
Finally, we add the ultimate component: seasonality S(P, D, Q, s), where s is just the season’s length. Furthermore, this component requires the parameters P and Q which are an equivalent as p and q, except for the seasonal component. Finally, D is that the order of seasonal integration representing the amount of differences required to get rid of seasonality from the series.
Combining all, we get the SARIMA(p, d, q)(P, D, Q, s) model.
The main takeaway from this is often that before modeling with SARIMA, we must apply transformations to our statistic to get rid of seasonality and any non-stationary behaviors.