Coursera Learner working on a presentation with Coursera logo and
Coursera Learner working on a presentation with Coursera logo and

Multicollinearity is a state of very high interrelationships or interassociations between independent variables. It is a type of disturbance in the data, and if it is present in the data, the statistical inferences made on the data may not be reliable.

There are some reasons why multi-linearity occurs:

It is caused by inaccurate use of false variables.

It is caused by the inclusion of a variable calculated from other variables in the data set.

Multicollinearity can also result from the repetition of the same type of variable.

It typically occurs when the variables are highly correlated with each other.

Multicollinearity can cause several problems. The following are the problems:

The partial regression coefficient due to multicollinearity may be inaccurate. Standard errors are likely to be high.

Multicollinearity causes a change in signs as well as magnitudes of partial regression coefficients from one sample to another.

Multicollinearity renders it tedious to evaluate the relative importance of independent variables in explaining the variation caused by the dependent variable.

In the presence of a high multicollinearity, confidence intervals of the coefficients tend to become very wide and statistics tend to be very small. It is difficult to reject the null hypothesis of any study when multicollinearity is present in the study data.

There are some signs that help the researcher to detect the degree of multicollinearity.

One of these signals is whether the individual result of a statistic is not significant, but the overall result of the statistic is significant. In this case, the investigator may obtain a mix of significant and insignificant results that show the presence of multicollinearity. Assume that the researcher, after dividing the sample into two parts, finds that the coefficients of the sample differ drastically. This indicates the presence of multicollinearity. That means that the coefficients are unstable due to the presence of multicollinearity. Suppose the researcher observes a drastic change in the model simply by adding or dropping some variables.   This also indicates that multicollinearity is present in the data.

Multicollinearity can also be observed with the help of tolerance and its reciprocal, called variance inflation factor (VIF). If the tolerance value is below 0.2 or 0.1 and, at the same time, the value of VIF 10 and above, then multicollinearity is problematic.