Coursera Learner working on a presentation with Coursera logo and

History of predictive analysis and current progress

Coursera Learner working on a presentation with Coursera logo and

Although predictive analysis has been around for decades, it is a technology whose time has come. More and more organizations are turning to predictive analysis to increase their profits and competitive advantage. Why now?
Data volumes and types are growing, and there is more interest in using data to produce valuable insights.

Faster, less expensive computers.

Easier to use software.

Tougher economic conditions and need for competitive differentiation.

With the growing popularity of interactive and easy-to-use software, predictive analysis is no longer just the domain of mathematicians and statisticians. Business analysts and line of business experts also use these technologies.

Why is predictive analysis important?

Organizations are turning to predictive analysis to help solve difficult problems and discover new opportunities. Common uses include:

Detecting fraud

Combining multiple methods of analysis can improve pattern detection and prevent criminal behavior. With cyber security-as cyber security becomes a growing concern, performance-based behavioral analysis examines all actions on a network in real time to identify anomalies that may suggest fraud, zero-day vulnerabilities, and advanced persistent threats.

Optimization of marketing campaigns

Predictive analysis is used to determine customer responses or purchases, as well as promote cross-selling opportunities. Predictive models help companies attract, retain and grow more profitable customers. 

Improve operations

Many companies use predictive models to predict inventory and manage assets. Airlines use predictive analysis to establish ticket prices. Hotels try to predict the number of guests per night to maximize occupancy and increase revenue. Predictive analysis enables organizations to function more efficiently.

Reduce risk

Credit scores are used to assess the probability of buyer default on purchases and are a well-known example of predictive analysis. A credit score is a number generated by a predictive model that incorporates all data relevant to a person’s creditworthiness. Other risk-related uses include insurance claims and collections.

Who uses it?

Any industry can use predictive analysis to reduce risk, optimize operations and increase revenue. Here are some examples.

Banking and financial services

The financial industry, with huge amounts of data and money at stake, has long embraced predictive analysis to detect and reduce fraud, measure credit risk, maximize cross-sell/up-sell opportunities and retain valuable customers. Commonwealth Bank uses analytics to predict the likelihood of fraud activity for any transaction before it is authorized – within 40 milliseconds of the start of the transaction.


From the now infamous study that showed that men who buy diapers often buy beer at the same time, retailers around the world use predictive analysis for merchandise planning and price optimization, to analyze the effectiveness of promotional events and to determine which offers are best suited to consumers. Staples has gained customer insight by analyzing behavior, providing a complete picture of their customers and achieving a 137 percent ROI.

Oil, Gas and Utilities

Whether it’s predicting equipment failures and future resource needs, mitigating safety and reliability risks or improving overall performance, the energy industry has embraced predictive analysis vigorously. The Salt River Project is the second largest public power company in the United States and one of Arizona’s largest water suppliers. Analysis of machine sensor data predicts when power-generating turbines need maintenance.

Governments and Public Sector

Governments have been key players in the advancement of information technology. The US Census Bureau has analyzed the data to understand population trends for decades. Governments now use predictive analysis like many other industries – to improve service and performance, detect and prevent fraud, and better understand consumer behavior. They also use predictive analysis to improve information security.

How it works

The predictive models use known results to develop (or train) a model that can be used to predict values for different or new data. The modeling provides results in the form of predictions that represent a probability of the target variable (e.g., revenues) based on the estimated significance of a set of input variables.

This differs from descriptive models that help understand what happened, or diagnostic models that help understand key relationships and determine why something happened. Whole books are dedicated to analytical methods and techniques. Comprehensive university programs explore this topic in depth. But to begin with, here are some basics.

There are two types of predictive models. Classification models involve class membership. For example, they try to classify whether someone is likely to leave, whether they will respond to a solicitation, whether it is a good or bad credit risk, etc. Usually, the results of the model are in the form of 0 or 1, and 1 is the event targeted. Regression models predict a number – for example, how much a customer will generate in the next year or the number of months before a component fails on a machine.

The most commonly used predictive modeling techniques are decision trees, regression and neural networks.

The decision trees are classification models that subdivide data into sub-sets based on categories of input variables. It helps to understand the path of someone’s decisions. The decision-making tree is presented as a tree with each branch representing a choice between a set of alternatives and each leaf representing a classification or decision. It looks at the data and tries to find the only variable that divides the data into logical groups that are the most diverse. The decision-making trees are popular because they are easy to understand and to interpret. They also handle missing values well and are useful for the preliminary selection of variables. So, if you have a lot of missing values or you want a quick and easy to interpret response, you can start with a tree.

Regression (linear and logistic) is one of the most popular methods in statistics. Regression analysis estimates the relationships between variables. Intended for continuous data that can be assumed to follow a normal distribution, it finds key patterns in large data sets and is often used to determine how specific factors, such as price, affect the movement of an asset. With regression analysis, we want to predict a number, called a response or variable Y. With linear regression, an independent variable is used to explain and/or predict the outcome of Y. The multiple regression uses two or more independent variables to predict the outcome. With logistic regression, unknown variables of a discrete variable are predicted based on the known value of other variables. The response variable is categorical, which means that it can only assume a limited number of values. A response variable has only two values such as 0 or 1 in binary logistic regression. A response variable can have different levels, like low, medium and high, or 1, 2 and 3 in multiple logistic regression.. They are popular because they are powerful and flexible. Power comes from their ability to handle non-linear relationships in data, which is increasingly common as more data is collected. They are often used to confirm the results of simple techniques such as regression and decision trees. Neural networks are based on model recognition and some IA processes that graphically “model” parameters. These work well when there is no known mathematical formula that relates inputs to outputs, forecasting is more important than explanation or there is a lot of training data. The artificial neural networks were originally developed by researchers who sought to mimic the neurophysiology of the human brain.

Other popular techniques you can hear about

Bayesian analysis

Bayesian methods treat parameters as random variables and define probability as “degrees of belief” (i.e. the probability of an event is the degree to which the event is believed to be true). When performing a Bayesian analysis, you start with a previous belief regarding the probability distribution of an unknown parameter. After you learn the information from the data you have, change or update your belief about the unknown parameter.

Assemble the models

Ensemble models are produced by training several similar models and combining their results to improve accuracy, reduce distortion, reduce variance, and identify the best model to use with the new data.

Increasing the gradient

Increasing the gradient is a boosting approach that resamples the data set several times to generate results that form a weighted average of the resampled data set. Like decision trees, boosting makes no assumptions about data distribution. Boosting is less prone to oversizing data than a single decision tree, and if a decision tree fits the data well enough, then boosting often improves the fit. (Data overfitting means that you are using too many variables and the model is too complex. Under-fitting means the opposite: there are not enough variables and the model is too simple. Both reduce prediction accuracy).

Incremental response

Incremental response (also called lifting or net lifting models) represent the change in probability caused by an action. They are widely used to reduce churn and to discover the effects of different marketing programs.

K-nearest near KNN

K-nearest near (knn) is a non-parametric method of classification and regression that involves the values of an object or class membership based on k-closest training examples.

Memory-based reasoning

Memory-based reasoning is a k-nearest neighbor technique for classifying or predicting observations.

Partial square minimums

This flexible statistical technique can be applied to data of any form. It models the relationships between inputs and outputs even when inputs are related and noisy, there are more outputs or there are more inputs than observations. The partial least squares method looks for factors that explain both response and predictor variations.

Analysis of the main components

The purpose of main component analysis is to derive a small number of independent linear combinations (main components) of a set of variables that retain as much information as possible in the original variables.

Vector machine support

This supervised machine learning technique uses associated learning algorithms to analyze data and recognize patterns. It can be used for both classification and regression.

Time series data mining

Time series data is timed and collected over time at a given time interval (sales in a month, calls per day, web visits per hour, etc.) Time series data mining combines traditional data mining and forecasting techniques. Data mining techniques such as sampling, clustering and decision trees are applied to data collected over time with the aim of improving forecasting.


Weekly newsletter

No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every week.