## Understanding Random Forests

Random forests are a machine learning method for classifying algorithms. It comprises several individual decision trees that rely on random features and data training to reach an intelligent guess that has more credibility than a single decision tree. All decision trees in the random forest are separate models. Each of them uses a subset of random features to predict a target, and all these predicted targets accumulate together to predict a more accurate target.

## Starting from Decision Trees

Considering that not everyone reading this might be aware of machine learning jargon, we have decided to break down the concepts into layman terms. Everyone knowingly or unknowingly has used decision trees either during their academic years or during their professional life. The concept is like a flow chart in which you break down complex data or text into easy steps in the form of a box diagram.

Though things are not as simple and unilateral in a decision tree as they are in a flow chart, in a decision tree, you start from an initial part and keep creating nodes between variables until you reach your target. For example, someone wants you to predict their favorite football team’s rank in an upcoming tournament. Here, you’ll begin with the initial probability. But that initial probability cannot be the absolute answer, especially when there are biases involved in the prediction process. You’ll have to give reasons and crunch up numbers to make your guess as credible as possible.

## The Correlation between the Decision Tree and Random Forest

As mentioned before, random forests are a congregation of several individual decision trees. All decision trees that are part of it use different variables from the same set of data, though all of them reach the desired target through different means. The credibility of these forests relies on the fact that no two people can reach a target using the same route or reasoning. And even if some are similar, you can always utilize these repetitive patterns in the forest for trial and error elimination.

For example, a sports analyst, an ex-football player, a sports journalist, an enthusiastic fan, and a retired referee will ask a different question to predict the result of a game. All of them have different skills, information, and knowledge of the game; hence their methods to reach the prediction target will differ. Not only their game of knowledge but their reasoning to establish a relation between variables retrieved from their acquired data is also different.

Now the decision trees of all these people will create a model. Collectively, this model is a ‘random forest.’ You have all these individual predictions from several uncorrelated decision trees, and all of them have used unique ways to predict the desired target. You can use all these predictions to increase the accuracy of your final prediction.

## How it Works

Creating a random forest is not just a matter of creating drastically opposing variables or choosing random features from the available data. You must have a sense of data mapping and a knack for asking reasonable questions to make an accurate guess. Machines can learn to do this by storing the information you feed them throughout the years, but they will still not be able to ask the breakthrough questions that a human would when faced with a dead-end in a decision tree.

For a random forest to work, you need to gather several decision trees. All these trees will use random training data, which will help in establishing features. Know that features are the relationships that a classifier builds between data in machine learning, and the thing that we want to predict is the target.