Deep learning model by SHAP

When interpreting the output of predictive data, make sure to perform every step carefully. To satisfy users, you need to provide easy to understand insights. You can do so by improving the process of your model. It’s okay to concentrate on simple models rather than complex ones. For instance, linear models will help you with easy interpretation. However, with excessive amounts of data, using complex models comes with numerous benefits. With such a model, you can bring your forefront trade-off to accurate and interpretability output. You can choose from numerous different methods to solve complex issues. However, these solutions do not imply how these methods relate to each other. Furthermore, there is no data to back why up why one method is better than the other.
SHAP construction gains inspiration from the previous unified framework. This new approach to the SHAP framework uses Shapely values. Below, you can understand the definition of SHAP and how you can implement the concept with the Python package.
Shapley Additive exPlanations or SHAP is an approach used in game theory. With SHAP, you can explain the output of your machine learning model. This model connects the local explanation of the optimal credit allocation with the help of Shapely values. This approach is highly effective with game theory.
SHAP is a featured value of average marginal contribution among all the combinations of the feature that are possible. Below, we will discuss how SHAP or Shapely Additive exPlanations is becoming a popular technique in machine learning. We can understand the concept with the following example:
We can consider the points that a team scores in every match of a season. Suppose we want to find the average score of Player A and his contribution as a team score in a match. For that, we need to find the contribution of Player A in the partnership of Player B and Player C.
While you perform the experiment, you need to ensure the following conditions about the matches:
First, you need to import all the necessary libraries with the help of the following codes:
import pandas as pd
import numpy as np
import shap
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from xgboost.sklearn import XGBRegressor
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn import tree
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings(‘ignore’)
The following example is about Real Estate. However, you can use any dataset to find the output with this method. This is just an example, so imputation and pre-processing are not important. But when you are going through an original test, you need to follow the complete procedure:
data = pd.read_csv(‘data.csv’)# Remove features with high null values
data.drop([‘PoolQC’, ‘MiscFeature’, ‘Fence’, ‘FireplaceQu’,
‘LotFrontage’], inplace=True, axis=1)# Drop null values
data.dropna(inplace=True)# Prepare X and Y
X = pd.get_dummies(data)
X.drop([‘SalePrice’], inplace=True, axis=1)
y = data[‘SalePrice’]
In this step you need to fit the model with the dataset:
model = XGBRegressor(n_estimators=1000, max_depth=10, learning_rate=0.001)# Fit the Model
model.fit(X, y)
Now, you need to use the SHAP library. This is the most powerful library available. Check the plots they are offering.
• First, you need to start a JS visualization code in your library.
load JS visualization code to notebook
shap.initjs()
• Now you can explain the prediction of your model.
• You can start by collecting the SHAP values and the explainer
shap_values.
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
Force Plotting
i = 5
shap.force_plot(explainer.expected_value, shap_values[i], features=X.iloc[i], feature_names=X.columns)
With the help of the above explanation, you can view features that contribute to find the output of your model and push the base value. The base value is the average output of the model that we receive with the help of training data.