XGBoost
XGBoost is a popular Gradient Boosting library with Python interface.
eli5 supports eli5.explain_weights() and eli5.explain_prediction()
for XGBClassifer, XGBRegressor and Booster estimators. It is tested for
xgboost >= 0.6a2 and < 2.0.0.
Versions starting from 2.0.0 likely produce incorrect results in
eli5.explain_prediction(), and will issue a warning.
eli5.explain_weights() uses feature importances. Additional
arguments for XGBClassifer, XGBRegressor and Booster:
importance_typeis a way to get feature importance. Possible values are:‘gain’ - the average gain of the feature when it is used in trees (default)
‘weight’ - the number of times a feature is used to split the data across all trees
‘cover’ - the average coverage of the feature when it is used in trees
target_names and targets arguments are ignored.
Note
Top-level eli5.explain_weights() calls are dispatched
to eli5.xgboost.explain_weights_xgboost() for
XGBClassifer, XGBRegressor and Booster.
For eli5.explain_prediction() eli5 uses an approach based on ideas from
http://blog.datadive.net/interpreting-random-forests/ :
feature weights are calculated by following decision paths in trees
of an ensemble. Each node of the tree has an output score, and
contribution of a feature on the decision path is how much the score changes
from parent to child.
Note
When explaining Booster predictions,
do not pass an xgboost.DMatrix object as doc, pass a numpy array
or a sparse matrix instead (or have vec return them).
Additional eli5.explain_prediction() keyword arguments supported
for XGBClassifer, XGBRegressor and Booster:
vecis a vectorizer instance used to transform raw features to the input of the estimatorxgb(e.g. a fitted CountVectorizer instance); you can pass it instead offeature_names.vectorizedis a flag which tells eli5 ifdocshould be passed throughvecor not. By default it is False, meaning that ifvecis not None,vec.transform([doc])is passed to the estimator. Set it to True if you’re passingvec, butdocis already vectorized.
eli5.explain_prediction() for Booster estimator accepts
two more optional arguments:
is_regression- True if solving a regression problem (“objective” starts with “reg”) and False for a classification problem. If not set, regression is assumed for a single target estimator and proba will not be shown.missing- set it to the same value as themissingargument toxgboost.DMatrix. Matters only if sparse values are used. Default isnp.nan.
See the tutorial for a more detailed usage example.
Note
Top-level eli5.explain_prediction() calls are dispatched
to eli5.xgboost.explain_prediction_xgboost() for
XGBClassifer, XGBRegressor and Booster.