XGBoost

XGBoost is a popular Gradient Boosting library with Python interface. eli5 supports eli5.explain_weights() and eli5.explain_prediction() for XGBClassifer, XGBRegressor and Booster estimators. It is tested for xgboost >= 0.6a2.

eli5.explain_weights() uses feature importances. Additional arguments for XGBClassifer, XGBRegressor and Booster:

  • importance_type is a way to get feature importance. Possible values are:
    • ‘gain’ - the average gain of the feature when it is used in trees (default)
    • ‘weight’ - the number of times a feature is used to split the data across all trees
    • ‘cover’ - the average coverage of the feature when it is used in trees

target_names and targets arguments are ignored.

For eli5.explain_prediction() eli5 uses an approach based on ideas from http://blog.datadive.net/interpreting-random-forests/ : feature weights are calculated by following decision paths in trees of an ensemble. Each node of the tree has an output score, and contribution of a feature on the decision path is how much the score changes from parent to child.

Note

When explaining Booster predictions, do not pass an xgboost.DMatrix object as doc, pass a numpy array or a sparse matrix instead (or have vec return them).

Additional eli5.explain_prediction() keyword arguments supported for XGBClassifer, XGBRegressor and Booster:

  • vec is a vectorizer instance used to transform raw features to the input of the estimator xgb (e.g. a fitted CountVectorizer instance); you can pass it instead of feature_names.
  • vectorized is a flag which tells eli5 if doc should be passed through vec or not. By default it is False, meaning that if vec is not None, vec.transform([doc]) is passed to the estimator. Set it to True if you’re passing vec, but doc is already vectorized.

eli5.explain_prediction() for Booster estimator accepts two more optional arguments:

  • is_regression - True if solving a regression problem (“objective” starts with “reg”) and False for a classification problem. If not set, regression is assumed for a single target estimator and proba will not be shown.
  • missing - set it to the same value as the missing argument to xgboost.DMatrix. Matters only if sparse values are used. Default is np.nan.

See the tutorial for a more detailed usage example.