eli5.xgboost

eli5 has XGBoost support - eli5.explain_weights() shows feature importances, and eli5.explain_prediction() explains predictions by showing feature weights. Both functions work for XGBClassifier and XGBRegressor.

explain_prediction_xgboost(xgb, doc, vec=None, top=None, top_targets=None, target_names=None, targets=None, feature_names=None, feature_re=None, feature_filter=None, vectorized=False)[source]

Return an explanation of XGBoost prediction (via scikit-learn wrapper XGBClassifier or XGBRegressor) as feature weights.

See eli5.explain_prediction() for description of top, top_targets, target_names, targets, feature_names, feature_re and feature_filter parameters.

vec is a vectorizer instance used to transform raw features to the input of the estimator xgb (e.g. a fitted CountVectorizer instance); you can pass it instead of feature_names.

vectorized is a flag which tells eli5 if doc should be passed through vec or not. By default it is False, meaning that if vec is not None, vec.transform([doc]) is passed to the estimator. Set it to True if you’re passing vec, but doc is already vectorized.

Method for determining feature importances follows an idea from http://blog.datadive.net/interpreting-random-forests/. Feature weights are calculated by following decision paths in trees of an ensemble. Each leaf has an output score, and expected scores can also be assigned to parent nodes. Contribution of one feature on the decision path is how much expected score changes from parent to child. Weights of all features sum to the output score of the estimator.

explain_weights_xgboost(xgb, vec=None, top=20, target_names=None, targets=None, feature_names=None, feature_re=None, feature_filter=None, importance_type='gain')[source]

Return an explanation of an XGBoost estimator (via scikit-learn wrapper XGBClassifier or XGBRegressor) as feature importances.

See eli5.explain_weights() for description of top, feature_names, feature_re and feature_filter parameters.

target_names and targets parameters are ignored.

Parameters:

importance_type (str, optional) – A way to get feature importance. Possible values are:

  • ‘gain’ - the average gain of the feature when it is used in trees (default)
  • ‘weight’ - the number of times a feature is used to split the data across all trees
  • ‘cover’ - the average coverage of the feature when it is used in trees