eli5.xgboost

eli5 has XGBoost support - eli5.explain_weights() shows feature importances, and eli5.explain_prediction() explains predictions by showing feature weights. Both functions work for XGBClassifier and XGBRegressor.

explain_prediction_xgboost(xgb, doc, vec=None, top=None, top_targets=None, target_names=None, targets=None, feature_names=None, feature_re=None, feature_filter=None, vectorized=False, is_regression=None, missing=None)[source]

Return an explanation of XGBoost prediction (via scikit-learn wrapper XGBClassifier or XGBRegressor, or via xgboost.Booster) as feature weights.

See eli5.explain_prediction() for description of top, top_targets, target_names, targets, feature_names, feature_re and feature_filter parameters.

Parameters:
  • vec (vectorizer, optional) – A vectorizer instance used to transform raw features to the input of the estimator xgb (e.g. a fitted CountVectorizer instance); you can pass it instead of feature_names.
  • vectorized (bool, optional) – A flag which tells eli5 if doc should be passed through vec or not. By default it is False, meaning that if vec is not None, vec.transform([doc]) is passed to the estimator. Set it to True if you’re passing vec, but doc is already vectorized.
  • is_regression (bool, optional) – Pass if an xgboost.Booster is passed as the first argument. True if solving a regression problem (“objective” starts with “reg”) and False for a classification problem. If not set, regression is assumed for a single target estimator and proba will not be shown.
  • missing (optional) – Pass if an xgboost.Booster is passed as the first argument. Set it to the same value as the missing argument to xgboost.DMatrix. Matters only if sparse values are used. Default is np.nan.
  • Method for determining feature importances follows an idea from
  • http (//blog.datadive.net/interpreting-random-forests/.)
  • Feature weights are calculated by following decision paths in trees
  • of an ensemble.
  • Each leaf has an output score, and expected scores can also be assigned
  • to parent nodes.
  • Contribution of one feature on the decision path is how much expected score
  • changes from parent to child.
  • Weights of all features sum to the output score of the estimator.
explain_weights_xgboost(xgb, vec=None, top=20, target_names=None, targets=None, feature_names=None, feature_re=None, feature_filter=None, importance_type='gain')[source]

Return an explanation of an XGBoost estimator (via scikit-learn wrapper XGBClassifier or XGBRegressor, or via xgboost.Booster) as feature importances.

See eli5.explain_weights() for description of top, feature_names, feature_re and feature_filter parameters.

target_names and targets parameters are ignored.

Parameters:

importance_type (str, optional) – A way to get feature importance. Possible values are:

  • ‘gain’ - the average gain of the feature when it is used in trees (default)
  • ‘weight’ - the number of times a feature is used to split the data across all trees
  • ‘cover’ - the average coverage of the feature when it is used in trees