eli5.sklearn

eli5.sklearn.explain_prediction

explain_prediction_linear_classifier(clf, doc, vec=None, top=None, top_targets=None, target_names=None, targets=None, feature_names=None, feature_re=None, feature_filter=None, vectorized=False)[source]

Explain prediction of a linear classifier.

See eli5.explain_prediction() for description of top, top_targets, target_names, targets, feature_names, feature_re and feature_filter parameters.

vec is a vectorizer instance used to transform raw features to the input of the classifier clf (e.g. a fitted CountVectorizer instance); you can pass it instead of feature_names.

vectorized is a flag which tells eli5 if doc should be passed through vec or not. By default it is False, meaning that if vec is not None, vec.transform([doc]) is passed to the classifier. Set it to True if you’re passing vec, but doc is already vectorized.

explain_prediction_linear_regressor(reg, doc, vec=None, top=None, top_targets=None, target_names=None, targets=None, feature_names=None, feature_re=None, feature_filter=None, vectorized=False)[source]

Explain prediction of a linear regressor.

See eli5.explain_prediction() for description of top, top_targets, target_names, targets, feature_names, feature_re and feature_filter parameters.

vec is a vectorizer instance used to transform raw features to the input of the classifier clf; you can pass it instead of feature_names.

vectorized is a flag which tells eli5 if doc should be passed through vec or not. By default it is False, meaning that if vec is not None, vec.transform([doc]) is passed to the regressor reg. Set it to True if you’re passing vec, but doc is already vectorized.

explain_prediction_sklearn(*args, **kw)[source]

Return an explanation of a scikit-learn estimator

explain_prediction_tree_classifier(clf, doc, vec=None, top=None, top_targets=None, target_names=None, targets=None, feature_names=None, feature_re=None, feature_filter=None, vectorized=False)[source]

Explain prediction of a tree classifier.

See eli5.explain_prediction() for description of top, top_targets, target_names, targets, feature_names, feature_re and feature_filter parameters.

vec is a vectorizer instance used to transform raw features to the input of the classifier clf (e.g. a fitted CountVectorizer instance); you can pass it instead of feature_names.

vectorized is a flag which tells eli5 if doc should be passed through vec or not. By default it is False, meaning that if vec is not None, vec.transform([doc]) is passed to the classifier. Set it to True if you’re passing vec, but doc is already vectorized.

Method for determining feature importances follows an idea from http://blog.datadive.net/interpreting-random-forests/. Feature weights are calculated by following decision paths in trees of an ensemble (or a single tree for DecisionTreeClassifier). Each node of the tree has an output score, and contribution of a feature on the decision path is how much the score changes from parent to child. Weights of all features sum to the output score or proba of the estimator.

explain_prediction_tree_regressor(reg, doc, vec=None, top=None, top_targets=None, target_names=None, targets=None, feature_names=None, feature_re=None, feature_filter=None, vectorized=False)[source]

Explain prediction of a tree regressor.

See eli5.explain_prediction() for description of top, top_targets, target_names, targets, feature_names, feature_re and feature_filter parameters.

vec is a vectorizer instance used to transform raw features to the input of the regressor reg (e.g. a fitted CountVectorizer instance); you can pass it instead of feature_names.

vectorized is a flag which tells eli5 if doc should be passed through vec or not. By default it is False, meaning that if vec is not None, vec.transform([doc]) is passed to the regressor. Set it to True if you’re passing vec, but doc is already vectorized.

Method for determining feature importances follows an idea from http://blog.datadive.net/interpreting-random-forests/. Feature weights are calculated by following decision paths in trees of an ensemble (or a single tree for DecisionTreeRegressor). Each node of the tree has an output score, and contribution of a feature on the decision path is how much the score changes from parent to child. Weights of all features sum to the output score of the estimator.

eli5.sklearn.explain_weights

explain_decision_tree(estimator, vec=None, top=20, target_names=None, targets=None, feature_names=None, feature_re=None, feature_filter=None, **export_graphviz_kwargs)[source]

Return an explanation of a decision tree.

See eli5.explain_weights() for description of top, target_names, feature_names, feature_re and feature_filter parameters.

targets parameter is ignored.

vec is a vectorizer instance used to transform raw features to the input of the estimator (e.g. a fitted CountVectorizer instance); you can pass it instead of feature_names.

All other keyword arguments are passed to sklearn.tree.export_graphviz function.

explain_linear_classifier_weights(clf, vec=None, top=20, target_names=None, targets=None, feature_names=None, coef_scale=None, feature_re=None, feature_filter=None)[source]

Return an explanation of a linear classifier weights.

See eli5.explain_weights() for description of top, target_names, targets, feature_names, feature_re and feature_filter parameters.

vec is a vectorizer instance used to transform raw features to the input of the classifier clf (e.g. a fitted CountVectorizer instance); you can pass it instead of feature_names.

coef_scale is a 1D np.ndarray with a scaling coefficient for each feature; coef[i] = coef[i] * coef_scale[i] if coef_scale[i] is not nan. Use it if you want to scale coefficients before displaying them, to take input feature sign or scale in account.

explain_linear_regressor_weights(reg, vec=None, top=20, target_names=None, targets=None, feature_names=None, coef_scale=None, feature_re=None, feature_filter=None)[source]

Return an explanation of a linear regressor weights.

See eli5.explain_weights() for description of top, target_names, targets, feature_names, feature_re and feature_filter parameters.

vec is a vectorizer instance used to transform raw features to the input of the regressor reg; you can pass it instead of feature_names.

coef_scale is a 1D np.ndarray with a scaling coefficient for each feature; coef[i] = coef[i] * coef_scale[i] if coef_scale[i] is not nan. Use it if you want to scale coefficients before displaying them, to take input feature sign or scale in account.

explain_rf_feature_importance(estimator, vec=None, top=20, target_names=None, targets=None, feature_names=None, feature_re=None, feature_filter=None)[source]

Return an explanation of a tree-based ensemble estimator.

See eli5.explain_weights() for description of top, feature_names, feature_re and feature_filter parameters.

target_names and targets parameters are ignored.

vec is a vectorizer instance used to transform raw features to the input of the estimator (e.g. a fitted CountVectorizer instance); you can pass it instead of feature_names.

explain_weights_sklearn(*args, **kw)[source]

Return an explanation of an estimator

eli5.sklearn.unhashing

Utilities to reverse transformation done by FeatureHasher or HashingVectorizer.

class FeatureUnhasher(hasher, unkn_template='FEATURE[%d]')[source]

Class for recovering a mapping used by FeatureHasher.

recalculate_attributes(force=False)[source]

Update all computed attributes. It is only needed if you need to access computed attributes after patrial_fit() was called.

class InvertableHashingVectorizer(vec, unkn_template='FEATURE[%d]')[source]

A wrapper for HashingVectorizer which allows to get meaningful feature names. Create it with an existing HashingVectorizer instance as an argument:

vec = InvertableHashingVectorizer(my_hashing_vectorizer)

Unlike HashingVectorizer it can be fit. During fitting InvertableHashingVectorizer learns which input terms map to which feature columns/signs; this allows to provide more meaningful get_feature_names(). The cost is that it is no longer stateless.

You can fit InvertableHashingVectorizer on a random sample of documents (not necessarily on the whole training and testing data), and use it to inspect an existing HashingVectorizer instance.

If several features hash to the same value, they are ordered by their frequency in documents that were used to fit the vectorizer.

transform() works the same as HashingVectorizer.transform.

column_signs_

Return a numpy array with expected signs of features. Values are

  • +1 when all known terms which map to the column have positive sign;
  • -1 when all known terms which map to the column have negative sign;
  • nan when there are both positive and negative known terms for this column, or when there is no known term which maps to this column.
fit(X, y=None)[source]

Extract possible terms from documents

get_feature_names(always_signed=True)[source]

Return feature names. This is a best-effort function which tries to reconstruct feature names based on what it have seen so far.

HashingVectorizer uses a signed hash function. If always_signed is True, each term in feature names is prepended with its sign. If it is False, signs are only shown in case of possible collisions of different sign.

You probably want always_signed=True if you’re checking unprocessed classifier coefficients, and always_signed=False if you’ve taken care of column_signs_.

handle_hashing_vec(vec, feature_names, coef_scale, with_coef_scale=True)[source]

Return feature_names and coef_scale (if with_coef_scale is True), calling .get_feature_names for invhashing vectorizers.

invert_hashing_and_fit(vec, docs)[source]

Create an InvertableHashingVectorizer from hashing vectorizer vec and fit it on docs. If vec is a FeatureUnion, do it for all hashing vectorizers in the union. Return an InvertableHashingVectorizer, or a FeatureUnion, or an unchanged vectorizer.