eli5.sklearn

eli5.sklearn.explain_prediction

explain_prediction_linear_classifier(clf, doc, vec=None, top=None, target_names=None, targets=None, feature_names=None, vectorized=False)[source]

Explain prediction of a linear classifier.

explain_prediction_linear_regressor(reg, doc, vec=None, top=None, target_names=None, targets=None, feature_names=None, vectorized=False)[source]

Explain prediction of a linear regressor.

explain_prediction_sklearn(*args, **kw)[source]

Return an explanation of a scikit-learn estimator

eli5.sklearn.explain_weights

explain_decision_tree(clf, vec=None, top=20, target_names=None, targets=None, feature_names=None, feature_re=None, **export_graphviz_kwargs)[source]

Return an explanation of a decision tree classifier in the following format (compatible with random forest explanations):

Explanation(
    estimator="<classifier repr>",
    method="<interpretation method>",
    description="<human readable description>",
    decision_tree={...tree information},
    feature_importances=[
        FeatureWeight(feature_name, importance, std_deviation),
        ...
    ]
)
explain_linear_classifier_weights(clf, vec=None, top=20, target_names=None, targets=None, feature_names=None, coef_scale=None, feature_re=None)[source]

Return an explanation of a linear classifier weights in the following format:

Explanation(
    estimator="<classifier repr>",
    method="<interpretation method>",
    description="<human readable description>",
    targets=[
        TargetExplanation(
            target="<class name>",
            feature_weights=FeatureWeights(
                # positive weights
                pos=[
                    (feature_name, coefficient),
                    ...
                ],

                # negative weights
                neg=[
                    (feature_name, coefficient),
                    ...
                ],

                # A number of features not shown
                pos_remaining = <int>,
                neg_remaining = <int>,

                # Sum of feature weights not shown
                # pos_remaining_sum = <float>,
                # neg_remaining_sum = <float>,
            ),
        ),
        ...
    ]
)

To print it use utilities from eli5.formatters.

explain_linear_regressor_weights(reg, vec=None, top=20, target_names=None, targets=None, feature_names=None, coef_scale=None, feature_re=None)[source]

Return an explanation of a linear regressor weights in the following format:

Explanation(
    estimator="<regressor repr>",
    method="<interpretation method>",
    description="<human readable description>",
    targets=[
        TargetExplanation(
            target="<target name>",
            feature_weights=FeatureWeights(
                # positive weights
                pos=[
                    (feature_name, coefficient),
                    ...
                ],

                # negative weights
                neg=[
                    (feature_name, coefficient),
                    ...
                ],

                # A number of features not shown
                pos_remaining = <int>,
                neg_remaining = <int>,

                # Sum of feature weights not shown
                # pos_remaining_sum = <float>,
                # neg_remaining_sum = <float>,
            ),
        ),
        ...
    ]
)

To print it use utilities from eli5.formatters.

explain_rf_feature_importance(clf, vec=None, top=20, target_names=None, targets=None, feature_names=None, feature_re=None)[source]

Return an explanation of a tree-based ensemble classifier in the following format:

Explanation(
    estimator="<classifier repr>",
    method="<interpretation method>",
    description="<human readable description>",
    feature_importances=[
        FeatureWeight(feature_name, importance, std_deviation),
        ...
    ]
)
explain_weights_sklearn(*args, **kw)[source]

Return an explanation of an estimator

eli5.sklearn.unhashing

Utilities to reverse transformation done by FeatureHasher or HashingVectorizer.

class FeatureUnhasher(hasher, unkn_template='FEATURE[%d]')[source]

Class for recovering a mapping used by FeatureHasher.

recalculate_attributes(force=False)[source]

Update all computed attributes. It is only needed if you need to access computed attributes after patrial_fit() was called.

class InvertableHashingVectorizer(vec, unkn_template='FEATURE[%d]')[source]

A wrapper for HashingVectorizer which allows to get meaningful feature names. Create it with an existing HashingVectorizer instance as an argument:

vec = InvertableHashingVectorizer(my_hashing_vectorizer)

Unlike HashingVectorizer it can be fit. During fitting InvertableHashingVectorizer learns which input terms map to which feature columns/signs; this allows to provide more meaningful get_feature_names(). The cost is that it is no longer stateless.

You can fit InvertableHashingVectorizer on a random sample of documents (not necessarily on the whole training and testing data), and use it to inspect an existing HashingVectorizer instance.

If several features hash to the same value, they are ordered by their frequency in documents that were used to fit the vectorizer.

transform() works the same as HashingVectorizer.transform.

column_signs_

Return a numpy array with expected signs of features. Values are

  • +1 when all known terms which map to the column have positive sign;
  • -1 when all known terms which map to the column have negative sign;
  • nan when there are both positive and negative known terms for this column, or when there is no known term which maps to this column.
fit(X, y=None)[source]

Extract possible terms from documents

get_feature_names(always_signed=True)[source]

Return feature names. This is a best-effort function which tries to reconstruct feature names based on what it have seen so far.

HashingVectorizer uses a signed hash function. If always_signed is True, each term in feature names is prepended with its sign. If it is False, signs are only shown in case of possible collisions of different sign.

You probably want always_signed=True if you’re checking unprocessed classifier coefficients, and always_signed=False if you’ve taken care of column_signs_.