eli5.lime

eli5.lime.lime

An impementation of LIME (http://arxiv.org/abs/1602.04938), an algorithm to explain predictions of black-box models.

class TextExplainer(n_samples=5000, char_based=None, clf=None, vec=None, sampler=None, position_dependent=False, rbf_sigma=None, random_state=None, expand_factor=10, token_pattern=None)[source]

TextExplainer allows to explain predictions of black-box text classifiers using LIME algorithm.

Parameters:
  • n_samples (int) – A number of samples to generate and train on. Default is 5000.

    With larger n_samples it takes more CPU time and RAM to explain a prediction, but it could give better results. Larger n_samples could be also required to get good results if you don’t want to make strong assumptions about the black-box classifier (e.g. char_based=True and position_dependent=True).

  • char_based (bool) – True if explanation should be char-based, False if it should be token-based. Default is False.

  • clf (object, optional) – White-box probabilistic classifier. It should be supported by eli5, follow scikit-learn interface and provide predict_proba method. When not set, a default classifier is used (logistic regression with elasticnet regularization trained with SGD).

  • vec (object, optional) – Vectorizer which converts generated texts to feature vectors for the white-box classifier. When not set, a default vectorizer is used; which one depends on char_based and position_dependent arguments.

  • sampler (MaskingTextSampler or MaskingTextSamplers, optional) – Sampler used to generate modified versions of the text.

  • position_dependent (bool) – When True, a special vectorizer is used which takes each token or character (depending on char_based value) in account separately. When False (default) a vectorized passed in vec or a default vectorizer is used.

    Default vectorizer converts text to vector using bag-of-ngrams or bag-of-char-ngrams approach (depending on char_based argument). It means that it may be not powerful enough to approximate a black-box classifier which e.g. takes in account word FOO in the beginning of the document, but not in the end.

    When position_dependent is True the model becomes powerful enough to account for that, but it can become more noisy and require larger n_samples to get an OK explanation.

    When char_based=False the default vectorizer uses word bigrams in addition to unigrams; this is less powerful than position_dependent=True, but can give similar results in practice.

  • rbf_sigma (float, optional) – Sigma parameter of RBF kernel used to post-process cosine similarity values. Default is None, meaning no post-processing (cosine simiilarity is used as sample weight as-is). Small rbf_sigma values (e.g. 0.1) tell the classifier to pay more attention to generated texts which are close to the original text. Large rbf_sigma values (e.g. 1.0) make distance between text irrelevant.

    Note that if you’re using large rbf_sigma it could be more efficient to use custom samplers instead, in order to generate text samples which are closer to the original text in the first place. Use e.g. max_replace parameter of MaskingTextSampler.

  • random_state (integer or numpy.random.RandomState, optional) – random state

  • expand_factor (int or None) – To approximate output of the probabilistic classifier generated dataset is expanded by expand_factor (10 by default) according to the predicted label probabilities. This is a workaround for scikit-learn limitation (no cross-entropy loss for non 1/0 labels). With larger values training takes longer, but probability output can be approximated better.

    expand_factor=None turns this feature off; pass None when you know that black-box classifier returns only 1.0 or 0.0 probabilities.

  • token_pattern (str, optional) – Regex which matches a token. Use it to customize tokenization. Default value depends on char_based parameter.

rng_

random state

Type:numpy.random.RandomState
samples_

A list of samples the local model is trained on. Only available after fit().

Type:list[str]
X_

A matrix with vectorized samples_. Only available after fit().

Type:ndarray or scipy.sparse matrix
similarity_

Similarity vector. Only available after fit().

Type:ndarray
y_proba_

probabilities predicted by black-box classifier (predict_proba(self.samples_) result). Only available after fit().

Type:ndarray
clf_

Trained white-box classifier. Only available after fit().

Type:object
vec_

Fit white-box vectorizer. Only available after fit().

Type:object
metrics_

A dictionary with metrics of how well the local classification pipeline approximates the black-box pipeline. Only available after fit().

Type:dict
explain_prediction(**kwargs)[source]

Call eli5.explain_prediction() for the locally-fit classification pipeline. Keyword arguments are passed to eli5.explain_prediction().

fit() must be called before using this method.

explain_weights(**kwargs)[source]

Call eli5.show_weights() for the locally-fit classification pipeline. Keyword arguments are passed to eli5.show_weights().

fit() must be called before using this method.

fit(doc, predict_proba)[source]

Explain predict_proba probabilistic classification function for the doc example. This method fits a local classification pipeline following LIME approach.

To get the explanation use show_prediction(), show_weights(), explain_prediction() or explain_weights().

Parameters:
  • doc (str) – Text to explain
  • predict_proba (callable) – Black-box classification pipeline. predict_proba should be a function which takes a list of strings (documents) and return a matrix of shape (n_samples, n_classes) with probability values - a row per document and a column per output label.
show_prediction(**kwargs)[source]

Call eli5.show_prediction() for the locally-fit classification pipeline. Keyword arguments are passed to eli5.show_prediction().

fit() must be called before using this method.

show_weights(**kwargs)[source]

Call eli5.show_weights() for the locally-fit classification pipeline. Keyword arguments are passed to eli5.show_weights().

fit() must be called before using this method.

eli5.lime.samplers

class BaseSampler[source]

Base sampler class. Sampler is an object which generates examples similar to a given example.

fit(X=None, y=None)[source]
sample_near(doc, n_samples=1)[source]

Return (examples, similarity) tuple with generated documents similar to a given document and a vector of similarity values.

class MaskingTextSampler(token_pattern=None, bow=True, random_state=None, replacement='', min_replace=1, max_replace=1.0, group_size=1)[source]

Sampler for text data. It randomly removes or replaces tokens from text.

Parameters:
  • token_pattern (str, optional) – Regexp for token matching
  • bow (bool, optional) – Sampler could either replace all instances of a given token (bow=True, bag of words sampling) or replace just a single token (bow=False).
  • random_state (integer or numpy.random.RandomState, optional) – random state
  • replacement (str) – Defalt value is ‘’ - by default tokens are removed. If you want to preserve the total token count set replacement to a non-empty string, e.g. ‘UNKN’.
  • min_replace (int or float) – A minimum number of tokens to replace. Default is 1, meaning 1 token. If this value is float in range [0.0, 1.0], it is used as a ratio. More than min_replace tokens could be replaced if group_size > 1.
  • max_replace (int or float) – A maximum number of tokens to replace. Default is 1.0, meaning all tokens can be replaced. If this value is float in range [0.0, 0.1], it is used as a ratio.
  • group_size (int) – When group_size > 1, groups of nearby tokens are replaced all in once (each token is still replaced with a replacement). Default is 1, meaning individual tokens are replaced.
sample_near(doc, n_samples=1)[source]

Return (examples, similarity) tuple with generated documents similar to a given document and a vector of similarity values.

sample_near_with_mask(doc, n_samples=1)[source]
class MaskingTextSamplers(sampler_params, token_pattern=None, random_state=None, weights=None)[source]

Union of MaskingText samplers, with weights. sample_near() or sample_near_with_mask() generate a requested number of samples using all samplers; a probability of using a sampler is proportional to its weight.

All samplers must use the same token_pattern in order for sample_near_with_mask() to work.

Create it with a list of {param: value} dicts with MaskingTextSampler paremeters.

sample_near(doc, n_samples=1)[source]

Return (examples, similarity) tuple with generated documents similar to a given document and a vector of similarity values.

sample_near_with_mask(doc, n_samples=1)[source]
class MultivariateKernelDensitySampler(kde=None, metric='euclidean', fit_bandwidth=True, bandwidths=array([1.00000000e-06, 1.00000000e-03, 3.16227766e-03, 1.00000000e-02, 3.16227766e-02, 1.00000000e-01, 3.16227766e-01, 1.00000000e+00, 3.16227766e+00, 1.00000000e+01, 3.16227766e+01, 1.00000000e+02, 3.16227766e+02, 1.00000000e+03, 3.16227766e+03, 1.00000000e+04]), sigma='bandwidth', n_jobs=1, random_state=None)[source]

General-purpose sampler for dense continuous data, based on multivariate kernel density estimation.

The limitation is that a single bandwidth value is used for all dimensions, i.e. bandwith matrix is a positive scalar times the identity matrix. It is a problem e.g. when features have different variances (e.g. some of them are one-hot encoded and other are continuous).

fit(X, y=None)[source]
sample_near(doc, n_samples=1)[source]

Return (examples, similarity) tuple with generated documents similar to a given document and a vector of similarity values.

class UnivariateKernelDensitySampler(kde=None, metric='euclidean', fit_bandwidth=True, bandwidths=array([1.00000000e-06, 1.00000000e-03, 3.16227766e-03, 1.00000000e-02, 3.16227766e-02, 1.00000000e-01, 3.16227766e-01, 1.00000000e+00, 3.16227766e+00, 1.00000000e+01, 3.16227766e+01, 1.00000000e+02, 3.16227766e+02, 1.00000000e+03, 3.16227766e+03, 1.00000000e+04]), sigma='bandwidth', n_jobs=1, random_state=None)[source]

General-purpose sampler for dense continuous data, based on univariate kernel density estimation. It estimates a separate probability distribution for each input dimension.

The limitation is that variable interactions are not taken in account.

Unlike KernelDensitySampler it uses different bandwidths for different dimensions; because of that it can handle one-hot encoded features somehow (make sure to at least tune the default sigma parameter). Also, at sampling time it replaces only random subsets of the features instead of generating totally new examples.

fit(X, y=None)[source]
sample_near(doc, n_samples=1)[source]

Sample near the document by replacing some of its features with values sampled from distribution found by KDE.

eli5.lime.textutils

Utilities for text generation.

cosine_similarity_vec(num_tokens, num_removed_vec)[source]

Return cosine similarity between a binary vector with all ones of length num_tokens and vectors of the same length with num_removed_vec elements set to zero.

generate_samples(text, n_samples=500, bow=True, random_state=None, replacement='', min_replace=1, max_replace=1.0, group_size=1)[source]

Return n_samples changed versions of text (with some words removed), along with distances between the original text and a generated examples. If bow=False, all tokens are considered unique (i.e. token position matters).