Note

This tutorial can be run as an IPython notebook.

Named Entity Recognition using sklearn-crfsuite¶

In this notebook we train a basic CRF model for Named Entity Recognition on CoNLL2002 data (following https://github.com/TeamHG-Memex/sklearn-crfsuite/blob/master/docs/CoNLL2002.ipynb) and check its weights to see what it learned.

To follow this tutorial you need NLTK > 3.x and sklearn-crfsuite Python packages. The tutorial uses Python 3.

import nltk
import sklearn_crfsuite
import eli5

1. Training data¶

CoNLL 2002 datasets contains a list of Spanish sentences, with Named Entities annotated. It uses IOB2 encoding. CoNLL 2002 data also provide POS tags.

train_sents = list(nltk.corpus.conll2002.iob_sents('esp.train'))
test_sents = list(nltk.corpus.conll2002.iob_sents('esp.testb'))
train_sents[0]

[('Melbourne', 'NP', 'B-LOC'),
 ('(', 'Fpa', 'O'),
 ('Australia', 'NP', 'B-LOC'),
 (')', 'Fpt', 'O'),
 (',', 'Fc', 'O'),
 ('25', 'Z', 'O'),
 ('may', 'NC', 'O'),
 ('(', 'Fpa', 'O'),
 ('EFE', 'NC', 'B-ORG'),
 (')', 'Fpt', 'O'),
 ('.', 'Fp', 'O')]

2. Feature extraction¶

POS tags can be seen as pre-extracted features. Let’s extract more features (word parts, simplified POS tags, lower/title/upper flags, features of nearby words) and convert them to sklear-crfsuite format - each sentence should be converted to a list of dicts. This is a very simple baseline; you certainly can do better.

def word2features(sent, i):
    word = sent[i][0]
    postag = sent[i][1]

    features = {
        'bias': 1.0,
        'word.lower()': word.lower(),
        'word[-3:]': word[-3:],
        'word.isupper()': word.isupper(),
        'word.istitle()': word.istitle(),
        'word.isdigit()': word.isdigit(),
        'postag': postag,
        'postag[:2]': postag[:2],
    }
    if i > 0:
        word1 = sent[i-1][0]
        postag1 = sent[i-1][1]
        features.update({
            '-1:word.lower()': word1.lower(),
            '-1:word.istitle()': word1.istitle(),
            '-1:word.isupper()': word1.isupper(),
            '-1:postag': postag1,
            '-1:postag[:2]': postag1[:2],
        })
    else:
        features['BOS'] = True

    if i < len(sent)-1:
        word1 = sent[i+1][0]
        postag1 = sent[i+1][1]
        features.update({
            '+1:word.lower()': word1.lower(),
            '+1:word.istitle()': word1.istitle(),
            '+1:word.isupper()': word1.isupper(),
            '+1:postag': postag1,
            '+1:postag[:2]': postag1[:2],
        })
    else:
        features['EOS'] = True

    return features


def sent2features(sent):
    return [word2features(sent, i) for i in range(len(sent))]

def sent2labels(sent):
    return [label for token, postag, label in sent]

def sent2tokens(sent):
    return [token for token, postag, label in sent]

X_train = [sent2features(s) for s in train_sents]
y_train = [sent2labels(s) for s in train_sents]

X_test = [sent2features(s) for s in test_sents]
y_test = [sent2labels(s) for s in test_sents]

This is how features extracted from a single token look like:

X_train[0][1]

{'+1:postag': 'NP',
 '+1:postag[:2]': 'NP',
 '+1:word.istitle()': True,
 '+1:word.isupper()': False,
 '+1:word.lower()': 'australia',
 '-1:postag': 'NP',
 '-1:postag[:2]': 'NP',
 '-1:word.istitle()': True,
 '-1:word.isupper()': False,
 '-1:word.lower()': 'melbourne',
 'bias': 1.0,
 'postag': 'Fpa',
 'postag[:2]': 'Fp',
 'word.isdigit()': False,
 'word.istitle()': False,
 'word.isupper()': False,
 'word.lower()': '(',
 'word[-3:]': '('}

3. Train a CRF model¶

Once we have features in a right format we can train a linear-chain CRF (Conditional Random Fields) model using sklearn_crfsuite.CRF:

crf = sklearn_crfsuite.CRF(
    algorithm='lbfgs',
    c1=0.1,
    c2=0.1,
    max_iterations=20,
    all_possible_transitions=False,
)
crf.fit(X_train, y_train);

4. Inspect model weights¶

CRFsuite CRF models use two kinds of features: state features and transition features. Let’s check their weights using eli5.explain_weights:

eli5.show_weights(crf, top=30)

From \ To	O	B-LOC	I-LOC	B-MISC	I-MISC	B-ORG	I-ORG	B-PER	I-PER
O	3.281	2.204	0.0	2.101	0.0	3.468	0.0	2.325	0.0
B-LOC	-0.259	-0.098	4.058	0.0	0.0	0.0	0.0	-0.212	0.0
I-LOC	-0.173	-0.609	3.436	0.0	0.0	0.0	0.0	0.0	0.0
B-MISC	-0.673	-0.341	0.0	0.0	4.069	-0.308	0.0	-0.331	0.0
I-MISC	-0.803	-0.998	0.0	-0.519	4.977	-0.817	0.0	-0.611	0.0
B-ORG	-0.096	-0.242	0.0	-0.57	0.0	-1.012	4.739	-0.306	0.0
I-ORG	-0.339	-1.758	0.0	-0.841	0.0	-1.382	5.062	-0.472	0.0
B-PER	-0.4	-0.851	0.0	0.0	0.0	-1.013	0.0	-0.937	4.329
I-PER	-0.676	-0.47	0.0	0.0	0.0	0.0	0.0	-0.659	3.754

y=O top features

y=B-LOC top features

y=I-LOC top features

y=B-MISC top features

y=I-MISC top features

y=B-ORG top features

y=I-ORG top features

y=B-PER top features

y=I-PER top features

Weight^?	Feature
+4.416	postag[:2]:Fp
+3.116	BOS
+2.401	bias
+2.297	postag[:2]:Fc
+2.297	word.lower():,
+2.297	postag:Fc
+2.297	word[-3:]:,
+2.124	postag[:2]:CC
+2.124	postag:CC
+1.984	EOS
+1.859	word.lower():y
+1.684	postag:RG
+1.684	postag[:2]:RG
+1.610	word.lower():-
+1.610	postag[:2]:Fg
+1.610	word[-3:]:-
+1.610	postag:Fg
+1.582	postag:Fp
+1.582	word[-3:]:.
+1.582	word.lower():.
+1.372	word[-3:]:y
+1.187	postag:CS
+1.187	postag[:2]:CS
+1.150	word[-3:]:(
+1.150	postag:Fpa
+1.150	word.lower():(
… 16444 more positive …
… 3771 more negative …
-2.106	postag:NP
-2.106	postag[:2]:NP
-3.723	word.isupper()
-6.166	word.istitle()

Weight^?	Feature
+2.530	word.istitle()
+2.224	-1:word.lower():en
+0.906	word[-3:]:rid
+0.905	word.lower():madrid
+0.646	word.lower():españa
+0.640	word[-3:]:ona
+0.595	word[-3:]:aña
+0.595	+1:postag[:2]:Fp
+0.515	word.lower():parís
+0.514	word[-3:]:rís
+0.424	word.lower():barcelona
+0.420	-1:postag:Fg
+0.420	-1:word.lower():-
+0.420	-1:postag[:2]:Fg
+0.413	-1:word.isupper()
+0.390	-1:postag[:2]:Fp
+0.389	-1:postag:Fpa
+0.389	-1:word.lower():(
+0.388	word.lower():san
+0.385	postag:NC
… 2282 more positive …
… 413 more negative …
-0.389	-1:word.lower():"
-0.389	-1:postag:Fe
-0.389	-1:postag[:2]:Fe
-0.406	-1:postag[:2]:VM
-0.646	word[-3:]:ión
-0.759	-1:word.lower():del
-0.818	bias
-0.986	postag:SP
-0.986	postag[:2]:SP
-1.354	-1:word.istitle()

Weight^?	Feature
+0.886	-1:word.istitle()
+0.664	-1:word.lower():de
+0.582	word[-3:]:de
+0.578	word.lower():de
+0.529	-1:word.lower():san
+0.444	+1:word.istitle()
+0.441	word.istitle()
+0.335	-1:word.lower():la
+0.262	postag:SP
+0.262	postag[:2]:SP
+0.235	word[-3:]:la
+0.228	word[-3:]:iro
+0.226	word[-3:]:oja
+0.218	word[-3:]:del
+0.215	word.lower():del
+0.213	-1:postag:NC
+0.213	-1:postag[:2]:NC
+0.205	-1:word.lower():nueva
… 1665 more positive …
… 258 more negative …
-0.206	-1:postag[:2]:Z
-0.206	-1:postag:Z
-0.213	-1:postag[:2]:CC
-0.213	-1:postag:CC
-0.219	-1:word.lower():en
-0.222	+1:word.isupper()
-0.235	+1:postag:VMI
-0.342	word.isupper()
-0.366	+1:postag[:2]:AQ
-0.366	+1:postag:AQ
-0.392	+1:postag[:2]:VM
-1.690	BOS

Weight^?	Feature
+1.770	word.isupper()
+0.693	word.istitle()
+0.606	word.lower():"
+0.606	word[-3:]:"
+0.606	postag:Fe
+0.606	postag[:2]:Fe
+0.538	+1:word.istitle()
+0.508	-1:word.lower():"
+0.508	-1:postag:Fe
+0.508	-1:postag[:2]:Fe
+0.484	-1:postag[:2]:DA
+0.484	-1:postag:DA
+0.479	+1:word.isupper()
+0.457	postag[:2]:NC
+0.457	postag:NC
+0.400	word.lower():liga
+0.399	word[-3:]:iga
+0.367	-1:word.lower():la
+0.354	postag:Z
+0.354	postag[:2]:Z
+0.332	-1:word.lower():del
+0.286	+1:postag[:2]:Z
+0.286	+1:postag:Z
+0.284	+1:postag:NC
+0.284	+1:postag[:2]:NC
… 2284 more positive …
… 314 more negative …
-0.308	BOS
-0.377	-1:postag[:2]:VM
-0.908	postag[:2]:SP
-0.908	postag:SP
-1.094	-1:word.istitle()

Weight^?	Feature
+1.364	-1:word.istitle()
+0.675	-1:word.lower():de
+0.597	+1:postag:Fe
+0.597	+1:word.lower():"
+0.597	+1:postag[:2]:Fe
+0.369	-1:postag:NC
+0.369	-1:postag[:2]:NC
+0.324	-1:word.lower():liga
+0.318	word[-3:]:de
+0.304	word.lower():de
+0.303	word.isdigit()
+0.261	-1:postag[:2]:SP
+0.261	-1:postag:SP
+0.258	-1:word.lower():copa
+0.240	word.lower():campeones
+0.235	word[-3:]:000
+0.234	+1:postag:Z
+0.234	+1:postag[:2]:Z
+0.229	word.lower():2000
… 3675 more positive …
… 573 more negative …
-0.235	EOS
-0.264	-1:word.lower():y
-0.265	word.lower():y
-0.265	+1:postag:VMI
-0.274	postag[:2]:VM
-0.306	-1:postag:CC
-0.306	-1:postag[:2]:CC
-0.320	postag:CC
-0.320	postag[:2]:CC
-0.370	+1:postag[:2]:VM
-0.641	bias

Weight^?	Feature
+2.695	word.lower():efe
+2.519	word.isupper()
+2.084	word[-3:]:EFE
+1.174	word.lower():gobierno
+1.142	word.istitle()
+1.018	-1:word.lower():del
+0.958	word[-3:]:rno
+0.671	word[-3:]:PP
+0.671	word.lower():pp
+0.667	-1:word.lower():al
+0.555	-1:word.lower():el
+0.499	word[-3:]:eal
+0.413	word.lower():real
+0.393	word.lower():ayuntamiento
+0.391	postag:AQ
+0.391	postag[:2]:AQ
… 3518 more positive …
… 619 more negative …
-0.430	-1:postag[:2]:AQ
-0.430	-1:postag:AQ
-0.450	+1:word.lower():de
-0.455	postag[:2]:Z
-0.455	postag:Z
-0.500	-1:word.istitle()
-0.642	-1:word.lower():los
-0.664	-1:word.lower():de
-0.707	-1:word.isupper()
-0.746	-1:word.lower():en
-0.747	-1:postag[:2]:VM
-1.100	bias
-1.289	postag[:2]:SP
-1.289	postag:SP

Weight^?	Feature
+1.499	-1:word.istitle()
+1.200	-1:word.lower():de
+0.539	-1:word.lower():real
+0.511	word[-3:]:rid
+0.446	word[-3:]:de
+0.433	word.lower():de
+0.428	-1:postag:SP
+0.428	-1:postag[:2]:SP
+0.399	word.lower():madrid
+0.368	word[-3:]:la
+0.365	-1:word.lower():consejo
+0.363	word.istitle()
+0.352	-1:word.lower():comisión
+0.336	postag[:2]:AQ
+0.336	postag:AQ
+0.332	+1:postag:Fpa
+0.332	+1:word.lower():(
+0.311	-1:word.lower():estados
+0.306	word.lower():unidos
… 3473 more positive …
… 703 more negative …
-0.304	postag[:2]:NP
-0.304	postag:NP
-0.306	-1:word.lower():a
-0.384	+1:postag[:2]:NC
-0.384	+1:postag:NC
-0.391	-1:word.isupper()
-0.507	+1:postag:AQ
-0.507	+1:postag[:2]:AQ
-0.535	postag[:2]:VM
-0.540	postag:VMI
-1.195	bias

Weight^?	Feature
+1.698	word.istitle()
+0.683	-1:postag:VMI
+0.601	+1:postag[:2]:VM
+0.589	postag:NP
+0.589	postag[:2]:NP
+0.589	+1:postag:VMI
+0.565	-1:word.lower():a
+0.520	word[-3:]:osé
+0.503	word.lower():josé
+0.476	-1:postag[:2]:VM
+0.472	postag:NC
+0.472	postag[:2]:NC
+0.452	-1:postag[:2]:Fc
+0.452	-1:word.lower():,
+0.452	-1:postag:Fc
… 4117 more positive …
… 351 more negative …
-0.472	-1:word.lower():en
-0.475	-1:postag[:2]:Fe
-0.475	-1:word.lower():"
-0.475	-1:postag:Fe
-0.543	word.lower():la
-0.572	-1:word.lower():de
-0.693	-1:word.istitle()
-0.712	postag[:2]:SP
-0.712	postag:SP
-0.778	-1:word.lower():del
-0.818	-1:postag[:2]:DA
-0.818	-1:postag:DA
-0.923	-1:word.lower():la
-1.319	postag:DA
-1.319	postag[:2]:DA

Weight^?	Feature
+2.742	-1:word.istitle()
+0.736	word.istitle()
+0.660	-1:word.lower():josé
+0.598	-1:postag[:2]:AQ
+0.598	-1:postag:AQ
+0.510	-1:postag[:2]:VM
+0.487	-1:word.lower():juan
+0.419	-1:word.lower():maría
+0.413	-1:postag:VMI
+0.345	-1:word.lower():luis
+0.319	-1:word.lower():manuel
+0.315	postag[:2]:NC
+0.315	postag:NC
+0.309	-1:word.lower():carlos
… 3903 more positive …
… 365 more negative …
-0.301	postag[:2]:NP
-0.301	postag:NP
-0.301	word[-3:]:ión
-0.305	postag[:2]:Fe
-0.305	word.lower():"
-0.305	postag:Fe
-0.305	word[-3:]:"
-0.305	+1:word.lower():que
-0.324	-1:word.lower():el
-0.377	+1:postag[:2]:Z
-0.377	+1:postag:Z
-0.396	postag:VMI
-0.433	+1:postag:SP
-0.433	+1:postag[:2]:SP
-0.485	postag[:2]:VM
-1.431	bias

Transition features make sense: at least model learned that I-ENITITY must follow B-ENTITY. It also learned that some transitions are unlikely, e.g. it is not common in this dataset to have a location right after an organization name (I-ORG -> B-LOC has a large negative weight).

Features don’t use gazetteers, so model had to remember some geographic names from the training data, e.g. that España is a location.

If we regularize CRF more, we can expect that only features which are generic will remain, and memoized tokens will go. With L1 regularization (c1 parameter) coefficients of most features should be driven to zero. Let’s check what effect does regularization have on CRF weights:

crf = sklearn_crfsuite.CRF(
    algorithm='lbfgs',
    c1=200,
    c2=0.1,
    max_iterations=20,
    all_possible_transitions=False,
)
crf.fit(X_train, y_train)
eli5.show_weights(crf, top=30)

From \ To	O	B-LOC	I-LOC	B-MISC	I-MISC	B-ORG	I-ORG	B-PER	I-PER
O	3.232	1.76	0.0	2.026	0.0	2.603	0.0	1.593	0.0
B-LOC	0.035	0.0	2.773	0.0	0.0	0.0	0.0	0.0	0.0
I-LOC	-0.02	0.0	3.099	0.0	0.0	0.0	0.0	0.0	0.0
B-MISC	-0.382	0.0	0.0	0.0	4.758	0.0	0.0	0.0	0.0
I-MISC	-0.256	0.0	0.0	0.0	4.155	0.0	0.0	0.0	0.0
B-ORG	0.161	0.0	0.0	0.0	0.0	0.0	3.344	0.0	0.0
I-ORG	-0.126	-0.081	0.0	0.0	0.0	0.0	4.048	0.0	0.0
B-PER	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	3.449
I-PER	-0.085	0.0	0.0	0.0	0.0	0.0	0.0	0.0	2.254

y=O top features

y=B-LOC top features

y=I-LOC top features

y=B-MISC top features

y=I-MISC top features

y=B-ORG top features

y=I-ORG top features

y=B-PER top features

y=I-PER top features

Weight^?	Feature
+3.363	BOS
+2.842	bias
+2.478	postag[:2]:Fp
+0.665	-1:word.isupper()
+0.439	+1:postag[:2]:AQ
+0.439	+1:postag:AQ
+0.400	postag[:2]:Fc
+0.400	word.lower():,
+0.400	word[-3:]:,
+0.400	postag:Fc
+0.391	postag:CC
+0.391	postag[:2]:CC
+0.365	EOS
+0.363	+1:postag:NC
+0.363	+1:postag[:2]:NC
+0.315	postag:SP
+0.315	postag[:2]:SP
+0.302	+1:word.isupper()
… 15 more positive …
… 14 more negative …
-0.216	postag:AQ
-0.216	postag[:2]:AQ
-0.334	-1:postag:SP
-0.334	-1:postag[:2]:SP
-0.417	postag[:2]:NP
-0.417	postag:NP
-0.547	postag[:2]:NC
-0.547	postag:NC
-0.547	word.lower():de
-0.600	word[-3:]:de
-3.552	word.isupper()
-5.446	word.istitle()

Weight^?	Feature
+1.417	-1:word.lower():en
+1.183	word.istitle()
+0.498	+1:postag[:2]:Fp
+0.150	+1:word.lower():,
+0.150	+1:postag:Fc
+0.150	+1:postag[:2]:Fc
+0.098	-1:postag[:2]:Fp
+0.081	-1:postag:Fpa
+0.081	-1:word.lower():(
+0.080	postag[:2]:NP
+0.080	postag:NP
+0.056	-1:postag:SP
+0.056	-1:postag[:2]:SP
+0.022	postag:NC
+0.022	postag[:2]:NC
+0.019	BOS
-0.008	+1:word.istitle()
-0.028	-1:word.lower():del
-0.572	-1:word.istitle()

Weight^?	Feature
+0.788	-1:word.istitle()
+0.248	word[-3:]:de
+0.237	word.lower():de
+0.199	-1:word.lower():de
+0.190	postag[:2]:SP
+0.190	postag:SP
+0.060	-1:postag:SP
+0.060	-1:postag[:2]:SP
+0.040	+1:word.istitle()

Weight^?	Feature
+0.349	word.isupper()
+0.053	-1:postag[:2]:DA
+0.053	-1:postag:DA
+0.030	word.istitle()
-0.009	-1:postag:SP
-0.009	-1:postag[:2]:SP
-0.060	bias
-0.172	-1:word.istitle()

Weight^?	Feature
+0.432	-1:word.istitle()
+0.158	-1:postag[:2]:NC
+0.158	-1:postag:NC
+0.146	+1:postag[:2]:Fe
+0.146	+1:word.lower():"
+0.146	+1:postag:Fe
+0.030	postag[:2]:SP
+0.030	postag:SP
-0.087	word.istitle()
-0.094	bias
-0.119	word.isupper()
-0.120	-1:word.isupper()
-0.121	+1:word.isupper()
-0.211	+1:word.istitle()

Weight^?	Feature
+1.681	word.isupper()
+0.507	-1:word.lower():del
+0.350	-1:postag:DA
+0.350	-1:postag[:2]:DA
+0.282	word.lower():efe
+0.234	word[-3:]:EFE
+0.195	-1:word.lower():(
+0.195	-1:postag:Fpa
+0.192	word.istitle()
+0.178	+1:postag:Fpt
+0.178	+1:word.lower():)
+0.173	-1:postag[:2]:Fp
+0.136	-1:word.lower():el
+0.110	postag[:2]:NC
+0.110	postag:NC
-0.004	+1:word.istitle()
-0.023	+1:postag[:2]:Fp
-0.041	+1:postag:NC
-0.041	+1:postag[:2]:NC
-0.210	-1:word.lower():de
-0.515	bias

Weight^?	Feature
+1.318	-1:word.istitle()
+0.762	-1:word.lower():de
+0.185	-1:postag:SP
+0.185	-1:postag[:2]:SP
+0.185	word[-3:]:de
+0.058	word.lower():de
-0.043	-1:word.isupper()
-0.267	+1:word.istitle()
-0.536	bias

Weight^?	Feature
+0.800	word.istitle()
+0.463	-1:word.lower():,
+0.463	-1:postag[:2]:Fc
+0.463	-1:postag:Fc
+0.148	+1:postag:VMI
+0.125	+1:word.istitle()
+0.095	+1:postag[:2]:VM
+0.007	+1:postag:AQ
+0.007	+1:postag[:2]:AQ
-0.039	-1:word.istitle()
-0.058	postag:DA
-0.058	postag[:2]:DA
-0.063	bias
-0.067	-1:word.lower():de
-0.159	-1:postag:SP
-0.159	-1:postag[:2]:SP
-0.263	-1:postag:DA
-0.263	-1:postag[:2]:DA

Weight^?	Feature
+2.127	-1:word.istitle()
+0.331	word.istitle()
+0.016	+1:postag[:2]:Fc
+0.016	+1:word.lower():,
+0.016	+1:postag:Fc
-0.089	+1:postag:SP
-0.089	+1:postag[:2]:SP
-0.648	bias

As you can see, memoized tokens are mostly gone and model now relies on word shapes and POS tags. There is only a few non-zero features remaining. In our example the change probably made the quality worse, but that’s a separate question.

Let’s focus on transition weights. We can expect that O -> I-ENTIRY transitions to have large negative weights because they are impossible. But these transitions have zero weights, not negative weights, both in heavily regularized model and in our initial model. Something is going on here.

The reason they are zero is that crfsuite haven’t seen these transitions in training data, and assumed there is no need to learn weights for them, to save some computation time. This is the default behavior, but it is possible to turn it off using sklearn_crfsuite.CRF all_possible_transitions option. Let’s check how does it affect the result:

crf = sklearn_crfsuite.CRF(
    algorithm='lbfgs',
    c1=0.1,
    c2=0.1,
    max_iterations=20,
    all_possible_transitions=True,
)
crf.fit(X_train, y_train);

eli5.show_weights(crf, top=5, show=['transition_features'])

From \ To	O	B-LOC	I-LOC	B-MISC	I-MISC	B-ORG	I-ORG	B-PER	I-PER
O	2.732	1.217	-4.675	1.515	-5.785	1.36	-6.19	0.968	-6.236
B-LOC	-0.226	-0.091	3.378	-0.433	-1.065	-0.861	-1.783	-0.295	-1.57
I-LOC	-0.184	-0.585	2.404	-0.276	-0.485	-0.582	-0.749	-0.442	-0.647
B-MISC	-0.714	-0.353	-0.539	-0.278	3.512	-0.412	-1.047	-0.336	-0.895
I-MISC	-0.697	-0.846	-0.587	-0.297	4.252	-0.84	-1.206	-0.523	-1.001
B-ORG	0.419	-0.187	-1.074	-0.567	-1.607	-1.13	5.392	-0.223	-2.122
I-ORG	-0.117	-1.715	-0.863	-0.631	-1.221	-1.442	5.141	-0.397	-1.908
B-PER	-0.127	-0.806	-0.834	-0.52	-1.228	-1.089	-2.076	-1.01	4.04
I-PER	-0.766	-0.242	-0.67	-0.418	-0.856	-0.903	-1.472	-0.692	2.909

With all_possible_transitions=True CRF learned large negative weights for impossible transitions like O -> I-ORG.

5. Customization¶

The table above is large and kind of hard to inspect; eli5 provides several options to look only at a part of features. You can check only a subset of labels:

eli5.show_weights(crf, top=10, targets=['O', 'B-ORG', 'I-ORG'])

From \ To	O	B-ORG	I-ORG
O	2.732	1.36	-6.19
B-ORG	0.419	-1.13	5.392
I-ORG	-0.117	-1.442	5.141

y=O top features

y=B-ORG top features

y=I-ORG top features

Weight^?	Feature
+4.931	BOS
+3.754	postag[:2]:Fp
+3.539	bias
+2.328	word[-3:]:,
+2.328	word.lower():,
+2.328	postag[:2]:Fc
+2.328	postag:Fc
… 15039 more positive …
… 3905 more negative …
-2.187	postag[:2]:NP
-3.685	word.isupper()
-7.025	word.istitle()

Weight^?	Feature
+3.041	word.isupper()
+2.952	word.lower():efe
+1.851	word[-3:]:EFE
+1.278	word.lower():gobierno
+1.033	word[-3:]:rno
+1.005	word.istitle()
+0.864	-1:word.lower():del
… 3524 more positive …
… 621 more negative …
-0.842	-1:word.lower():en
-1.416	postag[:2]:SP
-1.416	postag:SP

Weight^?	Feature
+1.159	-1:word.lower():de
+0.993	-1:word.istitle()
+0.637	-1:postag[:2]:SP
+0.637	-1:postag:SP
+0.570	-1:word.lower():real
+0.547	word.istitle()
… 3517 more positive …
… 676 more negative …
-0.480	postag:VMI
-0.508	postag[:2]:VM
-0.533	-1:word.isupper()
-1.290	bias

Another option is to check only some of the features - it helps to check if a feature function works as intended. For example, let’s check how word shape features are used by model using feature_re argument and hide transition table:

eli5.show_weights(crf, top=10, feature_re='^word\.is',
                  horizontal_layout=False, show=['targets'])

y=O top features

Weight^?	Feature
-3.685	word.isupper()
-7.025	word.istitle()

y=B-LOC top features

Weight^?	Feature
+2.397	word.istitle()
+0.099	word.isupper()
-0.152	word.isdigit()

y=I-LOC top features

Weight^?	Feature
+0.460	word.istitle()
-0.018	word.isdigit()
-0.345	word.isupper()

y=B-MISC top features

Weight^?	Feature
+2.017	word.isupper()
+0.603	word.istitle()
-0.012	word.isdigit()

y=I-MISC top features

Weight^?	Feature
+0.271	word.isdigit()
-0.072	word.isupper()
-0.106	word.istitle()

y=B-ORG top features

Weight^?	Feature
+3.041	word.isupper()
+1.005	word.istitle()
-0.044	word.isdigit()

y=I-ORG top features

Weight^?	Feature
+0.547	word.istitle()
+0.014	word.isdigit()
-0.012	word.isupper()

y=B-PER top features

Weight^?	Feature
+1.757	word.istitle()
+0.050	word.isupper()
-0.123	word.isdigit()

y=I-PER top features

Weight^?	Feature
+0.976	word.istitle()
+0.193	word.isupper()
-0.106	word.isdigit()

Looks fine - UPPERCASE and Titlecase words are likely to be entities of some kind.

6. Formatting in console¶

It is also possible to format the result as text (could be useful in console):

expl = eli5.explain_weights(crf, top=5, targets=['O', 'B-LOC', 'I-LOC'])
print(eli5.format_as_text(expl))

Explained as: CRF

Transition features:
            O    B-LOC    I-LOC
-----  ------  -------  -------
O       2.732    1.217   -4.675
B-LOC  -0.226   -0.091    3.378
I-LOC  -0.184   -0.585    2.404

y='O' top features
Weight  Feature
------  --------------
+4.931  BOS
+3.754  postag[:2]:Fp
+3.539  bias
… 15043 more positive …
… 3906 more negative …
-3.685  word.isupper()
-7.025  word.istitle()

y='B-LOC' top features
Weight  Feature
------  ------------------
+2.397  word.istitle()
+2.147  -1:word.lower():en
  … 2284 more positive …
  … 433 more negative …
-1.080  postag[:2]:SP
-1.080  postag:SP
-1.273  -1:word.istitle()

y='I-LOC' top features
Weight  Feature
------  ------------------
+0.882  -1:word.lower():de
+0.780  -1:word.istitle()
+0.718  word[-3:]:de
+0.711  word.lower():de
  … 1684 more positive …
  … 268 more negative …
-1.965  BOS