OpenAI

OpenAI provides a client library for calling Large Language Models (LLMs).

eli5 supports eli5.explain_prediction() for ChatCompletion, ChoiceLogprobs and openai.Client objects, highlighting tokens proportionally to the log probability, which can help to see where model is less confident in it’s predictions. More likely tokens are highlighted in green, while unlikely tokens are highlighted in red:

LLM token probabilities visualized

Explaining with a client, invoking the model with logprobs enabled:

import eli5
import opeanai
client = openai.Client()
prompt = 'some string'  # or [{"role": "user", "content": "some string"}]
explanation = eli5.explain_prediction(client, prompt, model='gpt-4o')
explanation

You may pass any extra keyword arguments to eli5.explain_prediction(), they would be passed to the client.chat.completions.create, e.g. you may pass n=2 to get multiple responses and see explanations for each of them.

You’d normally want to run it in a Jupyter notebook to see the explanation formatted as HTML.

You can access the Choice object on the explanation.targets[0].target:

explanation.targets[0].target.message.content

If you have already obtained a chat completion with logprobs from OpenAI client, you may call eli5.explain_prediction() with ChatCompletion or ChoiceLogprobs like this:

chat_completion = client.chat.completions.create(
    messages=[{"role": "user", "content": prompt}],
    model="gpt-4o",
    logprobs=True,
)
eli5.explain_prediction(chat_completion)  # or
eli5.explain_prediction(chat_completion.choices[0].logprobs)

See the tutorial for a more detailed usage example.

Consider also checking other libraries which support explaining predictions of open source LLMs:

Note

While token probabilities reflect model uncertainty in many cases, they are not always indicative, e.g. in case of Chain of Thought preceding the final response. See the tutorial’s limitations section for an example of that.