Announcing | TLM (Trustworthy Language Model) for reliable LLM outputs.Learn more.
Private BETA

Trustworthy Language Model (TLM)

Join our early access community to try TLM for free.
What is 100 + 300?
Cleanlab TLM:
400
> Confidence: 93.80%
What is 57834849 + 38833747?
Cleanlab TLM:
96668696
> Confidence: 24.50%
Down left arrow
0/140

Problem

You want to use Large Language Models (LLMs) for automation and answering questions, but you don’t know when to trust their outputs due to “hallucinations” and incorrect or nonsensical answers.

Solution

Cleanlab Trustworthy Language Model (TLM) adds a trustworthiness score to LLM outputs so you know which outputs are more reliable and which should be double-checked.

Cleanlab TLM is a robust and reliable LLM that gives high-quality outputs, and indicates when it is unsure of the answer to a question, making it suitable for applications where unchecked hallucinations are a show-stopper.

Without Cleanlab TLM: (i.e. a standard LLM)

Question: What is 100 + 300?

Answer: 400

Question: What is 57834849 + 38833747?

Answer: 96668696

It’s difficult to tell when the LLM is answering confidently, and when it is not. However, with Cleanlab TLM, the answers come with a confidence score. This can guide how to use the output from the LLM (e.g. use it directly if the confidence is above a certain threshold, otherwise flag the LLM response for human review):

With Cleanlab TLM:

Question: What is 100 + 300?

Answer: 400

Confidence: 0.938

Question: What is 57834849 + 38833747?

Answer: 96668696

Confidence: 0.245

Here are a few other examples…

Question: Which part of the human body produces insulin?

Answer: the pancreas

Confidence: 0.759

Question: What color are the two stars on the national flag of Syria?

Answer: red and black

Confidence: 0.173

Question: Where was Jerome Powell born?

Answer: Washington, D.C., United States

Confidence: 0.882

Question: Where did Jerome Powell do his PhD in Economics?

Answer: Princeton University

Confidence: 0.396

Cleanlab TLM confidence scores quantify both aleatoric and epistemic uncertainty in LLM answers, and are consistently lower for answers that are incorrect/suboptimal. Boost the reliability of your Generative AI applications by adding contingency plans to override LLM answers whose confidence falls below some threshold (e.g., route to human for answer, append disclaimer that answer is uncertain, revert to a default baseline answer, or request a prompt with more information/context).

The Trustworthy Language Model is available in beta with a Cleanlab Studio account. Sign up for a free account below. Make sure you indicate your interest in early TLM access when signing up (TLM is in private beta).