Skip to main content
ai tools

AI Interfaces Need Receipts

When AI makes a decision, the interface needs to show the reasoning — not as a disclosure, but as primary UX. Trust comes from transparency, not magic.


7 min read

When an AI surfaces a recommendation, selects a candidate, drafts a proposal, or ranks a list, the interface has a choice: show the output, or show the output and how it got there. Most interfaces show only the output. The reasoning is treated as implementation detail — something the model does internally that the user doesn't need to see. That's wrong, and it produces interfaces that feel authoritative in ways that erode trust the moment the output is wrong. A receipt — a visible record of what the AI used to reach a conclusion — is not a disclosure. It's part of the primary UX.


What a receipt actually is

A receipt in an AI interface is a representation of the inputs and reasoning that produced an output. It answers the question: what did the AI look at, and how much did each thing matter?

The simplest receipt is a source tag: "Based on 3 documents." The most complex is a collapsible reasoning trace that shows intermediate steps, confidence levels, and which inputs were weighted most heavily. Most production interfaces need something between those extremes.

There are four categories of information a receipt can surface:

Sources. What data, documents, or context did the AI use? In a RAG-based system, this is the retrieved chunks. In a recommendation system, it's the items or signals that drove the output. Naming the sources lets a user verify them — and tells them what to change if the output isn't right.

Confidence. How certain is the model about this output? This isn't the model's internal probability score — that's not legible to most users. It's a calibrated signal: "high confidence" when the model has strong relevant context, "uncertain" when it's extrapolating or working with incomplete data. Absence of a confidence signal means users interpret every output as equally reliable, which is wrong and produces poor decisions.

Context used. Specifically, which parts of the available context were actually relevant? A chatbot with access to a 50-page document shouldn't present its response as if it read all 50 pages if only section 3 was relevant. Surfacing which section was used tells the user what the model considered in scope.

Intermediate steps. For multi-step reasoning — a plan, a recommendation across multiple criteria, a document that synthesizes multiple sources — what were the sub-conclusions that led to the final output? This is the most complex receipt to design but the most valuable for decisions that have real consequences.


Why hiding the reasoning erodes trust

Opaque AI outputs create a specific failure mode: the user trusts the output until it fails badly enough to be noticed, then stops trusting the system entirely.

This is different from how trust works with human expertise. When a doctor recommends a treatment, the reasoning is part of the interaction — the diagnosis, the differential, the reasoning for preferring one option over another. A patient who understands the reasoning can ask questions, provide additional information, and participate in the decision. A patient handed a prescription with no explanation has no entry point for any of that.

AI interfaces that hide reasoning ask users to extend blanket trust to a system they can't inspect. That's a fragile foundation. The moment the AI is confidently wrong — and it will be — users have no way to diagnose whether the problem was the model, the data, or their input. They just know it was wrong. Repeated enough, this produces one of two outcomes: over-reliance (trusting the AI even when the user has information that would contradict it) or abandonment (not trusting the AI even when it's right).

A receipt doesn't eliminate the risk of AI error. It gives the user a way to engage with the output as a product of a process rather than as an oracle's pronouncement.


What this looks like in a real product

On Waco3's freelance proposal platform, the AI generates proposal sections by analyzing freelancer profiles and matching them against a client brief. The first version of the UI showed the generated proposal sections with no indication of how each section was produced. The quality was good. User trust was low. Clients kept asking which freelancers the proposals were drawing on, and the interface had no answer.

The redesign added a receipts layer. Each proposal section carries a small, unobtrusive indicator: a row of avatar thumbnails showing which freelancer profiles influenced that section, with a tooltip showing the specific signal — "strong match on timeline management, referenced in 3 proposals." Clicking through opens a drawer with the full context: the relevant excerpts from each profile, the weighting factors, and a note if the section drew on fewer profiles than expected (a signal that this part of the brief had weaker coverage in the available pool).

The change had two effects. Clients could now understand why a proposal looked the way it did and ask targeted follow-up questions — "can we find more freelancers with X experience?" instead of "the proposal doesn't feel right." And proposal acceptance rates improved because clients were making decisions based on visible reasoning rather than having to trust an opaque output.


Designing the receipt without cluttering the primary UX

The receipt should not compete with the output for attention. A reasoning trace that takes up 30% of the page is not a receipt — it's a different document. The design goal is: the primary output is prominent and readable; the receipt is findable and complete.

Three patterns that work:

Always-visible summary, full detail on demand. A single line beneath the output — "Generated from 4 sources · 94% confidence" — gives the user enough to know a receipt exists and whether to look at it. Clicking or expanding shows the full detail. This is the right pattern for most outputs.

Inline provenance markers. For long-form outputs — a generated document, a multi-part recommendation — individual sections or sentences can carry markers that link to their source. Think footnote-style, but interactive. This is high-design-effort but valuable for outputs where different sections have different sources and confidence levels.

Receipt-first disclosure for consequential decisions. If the AI is influencing a hiring decision, a financial recommendation, or a medical summary, the receipt should be part of the primary flow — not hidden behind a disclosure toggle. The output and the reasoning should appear at the same level of visual hierarchy.


The difference between a receipt and a disclaimer

A disclaimer says: "AI can make mistakes." It's a legal hedge, not a design pattern. It asks the user to extend general skepticism to every output, which is functionally useless — general skepticism doesn't help a user evaluate a specific output.

A receipt says: "Here's what this specific output was based on." It gives the user something to reason about. They can see whether the sources are relevant, whether the confidence is appropriate, whether the context the AI used matches the context they intended to provide. They can act on that information.

The test for whether an AI interface should tell you what happened is whether a user can read it and make a specific decision. "AI can make mistakes" doesn't help anyone decide whether to accept a recommendation. "This recommendation is based on Q3 2025 revenue data and doesn't include the Q4 figures you uploaded last week" gives the user a specific, actionable reason to verify or override.

That specificity is what separates receipts from disclaimers. Design for the specific, not the general. The trust you build with a receipt is earned trust — it compounds. The trust you ask for with a disclaimer is borrowed trust, and you'll pay it back in user abandonment when the output fails.