Contextary is in early beta. Free for anyone to use, for now. Questions? Email dan@contextary.ai
Troubleshooting

How to stop Claude from hallucinating SQL

Confidently wrong is worse than wrong. Here are the five places Claude fails on real warehouses, and the fix for each.

The short answer

Hallucinated SQL has two causes: Claude doesn't have your actual schema, or it doesn't have your team's business rules. Fix both by serving your annotated handbook over MCP, then verify every generated query against real data before trusting the answer.

Why Claude hallucinates SQL in the first place

Claude doesn't know it's wrong. From its perspective, every query it generates is a reasonable guess based on the prompt. When the prompt doesn't include your real schema or your real business rules, those reasonable guesses go off the rails in five predictable ways:

  1. 1. It invents columns that "should" exist (orders.total instead of orders.amount_cents).
  2. 2. It picks the wrong join key (the one that looks right but isn't canonical).
  3. 3. It applies a generic metric definition instead of yours (its definition of "active user" instead of your team's).
  4. 4. It misses a filter you take for granted (test accounts, soft deletes, paused customers).
  5. 5. It misreads a status enum (counts "Stage 0" as qualified pipeline).

The five failure modes, and the fix for each

1. Hallucinated columns

Symptom: The query references columns that don't exist. SQL fails to compile, or worse, it does compile because the column name happens to exist on a different table.

Fix: Give Claude your real schema, on every prompt, automatically. That's what an MCP server does: your annotated tables and columns reach the model as part of its working context, not as a once-pasted system prompt that drifts the moment you add a new column.

2. Wrong join key

Symptom: The query joins orders.account_uuid to accounts.id. The numbers come back ~10% too low because not every order has the deprecated UUID populated.

Fix: Document the canonical join keys explicitly. One annotation per table pair: "orders.account_id joins to accounts.id, never account_uuid." This is the kind of thing that lives forever in a senior analyst's head and never makes it to the schema.

3. Generic metric definition instead of yours

Symptom: Marketing reports 10K MQLs. Sales says 6K. AI used a different definition of "qualified" each time. The boardroom argument is now about which AI was wrong, not about the actual business.

Fix: Define your metrics once in a place every AI tool can read. See how to define metrics for AI. The compounding asset isn't the metric definitions; it's having one definition that humans and AI both use.

4. Missing filters

Symptom: Churn comes back at 8%. Real number is 4%. AI included paused customers and trial accounts in the denominator.

Fix: Capture filters as gotchas. "Test accounts have is_test = true and must be filtered for any reporting." "Paused subscriptions don't count as churn, only status = 'cancelled'." Once those rules are documented and reaching every prompt, the filter is applied automatically.

5. Status enum misread

Symptom: Pipeline number comes back at $12M. Real qualified pipeline is $3M. AI counted Stage 0 opportunities (inbound triage) as qualified.

Fix: Document what each enum value means, not just that it exists. Schema knows there's a stage_name column. Your handbook needs to say "Stages 1+ are qualified pipeline. Stage 0 is unqualified inbound. Exclude it from any pipeline reporting."

Verification: never trust, always test

Even with perfect annotations, AI can write SQL with subtle bugs: an off-by-one date range, a CTE that filters too early, an aggregation that double-counts. The second half of the solution is verification: run the generated SQL against real data before trusting the answer.

A useful verification flow has three checks:

Contextary's verification console runs these checks as part of the workflow: you see the SQL, the result, and the context Claude used, side by side, before anything hits a dashboard or a board deck.

Keep reading

Stop trusting confidently-wrong answers

Ground Claude in your team's context, then verify every query against real data. Free to start.