Confidently wrong is worse than wrong. Here are the five places Claude fails on real warehouses, and the fix for each.
Hallucinated SQL has two causes: Claude doesn't have your actual schema, or it doesn't have your team's business rules. Fix both by serving your annotated handbook over MCP, then verify every generated query against real data before trusting the answer.
Claude doesn't know it's wrong. From its perspective, every query it generates is a reasonable guess based on the prompt. When the prompt doesn't include your real schema or your real business rules, those reasonable guesses go off the rails in five predictable ways:
orders.total instead of orders.amount_cents).Symptom: The query references columns that don't exist. SQL fails to compile, or worse, it does compile because the column name happens to exist on a different table.
Fix: Give Claude your real schema, on every prompt, automatically. That's what an MCP server does: your annotated tables and columns reach the model as part of its working context, not as a once-pasted system prompt that drifts the moment you add a new column.
Symptom: The query joins orders.account_uuid to accounts.id. The numbers come back ~10% too low because not every order has the deprecated UUID populated.
Fix: Document the canonical join keys explicitly. One annotation per table pair: "orders.account_id joins to accounts.id, never account_uuid." This is the kind of thing that lives forever in a senior analyst's head and never makes it to the schema.
Symptom: Marketing reports 10K MQLs. Sales says 6K. AI used a different definition of "qualified" each time. The boardroom argument is now about which AI was wrong, not about the actual business.
Fix: Define your metrics once in a place every AI tool can read. See how to define metrics for AI. The compounding asset isn't the metric definitions; it's having one definition that humans and AI both use.
Symptom: Churn comes back at 8%. Real number is 4%. AI included paused customers and trial accounts in the denominator.
Fix: Capture filters as gotchas. "Test accounts have is_test = true and must be filtered for any reporting." "Paused subscriptions don't count as churn, only status = 'cancelled'." Once those rules are documented and reaching every prompt, the filter is applied automatically.
Symptom: Pipeline number comes back at $12M. Real qualified pipeline is $3M. AI counted Stage 0 opportunities (inbound triage) as qualified.
Fix: Document what each enum value means, not just that it exists. Schema knows there's a stage_name column. Your handbook needs to say "Stages 1+ are qualified pipeline. Stage 0 is unqualified inbound. Exclude it from any pipeline reporting."
Even with perfect annotations, AI can write SQL with subtle bugs: an off-by-one date range, a CTE that filters too early, an aggregation that double-counts. The second half of the solution is verification: run the generated SQL against real data before trusting the answer.
A useful verification flow has three checks:
Contextary's verification console runs these checks as part of the workflow: you see the SQL, the result, and the context Claude used, side by side, before anything hits a dashboard or a board deck.
The four layers of context AI needs and the practical order to capture them.
Stop the boardroom argument about which AI is right. Define metrics once, use them everywhere.
An explainer on how MCP gets your context to Claude reliably.
Step-by-step setup tutorial. Same shape for BigQuery, Postgres, Redshift.
Ground Claude in your team's context, then verify every query against real data. Free to start.