Schema isn't context. Here's the practical playbook for capturing the knowledge AI needs to write accurate SQL: the gotchas, the join keys, the metric definitions, the business rules.
Document four layers: what columns mean, where the gotchas live, which join keys are canonical, and what your metrics actually represent. Serve those annotations to Claude over the Model Context Protocol so they reach every prompt automatically. Never hand-write the same context twice.
Your data warehouse already has a schema. Tables, columns, types, primary keys. That's not the problem. The problem is everything not in the schema:
amount column is in cents, so you have to divide by 100 before showing dollars.accounts.id joins to orders.account_id, never to orders.account_uuid (a deprecated column nobody removed).is_test = true and need to be filtered out for any reporting.None of this is in your DDL. All of it is required for AI to give correct answers. When Claude doesn't know these things, it doesn't ask. It guesses. Sometimes the guess is right. Often it's confidently, quietly wrong.
What each column actually represents. stage_name on its own is meaningless; "Salesforce opportunity stage. Stage 0 means inbound triage, not qualified" is useful. Focus on the columns AI will actually need: identifiers, foreign keys, status fields, money columns, dates.
Reusable warnings tied to specific columns or tables. The patterns repeat across companies. Most warehouses have at least these:
deleted_at IS NULL
Spell out which keys to use and which to avoid. Most warehouses have at least one column that looks like the right join key but isn't: a deprecated identifier nobody removed, or a string that should never be matched against a UUID. Saying "orders.account_id joins to accounts.id" once is worth more than the same instruction in 200 system prompts.
What "revenue" actually means in your company. Whether MRR includes annual contracts divided by 12. How you count an "active user". Last 7 days? 30? Does a paused account count as churned? These are the questions that produce wildly different numbers if AI guesses. See how to define metrics for AI for the deeper playbook.
You don't need to document the whole warehouse. You need to document the parts AI will be asked about. A practical order that works for most analytics warehouses:
orders, accounts, opportunities, subscriptions, events, plus a few sources of truth specific to your business. Cold-start is the reason most data catalogs die. Looking at an empty handbook with 200 tables to annotate, every team blinks. The unlock is letting AI read your schema and propose the first draft (column descriptions, likely gotchas, probable join keys) for you to accept, edit, or reject in seconds instead of minutes.
That's what Contextary's AI-assisted annotations do. Connect your warehouse, the schema imports automatically, and AI suggests starter context. You spend your time on the 20% that requires real judgment, your business rules and your edge cases, instead of writing "this is a UUID identifier" four hundred times.
Documentation is only useful if it changes the answer. Keep a list of test questions, the ones your team asks every week, and run them through Claude after each round of annotations. The right ones to start with:
When an answer is wrong, the lesson is usually clear: the annotation that would've prevented it isn't there yet. Add it, re-test, move on. See how to stop Claude from hallucinating SQL for the deeper troubleshooting playbook.
The five common failure modes and the playbook for grounding AI in your team's context.
A practical guide to defining MRR, churn, and the metrics that get reported with three different numbers.
An explainer on the Model Context Protocol and why it changed how AI tools see your data.
The connection-and-setup tutorial. Snowflake first, with parallel guides for BigQuery, Postgres, Redshift.
Connect your warehouse and Contextary will draft your handbook for you in minutes. Free to get started.