What's the difference between schema and context?

Schema is what your warehouse knows: table names, column names, types, and primary keys. Context is what your team knows: which join key is canonical, which 'amount' column is in cents, what 'qualified' means in your sales motion, why you exclude is_test=true. AI needs both. Schema alone produces SQL that compiles but answers the wrong question.

Do I need to document every column in my warehouse?

No. Document the tables AI will query most, typically the analytics layer (orders, accounts, opportunities, events) plus any table with non-obvious columns. Skip raw landing tables and tables nobody queries. A good rule: if no human on the team has ever asked a question about it, AI doesn't need context for it yet.

How long does it take to document a warehouse for AI?

It depends on warehouse size and how much tribal knowledge needs capturing. With AI-assisted annotation tools like Contextary, a typical analytics warehouse with 30-60 important tables can be annotated meaningfully in a few hours. Without help, the same work takes weeks of prompt engineering and still doesn't survive across teams.

Will Claude follow the documentation I write?

Yes, if the documentation is structured and reaches Claude reliably. The Model Context Protocol (MCP) makes this concrete: Contextary's MCP server hands your annotations to Claude as part of every prompt, so the rules don't get forgotten between conversations. Without MCP, you're back to copy-pasting prompts and hoping context survives.

How to Document Your Data Warehouse for Claude: A Practical Guide

Why schema isn't enough

Your data warehouse already has a schema. Tables, columns, types, primary keys. That's not the problem. The problem is everything not in the schema:

→ The amount column is in cents, so you have to divide by 100 before showing dollars.
→ Stage 0 opportunities aren't qualified. They're junk that hasn't been triaged yet.
→ accounts.id joins to orders.account_id, never to orders.account_uuid (a deprecated column nobody removed).
→ Fiscal Q1 starts in February, not January.
→ Test accounts have is_test = true and need to be filtered out for any reporting.

None of this is in your DDL. All of it is required for AI to give correct answers. When Claude doesn't know these things, it doesn't ask. It guesses. Sometimes the guess is right. Often it's confidently, quietly wrong.

The four layers of context AI actually needs

1

Plain-English column meaning

What each column actually represents. stage_name on its own is meaningless; "Salesforce opportunity stage. Stage 0 means inbound triage, not qualified" is useful. Focus on the columns AI will actually need: identifiers, foreign keys, status fields, money columns, dates.

2

Gotchas (the data hazards)

Reusable warnings tied to specific columns or tables. The patterns repeat across companies. Most warehouses have at least these:

• Cents vs. dollars: money columns stored as integer cents
• Soft deletes: rows aren't really deleted; filter deleted_at IS NULL
• Test accounts: internal accounts pollute production reports
• Status enums: "Stage 0", "Pending", "Closed Lost (no contact)". Which ones count?
• Fiscal calendar: Q1 doesn't always start January 1

3

Canonical join keys

Spell out which keys to use and which to avoid. Most warehouses have at least one column that looks like the right join key but isn't: a deprecated identifier nobody removed, or a string that should never be matched against a UUID. Saying "orders.account_id joins to accounts.id" once is worth more than the same instruction in 200 system prompts.

4

Business rules and metric definitions

What "revenue" actually means in your company. Whether MRR includes annual contracts divided by 12. How you count an "active user". Last 7 days? 30? Does a paused account count as churned? These are the questions that produce wildly different numbers if AI guesses. See how to define metrics for AI for the deeper playbook.

Where to start (in this order)

You don't need to document the whole warehouse. You need to document the parts AI will be asked about. A practical order that works for most analytics warehouses:

1 Pick the 5-10 tables your team queries most. Usually orders, accounts, opportunities, subscriptions, events, plus a few sources of truth specific to your business.
2 Annotate the columns that produce wrong answers when misunderstood: money columns (cents/dollars), status enums, soft-delete flags, test-account flags. These are the gotchas.
3 Spell out the join keys between those tables. One canonical line per pair.
4 Define your top 5 metrics: MRR, churn, NRR, qualified pipeline, active users (or whatever your equivalents are). Include the formula, the filters, and the gotchas.
5 Test by asking real questions. If Claude's first attempt is wrong, the annotation that should have prevented it is missing. Add it.

The shortcut

Let AI start the handbook for you

Cold-start is the reason most data catalogs die. Looking at an empty handbook with 200 tables to annotate, every team blinks. The unlock is letting AI read your schema and propose the first draft (column descriptions, likely gotchas, probable join keys) for you to accept, edit, or reject in seconds instead of minutes.

That's what Contextary's AI-assisted annotations do. Connect your warehouse, the schema imports automatically, and AI suggests starter context. You spend your time on the 20% that requires real judgment, your business rules and your edge cases, instead of writing "this is a UUID identifier" four hundred times.

How to verify your annotations work

Documentation is only useful if it changes the answer. Keep a list of test questions, the ones your team asks every week, and run them through Claude after each round of annotations. The right ones to start with:

✓ "What was revenue last quarter?" Does it apply your fiscal year + currency conversion correctly?
✓ "How many qualified opportunities do we have?" Does it filter Stage 0 and test accounts?
✓ "What's our churn rate this month?" Does it use your definition or one it invented?
✓ "How many active users this week?" Does it apply soft-delete and test-account filters?

When an answer is wrong, the lesson is usually clear: the annotation that would've prevented it isn't there yet. Add it, re-test, move on. See how to stop Claude from hallucinating SQL for the deeper troubleshooting playbook.

Keep reading

Troubleshooting

How to document your data warehouse for Claude

The short answer

Why schema isn't enough

The four layers of context AI actually needs

Plain-English column meaning

Gotchas (the data hazards)

Canonical join keys

Business rules and metric definitions

Where to start (in this order)

Let AI start the handbook for you

How to verify your annotations work

Keep reading

How to stop Claude from hallucinating SQL

How to define metrics so AI uses them consistently

What is an MCP server for your data warehouse?

How to connect Snowflake to Claude

Skip the cold-start