Contextary is in early beta. Free for anyone to use, for now. Questions? Email dan@contextary.ai
Annotation playbook

How to document your data warehouse for Claude

Schema isn't context. Here's the practical playbook for capturing the knowledge AI needs to write accurate SQL: the gotchas, the join keys, the metric definitions, the business rules.

The short answer

Document four layers: what columns mean, where the gotchas live, which join keys are canonical, and what your metrics actually represent. Serve those annotations to Claude over the Model Context Protocol so they reach every prompt automatically. Never hand-write the same context twice.

Why schema isn't enough

Your data warehouse already has a schema. Tables, columns, types, primary keys. That's not the problem. The problem is everything not in the schema:

None of this is in your DDL. All of it is required for AI to give correct answers. When Claude doesn't know these things, it doesn't ask. It guesses. Sometimes the guess is right. Often it's confidently, quietly wrong.

The four layers of context AI actually needs

1

Plain-English column meaning

What each column actually represents. stage_name on its own is meaningless; "Salesforce opportunity stage. Stage 0 means inbound triage, not qualified" is useful. Focus on the columns AI will actually need: identifiers, foreign keys, status fields, money columns, dates.

2

Gotchas (the data hazards)

Reusable warnings tied to specific columns or tables. The patterns repeat across companies. Most warehouses have at least these:

  • Cents vs. dollars: money columns stored as integer cents
  • Soft deletes: rows aren't really deleted; filter deleted_at IS NULL
  • Test accounts: internal accounts pollute production reports
  • Status enums: "Stage 0", "Pending", "Closed Lost (no contact)". Which ones count?
  • Fiscal calendar: Q1 doesn't always start January 1
3

Canonical join keys

Spell out which keys to use and which to avoid. Most warehouses have at least one column that looks like the right join key but isn't: a deprecated identifier nobody removed, or a string that should never be matched against a UUID. Saying "orders.account_id joins to accounts.id" once is worth more than the same instruction in 200 system prompts.

4

Business rules and metric definitions

What "revenue" actually means in your company. Whether MRR includes annual contracts divided by 12. How you count an "active user". Last 7 days? 30? Does a paused account count as churned? These are the questions that produce wildly different numbers if AI guesses. See how to define metrics for AI for the deeper playbook.

Where to start (in this order)

You don't need to document the whole warehouse. You need to document the parts AI will be asked about. A practical order that works for most analytics warehouses:

  1. 1 Pick the 5-10 tables your team queries most. Usually orders, accounts, opportunities, subscriptions, events, plus a few sources of truth specific to your business.
  2. 2 Annotate the columns that produce wrong answers when misunderstood: money columns (cents/dollars), status enums, soft-delete flags, test-account flags. These are the gotchas.
  3. 3 Spell out the join keys between those tables. One canonical line per pair.
  4. 4 Define your top 5 metrics: MRR, churn, NRR, qualified pipeline, active users (or whatever your equivalents are). Include the formula, the filters, and the gotchas.
  5. 5 Test by asking real questions. If Claude's first attempt is wrong, the annotation that should have prevented it is missing. Add it.
The shortcut

Let AI start the handbook for you

Cold-start is the reason most data catalogs die. Looking at an empty handbook with 200 tables to annotate, every team blinks. The unlock is letting AI read your schema and propose the first draft (column descriptions, likely gotchas, probable join keys) for you to accept, edit, or reject in seconds instead of minutes.

That's what Contextary's AI-assisted annotations do. Connect your warehouse, the schema imports automatically, and AI suggests starter context. You spend your time on the 20% that requires real judgment, your business rules and your edge cases, instead of writing "this is a UUID identifier" four hundred times.

How to verify your annotations work

Documentation is only useful if it changes the answer. Keep a list of test questions, the ones your team asks every week, and run them through Claude after each round of annotations. The right ones to start with:

When an answer is wrong, the lesson is usually clear: the annotation that would've prevented it isn't there yet. Add it, re-test, move on. See how to stop Claude from hallucinating SQL for the deeper troubleshooting playbook.

Keep reading

Skip the cold-start

Connect your warehouse and Contextary will draft your handbook for you in minutes. Free to get started.