Beyond the Semantic Layer: Federated Context Layer

This article builds on

This is a companion piece to The Broken Promise of the Semantic Layer, which covers why the traditional approach to building semantic layers falls short and how continuous semantic mining offers a better path. This article goes deeper into the architectural question: what comes after the semantic layer?

The Semantic Layer Changed How Data Teams Work. AI Needs It to Go Further.

The semantic layer was one of the most important ideas in modern data infrastructure. Before it existed, every dashboard, every report, every analyst had their own version of "revenue." Finance calculated it one way. Marketing another. The board deck was a negotiation, not a readout.

Cube, dbt MetricFlow, and Looker's LookML changed that. They gave data teams a single place to define business logic: what MRR means, how churn is calculated, which filters apply. They decoupled it from any specific BI tool. Define once, use everywhere. It was a genuine breakthrough: consistent metrics across teams, a single source of truth for calculations, and business logic that could be version-controlled and reviewed like code. The semantic layer turned "my number is different from your number" into a solvable problem. That matters. A lot of hard work went into making that category real.

But the semantic layer was designed for a specific world: one where humans write SQL, dashboards render results, and the semantic layer sits between them to ensure consistency. As a semantic layer for AI – where agents autonomously query data – it was never built to handle the breadth of context required. In that original world, it works beautifully.

Every organization has two layers of semantics. There are the explicit semantics: the metric definitions, formulas, and hierarchies that someone formally wrote down in Cube, dbt, or LookML. And there are the implicit semantics: how the organization actually talks about and uses data. The CFO's mental model of what "revenue" means in a board context. The fact that marketing means "gross" when they say "revenue" while finance means "net." The correction someone made in Slack last Thursday. This implicit layer is culturally embedded, constantly shifting, and almost never written down.

Before AI, the gap between explicit and implicit semantics was manageable. A data analyst knows to ask "which revenue do you mean?" before running a query. They fill in the implicit context naturally, through experience, conversations, and domain knowledge. But an AI agent doesn't have that instinct. It takes the explicit definition at face value and serves a confident answer. The semantic layer captures the explicit. Nobody captures the implicit. That's about to matter a lot more.

We are moving into a different world. One where AI agents answer questions, build pipelines, and surface recommendations. And in that world, the semantic layer is necessary, but it's not enough.

The Scenario

A company has a well-maintained semantic layer. Revenue is defined as a metric. Churn is defined. The data team did their job. Then an AI agent is asked "What's our revenue this quarter?" It uses the revenue metric from the semantic layer, queries the revenue table, and returns a number. The CFO looks at it and says "That's wrong. We report recognized revenue to the board, not billed revenue."

Could the data team have encoded that rule in the semantic layer? Absolutely. Cube and dbt MetricFlow support complex business logic. But nobody told them to. The distinction between billed and recognized revenue lived in a Confluence page and the CFO's head. The semantic layer only contains what someone explicitly defines, and it has no way to discover what's missing. For every metric like revenue, there are dozens of contextual rules, team-specific interpretations, and edge cases that no data team can anticipate and pre-encode. The semantic layer was correct. It just wasn't complete. And at scale, it never will be.

The gap is not a bad semantic layer. It's that the real world has more context than anyone can manually formalize upfront. To give accurate, trustworthy answers, an AI agent needs:

Semantic definitions (what things mean). Metric formulas, KPI calculations, business term glossaries, dimensional hierarchies, fiscal calendars. This is what traditional semantic layers provide. The foundation.
Business rules and logic (how things work). Conditional logic, exceptions, domain-specific playbooks. "The board uses ASC 606 recognized revenue with a one-month lag." "EMEA includes Turkey for sales reporting but not for GDPR compliance." These live in Confluence, Notion, people's heads. No semantic layer captures them.
Data landscape (what exists and how it connects). Table descriptions, join paths, entity resolution across systems, source-of-truth mappings, lineage, ownership. Data catalogs cover parts of this, but no single tool covers it all.
Quality and trust signals (what to trust and what to doubt). Freshness, completeness, anomaly detection, certification status, known issues, expected value ranges.
Institutional knowledge (what the organization has learned). Corrections from conversations, historical context, team-specific conventions, usage patterns, persona-specific interpretations. The knowledge that gets you from 85% to 99% accuracy.
Operational rules and AI data governance (how the agent should behave). Presentation preferences, analytical conventions, hard limits, brand terminology. "Always present revenue in EUR for executive audiences." "Flag any result where sample size is below 30." These are not about what the data means. They're about how the agent should act – and how AI data governance is enforced in practice.

A traditional semantic layer provides the first item on this list. An AI agent needs all six. Notice that the last five aren't even about data semantics. They're about business logic, trust, organizational memory, and behavioral rules. This is why the answer isn't a better semantic layer. It's a broader context layer.

Breadth is only half the problem. The other half is that traditional semantic layers are static. A data team builds definitions upfront, then updates them reactively when someone files a ticket. That cadence worked when dashboards were the primary consumer. But when AI agents give every employee direct access to data, the velocity of exploration explodes. New metrics emerge from how people actually use the data. New questions surface that nobody anticipated. The context layer can't be a file that gets reviewed quarterly. It needs to be a living artifact that evolves with the business.

The Two Sources of Truth Trap

Here is the trap that every AI analytics platform falls into: they realize agents need institutional context, so they build their own semantic layer. Now you have two.

Finance defines "revenue" in dbt MetricFlow. The AI platform also has a "revenue" definition in its own knowledge store. Which one wins?

Worse: the dbt definition gets updated next quarter because the CFO changes how revenue recognition works. Does the AI platform know? Does it drift silently? Who is responsible for keeping them in sync? The data team is now maintaining definitions in two places, which is strictly worse than maintaining them in one.

Nobody wants two sources of truth. And telling a data team "just maintain your definitions in our tool instead" is a non-starter when they have years of work invested in dbt, Cube, or LookML.

This problem disappears when you think about it differently.

Google doesn't host websites. It indexes the web. It aggregates information from millions of sources, makes it findable, and surfaces when two sources say contradictory things. Google's value is not in storing content. It's in federating access to content that lives everywhere.

A data platform shouldn't try to BE the semantic layer. It should read from whatever semantic layers, glossaries, documentation, and metadata already exist, and add the intelligence on top.

What this means for your existing semantic layer

It becomes more valuable, not less. Today, your Cube or dbt MetricFlow definitions serve your dashboards. With a federated context layer reading from them, those same definitions now serve every AI agent, every Slack question, every API call, enriched with the quality signals, business rules, and institutional memory that Cube alone can't carry. You invested in your semantic layer. Now that investment works harder.

What a Federated Context Layer Actually Is

A federated context layer aggregates, reconciles, and enriches institutional context from wherever it already lives. It is broader than a semantic layer (it includes everything an AI agent needs, not just metric definitions) and it is federated (it reads from existing tools rather than replacing them).

Five capabilities define it:

Aggregates

Connects to wherever context lives today: dbt YAML files, Cube semantic models, Looker LookML, Confluence pages, Notion databases, CSV data dictionaries, warehouse information_schema, and direct user conversations. Context is never in one place. The federated context layer meets it where it is.

Reconciles

Data reconciliation across sources is one of the hardest unsolved problems in enterprise data. When two sources define the same term differently, the system detects the discrepancy and surfaces it with evidence. Not a vague alert; it provides concrete data: who uses which definition, how often, in what context, and where they diverge. Exact matches are merged. Semantic overlaps are linked. Contradictions are flagged for human resolution.

Recommends

Proactively suggests what's missing: metric definitions that are referenced but never formally defined, derived tables that are reused across many questions, data sources that would fill a gap, joins that business users frequently request. The system proposes. Humans decide.

Stores

The federated context layer includes a native store so you never need to adopt a separate tool just to get started. For teams without existing semantic tools, it is the primary context layer from day one. For teams with Cube or dbt, it reads from those tools and the native store captures only what they can't: corrections from conversations, business rules surfaced through usage, quality signals from monitoring. This is not a competing source of truth. It's the layer that holds everything your semantic layer was never designed to hold.

Learns

Every conversation, correction, endorsement, and dismissal enriches the context continuously. When AI agents give every employee self-serve access to data, the velocity of new questions explodes. The context layer crowd-sources these analysis trends: what metrics get asked about most, what definitions get corrected, what tables get joined repeatedly, what questions can't be answered. It surfaces these patterns to the data team as actionable recommendations. Instead of waiting for tickets, the data team proactively formalizes what the business is already doing.

The Spectrum

The federated context layer serves every team, regardless of maturity:

← No existing tools • Mature semantic layer →

Starting from scratch

Native store holds everything. Full value from day one. No other tools required.

Some docs & definitions

Imports, reconciles, and enriches. Finds the contradictions nobody knew about.

Mature Cube/dbt setup

Reads from existing tools and adds the broader context they can't provide: quality signals, tribal knowledge, recommendations.

For all three, the outcome is the same: a unified context that is broader than any single source, actively reconciled, and continuously learning.

How It Works in Practice

Scenario A: Data Reconciliation

When you already have definitions across multiple tools

A company has metric definitions in dbt, a data glossary in Confluence, and business rules that live in people's heads.

The federated context layer connects to all three and runs data reconciliation across them. It imports 47 definitions from dbt, 23 from Confluence. It cross-references them and finds 8 contradictions the team never knew about. "Revenue" in dbt excludes refunds; "Revenue" in Confluence includes them. Both have been in use for two years. Neither team knew the other's definition was different.

As business users start asking questions through agents, the system observes what's being asked and how data is being used. From these usage patterns, it proposes 12 new definitions that nobody had formalized yet and flags 5 data gaps where a metric is defined but no data exists to compute it.

Everything flows through a single review surface. The data team resolves contradictions, either by updating the source (push the fix to dbt) or by storing the resolution directly. They accept, modify, or dismiss each recommendation. The system learns from every decision.

Scenario B: Bootstrap

When you're starting from scratch

A company with no semantic layer, no glossary, no formalized definitions.

The data team exposes their warehouse tables to the platform. Business users start asking questions through agents. Each correction becomes permanent institutional knowledge. "That number looks wrong, exclude trial accounts from MRR" is not just a one-time fix. It becomes a persistent rule that applies to every future MRR question, from every user, through every channel.

Within weeks, the data team has a working context layer with dozens of definitions, quality baselines, and recommendations. That is something that would have taken months to build manually in dbt or Cube. If they later adopt dbt, they can export these definitions to seed it. No lock-in.

Scenario C: Agent-readiness

When an external agent needs data

A Claude Code session, a custom automation, or an internal agent calls the platform's API: "find customer lifetime value data."

The federated context layer routes through all available context: the endorsed CLV definition (originally imported from dbt), the customer table documentation from the dbt project, the quality signal confirming the payments table is fresh, the business rule that CLV should use a 24-month lookback window (captured from a conversation correction last month).

The agent gets a complete, contextual answer with full provenance. Not just a table name and column list, but what the data means, how reliable it is, and where the definition came from. This is the difference between an agent that gives accurate answers and one that gives plausible guesses.

Why This Matters Now

The semantic layer for AI had its breakout moment in 2024-2025. Every major data vendor converged on the same realization: AI agents need institutional context to give accurate answers. Cube raised funding. dbt shipped MetricFlow. Snowflake launched Cortex with semantic model support. Analysts started writing about the "semantic layer imperative." The industry agreed: you can't just point an LLM at a warehouse and hope for the best. You need a layer of business meaning between the data and the agent.

They were right. And the results proved it: companies that invested in a semantic layer saw measurably better AI accuracy than those that didn't.

But a pattern emerged. Even companies with well-maintained semantic layers hit a ceiling. The agent could get metric definitions right, but it still missed business rules that lived in Confluence. It still didn't know about the data quality issue from last Tuesday. It still couldn't reconcile the fact that marketing and finance use "revenue" differently, because both definitions were technically correct in their respective contexts, and the semantic layer only had room for one.

The semantic layer solved the metric consistency problem. But agents don't just need metric consistency. They need the full picture: metrics plus business rules plus quality signals plus tribal knowledge plus data reconciliation across sources. And maintaining all of that in separate tools (a semantic layer here, a data catalog there, a quality monitor somewhere else) is not realistic for a team of 1-3 data people. AI data governance – controlling what agents can access, how they behave, and what rules they follow – cannot be bolted on after the fact. It has to be built into the context layer itself.

The emerging need is a single layer that federates all of a company's institutional context, from whatever tools already hold it, into one reconciled, continuously-learning surface that agents can query.

Why it must be federated: Because no single tool will ever hold all context. dbt holds metric definitions. Data catalogs like Databricks Unity Catalog, Alation, or Atlan hold lineage, ownership, and certifications. Confluence holds business processes. Slack holds corrections and tribal knowledge. The data warehouse holds structural metadata. A platform that tries to replace all of these will fail; there's too much surface area, too much organizational inertia. A platform that reads from all of them and reconciles them succeeds precisely because it doesn't ask anyone to change how they work.

Why This Requires Vertical Integration

Building a federated context layer is not a feature you bolt onto an existing tool. It requires three things working together:

An execution layer that sees every query

If you don't own query execution, you can't learn from usage. You don't know which definitions get used, which corrections get made, which questions can't be answered. A semantic layer like Cube defines metrics, but it doesn't see the questions. A warehouse like Snowflake runs SQL, but it doesn't see the business intent behind it.

A knowledge graph that stores and relates context

Metric definitions, business rules, quality signals, corrections, and tribal knowledge are not flat files. They relate to each other: this metric depends on that table, which was corrected by this user, and contradicts that imported definition. You need a graph structure to capture those relationships, not another YAML file.

An ingestion layer that connects to sources

Federation means reading from dbt, Confluence, Notion, warehouse metadata, and conversations. That requires native connectors and a reconciliation engine that can parse structured definitions, compare them against existing knowledge, and surface contradictions intelligently.

No single-purpose tool has all three. A semantic layer has definitions but not execution. A warehouse has execution but not knowledge. A data catalog has metadata but not queries. The federated context layer sits at the intersection, and that intersection requires a vertically integrated platform.

Ronja

Ronja was built this way from the start. It owns the execution layer, so it sees every query and every correction. It has a persistent knowledge graph where every definition, rule, and correction is stored with relationships to tables, users, and other definitions. And it has native connectors and a reconciliation engine that imports from dbt, Looker, Confluence, and any structured format.

These aren't features added after the fact. The federated context layer is the architecture.

What This Changes

For data teams

Stop maintaining definitions in six places and hoping they stay in sync. The federated context layer reconciles them for you, surfaces where they diverge, and recommends what's missing. More importantly, it shows you what the business actually needs based on real usage, so you work on what matters most. Your existing tools keep working, and they become more valuable.

For CTOs

The "accuracy problem" in AI analytics is actually a context problem – and AI data governance is a context problem too. Context is federated by nature; it lives in dbt, in Confluence, in people's heads, in Slack threads. The platform with the richest, most reconciled context becomes the default data backend for every agent in the company.

For the industry

The semantic layer was step one. It solved metric consistency for dashboards. The federated context layer is step two. It solves agent accuracy at the scale, breadth, and speed that AI-first analytics demands. Not by replacing what exists, but by making it all work together.

Key takeaways

The semantic layer solves metric consistency, but AI agents need six categories of context – not just metric definitions
Building a second semantic layer inside your AI platform creates a two-sources-of-truth problem that gets worse over time
A federated context layer reads from existing tools (dbt, Cube, Confluence, warehouse metadata) rather than replacing them
Five capabilities define it: aggregates, reconciles, recommends, stores, and learns
This requires vertical integration – an execution layer, a knowledge graph, and an ingestion layer working together

Frequently asked questions

What is a federated context layer?

A federated context layer aggregates, reconciles, and enriches institutional context from wherever it already lives – dbt definitions, Cube models, Confluence pages, warehouse metadata, and user conversations. It is broader than a semantic layer (covering business rules, quality signals, and institutional knowledge, not just metrics) and federated (it reads from existing tools rather than replacing them).

How is a federated context layer different from a semantic layer?

A semantic layer defines metric formulas and business terms. A federated context layer includes metric definitions but also business rules, quality signals, institutional knowledge, and operational governance rules – the full context an AI agent needs to give accurate answers. It also federates across multiple tools rather than requiring all definitions in one place.

Does a federated context layer replace dbt or Cube?

No. It reads from them. Your dbt or Cube definitions continue to serve your dashboards. The federated context layer imports those definitions, enriches them with context those tools can't carry (quality signals, business rules, tribal knowledge), and makes them available to every AI agent and every surface. Your existing tools become more valuable, not less.

What is the "two sources of truth" problem?

When an AI analytics platform builds its own semantic layer alongside an existing one (like dbt), the data team ends up maintaining definitions in two places. When one gets updated and the other doesn't, they silently drift apart. A federated approach avoids this by reading from existing tools rather than duplicating them.

Why does a federated context layer require vertical integration?

It requires three things working together: an execution layer that sees every query (to learn from usage), a knowledge graph that stores and relates context, and an ingestion layer that connects to external sources. No single-purpose tool – whether a semantic layer, a warehouse, or a data catalog – has all three.

What is AI data governance in this context?

AI data governance refers to the rules that control how AI agents access and present data: what they can see, how they should behave, what limits apply. In a federated context layer, these rules are part of the context itself – enforced architecturally, not through prompt instructions – so they apply consistently regardless of which agent or channel is used.

Beyond the Semantic Layer: The Federated Context Layer

The Semantic Layer Changed How Data Teams Work. AI Needs It to Go Further.

The Two Sources of Truth Trap

What a Federated Context Layer Actually Is

Aggregates

Reconciles

Recommends

Stores

Learns

The Spectrum

How It Works in Practice

When you already have definitions across multiple tools

When you're starting from scratch

When an external agent needs data

Why This Matters Now

Why This Requires Vertical Integration

An execution layer that sees every query

A knowledge graph that stores and relates context

An ingestion layer that connects to sources

What This Changes

Frequently asked questions

Ready to make better decisions?

Beyond the Semantic Layer: The Federated Context Layer

The Semantic Layer Changed How Data Teams Work. AI Needs It to Go Further.

The Two Sources of Truth Trap

What a Federated Context Layer Actually Is

Aggregates

Reconciles

Recommends

Stores

Learns

The Spectrum

How It Works in Practice

When you already have definitions across multiple tools

When you're starting from scratch

When an external agent needs data

Why This Matters Now

Why This Requires Vertical Integration

An execution layer that sees every query

A knowledge graph that stores and relates context

An ingestion layer that connects to sources

What This Changes

Frequently asked questions

Related articles

The broken promise of the semantic layer, and how AI fixes it

What is a data discovery platform? The complete guide (2026)

Ready to make better decisions?