Back to Blog

The Data Context Layer: Context Engineering for AI-Native Platforms

Context engineering is the hottest term in AI right now. And the entire conversation is stuck on prompting, chatbots, and coding assistants.

Context engineering is the delicate art and science of filling the context window with just the right information for the next step.” — Andrej Karpathy

Karpathy’s talking about prompts. But the context window is only as good as what feeds it. For AI-native platforms, that’s your data infrastructure, and nobody’s engineering it with AI in mind. That’s the “data context layer”.

Your Models Are Smart. Your Data Stack Isn’t Ready.

Today’s foundational LLMs can parse your code, read your configs, and explain your logic with some clever prompts. They’re impressively capable and entirely missing the point. The models aren’t the bottleneck. The data context feeding them is.

Data engineering is more nuanced than code and queries. Effective AI needs awareness of lineage, semantics, grain, dimensions, and downstream impact. It needs to understand not only how data is structured, but what it means. A customer ID that’s a string of numbers in one system and a UUID in another isn’t a schema difference. It’s a context gap that turns inference into guesswork.

This is why we’ve built our data platform around shared data context from day one. Our data pipelines are designed so that every stage understands what upstream systems produce and what downstream consumers expect. When a document enters our system, the pipeline tracks its state, processing history, extraction, and dependencies across ingestion, transformation, and scoring layers. That shared understanding enables models to understand claim state rather than guess. Without it, you get context gaps where systems operate in isolation and nobody, human or AI agent, can trace the impact of a change.

The Modern Data Stack Was Built for Humans

Everything in the modern data stack was designed for human engineers and analysts. Staging tables, canonical data layers, BI dashboards. All layers optimized for how humans think.

AI doesn’t work that way. It needs information structured for programmatic access, delivered at exactly the right moment. Not too much context. Not too little. Just enough, just in time.

This is what I mean by the data context layer: a context graph spanning your entire data system, from source to reports. It encodes lineage, semantics, runtime signals, and business meaning into a form agents can traverse and reason over.

In our world, that means models need to know how a claim entity connects to claimants, injuries, treatments, providers, and legal filings through defined relationships. We’ve invested heavily in building operational visibility across these stages precisely because, without it, neither humans nor agents can reason about the system.

Architecture Patterns That Actually Work

Here are the architecture patterns we’ve found that actually work in production.

Deterministic first, LLM where needed.

Not everything needs an LLM. SQL and rule-based engines handle grain, joins, and aggregations reliably. The LLM earns its keep on the 10% that’s genuinely unstructured and context-dependent: summarization and extracting meaning from claim documents and unstructured notes. We run deterministic pipelines for structured data processing and reserve LLMs for intelligent document extraction, data intelligence, and extracting meaning from unstructured notes. Many of our data operations never see an LLM, but connecting predictions and explainability to document evidence is something deterministic pipelines can’t do.

Structure domain knowledge. 

This is our foundation. We have encoded our domain knowledge in a structured, queryable, and reusable form. Our foundation for this is a multi-layered lakehouse and MDM architecture that encodes entity relationships through dimensional models, covering claims, claimants, injuries, treatments, providers, and legal filings, with semantic search over unstructured documents on top. That combination of structured and unstructured data foundations is what we’re building on as we move toward a domain knowledge graph that makes these relationships explicitly traversable. In a domain like insurance, where entity networks are dense and context-dependent, this is the only way to scale context delivery.

Evals. Evals. Evals.

You can’t build effective AI if you can’t evaluate its accuracy. And you can’t measure them without data context; knowing exactly what data fed the model, which pipeline version produced it, and what changed. We’ve built standardized evaluation frameworks for our predictive models, including classification and regression evaluators, shadow testing between pipeline versions, and bias detection across our product lines. Establish baselines, test for regressions repeatably, and treat evaluation as infrastructure, not an afterthought.

Guardrails are data context.

In AI-native systems, engineering is fundamentally about guardrails because the system is non-deterministic. Most guardrail conversations focus on prompt filtering and output safety. In production, guardrails are domain constraints: valid score ranges, required entity relationships, and business rules that define what a correct output looks like. Without a data context layer encoding those rules, guardrails are just string matching. This is the part most teams miss: your data engineers are already building guardrails. Every validation rule, every entity relationship, every business constraint in your data model is a guardrail.

Context Starts in the Data

Enterprises are data-rich and context-poor. We’ve spent years building the infrastructure that gives our models trusted, domain-specific context. The organizations that win in 2026 won’t only leverage the best models. They’ll have the data context layer underneath them. We’ve been building that way from day one.

Doug Lawrence

Douglas Lawrence is a CTO building AI-native platforms. This blog is part of CLARA Analytics’s series on engineering leadership, AI architecture, and the realities of shipping AI production systems at scale.

It's Easy To Get Started

Optimize claims outcomes with the power of AI

CLARA
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.