Epic EHR data management
at AMC scale.

Chronicles, Clarity, and Caboodle are three distinct data environments — each with different access patterns, refresh cadences, schema complexity, and analytic suitability. Most organizations treat them as one problem. Databasin was built knowing they aren't.

Chronicles (OLTP) Clarity (Reporting DB) Caboodle (Star Schema) HIPAA Medallion ELT Azure Private Install LLM-Agnostic AI

Epic Data Environment Map

OLTP — Not for analytics

Layer 1

Chronicles

Live operational database. Cache-based, not relational. Direct analytic queries degrade production performance.

↓

Primary analytics source

Layer 2

Clarity

Relational reporting DB — nightly refresh from Chronicles. Core for research, finance, and operations analytics. Large, normalized, Epic-specific schema.

↓

Not uniformly deployed

Layer 3

Caboodle / Cogito

Epic's curated star-schema warehouse layer. Not consistently implemented across sites — coverage varies significantly.

↓

Narrow scope only

Layer 4

FHIR / App Orchard APIs

Covers standard resources only. Poor fit for bulk cohort or RCM analytics. Not a replacement for Clarity access.

The Technical Reality

What actually makes Epic data hard to use.

Not a summary — a specific breakdown of the six failure modes that show up in every AMC data engineering engagement.

Clarity schemas are large, normalized, and Epic-specific

Dozens of tables required to answer a single clinical question. The schema reflects Epic's internal logic and configuration, not a generic healthcare model. Generic SQL skills are necessary but not sufficient — deep institutional knowledge of workflows and build is equally critical.

Every implementation has a unique local build

Different combinations of Foundation content, best-practice templates, and local customization across flowsheets, SmartForms, and order sets. SQL that works at one site requires substantial adaptation before it works at another.

Clarity is batch — nightly refresh, not real-time

New events are delayed until the next refresh window. Teams must explicitly communicate which dashboards show "today" vs. "yesterday" — and that distinction is rarely clear.

Brittle ETL and CSV-dump pipelines

Scheduled ETL and CSV exports are the pragmatic workaround for constrained API access. But small changes in report logic or Clarity upgrades silently break downstream analytics. Key measures — readmission, LOS, RVUs, denials — scatter across multiple fragile pipelines.

FHIR APIs don't cover the full custom data model

Epic's App Orchard covers standard FHIR resources but rarely exposes an institution's full customized data model. For bulk research cohorts or RCM analytics, FHIR is a complement to Clarity access — not a replacement for it.

Epic-literate analysts are the bottleneck

Front-line staff and researchers are told "talk to your Epic reporting team" — but those teams are oversubscribed. New extracts and analytic requests take months. New projects begin with data archaeology: exploratory queries, trial and error, consultations with analysts who are already at capacity.

How Databasin Solves It

The architecture, layer by layer.

From Epic's source environments through a governed medallion lake house to AI-powered querying — with every design decision explained.

Step 1 — Understand the source environments

OLTP — Avoid for analytics

Chronicles

Cache-based operational store

Epic's live operational database. Not relational — uses a proprietary cache-based structure. Querying it directly for analytics degrades production performance and is generally prohibited by Epic and institutional policy.

Source of truth for live clinical operations

Feeds Clarity via nightly ETL process

Not suitable for direct analytic access

Primary analytics source

Clarity

Relational reporting database

The main relational reporting database, refreshed nightly from Chronicles. Core for research, revenue cycle, and operational analytics. Large, highly normalized schema — dozens of joined tables for a single clinical question.

Nightly refresh — T-1 data for most workloads

Highly normalized — many joins per query

Institution-specific schema — local build required

Primary Databasin extraction target

Star schema — varies by site

Caboodle

Epic's curated warehouse layer

Epic's data warehouse and subject-area marts — star schema format, more analytics-friendly than Clarity. Not uniformly implemented across AMCs. Coverage and fidelity vary significantly depending on institutional investment in Cogito.

Star schema — more query-friendly

Not consistent across implementations

Databasin supports when present

Step 2 — Apply medallion architecture (bronze for raw data, silver for transformed, gold for governed analytics)

Layer	What lands here	What happens here	Who uses it
Bronze	Raw Clarity extracts, Caboodle tables, HL7 feeds, CSV dumps from Reporting Workbench — ingested without transformation, exactly as received from Epic	Schema capture and version tracking. Every extract is timestamped, schema-versioned, and stored immutably. When Clarity upgrades change a column or table structure, the change is logged — not silently propagated downstream.	Data engineers auditing extraction fidelity, lineage tracing, reprocessing from source when rules change
Silver	Clarity encounters, diagnoses, procedures, orders, charges, payments, flowsheets — validated, standardized, and conformance-checked	Validation, standardization, and Epic-specific business rule application. ICD-10 code normalization, encounter status filtering, charge/payment reconciliation, effective-date handling, and institution-specific mapping logic applied here — before data reaches analysts.	Data engineers building curated marts, Epic analysts validating definitions, compliance and audit teams
Gold	Research cohort tables, operational dashboards, RCM analytics marts, quality measure views, population health summaries	Business-ready, governed, trusted. Conformed dimensions (patient, provider, encounter, facility), curated subject-area marts with documented metric definitions. One definition of readmission, one definition of LOS, one definition of denial — enforced by the ecosystem, not by individual analysts.	Researchers, clinical operations, finance teams, administrators — via direct SQL, BI tools, or Databasin's AI query layer

End-to-end data flow

Chronicles

OLTP

→

Clarity

Nightly refresh

→

Bronze

Raw + versioned

→

Silver

Validated + mapped

→

Gold

Governed + trusted

Caboodle

Star schema

→

Workday / REDCap / other

dozens of connectors

→

AI Query Layer

Natural language → answers

Architectural Decisions

Why we built it this way — and why the alternatives fail.

Problem → Direct Clarity querying

We anchor on Clarity as the primary extraction source — not FHIR APIs

FHIR App Orchard covers standard resources but never exposes an institution's full custom data model. For bulk cohort extraction, RCM analytics, or cross-domain research, FHIR is insufficient. Every team that tries it ends up back at Clarity within six months.

Databasin's Epic connector extracts from Clarity directly — with schema awareness of the full table structure, local customization, and Epic's internal data model. FHIR is supported as a supplemental stream, not the primary path.

Problem → Schema volatility on Epic upgrades

Schema changes in Clarity propagate to bronze — not downstream

Epic version upgrades and local build changes silently alter Clarity table structures. Organizations running ETL jobs directly against Clarity discover broken pipelines only when a dashboard shows wrong numbers — days or weeks after the fact.

Immutable bronze storage with schema versioning means every extraction is logged with its schema at time of capture. When Clarity changes, the change is detected, logged, and surfaced — before it propagates to the silver layer or reaches an analyst.

Problem → Inconsistent metric definitions

Key clinical and financial measures defined once at the silver layer — enforced everywhere

Readmission. Length of stay. RVUs. Denials. Every AMC has multiple definitions of each floating across individual reports, dashboards, and pipelines. When the CFO's number doesn't match the CMO's number, someone built the calculation twice in different places.

Silver-layer business rules encode institutional definitions centrally. One readmission calculation, one LOS formula, one denial status taxonomy — applied at transformation time, before data reaches any downstream consumer. Arguments about definitions stop at the ecosystem boundary.

Problem → Epic-literate analyst bottleneck

AI query layer on governed gold data eliminates routine Clarity dependency

Researchers and operational leaders waiting months for Epic reporting team capacity is a structural problem, not a staffing problem. Adding analysts doesn't scale — the Clarity knowledge required to write correct queries is concentrated in a small team and takes years to develop.

Natural language querying against the gold layer means a researcher can ask "show me all sepsis encounters in the last 12 months by unit and disposition" and get a governed answer — without submitting a ticket, waiting for an analyst, or knowing a single Clarity table name.

Deployment for Health Systems

How Databasin deploys in an AMC environment.

Recommended for AMCs

Private install — your Azure tenant

The most common deployment pattern for academic medical centers and health systems. Databasin deploys within your existing Azure environment — your data never leaves your governance perimeter, your security and compliance controls apply natively, and your IT team retains full visibility.

Data never leaves your Azure environment — HIPAA compliance maintained

Your existing Azure AD, RBAC, and network security policies apply

Meets Epic's data governance requirements for Clarity access

Full audit logging and lineage within your security boundary

LLM runs behind your perimeter — no PHI sent to external AI services

For existing Databricks or Fabric environments

BYO environment — layer on your existing platform

Already running Databricks or Microsoft Fabric in your Azure tenant? Connect Databasin's Epic connector, medallion pipeline automation, and AI query layer directly to your existing environment — without migrating your existing lake house infrastructure.

Epic connector and dozens more connectors on your Databricks or Fabric environment

Medallion pipeline automation provisioned in your existing Unity Catalog or Fabric workspace

AI query layer pointed at your existing governed tables

No data migration — Databasin works with what you've already built

A note on HIPAA and AI: Databasin's LLM-agnostic AI layer is designed for deployment behind your security boundary in all configurations. The AI model — whether GPT-5 via Azure OpenAI, Claude, or your own internally hosted model — queries your governed gold layer through a semantic translation layer. Patient data is never transmitted to an external AI service. Your PHI stays in your environment.

HIMSS '25

Featured by Microsoft and Databricks — three consecutive years

WashU

Co-created at Washington University School of Medicine — built in production, not a pilot

50 +

Connectors including all major Epic layers, Azure, Databricks, and AI APIs

Day 1

Time to governed Epic data in your lake house — not a six-month implementation

See It in Your Environment

Talk to an architect.
Not a sales rep.

Technical demos are led by Chris Lundeberg, Co-Founder & CPO. We'll walk through the Epic connector, the medallion pipeline, and the deployment architecture specific to your environment — Chronicles version, Clarity schema, and all.

Request a Technical Demo See the Health Systems page →

Epic EHR data management at AMC scale.

What actually makes Epic data hard to use.

The architecture, layer by layer.

Why we built it this way — and why the alternatives fail.

How Databasin deploys in an AMC environment.

Talk to an architect. Not a sales rep.

Epic EHR data management
at AMC scale.

Talk to an architect.
Not a sales rep.