Use Case 01 — Health Systems

Epic EHR data management
at AMC scale.

Chronicles, Clarity, and Caboodle are three distinct data environments — each with different access patterns, refresh cadences, schema complexity, and analytic suitability. Most organizations treat them as one problem. Databasin was built knowing they aren't.

Chronicles (OLTP) Clarity (Reporting DB) Caboodle (Star Schema) HIPAA Medallion ELT Azure Private Install LLM-Agnostic AI
Epic Data Environment Map
OLTP — Not for analytics
Layer 1
Chronicles
Live operational database. Cache-based, not relational. Direct analytic queries degrade production performance.
Primary analytics source
Layer 2
Clarity
Relational reporting DB — nightly refresh from Chronicles. Core for research, finance, and operations analytics. Large, normalized, Epic-specific schema.
Not uniformly deployed
Layer 3
Caboodle / Cogito
Epic's curated star-schema warehouse layer. Not consistently implemented across sites — coverage varies significantly.
Narrow scope only
Layer 4
FHIR / App Orchard APIs
Covers standard resources only. Poor fit for bulk cohort or RCM analytics. Not a replacement for Clarity access.
The Technical Reality

What actually makes Epic data hard to use.

Not a summary — a specific breakdown of the six failure modes that show up in every AMC data engineering engagement.

01
Clarity schemas are large, normalized, and Epic-specific

Dozens of tables required to answer a single clinical question. The schema reflects Epic's internal logic and configuration, not a generic healthcare model. Generic SQL skills are necessary but not sufficient — deep institutional knowledge of workflows and build is equally critical.

02
Every implementation has a unique local build

Different combinations of Foundation content, best-practice templates, and local customization across flowsheets, SmartForms, and order sets. SQL that works at one site requires substantial adaptation before it works at another.

03
Clarity is batch — nightly refresh, not real-time

New events are delayed until the next refresh window. Teams must explicitly communicate which dashboards show "today" vs. "yesterday" — and that distinction is rarely clear.

04
Brittle ETL and CSV-dump pipelines

Scheduled ETL and CSV exports are the pragmatic workaround for constrained API access. But small changes in report logic or Clarity upgrades silently break downstream analytics. Key measures — readmission, LOS, RVUs, denials — scatter across multiple fragile pipelines.

05
FHIR APIs don't cover the full custom data model

Epic's App Orchard covers standard FHIR resources but rarely exposes an institution's full customized data model. For bulk research cohorts or RCM analytics, FHIR is a complement to Clarity access — not a replacement for it.

06
Epic-literate analysts are the bottleneck

Front-line staff and researchers are told "talk to your Epic reporting team" — but those teams are oversubscribed. New extracts and analytic requests take months. New projects begin with data archaeology: exploratory queries, trial and error, consultations with analysts who are already at capacity.

How Databasin Solves It

The architecture, layer by layer.

From Epic's source environments through a governed medallion lake house to AI-powered querying — with every design decision explained.

Step 1 — Understand the source environments
OLTP — Avoid for analytics
Chronicles
Cache-based operational store
Epic's live operational database. Not relational — uses a proprietary cache-based structure. Querying it directly for analytics degrades production performance and is generally prohibited by Epic and institutional policy.
Source of truth for live clinical operations
Feeds Clarity via nightly ETL process
Not suitable for direct analytic access
Primary analytics source
Clarity
Relational reporting database
The main relational reporting database, refreshed nightly from Chronicles. Core for research, revenue cycle, and operational analytics. Large, highly normalized schema — dozens of joined tables for a single clinical question.
Nightly refresh — T-1 data for most workloads
Highly normalized — many joins per query
Institution-specific schema — local build required
Primary Databasin extraction target
Star schema — varies by site
Caboodle
Epic's curated warehouse layer
Epic's data warehouse and subject-area marts — star schema format, more analytics-friendly than Clarity. Not uniformly implemented across AMCs. Coverage and fidelity vary significantly depending on institutional investment in Cogito.
Star schema — more query-friendly
Not consistent across implementations
Databasin supports when present
Step 2 — Apply medallion architecture (bronze for raw data, silver for transformed, gold for governed analytics)
Layer What lands here What happens here Who uses it
Bronze Raw Clarity extracts, Caboodle tables, HL7 feeds, CSV dumps from Reporting Workbench — ingested without transformation, exactly as received from Epic Schema capture and version tracking. Every extract is timestamped, schema-versioned, and stored immutably. When Clarity upgrades change a column or table structure, the change is logged — not silently propagated downstream. Data engineers auditing extraction fidelity, lineage tracing, reprocessing from source when rules change
Silver Clarity encounters, diagnoses, procedures, orders, charges, payments, flowsheets — validated, standardized, and conformance-checked Validation, standardization, and Epic-specific business rule application. ICD-10 code normalization, encounter status filtering, charge/payment reconciliation, effective-date handling, and institution-specific mapping logic applied here — before data reaches analysts. Data engineers building curated marts, Epic analysts validating definitions, compliance and audit teams
Gold Research cohort tables, operational dashboards, RCM analytics marts, quality measure views, population health summaries Business-ready, governed, trusted. Conformed dimensions (patient, provider, encounter, facility), curated subject-area marts with documented metric definitions. One definition of readmission, one definition of LOS, one definition of denial — enforced by the platform, not by individual analysts. Researchers, clinical operations, finance teams, administrators — via direct SQL, BI tools, or Databasin's AI query layer
End-to-end data flow
Chronicles
OLTP
Clarity
Nightly refresh
Bronze
Raw + versioned
Silver
Validated + mapped
Gold
Governed + trusted
Caboodle
Star schema
Workday / REDCap / other
200+ connectors
AI Query Layer
Natural language → answers
Architectural Decisions

Why we built it this way — and why the alternatives fail.

Problem → Direct Clarity querying
We anchor on Clarity as the primary extraction source — not FHIR APIs
FHIR App Orchard covers standard resources but never exposes an institution's full custom data model. For bulk cohort extraction, RCM analytics, or cross-domain research, FHIR is insufficient. Every team that tries it ends up back at Clarity within six months.
Databasin's Epic connector extracts from Clarity directly — with schema awareness of the full table structure, local customization, and Epic's internal data model. FHIR is supported as a supplemental stream, not the primary path.
Problem → Schema volatility on Epic upgrades
Schema changes in Clarity propagate to bronze — not downstream
Epic version upgrades and local build changes silently alter Clarity table structures. Organizations running ETL jobs directly against Clarity discover broken pipelines only when a dashboard shows wrong numbers — days or weeks after the fact.
Immutable bronze storage with schema versioning means every extraction is logged with its schema at time of capture. When Clarity changes, the change is detected, logged, and surfaced — before it propagates to the silver layer or reaches an analyst.
Problem → Inconsistent metric definitions
Key clinical and financial measures defined once at the silver layer — enforced everywhere
Readmission. Length of stay. RVUs. Denials. Every AMC has multiple definitions of each floating across individual reports, dashboards, and pipelines. When the CFO's number doesn't match the CMO's number, someone built the calculation twice in different places.
Silver-layer business rules encode institutional definitions centrally. One readmission calculation, one LOS formula, one denial status taxonomy — applied at transformation time, before data reaches any downstream consumer. Arguments about definitions stop at the platform boundary.
Problem → Epic-literate analyst bottleneck
AI query layer on governed gold data eliminates routine Clarity dependency
Researchers and operational leaders waiting months for Epic reporting team capacity is a structural problem, not a staffing problem. Adding analysts doesn't scale — the Clarity knowledge required to write correct queries is concentrated in a small team and takes years to develop.
Natural language querying against the gold layer means a researcher can ask "show me all sepsis encounters in the last 12 months by unit and disposition" and get a governed answer — without submitting a ticket, waiting for an analyst, or knowing a single Clarity table name.
Deployment for Health Systems

How Databasin deploys in an AMC environment.

For existing Databricks or Fabric environments
BYO environment — layer on your existing platform
Already running Databricks or Microsoft Fabric in your Azure tenant? Connect Databasin's Epic connector, medallion pipeline automation, and AI query layer directly to your existing environment — without migrating your existing lake house infrastructure.
Epic connector and 200+ additional connectors on your Databricks or Fabric environment
Medallion pipeline automation provisioned in your existing Unity Catalog or Fabric workspace
AI query layer pointed at your existing governed tables
No data migration — Databasin works with what you've already built
A note on HIPAA and AI: Databasin's LLM-agnostic AI layer is designed for deployment behind your security boundary in all configurations. The AI model — whether GPT-5 via Azure OpenAI, Claude, or your own internally hosted model — queries your governed gold layer through a semantic translation layer. Patient data is never transmitted to an external AI service. Your PHI stays in your environment.
HIMSS '25
Featured by Microsoft and Databricks — three consecutive years
WashU
Co-created at Washington University School of Medicine — built in production, not a pilot
200+
Connectors including all major Epic layers, Azure, Databricks, and AI APIs
Day 1
Time to governed Epic data in your lake house — not a six-month implementation
See It in Your Environment

Talk to an architect.
Not a sales rep.

Technical demos are led by Chris Lundeberg, Co-Founder & CPO. We'll walk through the Epic connector, the medallion pipeline, and the deployment architecture specific to your environment — Chronicles version, Clarity schema, and all.