Pipelines

Ingestion modes

Snapshot, Delta, Historical, CDC, and Stored Procedure — when to use each.

Last updated June 29, 2026
Reading time 4 min read

Every object you sync — every artifact — carries one ingestion mode. It's the most important choice you make per table, because it decides how rows land in the target and how the next run behaves. You set it per artifact, so a single pipeline can mix modes freely.

There are five: Snapshot, Delta, Historical, CDC, and Stored Procedure.

The five modes at a glance

Mode How it writes Best for Key options
Snapshot Truncate and reload the whole table every run Small reference and dimension tables Snapshot retention
Delta Upsert new and changed rows, matched on merge keys Operational data that grows over time Watermark column, Merge keys, Backload, Detect deletes
Historical Append new rows; existing rows are never touched Event streams, time-series, ledgers Append-only
CDC Apply inserts, updates, and deletes from the source log Sources with transaction-log access Log-based capture
Stored Procedure Run a stored procedure on the source to produce the data Sources where a proc owns extraction Source-defined logic

Snapshot — full refresh every run

Snapshot replaces the target table on every run with a fresh copy of the source. Because it truncates and reloads, it's dup-safe by construction — you can't accumulate duplicate rows no matter how often it runs.

Use it for small, slow-changing tables: lookups, dimensions, reference data. It needs no watermark. The one option it exposes is snapshot retention, which controls how prior snapshots are kept.

When in doubt on a small table, Snapshot

It's the simplest mode to reason about and impossible to get duplicates from. The only cost is rereading everything each run, which is fine until the table gets large.

Delta — incremental upsert

Delta is the workhorse for data that grows. Each run reads only the rows newer than the last successful run, using a watermark column (typically something like modified_date or an auto-incrementing id), then upserts them — inserting new rows and updating existing ones matched on your merge keys.

Its options:

  • Watermark column — the field used to find new and changed rows.
  • Merge keys — the column(s) that identify a row so updates land on the right record.
  • Backload N days — re-pull the last N days on each run to catch late-arriving edits.
  • Detect deletes — also reflect rows that disappeared from the source.
A watermark has to actually mean something

If the source's modified_date only updates on insert (not on update), Delta will miss your edits. Verify it first: change one row and confirm the next run picks it up.

Historical — append-only

Historical treats the target as an append-only ledger: each run adds new rows, and existing rows are never updated or deleted. That makes it a natural fit for event streams, time-series, and audit logs where history is the point.

Two descriptions, same mode

The mode picker also describes Historical as a one-time full load that auto-disables after it completes successfully, while the artifact card frames it as an append-only ledger. Keep the append-only mental model — new rows in, nothing rewritten — and confirm the behavior on your first run.

CDC — change data capture

CDC reads inserts, updates, and deletes directly from the source's transaction log rather than re-querying tables. It's the most accurate way to mirror a fast-changing source and the lightest on the source itself, but it requires log access and is only available on supported sources (for example, MSSQL).

Stored Procedure — source-defined extract

Stored Procedure calls a procedure on the source to produce the data, instead of Databasin generating the query. Reach for it when the extraction logic is owned by the source system or a DBA — the in-app picker calls out Epic healthcare systems as the prime example.

Per-artifact options, in one place

Depending on the mode, the ingestion step exposes some of these:

  • Watermark column — the change marker for incremental modes.
  • Merge keys (Merge Columns) — identify rows for upserts.
  • Backload N days — re-pull a recent window each run.
  • Detect deletes — propagate source deletions.
  • Snapshot retention — how long prior snapshots are kept.
  • Custom SQL / Custom WHERE — an Ingestion behavior selector with three settings: Auto (read the object as-is), Custom SQL (replace the source SELECT), or Custom WHERE Clause (add a filter). This is the only in-flight shaping a pipeline does; heavier transforms belong in an automation or in Lakehouse SQL after the sync.

Coming from Full / Incremental / Append?

If you've used other tools, the names map cleanly:

You used to call it… In Databasin it's…
Full / full reload Snapshot
Incremental / merge-upsert Delta
Append Historical

CDC and Stored Procedure have no equivalent in that older vocabulary — they're additions for log-based sources and source-owned extracts.

Where to go next