Lakehouse

Lakehouse overview

Catalogs, schemas, and the Lakehouse surface.

Last updated June 29, 2026
Reading time 3 min read

Lakehouse is the SQL side of Databasin — a full warehouse surface where you query data that's been pipelined in, explore it in a proper editor, and run it across multiple engines from one place.

If Databasin One is "ask a question," Lakehouse is "know the answer and want to express it precisely in SQL."

The pieces

SQL Editor

A multi-tab SQL editor built on Monaco — the same engine behind VS Code. You get:

  • Autocomplete on catalogs, schemas, tables, and columns.
  • Syntax highlighting with SQL-aware language support.
  • Query history and saved queries (toggle buttons in the editor toolbar).
  • A results panel you can sort by clicking a column header. (Sorting only — there's no row filter in the results grid; narrow with a WHERE clause instead.)

Open the SQL editor from a project

Object Explorer

A tree view of everything you can query: catalogs → schemas → tables → columns. Browse what's available without guessing. Toggle it from the editor toolbar.

Saved Queries

Name and save any query so you can rerun it later. A dedicated toolbar toggle opens the saved-queries panel.

Notebooks

An alternate surface for long-form analysis. Notebooks mix SQL cells with markdown and chart outputs — good for handoffs where the query isn't the whole story.

Catalogs and connectors

A catalog in Lakehouse maps to a connector:

  • Every lakehouse engine (Trino, Doris, Spark, DuckDB) shows up as one catalog.
  • Inside that catalog, you see whatever schemas and tables the engine exposes.
  • Some connectors pull in external sources as read-only catalogs (e.g. Postgres, Snowflake).

So when you see warehouse.public.orders in a query, it's: catalog → schema → table.

The engines

A project can have more than one lakehouse engine at a time. The native set is Trino, Doris, Spark, and DuckDB; Databricks is supported too, as an external/federated connection rather than a native lakehouse engine.

Engine Best for
Trino Interactive SQL and federated queries across catalogs. The default.
Doris Real-time OLAP — low-latency, high-concurrency interactive analytics and BI.
Spark Heavy ETL and large-scale processing.
DuckDB Small-to-medium data; fast single-node prototyping.
Databricks Querying an existing Databricks workspace from the same editor.

The mechanics of switching engines — and which ones stream results — live in Multi-engine SQL. Doris is new enough to have its own guide: Apache Doris.

Clusters

Behind the scenes, Trino, Spark, and Doris answer queries from a running cluster. Clusters cost credits while running and sleep when idle:

  • A cold cluster wakes on first query (roughly half a minute).
  • It stays warm for a while so follow-up queries are instant.
  • It sleeps automatically after a stretch of no activity.

DuckDB is single-node and doesn't ride a managed cluster the same way. Databasin handles wake/sleep transparently — you just see a small "Waking cluster…" indicator on the first query after a nap. See Clusters, wake and sleep.

Things that differ from a generic SQL editor

Streaming results

On Trino and Doris, results stream in as they're produced, so you can start reading the shape before the query finishes. Spark, DuckDB, and Databricks return their results in a batch when the query completes.

Query limits (optional)

A limit results toggle in the toolbar appends LIMIT 1000 to interactive queries. It only touches SELECT, WITH, and VALUES statements — DDL, DML, and SHOW run untouched. A cheap guardrail against accidentally SELECT *-ing a huge table.

Where to go next