What is dbt-fabric-notebook?

Perfect for Data Analysis Agents needing advanced dbt pipeline integration within Microsoft Fabric Python notebooks. dbt-fabric-notebook is a skill that enables running dbt pipelines inside Microsoft Fabric Python notebooks using DuckDB as the compute engine and DuckLake for Delta Lake table management.

How do I install dbt-fabric-notebook?

Run the command: npx killer-skills add djouallah/dbt. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for dbt-fabric-notebook?

Key use cases include: Running dbt pipelines inside Microsoft Fabric Python notebooks, Transforming data with dbt models using DuckDB and DuckLake, Managing Delta Lake tables with DuckLake extension.

Which IDEs are compatible with dbt-fabric-notebook?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for dbt-fabric-notebook?

Requires Microsoft Fabric Python notebook environment. Dependent on DuckDB and DuckLake libraries. Limited to dbt pipeline execution within notebook sessions.

Running dbt inside a Microsoft Fabric Python Notebook

Name: dbt-fabric-notebook
Availability: InStock
Author: djouallah

This skill explains how to run a full dbt pipeline inside a Microsoft Fabric Python notebook using DuckDB as the compute engine and DuckLake for Delta Lake table management.

Architecture

Fabric Notebook (ephemeral Python session)
  -> pip install duckdb, dbt-duckdb
  -> dbt run (DuckDB in-memory + DuckLake extension)
       -> on-run-start: installs delta_export extension, downloads data
       -> Transforms with dbt models (incremental by file)
       -> DuckLake writes Parquet to abfss://.../<lakehouse>/Tables/
       -> on-run-end: CALL delta_export() writes Delta Lake _delta_log
  -> Fabric / Power BI reads Delta tables natively

Key insight: DuckDB runs in-memory inside the notebook. DuckLake stores data as Parquet on OneLake (abfss://) and keeps metadata in a local SQLite file. The delta_export DuckDB extension converts DuckLake metadata into Delta Lake transaction logs — no separate Python package needed. Everything runs inside dbt run.

Notebook Template (4 cells)

Cell 1 — Install dependencies

python
1!pip install -q duckdb==1.4.4
2!pip install -q dbt-duckdb
3import sys
4sys.exit(0)  # Restart kernel to pick up new packages

Cell 2 — Environment variables

python
1import os
2os.environ['DBT_SCHEMA']          = 'my_schema'
3os.environ['ROOT_PATH']           = 'abfss://<workspace>@onelake.dfs.fabric.microsoft.com/<lakehouse>.Lakehouse'
4os.environ['download_limit']      = '100'
5os.environ['process_limit']       = '100'
6os.environ['METADATA_LOCAL_PATH'] = '/synfs/nb_resource/builtin/metadata.db'

Variable	Purpose
`DBT_SCHEMA`	Target schema name in DuckLake
`ROOT_PATH`	abfss:// path to the Fabric lakehouse root
`download_limit`	Max files to download per source per run
`process_limit`	Max files to process per model per run (default: 100)
`METADATA_LOCAL_PATH`	Local path for DuckLake SQLite metadata DB. Use `/synfs/nb_resource/builtin/` in Fabric — this persists across notebook sessions

Cell 3 — Clone dbt project

Public repo:

python
1!git clone --branch production --single-branch https://github.com/<your-repo>.git /tmp/dbt

Private repo (production):

python
1pat = mssparkutils.credentials.getSecret('https://<vault-name>.vault.azure.net/', 'github-pat')
2!git clone --branch production --single-branch https://{pat}@github.com/<your-repo>.git /tmp/dbt

For private repos, store a GitHub fine-grained PAT (with Contents: Read permission) in Azure Key Vault. Grant the Fabric workspace identity Key Vault Secrets User access.

Cell 4 — Run dbt

python
1!cd /tmp/dbt && dbt run && dbt test

Delta export happens automatically via the on-run-end hook — no separate step needed.

profiles.yml Configuration

yaml
1my_project:
2  target: dev
3  outputs:
4    dev:
5      type: duckdb
6      path: ':memory:'
7      database: ducklake
8      threads: 1
9      schema: "{{ env_var('DBT_SCHEMA', 'default') }}"
10      config_options:
11        allow_unsigned_extensions: true
12      settings:
13        preserve_insertion_order: false
14      extensions:
15        - parquet
16        - sqlite
17        - azure
18        - httpfs
19        - json
20        - ducklake
21        - name: zipfs
22          repo: community
23        - name: delta_export
24          repo: community
25      attach:
26        - path: "ducklake:sqlite:{{ env_var('METADATA_LOCAL_PATH', '/tmp/ducklake_metadata.db') }}"
27          alias: ducklake
28          options:
29            data_path: "{{ env_var('ROOT_PATH') }}/Tables"
30            data_inlining_row_limit: 0

Key points:

path: ':memory:' — DuckDB runs fully in-memory, no local database file
database: ducklake — tells dbt to target the attached DuckLake catalog
config_options.allow_unsigned_extensions: true — must be set at connection creation time (not settings) to allow community extensions
settings.preserve_insertion_order: false — reduces memory usage by allowing DuckDB to reorder results
data_path points to abfss://.../<lakehouse>/Tables where Parquet data lives
data_inlining_row_limit: 0 — disables data inlining (small writes go to Parquet, not metadata)
Extensions: azure for abfss://, httpfs for HTTP downloads, zipfs for reading CSVs from ZIPs

dbt_project.yml — Delta Export & DuckLake Maintenance

yaml
1on-run-start:
2  - "CALL ducklake.set_option('rewrite_delete_threshold', 0)"
3  - "CALL ducklake.set_option('target_file_size', '128MB')"
4  - "{{ download() }}"   # your data ingestion macro
5
6on-run-end:
7  - "CALL ducklake_rewrite_data_files('ducklake')"
8  - "CALL ducklake_merge_adjacent_files('ducklake')"
9  - "CALL delta_export()"

delta_export is now a community extension — installed via profiles.yml extensions list (no manual INSTALL/LOAD needed). At run end, DuckLake maintenance compacts data files before delta_export() writes Delta Lake transaction logs.

Why delta_export is mandatory: DuckLake is not natively supported by Spark, Power BI, or any other engine in Microsoft Fabric. Without delta_export, the Parquet files written by DuckLake are invisible to the rest of the platform. The Delta Lake transaction logs (_delta_log/) make the tables readable as standard Delta tables by Power BI, Spark, SQL Analytics, and any Delta-compatible tool.

DuckLake Maintenance (on-run-end)

Call	Purpose
`ducklake_rewrite_data_files('ducklake')`	Rewrites files with deletes (threshold controlled by `rewrite_delete_threshold`)
`ducklake_merge_adjacent_files('ducklake')`	Merges small Parquet files into larger ones (target size from `target_file_size`)
`delta_export()`	Writes Delta Lake `_delta_log/` so Fabric/Power BI reads tables natively

DuckLake Options

Option	Value	How to Set	Why
`data_inlining_row_limit`	0	ATTACH option in profiles.yml	Disable storing small inserts in metadata DB
`rewrite_delete_threshold`	0	`set_option` in on-run-start (not an ATTACH param)	Rewrite all files with any deletes
`target_file_size`	128MB	`set_option` in on-run-start (not an ATTACH param)	Target Parquet file size for insert and compaction (default 512MB, reduced to prevent OOM)

Note: target_file_size controls both merge_adjacent_files merge target and auto-splitting of large inserts. It cannot be set via ATTACH options — must use set_option.

Schema Separation (raw + aemo)

Separate intermediate tables from Power BI-facing tables using dbt schema config:

aemo schema (default from DBT_SCHEMA): dim_calendar, dim_duid, fct_summary — exposed to Power BI
raw schema: fct_scada, fct_scada_today, fct_price, fct_price_today, stg_csv_archive_log — intermediate

dbt_project.yml:

yaml
1models:
2  aemo_electricity:
3    staging:
4      +materialized: view
5      +schema: raw
6    dimensions:
7      +materialized: table
8    marts:
9      +materialized: incremental
10      +schema: raw

Override specific models back to aemo: {{ config(schema='aemo') }} in fct_summary.sql.

Custom schema macro required (macros/generate_schema_name.sql):

sql
1{% macro generate_schema_name(custom_schema_name, node) -%}
2    {%- if custom_schema_name is none -%}
3        {{ target.schema }}
4    {%- else -%}
5        {{ custom_schema_name | trim }}
6    {%- endif -%}
7{%- endmacro %}

Without this, dbt prefixes the target schema (e.g., aemo_raw instead of raw).

Semantic Model (model.bim)

Deploy via Fabric REST API using TMSL format (model.bim) — TMDL create is NOT supported by the API.

Key requirements for Direct Lake on OneLake (no SQL endpoint):

PBI_ProTooling annotation with DirectLakeOnOneLakeCreatedInDesktop
sourceLineageTag on all tables ([schema].[table]) and columns
relyOnReferentialIntegrity on all relationships
PBI_RemovedChildren on expression listing excluded tables with correct schema prefix
Partition schemaName must match the lakehouse schema folder name

Key Patterns

Metadata persistence in Fabric

DuckLake's SQLite metadata DB needs a local filesystem. In Fabric notebooks, use /synfs/nb_resource/builtin/ — this is the notebook resource folder that persists across sessions. Set METADATA_LOCAL_PATH to a file in that folder.

Incremental by file with process_limit

Track which source files have been processed using a file column in each fact table. Pre-hooks use DuckDB SET VARIABLE to pass only unprocessed file paths, capped by process_limit:

sql
1{{ config(
2    materialized='incremental',
3    unique_key=['file', 'DUID', 'SETTLEMENTDATE'],
4    pre_hook="SET VARIABLE paths = (
5      SELECT list(...)
6      FROM (
7        SELECT source_filename
8        FROM {{ ref('stg_archive_log') }}
9        WHERE source_type = 'daily'
10        {% if is_incremental() %}
11          AND source_filename NOT IN (SELECT DISTINCT file FROM {{ this }})
12        {% endif %}
13        LIMIT {{ env_var('process_limit', '100') }}
14      )
15    )"
16) }}

Reading CSVs from ZIP archives

DuckDB can read CSVs directly from ZIP files on remote storage:

sql
1SELECT * FROM read_csv(
2    getvariable('paths'),
3    ignore_errors=true,
4    null_padding=true
5)

Reference

See reference.md for full code examples.

dbt-fabric-notebook — dbt-fabric-notebook install dbt-fabric-notebook, community, dbt-fabric-notebook install, ide skills, running dbt in Microsoft Fabric, DuckDB and DuckLake integration, dbt pipeline management, Claude Code, Cursor, Windsurf

About this Skill

Features

# Core Topics

Agent Capability Analysis

Ideal Agent Persona

Core Value

↓ Capabilities Granted for dbt-fabric-notebook

! Prerequisites & Limits

Browser Sandbox Environment

⚡️ Ready to unleash?

dbt-fabric-notebook