Model Layering in dbt's Best Practice Guides

dbt

There are Best practice guides that document how dbt Labs operates their dbt projects. Within these, How we structure our dbt projects outlines what to do and what not to do, along with the reasoning behind these decisions, so you don’t have to spend decision-making resources on how to organize folders and name files.

├── models
│   ├── intermediate
│   │   └── finance
│   │       ├── _int_finance__models.yml
│   │       └── int_payments_pivoted_to_orders.sql
│   ├── marts
│   │   ├── finance
│   │   │   ├── _finance__models.yml
│   │   │   ├── orders.sql
│   │   │   └── payments.sql
│   │   └── marketing
│   │       ├── _marketing__models.yml
│   │       └── customers.sql
│   ├── staging
│   │   ├── jaffle_shop
│   │   │   ├── _jaffle_shop__docs.md
│   │   │   ├── _jaffle_shop__models.yml
│   │   │   ├── _jaffle_shop__sources.yml
│   │   │   ├── base
│   │   │   │   ├── base_jaffle_shop__customers.sql
│   │   │   │   └── base_jaffle_shop__deleted_customers.sql
│   │   │   ├── stg_jaffle_shop__customers.sql
│   │   │   └── stg_jaffle_shop__orders.sql
│   │   └── stripe
│   │       ├── _stripe__models.yml
│   │       ├── _stripe__sources.yml
│   │       └── stg_stripe__payments.sql
│   └── utilities
│       └── all_dates.sql

staging

This layer performs name changes, type and category conversions, but generally avoids JOINs and aggregations. It is materialized as views to always reference the most recent data.

Directories are organized at the system level where properties tend to be shared, rather than by data loading methods or business groups like finance and marketing.

File names follow patterns like stg_[source]__[entity]s.sql, making it clear where the model comes from without having to examine the tree structure.

Tables can be referred directly, but you can also refer to sources.yml with source() to reflect it in the document. This file and the base of models that rename columns can be automatically generated by codegen.

Automatically generate sources.yml, staging models, and schema.yml using dbt codegen - sambaiz-net

sources:
  - name: dbttest
    tables:
      - name: my_first_dbt_model
      - name: my_second_dbt_model

# select * from {{ source('dbttest', 'my_first_dbt_model') }}

intermediate

This layer has intermediate models combine staging models and perform complex transformations, existing as either views or ephemeral models. Ephemeral models generate common table expressions (CTE) and are embedded in the referencing models, which keeps the data warehouse simple, but makes troubleshooting slightly more difficult since you cannot query them directly.

Directories are organized at the business group level, and file names follow the pattern int_[entity]s_[verb]s.sql to make it clear what is happening.

marts

This layer creates tables referenced by end users from staging and intermediate models. They are denormalized to save on heavy computational costs. When there are many JOINs, grouping them in intermediate models can make the concept clearer and more readable.

Directories are organized by business group or areas of interest, and file names are entity names such as customers.

$ cat dbt_project.yml
...
models:
  dbttest:
    staging:
      +materialized: view
    intermediate:
      +materialized: ephemeral
    marts:
      +materialized: table