Your version of dbt-core is out of date! You can find instructions for upgrading here: https://docs.getdbt.com/docs/installation Plugins: - postgres: 1.5.0 - Update available! At least one plugin is out of date or incompatible with dbt-core. You can find instructions for upgrading here: https://docs.getdbt.com/docs/installation

使用以下命令启动项目 :

dbt init jaffle_shop
// 如果windows的话,需要在用户目录下建一个 .dbt 目录 
// mkdir .dbt
//  C:\Users\用户名\.dbt
(venv) PS D:\workspace\dbt> dbt init jaffle_shop
00:48:43  Running with dbt=1.5.0
Which database would you like to use?
[1] postgres
(Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)
Enter a number: 1
00:48:47  Profile jaffle_shop written to C:\Users\UOS\.dbt\profiles.yml using target's sample configuration. Once updated, you'll be able to start developing with dbt.
00:48:47
Your new dbt project "jaffle_shop" was created!
For more information on how to configure the profiles.yml file,
please consult the dbt documentation here:
  https://docs.getdbt.com/docs/configure-your-profile
One more thing:
Need help? Don't hesitate to reach out to us via GitHub issues or on Slack:
  https://community.getdbt.com/
Happy modeling!

修改配置,链接数据库:

-- C:\Users\UOS\.dbt\profiles.yml
jaffle_shop:
  outputs:
      type: postgres
      threads: 1
      host: 127.0.0.1
      port: 5432
      user: postgres
      pass: haosql
      dbname: postgres
      schema: udo
    prod:
      type: postgres
      threads: 1
      host: 127.0.0.1
      port: 5432
      user: postgres
      pass: haosql
      dbname: postgres
      schema: udo
  target: dev

检测配置: dbt debug

(venv) PS D:\workspace\dbt\jaffle_shop> dbt debug
02:27:38  Running with dbt=1.5.0
02:27:38  dbt version: 1.5.0
02:27:38  python version: 3.8.8
02:27:38  python path: d:\workspace\dbt\venv\scripts\python.exe
02:27:38  os info: Windows-10-10.0.17763-SP0
02:27:38  Using profiles.yml file at C:\Users\UOS\.dbt\profiles.yml
02:27:38  Using dbt_project.yml file at D:\workspace\dbt\jaffle_shop\dbt_project.yml
02:27:38  Configuration:
02:27:38    profiles.yml file [OK found and valid]
02:27:38    dbt_project.yml file [OK found and valid]
02:27:38  Required dependencies:
02:27:38   - git [OK found]
02:27:38  Connection:
02:27:38    host: 127.0.0.1
02:27:38    port: 5432
02:27:38    user: postgres
02:27:38    database: postgres
02:27:38    schema: udo
02:27:38    search_path: None
02:27:38    keepalives_idle: 0
02:27:38    sslmode: None
02:27:38    Connection test: [OK connection ok]
02:27:38  All checks passed!

执行示例模型:dbt run

(venv) PS D:\workspace\dbt\jaffle_shop> dbt run
02:29:14  Running with dbt=1.5.0
02:29:14  Unable to do partial parsing because saved manifest not found. Starting full parse.
02:29:15  Found 2 models, 4 tests, 0 snapshots, 0 analyses, 307 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
02:29:15
02:29:15  Concurrency: 1 threads (target='dev')
02:29:15
02:29:15  1 of 2 START sql table model udo.my_first_dbt_model ............................ [RUN]
02:29:15  1 of 2 OK created sql table model udo.my_first_dbt_model ....................... [SELECT 2 in 0.17s]
02:29:15  2 of 2 START sql view model udo.my_second_dbt_model ............................ [RUN]
02:29:16  2 of 2 OK created sql view model udo.my_second_dbt_model ....................... [CREATE VIEW in 0.10s]
02:29:16
02:29:16  Finished running 1 table model, 1 view model in 0 hours 0 minutes and 0.56 seconds (0.56s).
02:29:16
02:29:16  Completed successfully
02:29:16
02:29:16  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Build DBT Project

资源Sources

可以暂时理解为是我们各个主要的业务数据库。

数据库模型 model

SQL 模型

  • run command
  • ref function
  • SQL模型是一个语句,模型在文件中定义(通常在目录中):

  • 每个文件包含一个模型/语句: .sql select
  • 模型名称继承自文件名
  • 模型可以嵌套在目录内的子目录中
  • 当执行dbt run,dbt将构建这个数据模型。

    这里新建了一个简单的模型:

    -- 计算书籍阅读人数排行
    {{ config(materialized='table') }}
    with book_ranks as(
        select u_bookid ,count(distinct u_userid) as cnt_rs from udo.t_read_history 
        group by u_bookid order  by cnt_rs desc 
    select u_bookid,cnt_rs,t2.u_bookname,t2.u_bookdesc,t2.u_booktypecode,t2.u_author
    from book_ranks t1 
    udo.t_bookstore t2
    on t1.u_bookid=t2.u_id
    

    dbt这里会对应的构建一个脚本:

    create table "postgres"."udo"."task3__dbt_tmp" -- 计算书籍阅读人数排行 with book_ranks as( select u_bookid ,count(distinct u_userid) as cnt_rs from udo.t_read_history group by u_bookid order by cnt_rs desc select u_bookid,cnt_rs,t2.u_bookname,t2.u_bookdesc,t2.u_booktypecode,t2.u_author from book_ranks t1 udo.t_bookstore t2 on t1.u_bookid=t2.u_id

    执行完成后,数据库里会新增一个task3的表,存储上面的结果。

    note:

  • 在哪里可以看到执行的sql:
  • dbt CLI:

  • 目录: target/compiled/select
  • 目录: target/run/create
  • 日志文件: logs/dbt.log
  • 是否需要提前创建结果表:
  • dbt会自动检测,如果不存在就创建目录。

    可以在dbt_project.yml 和具体的模型SQL中配置。

  • 改变模型的持久化方式
  • 将模型构建为单独的架构??
  • 创建模型的标签
  • dbt_project.yml 示例:

    name: jaffle_shop
    config-version: 2
    models:
      jaffle_shop: # this matches the `name:`` config
        +materialized: view # this applies to all models in the current project
        marts:
          +materialized: table # this applies to all models in the `marts/` directory
          marketing:
            +schema: marketing # this applies to all models in the `marts/marketing/`` directory
    

    在具体的模型里配置:

    {{ config( materialized="view", schema="marketing" with customer_orders as ...

    请务必注意,配置是分层应用的 — 应用于子目录的配置将覆盖任何常规配置。

    note:

  • dbt模型结果输出方式: view,table,incremental,ephemeral
  • 模型依赖关系

    可以使用 ref 函数代替查询中的表名来构建模型之间的依赖关系

    with customers as (
        select * from {{ ref('stg_customers') }}
    orders as (
        select * from {{ ref('stg_orders') }}
    

    note:

  • 通过创建DAG来确定运行模型的顺序
  • 推荐模块化转换,重用模型减少复用
  • dbt docs

    通过dbt docs可以生成项目的资料。

  • 首先执行:dbt docs generate
  • 启动服务: dbt docs serve
  • 可以在浏览器中查看:http://localhost:8080/#!/overview