Databricks Medallion Architecture: Data Modeling Guide, Best practices, Standards, Examples

  Hello Data Pros, and welcome back to another exciting blog in our Databricks learning series! In our last blog, we explored various platform architectures, specifically focusing on the modern Data Lakehouse architecture and its implementation using the Delta Lake framework. But today, we're moving to the next big question: With this powerful platform architecture in place, how do we organize and model our data effectively? And that’s where the Medallion Architecture comes in! So, what exactly is Medallion Architecture? It’s a data design pattern, developed to logically structure and organize your data within a Lakehouse! Its main purpose is to progressively improve the quality and usability of your data, as it moves through different stages, such as Bronze, Silver, and Gold. Think of it as a transformation journey, where raw data is refined step by step into a polished and analysis-ready state! Some people call it a multi-hop architecture because the data flows through several tr...

DBT doc blocks | DBT Docs | dbt documentation best practices

 

oms_config.yml

models:
  - name: customers_stg
    description: Staged customer data from order management system (OMS), with minor row-level transformations.  
    columns:
      - name: Email
        description: Customer's Primay Email address for promotions and offers.      
        tests:
          - string_not_empty

  - name: employees_stg
    description: Staged employees data from order management system (oms), with minor row-level transformations.  
    columns:
      - name: JobTitle
        description: Employee's Job Title based on his current Roles and Responsibilities.      
        tests:
          - string_not_empty

  - name: orders_stg
    description: Staged orders data from order management system (oms), with minor row-level transformations.  
    columns:
      - name: OrderID
        description: The primary key for orders_stg table.      
        tests:
          - unique
          - not_null

      - name: StatusCD
        description: "{{ doc('StatusCD') }}"    
        tests:
          - accepted_values:
              values: ['01', '02', '03']

  - name: orderitems_stg
    description: Staged order items data from order management system (oms), with minor row-level transformations.  
    columns:
      - name: OrderID
        tests:
          - relationships:
              to: ref('orders_stg')
              field: OrderID
  - name: orderitems_uniq
    tests:
      - dbt_expectations.expect_table_row_count_to_equal_other_table:
          compare_model: ref("orders_stg")


oms_doc_blocks.md

{% docs StatusCD %}
   
One of the following values:

| status     | definition                 |
|------------|----------------------------|
| 01         | Order is In Progress       |
| 02         | Order has been Completed   |
| 03         | Order has been Cancelled   |

{% enddocs %}







Comments

Popular posts from this blog

How to Install Airflow on Windows

How to Install DBT and Set Up a Project, Create Your First dbt Model

Airflow DAGs, Operators, Tasks & Providers