Databricks Medallion Architecture: Data Modeling Guide, Best practices, Standards, Examples

  Hello Data Pros, and welcome back to another exciting blog in our Databricks learning series! In our last blog, we explored various platform architectures, specifically focusing on the modern Data Lakehouse architecture and its implementation using the Delta Lake framework. But today, we're moving to the next big question: With this powerful platform architecture in place, how do we organize and model our data effectively? And that’s where the Medallion Architecture comes in! So, what exactly is Medallion Architecture? It’s a data design pattern, developed to logically structure and organize your data within a Lakehouse! Its main purpose is to progressively improve the quality and usability of your data, as it moves through different stages, such as Bronze, Silver, and Gold. Think of it as a transformation journey, where raw data is refined step by step into a polished and analysis-ready state! Some people call it a multi-hop architecture because the data flows through several tr...

DBT Sources, Seeds & Analyses | Data Build Tool Demo with Real World Examples


############  seeds\salestargets.csv ############ 

StoreID,SalesTarget

1876,260000

1396,290000

1146,305000

1107,345000

1384,350000

1155,300000

1913,300000

1882,345000

1289,300000

1005,375000



############  models\storeperformance.sql ########### 

SELECT
    OS.StoreID,
    SUM(OFACT.Revenue) AS ActualSales,
    SUM(ST.SalesTarget) AS TargetSales
FROM
    {{ ref('orders_stg') }} OS
JOIN
    {{ ref('orders_fact') }} OFACT ON OS.OrderID = OFACT.OrderID
JOIN
    {{ ref('salestargets') }} ST ON ST.StoreID = OS.StoreID
GROUP BY 1

############  analyses\StoreRevenue.sql ############ 

SELECT
    OS.StoreID,
    SUM(OFACT.Revenue) AS TotalRevenue
FROM
    {{ ref('orders_stg') }} OS
JOIN
    {{ ref('orders_fact') }} OFACT ON OS.OrderID = OFACT.OrderID
GROUP BY 1



############  models\src_oms.yml ############ 

sources:
  - name: landing
    description: The primary soruce system for order management system(OMS) data anlalytics.  
    database: SLEEKMART_OMS
    schema: L1_LANDING
    freshness:
      warn_after: {count: 1, period: day}
      error_after: {count: 3, period: day}
    loaded_at_field: Updated_at

    tables:
      - name: customers
        description: The customers table from order management system(OMS).      
        identifier: customers
      - name: orders
        description: The orders table from order management system(OMS).      
        identifier: orders
      - name: orderitems
        description: The orderitems table from order management system(OMS).      
        identifier: orderitems
      - name: employees
        description: The employees table from order management system(OMS).      
        identifier: employees
        columns:
          - name: address
            description: Full Address of emplyees for Transport Arrangements.          
            tests:
              - not_null
              - string_not_empty
      - name: sales_us
      - name: sales_europe
      - name: sales_asia
      - name: stores
      - name: suppliers
      - name: products

  - name: training
    database: SLEEKMART_OMS
    schema: TRAINING
    tables:
      - name: city_temperature
      - name: sales_us
      - name: sales_uk
      - name: sales_india

############  customers_stg.sql ############ 

{{ config(materialized='table') }}    

SELECT
    CustomerID,
    FirstName,
    LastName,
    Email,
    Phone,
    Address,
    City,
    State,
    ZipCode,
    Updated_at,
    CONCAT(FirstName, ' ', LastName) AS CustomerName
FROM
    {{ source('landing', 'customers') }}
   


Comments

Popular posts from this blog

How to Install Airflow on Windows

How to Install DBT and Set Up a Project, Create Your First dbt Model

Airflow DAGs, Operators, Tasks & Providers