Databricks Medallion Architecture: Data Modeling Guide, Best practices, Standards, Examples

  Hello Data Pros, and welcome back to another exciting blog in our Databricks learning series! In our last blog, we explored various platform architectures, specifically focusing on the modern Data Lakehouse architecture and its implementation using the Delta Lake framework. But today, we're moving to the next big question: With this powerful platform architecture in place, how do we organize and model our data effectively? And that’s where the Medallion Architecture comes in! So, what exactly is Medallion Architecture? It’s a data design pattern, developed to logically structure and organize your data within a Lakehouse! Its main purpose is to progressively improve the quality and usability of your data, as it moves through different stages, such as Bronze, Silver, and Gold. Think of it as a transformation journey, where raw data is refined step by step into a polished and analysis-ready state! Some people call it a multi-hop architecture because the data flows through several tr...

SNOWFLAKE TABLES HANDS-ON LAB

-------------------------------------------------------------------------------------------------------
--SNOWFLAKE TABLES HANDS-ON LAB  (ALL COMMANDS ARE AVAILABLE IN THE LINK PROVIDED INT HE DESCRIPTION)--
-------------------------------------------------------------------------------------------------------

-- SET THE ROLE & WAREHOUSE
USE ROLE ACCOUNTADMIN;
USE WAREHOUSE COMPUTE_WH;

-- CREATE & USE DATABASE AND SCHEMAS
CREATE DATABASE IF NOT EXISTS SLEEK_OMS;
CREATE SCHEMA IF NOT EXISTS SLEEK_OMS.L1_LANDING;
USE DATABASE SLEEK_OMS;
USE SCHEMA L1_LANDING;

-- CREATE PERMANENT TABLE
CREATE TABLE EMPLOYEE_PER
(
  ID INT,
  NAME STRING,
  AGE INT,
  SALARY DECIMAL(10, 2)
);

-- CREATE TEMPORARY TABLE
CREATE TEMPORARY TABLE EMPLOYEE_TMP
(
  ID INT,
  NAME STRING,
  AGE INT,
  SALARY DECIMAL(10, 2)
);

-- CREATE TRANSIENT TABLE
CREATE TRANSIENT TABLE EMPLOYEE_TRN
(
  ID INT,
  NAME STRING,
  AGE INT,
  SALARY DECIMAL(10, 2)
);

-- LOAD SOME DATA IN TO THESE 3 TABLES
INSERT INTO EMPLOYEE_PER (ID, NAME, AGE, SALARY)
VALUES
  (1, 'John', 30, 50000.00),
  (2, 'Alice', 25, 60000.00),
  (3, 'Bob', 35, 70000.00),
  (4, 'Emily', 28, 55000.00),
  (5, 'Michael', 32, 65000.00);

INSERT INTO EMPLOYEE_TMP (ID, NAME, AGE, SALARY)
VALUES
  (1, 'Jane', 27, 52000.00),
  (2, 'David', 40, 75000.00),
  (3, 'Sarah', 33, 62000.00),
  (4, 'James', 29, 57000.00),
  (5, 'Emma', 31, 68000.00);

INSERT INTO EMPLOYEE_TRN (ID, NAME, AGE, SALARY)
VALUES
  (1, 'Matthew', 34, 72000.00),
  (2, 'Olivia', 26, 59000.00),
  (3, 'William', 37, 80000.00),
  (4, 'Sophia', 29, 56000.00),
  (5, 'Daniel', 30, 63000.00);

-- TIME TRAVEL DEMONSTRATION
SELECT CURRENT_TIMESTAMP;
--2024-05-04 07:35:34.307 -0700

UPDATE EMPLOYEE_PER SET AGE = NULL;

SELECT CURRENT_TIMESTAMP;
--2024-05-04 07:35:57.976 -0700

UPDATE EMPLOYEE_PER SET SALARY = NULL;

SELECT * FROM EMPLOYEE_PER;

SELECT * FROM EMPLOYEE_PER AT(TIMESTAMP => '2024-05-04 07:35:57.976 -0700'::timestamp_tz);
SELECT * FROM EMPLOYEE_PER AT(TIMESTAMP => '2024-05-04 07:35:34.307 -0700'::timestamp_tz);

-- CHECK YOUR TABLE TYPE AND TIME TRAVEL DURATION
SHOW TABLES LIKE '%EMPLOYEE%';
SELECT "name", "database_name", "schema_name", "kind", "retention_time", "is_external","is_hybrid","is_iceberg","is_event"  FROM TABLE(result_scan(last_query_id()));

-- CONFIGURE TIME TRAVEL DURATION
ALTER TABLE EMPLOYEE_PER SET DATA_RETENTION_TIME_IN_DAYS = 90;      -- valid
ALTER TABLE EMPLOYEE_TMP SET DATA_RETENTION_TIME_IN_DAYS = 2;       -- invalid
ALTER TABLE EMPLOYEE_TRN SET DATA_RETENTION_TIME_IN_DAYS = 2;       -- invalid





Comments

Popular posts from this blog

How to Install Airflow on Windows

How to Install DBT and Set Up a Project, Create Your First dbt Model

Airflow DAGs, Operators, Tasks & Providers