Posts

Databricks Medallion Architecture: Data Modeling Guide, Best practices, Standards, Examples

  Hello Data Pros, and welcome back to another exciting blog in our Databricks learning series! In our last blog, we explored various platform architectures, specifically focusing on the modern Data Lakehouse architecture and its implementation using the Delta Lake framework. But today, we're moving to the next big question: With this powerful platform architecture in place, how do we organize and model our data effectively? And that’s where the Medallion Architecture comes in! So, what exactly is Medallion Architecture? It’s a data design pattern, developed to logically structure and organize your data within a Lakehouse! Its main purpose is to progressively improve the quality and usability of your data, as it moves through different stages, such as Bronze, Silver, and Gold. Think of it as a transformation journey, where raw data is refined step by step into a polished and analysis-ready state! Some people call it a multi-hop architecture because the data flows through several tr...

Databricks Free Community Edition | Introduction to Databricks Notebooks | Databricks User Interface

helloword # Databricks notebook source # MAGIC %md # MAGIC # Basic Hello World Example # MAGIC This example shows how to print "Hello, World!" to the console. # COMMAND ---------- message = "Hello, World!" # COMMAND ---------- print ( message ) data-processing # Databricks notebook source # MAGIC %md # MAGIC In this notebook, we will perform a simple data processing workflow using PySpark. The process is divided into four main stages: # MAGIC # MAGIC 1. **Imports**: Import the necessary functions or libraries. # MAGIC 2. **Extract/Create Data**: Create a DataFrame with sample data. # MAGIC 3. **Transform Data**: Filter the DataFrame to include only individuals older than 23. # MAGIC 4. **Load/Display Data**: Display the filtered DataFrame. # COMMAND ---------- # Import Libraries from pyspark.sql.functions import col # COMMAND ---------- # Create Data data = [( "John" , 25 ), ( "Jane" , 30 ), ( "Sam" , 22 )] columns = [ "Nam...

How to transform data in Snowflake | Use Snowflake Dynamic Tables like a pro

  Hello Data Pros, and welcome back to another exciting episode of our Snowflake learning series! In our previous video, we discussed various options for ingesting data into Snowflake, including Tasks, Snowpipe, Snowpipe Streaming, and the Snowflake Connector for Kafka. Today, we’re diving into the world of data transformations, with a special focus on Dynamic Tables. Let’s jump right in!   -----Follow the link in the description for the SQLs used in this lab ------ --------------------------------------------------------------------------- ---------------- Setting Up Foundational Objects for the Lab--------------- --------------------------------------------------------------------------- -- Create the database CREATE DATABASE IF NOT EXISTS my_database ; -- Use the created database USE DATABASE my_database ; -- Create the schema CREATE SCHEMA IF NOT EXISTS my_schema ; -- Use the created schema USE SCHEMA my_schema ; -- Create the 'customers' table CREATE TABLE ...

Airflow Tutorial - Deferrable Operators & Triggers | example | Why to use | Reduce Cost

from airflow import DAG from airflow.utils.dates import days_ago from airflow.providers.amazon.aws.sensors.s3 import S3KeySensor from airflow.providers.snowflake.transfers.copy_into_snowflake import CopyFromExternalStageToSnowflakeOperator dag = DAG(     's3_to_snowflake_dag' ,     default_args = { 'start_date' : days_ago( 1 )},     schedule_interval = '0 23 * * *' ,     catchup = False ) # Wait for the file in S3 wait_for_file = S3KeySensor(     task_id = 'wait_for_s3_file' ,     bucket_name = 'sleekdata' ,     bucket_key = 'oms/employee_details.csv' ,     aws_conn_id = 'aws_conn' ,     poke_interval = 10 ,     timeout = 60 * 60 * 5 ,     soft_fail = True ,     deferrable = True ,     dag = dag ) # Load the file from S3 to Snowflake load_table = CopyFromExternalStageToSnowflakeOperator(     task_id = "load_s3_file_to_table" , ...

Snowflake Storage Layer Key Concepts | Micropartition | Columnar Storage | Partition pruning

  Hello Data Pros, and welcome back to another exciting episode of our Snowflake learning series! In our previous videos, we explored Snowflake's storage and processing layers, covering key concepts along the way! Now, let's shift our focus to the Cloud Services Layer - the mastermind behind everything that happens in Snowflake! --All commands and sample csv file in the video description-- create database SLEEKDATA ; create schema OMS ; create stage MY_STAGE ; create table MY_TABLE ( ID int , Name varchar ( 100 ), Age int , Region varchar ( 10 ) ); -- Run put command in SNOWSQL CLI -- put file://D:/mydata.csv @MY_STAGE; copy into MY_TABLE from @ MY_STAGE / mydata . csv . gz ; select * from MY_TABLE mydata.csv 1 , Alice , 25 , US 2 , Bob , 30 , EU 3 , Charlie , 28 , AP 4 , David , 35 , US 5 , Emma , 40 , EU 6 , Frank , 32 , AP 7 , Grace , 27 , US 8 , Henry , 45 , EU 9 , Ivan , 50 , AP 10 , Jasmine , 23 , US

Snowpipe & Snowpipe Streaming | Snowflake Pipe Example | kafka connector

Hello Data Pros, and welcome back to another exciting episode of our Snowflake learning series! In our previous video, we covered Snowflake Tasks, and demonstrated how you can automate processes within Snowflake. Today, we will explore  Snowpipes , a powerful feature in Snowflake that offers near real-time data ingestion. -----Follow the link in the description for the SQLs used in this lab ------ --------------------------------------------------------------------------- ---------------- Setting Up Foundational Objects for the Lab--------------- --------------------------------------------------------------------------- --Create Database and Schema CREATE OR REPLACE DATABASE order_db ; CREATE OR REPLACE SCHEMA order_db . order_schema ; -- Create Tables USE SCHEMA order_db . order_schema ; CREATE or replace TABLE order_raw (     order_id INT ,     order_date DATE ,     cust_fname VARCHAR ( 50 ),     cust_lname VARCHAR ( 50 ...