Posts

Showing posts from October, 2024

Databricks Medallion Architecture: Data Modeling Guide, Best practices, Standards, Examples

  Hello Data Pros, and welcome back to another exciting blog in our Databricks learning series! In our last blog, we explored various platform architectures, specifically focusing on the modern Data Lakehouse architecture and its implementation using the Delta Lake framework. But today, we're moving to the next big question: With this powerful platform architecture in place, how do we organize and model our data effectively? And that’s where the Medallion Architecture comes in! So, what exactly is Medallion Architecture? It’s a data design pattern, developed to logically structure and organize your data within a Lakehouse! Its main purpose is to progressively improve the quality and usability of your data, as it moves through different stages, such as Bronze, Silver, and Gold. Think of it as a transformation journey, where raw data is refined step by step into a polished and analysis-ready state! Some people call it a multi-hop architecture because the data flows through several tr...

Databricks Free Community Edition | Introduction to Databricks Notebooks | Databricks User Interface

helloword # Databricks notebook source # MAGIC %md # MAGIC # Basic Hello World Example # MAGIC This example shows how to print "Hello, World!" to the console. # COMMAND ---------- message = "Hello, World!" # COMMAND ---------- print ( message ) data-processing # Databricks notebook source # MAGIC %md # MAGIC In this notebook, we will perform a simple data processing workflow using PySpark. The process is divided into four main stages: # MAGIC # MAGIC 1. **Imports**: Import the necessary functions or libraries. # MAGIC 2. **Extract/Create Data**: Create a DataFrame with sample data. # MAGIC 3. **Transform Data**: Filter the DataFrame to include only individuals older than 23. # MAGIC 4. **Load/Display Data**: Display the filtered DataFrame. # COMMAND ---------- # Import Libraries from pyspark.sql.functions import col # COMMAND ---------- # Create Data data = [( "John" , 25 ), ( "Jane" , 30 ), ( "Sam" , 22 )] columns = [ "Nam...