Posts

Showing posts from February, 2024

Snowflake overview and its unique Architecture

Image
  Hello Data Pros, Welcome to our exciting new learning series focused on Snowflake! In this blog, we'll provide an overview of Snowflake, and break down its unique architecture! Let’s dive right  in, and  explore the product that has revolutionized the way organizations manage their data!   So  what exactly is Snowflake! and why is there so much hype surrounding it!   Well, I would define Snowflake as a cloud-based data platform, offered as a convenient Software as a Service solution.   Each component of this definition holds significant importance!   Let's begin with the cloud-based aspect. Many database systems available today were initially created for on-premises applications, and subsequently altered for cloud environments! On the other hand, Snowflake was born in the cloud! this inherent cloud-native aspect, enables Snowflake to seamlessly integrate with a wide array of other cloud services.   Next, let's dig into the concept of a data platform. While Snowflake is of

About us

Welcome to SleekData, the ultimate hub for all things data! Whether you're a data enthusiast, analyst, engineer, scientist or aspiring data professional, you've come to the perfect place. On this channel, we dive deep into the world of data technologies, helping you unlock insights and make informed decisions. We strongly believe that learning should be simple, free, and designed to save your valuable time. And that's exactly why we've started this new initiative on a mission to make in-demand high paying data skills accessible to everyone. If you could share our video/channel with your friends, we would be so grateful! Join us as we explore the power of Azure, AWS, Bigdata, dbt (data build tool), Snowflake, Databricks, Terraform, Power BI, and more. So, if you're ready to level up your data skills and unleash the full potential of your data stack, hit that subscribe button and ring the notification bell. Let's embark on this exciting learning adventure together

Apache Airflow Taskflow API | Airflow Decorators

Image
  Hello Data Pros, welcome back to another episode of our Apache Airflow series! Today,  we're  taking a deep dive into the world of Task flow API!   The Task flow API was introduced to Airflow, starting from version 2.0, aiming to simplify the  dag  authoring experience by significantly reducing the number of code lines needed by developers! this also makes the code more modular, understandable and easily maintainable!   The  Taskflow  API was built based on the concept of Python decorators!  So  to understand  Taskflow  API, you first need to know the basics of decorators.   Think of a decorator as a special function that takes another function as its input and returns a modified version of that original function.   It's  like wrapping a gift: the decorator adds extra features to the original function.   Decorator is applied using the @ symbol, above a function definition.   In our case, ‘decorator example’ decorates the ‘say hello’ function.   Let's  execute and observe

Manage flow of tasks - Airflow Tutorial Trigger Rules, Conditional Branching, Setup Teardown, Latest Only, Depends On Past

Image
  Hello Data Pros, and welcome back to another exciting episode of our Apache Airflow series! ****  Code lines at the End  **** Today, we'll explore how to manage the flow of tasks in Airflow—a critical step in orchestrating efficient data pipelines !   With the default airflow settings, a task is executed only when all its dependencies complete successfully. However, in real-world projects, customizing this default behaviour becomes essential to address a vast number of use cases.   For example, you might need to dynamically pick and run a specific branch depending on the outcome of a preceding task, while skipping the remaining branches. The Branch Python Operator facilitates this feature, by allowing you to select a branch through a user-defined Python function. Within this function, you can implement the logic to determine the appropriate branch, and should ensure that it returns the task ID of the downstream task to be executed next.   All the code lines I've demonstrated