Posts

Showing posts from November, 2023

Airflow Tutorial - Hooks | Hooks vs Operators | airflow hooks example | When and How to use

  Hello Data Pros,  In our last blog, we uncovered the need for airflow X-coms! and demonstrated how to leverage them effectively in your dags! Today, we're shifting our focus to Airflow hooks!  We’re going to cover what hooks are! How they differ from Airflow operators! Lastly, when and how to use hooks, in your dags! Let's dive right in!   Technically, Hooks are pre-built Python classes. They simplify our interactions with external systems and services. For instance, the popular S3Hook, which is part of the AWS provider package, offers various methods to interact with S3 Storage.   For example, the create bucket method, Creates an Amazon S3 bucket! Load string method – can load a string value as a file in S3! Delete objects method - can be used to delete an S3 file.   Now, let's dive into the source code of this Hook! As you can see, it's low-level Python code. And if AWS has not provided this hook, you might find yourself having to write all this complex...

Airflow Tutorial - Xcom | How to Pass data between tasks

  Hello Data Pros,  In our last blog, we covered deferrable operators and triggers! Now, it’s time to explore Airflow's X-com feature! Let's dive right in! By design, Airflow tasks are isolated! which means they cannot exchange data with each other at run time! However, we frequently come across situations that require sharing data between tasks.   For instance, you might need to extract a value from a table, and based on that value, perform something in the next task! or you may need to create a file with a dynamic name, such as one with a timestamp, and process the same file in the next task.   This is where X-com comes into play. X-com, abbreviated as 'cross-communication,' provides a mechanism that allows tasks to exchange data with each other.   Let’s consider this example Dag. In the first task, we create a file! And in the second task, we upload the same file to S3. With the current setup, this process works well! because we have the 'replace' parameter s...