Airflow Tutorial - Xcom | How to Pass data between tasks
Hello Data Pros,
In our last blog, we covered deferrable operators and triggers!
Now, it’s time to explore Airflow's X-com feature!
Let's dive right in!
By design, Airflow tasks are isolated! which means they cannot exchange data with each other at run time!
However, we frequently come across situations that require sharing data between tasks.
For instance, you might need to extract a value from a table, and based on that value, perform something in the next task!
or you may need to create a file with a dynamic name, such as one with a timestamp, and process the same file in the next task.
This is where X-com comes into play.
X-com, abbreviated as 'cross-communication,' provides a mechanism that allows tasks to exchange data with each other.
Let’s consider this example Dag. In the first task, we create a file!
And in the second task, we upload the same file to S3.
With the current setup, this process works well! because we have the 'replace' parameter set to true. Consequently, each time we run this Dag, the file overwrites the previous one in S3.
Suppose this Dag is run multiple times a day, and the business now requires preserving the history of exchange rate files.
As we already discussed, the file name created in Task One is not known to Task 2. However, we can use X-com to facilitate this communication.
The X-com-push in task 1, stores the exact file name to Airflow’s metadata database.
Subsequently, the X-com-pull in task 2, retrieves the same file name from Airflow’s metadata database.
It’s important to note that, X-coms should only be used to pass small amounts of data between tasks.
For example, file names, task metadata, dates, or single-value query results are all ideal data to use with X-com.
But if you have use cases that require sharing large amounts of data between tasks, then please consider using a custom X-com backend or intermediary data storage.
Let's get into the Airflow UI and run this Dag!
The execution was successful, and the necessary exchange rate files have been uploaded to my S3 bucket.
Even if I rerun the Dag, the existing file won't be overwritten; instead, a new file will be created.
This capability was achieved by passing the file name from Task 1 to Task 2 using X-com.
You can review the X-com details in the Airflow interface by navigating to Admin, and then X-com.
That's all for today! stay tuned for our next video where we’ll explore more advanced airflow features!
Please do like the video and subscribe to our channel.
If you’ve any questions or thoughts, please feel free to leave them in the comments section below.
Comments
Post a Comment