Snowflake Time Travel & Fail-safe | What is Continuous data protection | How to | with Examples

 

Hello Data Pros, and welcome back to another exciting episode of our Snowflake learning series!

In our last video, we explored various types of Snowflake views and their practical applications, demonstrated with easy-to-follow examples!

Today, we're diving deeper into Snowflake's data protection features, specifically Time Travel and Fail-safe!


-----------------------------------------------------------------------------------------------------------
--                                        SNOWFLAKE TIME TRAVEL LAB                                      --
--               All commands/statements can be found via the link in the Video Description              --
-----------------------------------------------------------------------------------------------------------

-- Create a database
CREATE database demo_db;

-- Create a schema within the database
CREATE SCHEMA demo_db.demo_schema;

-- Switch to the newly created database and schema
USE DATABASE demo_db;
USE SCHEMA demo_db.demo_schema;

-- Create a permanent table within the schema
CREATE TABLE my_table (
    emp_id INT PRIMARY KEY,
    full_name VARCHAR(100),
    salary DECIMAL(10, 2)
);

-- Insert some rows into the table
INSERT INTO demo_db.demo_schema.my_table (emp_id, full_name, salary) VALUES (1, 'Alice Smith', 50000);
INSERT INTO demo_db.demo_schema.my_table (emp_id, full_name, salary) VALUES (2, 'Bob Johnson', 60000);
INSERT INTO demo_db.demo_schema.my_table (emp_id, full_name, salary) VALUES (3, 'Charlie Brown', 55000);
INSERT INTO demo_db.demo_schema.my_table (emp_id, full_name, salary) VALUES (4, 'Daisy Clark', 58000);
INSERT INTO demo_db.demo_schema.my_table (emp_id, full_name, salary) VALUES (5, 'Eve Davis', 62000);

-- Query table data
SELECT * FROM my_table;

-- Retrieve the current timestamp
SELECT CURRENT_TIMESTAMP();
--2024-05-31 03:54:09.742 -0700

-- Update a row in the table (change last name of Alice Smith)
UPDATE my_table
SET full_name = 'Alice Johnson'
WHERE emp_id = 1;

-- Delete a row from the table
DELETE FROM my_table WHERE emp_id = 2;

-- Query table data
SELECT * FROM my_table;

-----------------------------------------------------------------------------------------------------------
--                                    TIME TRAVEL WHILE QUERYING                                         --
-----------------------------------------------------------------------------------------------------------

-- Select table records as of a specific time in the past
SELECT * FROM my_table AT (TIMESTAMP => '2024-05-31 03:54:09.742 -0700'::timestamp_tz);

-- Select table records as they existed 300 seconds (5 minutes) ago
SELECT * FROM my_table AT(OFFSET => -60*5);

-- Select table records as they existed before a particular SQL statement
SELECT * FROM my_table BEFORE (STATEMENT => '01b4b1d4-3202-9972-0001-bc96000f4126');


-----------------------------------------------------------------------------------------------------------
--                                   TIME TRAVEL WHILE CLONING                                          --
-----------------------------------------------------------------------------------------------------------

-- Create a clone of my_table as it existed before a specific SQL statement
-- While the STATEMENT clause is used here, you can also use TIMESTAMP or OFFSET for cloning

CREATE TABLE my_restored_table CLONE my_table BEFORE (STATEMENT => '01b4b1d4-3202-9972-0001-bc96000f4126');

-- Query cloned table
SELECT * FROM my_restored_table;


-----------------------------------------------------------------------------------------------------------
--                         TIME TRAVEL AFTER ACCIDENTALLY DROPPING AN OBJECT                             --
-----------------------------------------------------------------------------------------------------------
-- While the UNDROP TABLE statement is used here, you can also UNDROP DATABASE and SCHEMA.

-- Drop the table
DROP TABLE my_table;

-- Query Dropped table
SELECT * FROM my_table;

-- Undrop the table
UNDROP TABLE my_table;



--------------------------------------------------------------------------------------------------------
--                    VERIFY TIME TRAVEL DURATION OF YOUR TABLE AND CHANGE AS NEEDED                   --
--------------------------------------------------------------------------------------------------------

-- CHECK YOUR TABLE TYPE AND TIME TRAVEL DURATION
SHOW TABLES LIKE '%my_table%';
SELECT "name", "database_name", "schema_name", "kind", "retention_time"  FROM TABLE(result_scan(last_query_id()));

-- CHANGE THE DEFAULT TIME TRAVEL DURATION
ALTER TABLE my_table SET DATA_RETENTION_TIME_IN_DAYS = 90;





In this digital world, data is the lifeblood and the most valued asset for any company. However, it is highly vulnerable to accidental loss and incorrect modifications. This reality emphasizes the critical need for robust data protection and recovery mechanisms within a data platform. Hence, Snowflake provides two key data protection features: Time Travel and Fail Safe, addressing this critical need.

To understand how these data protection features work in Snowflake, you must first be familiar with micro-partitions, how they are stored, and how modifications to existing table data are managed within this micro-partitions layer.

Well, we’ve already covered these topics in our initial videos, so I recommend watching our Snowflake playlist linked in the video description below, especially the video on key concepts of the storage layer.

Essentially, Time travel is a feature that allows you to go back in time, and recover table data as of specific dates or times in the past. With Time Travel enabled, you can revert your accidental or incorrect DML operations, or you can even recover an entirely dropped table, schema, or database.

If your data changes fall within the time travel duration, you can access and restore it yourself without involving Snowflake support.

In addition to this time travel feature, Snowflake also supports a failsafe period of 7 days, providing an additional layer of data protection. During this period, you cannot recover the data by yourself, but you can contact Snowflake support to retrieve data as needed.

Let's take a closer look at micro-partitioning and columnar storage using this example!

This diagram is intended only as a small-scale conceptual representation! in real-world data projects, tables often consist of numerous micro-partitions, with each one containing a substantial number of records.

As evident in this diagram, each micro-partition belongs to a single table, and each row belongs to a single micro-partition!

Considering the disk-read-order is left-to-right, you can see that column values are stored contiguously close to each other. This type of data storage is called columnar storage, which is quite different from the row-oriented storage represented by the leftmost image in this diagram.

Snowflake's micro-partitions are immutable, meaning once written, they are never modified. You might be curious about what happens when a record is updated! Well, Snowflake typically maintains a start date and end date for each micro-partition as part of its metadata. When a record is updated, Snowflake sets an end date for the entire micro-partition containing the updated record, and then creates a new micro-partition with the updated values. This mechanism enables Snowflake to support Time-Travel and Fail-Safe features.

It's worth noting that micro-partitions without an end-date are retained for-ever. However, those with end-date are kept until the end-date plus the Time-Travel and Fail-Safe periods elapse, post which they are permanently deleted.

Please note that retaining micro-partitions beyond their end date will result in additional storage costs!

So as a data expert, understanding how to select the right table type, and how to decide optimum time travel retention days for your tables is crucial. The following flowchart offers a high-level overview to help you identify the optimal table type and potential cost implications.

At a high level, as you move from top to bottom, the retention period of end-dated micro-partitions increases, along with the associated costs!

 

Let's explore different ways to utilize the time-travel feature in Snowflake!

You can query a table as it existed at a specific time in the past by specifying the exact timestamp.

With an offset, you can query a table as it existed 5 minutes earlier.

You can get the query ID of any DML statement, and query the table exactly as it existed before that DML operation was performed.

We’ll talk about cloning in detail in another video, but for now, please be aware that we can indeed clone tables, schemas, and databases as they existed at some point in the past.

Another important Time Travel statement is "un-drop." Yes, if you accidentally dropped your table, schema, or database, you can simply restore it using these "un-drop" statements.

 

It's time for a deep dive into Snowflake's UI, and witness all these time travel aspects in action!

To start with we’re creating a database, schema, table, and loading some data in the table.

If I query the table, I get all 5 rows that I inserted.

I'm recording the current timestamp so that I can demonstrate the time travel to this point later.

After noting down the timestamp, I'm updating the last name of Alice from Smith to Johnson, and I'm also deleting the row with employee ID 2.

If I query the table, I only get 4 rows, and the last name of Alice has been changed because of the DML statements above.

 

But what if you now realize that the DML operations you performed were incorrect? This is where Snowflake's robust time travel features come into play.

With time travel, you can easily query a historical snapshot of your table based on a timestamp.

This allows you to see all the records that were originally inserted, without the effects of the incorrect updates and deletes that were performed.

But time travel isn't limited to just direct timestamps. You can also use the OFFSET clause to travel back in time by seconds, minutes, hours, or even days.

Additionally, you can get the query ID for the DML operation from your query history.

And precisely time travel to the state before that DML operation was executed.

 

In addition to querying, you can also create or clone new tables using these time travel features.

While I've demonstrated this with the STATEMENT clause, TIMESTAMP and OFFSET-based time travel can also be used for cloning.

 

Furthermore, you may find yourself in a situation where you have accidentally dropped an entire table, schema, or even a database. That’s exactly why Snowflake supports Un-drop statements. With Un-drop, you can restore your objects to their state as they existed before they were dropped!

 

Please take a look at these SQL statements, which allow you to check the time travel retention days for your specific table. You can also change the default time travel duration to up to 90 days based on your requirements.

Now that we have a solid understanding of time travel, let's quickly understand how fail-safe works!

The main difference between time travel and fail-safe is that customers can perform time travel themselves to recover data as needed. However, to recover data using fail-safe, you must engage Snowflake support.

If the data you need to recover falls outside of your time travel window, but it’s within the 7-day fail-safe period, then you can reach out to Snowflake support for data recovery.

 

Please take a look at the high-level steps involved:

Firstly, the customer submits a support case, detailing the objects and the desired recovery time.

Next, the Snowflake support team assesses the feasibility of recovery. They carefully analyze the request to determine if recovery is possible.

Once feasibility is confirmed, the support team provides a script. This script typically contains create table statements with snowflake internal object IDs.

Finally, the customer executes the provided script with the appropriate role and virtual warehouse. This completes the fail-safe process, offering an extra layer of protection beyond the time travel duration.

 

That's all for today! Please stay tuned for our next video where we’ll explore more advanced Snowflake features!

If you found this video helpful, please give it a thumbs up, and hit that subscribe button to stay up-to-date with the latest technologies!

We also welcome your questions or thoughts in the discussion section below!

Thanks for watching!

 

 

 


Comments

Popular posts from this blog

How to Install Airflow on Windows

Airflow DAGs, Operators, Tasks & Providers

How to Install DBT and Set Up a Project, Create Your First dbt Model