Snowflake replication and failover
PRIMARY
--For Failback
SECONDARY
--For Failover
Hello Data Pros, and welcome back to another interesting episode of our Snowflake learning series!
In our last videos, we explored secure data sharing and collaboration features!
Today, we're shifting our focus to replication and failover! These features are essential for organizations looking for high availability and disaster recovery setups for their Snowflake deployments.
Let’s jump right in!
Replication is a Snowflake feature that enables you to replicate, and incrementally refresh databases between Snowflake accounts, typically across different regions, but within the same organization.
We must understand the difference between Snowflake's cloning, sharing, and replication features, because they are often mistaken for one another.
In terms of feature availability: Cloning is available across all editions. Sharing is available in all editions except VPS. When it comes to replication, it's available in all editions, however failover and failback are restricted to Business Critical or higher editions.
For instance, the main purposes of cloning are to quickly set up databases, schemas, and tables for testing purposes, and to provide a cost-effective method for backing up and restoring objects.
The primary purpose of sharing is to securely share your data with other Snowflake accounts, and to optionally monetize your data assets.
On the other hand, the main purposes of replication are to protect your data from regional failures, facilitate sharing across clouds and regions, and support disaster recovery and business continuity.
Let's talk about how these features operate. In the case of cloning and sharing, the underlying data is not duplicated, they operate purely at the metadata level. However, when you replicate an object or database, the underlying data is duplicated, and thereafter refreshed incrementally at scheduled intervals.
Another key difference is that cloning is limited to operations within a single Snowflake account. Sharing operates on a provider and consumer model between Snowflake accounts, typically restricted to a single region and cloud provider. In contrast, replication works between Snowflake accounts across different regions and clouds but within the same organization.
Furthermore, cloned objects are independent of the original object, and support both read and write operations.
However, the shared objects are read-only at the consumer end.
Regarding replicated databases, they are typically read-only; however, during the primary database outages, failover can be performed to enable read-write access.
We must be familiar with some important points when it comes to replication.
Objects like Temporary tables, External tables, Hybrid tables, and a few others are not replicated when the underlying database is replicated.
The database from which data is replicated is referred to as the primary database, while the database receiving the replicated data is known as the secondary database.
Failover is essentially an extension of the replication feature. When you enable only replication, the secondary database always functions as a read-only replica, you cannot promote it to a primary role or enable write operations, even during outages.
However, enabling failover for a database allows you to promote the secondary database to the primary role during an outage in the primary region, this also enables read-write access. Consequently, the original primary database becomes read-only.
Once the outage is resolved, you can revert to the original setup. This process is essentially the reverse of failover and is known as failback.
While replication is available on all Snowflake editions, failover and failback are only available on the Business Critical and higher editions.
Replication features incur data transfer, compute, and storage costs, all of which are billed to the target account, the account that hosts the secondary database.
Let's get into Snowflake UI and see how these features work.
We’ll start with basic Database Replication.
The setup typically consists of two parts: one on the primary account side, and the other on the secondary account. Please assume that I'm on the primary account, and already have a database named "employee db".
I can configure replication for this database by executing an ALTER DATABASE command. This command enables employee db replication to one-or-more secondary accounts.
Let’s proceed to the secondary account!
If you run the SHOW REPLICATION DATABASES command, you should see the "employee db" listed here.
If no results appear, replication might not be enabled for your account. In that case, you'll need to request your org-admin to enable replication.
Now, go ahead and create the secondary database as a replica of the primary database.
The database is created, but the tables have not yet been pulled from the primary database. To accomplish this, you'll need to run the ALTER DATABASE REFRESH command.
Excellent, now I can see all the tables.
We'll cover tasks later, but it's worth noting that you can automate this refresh process using tasks.
With this solid understanding of replication, let's explore failover.
In your primary account, you should run this ENABLE FAILOVER statement. This essentially authorizes your secondary account to initiate failover at any time in the future.
Imagine a few days later, your primary region experiences an outage, and you’re unable to access any resources. This is where the failover feature steps in as a lifesaver.
In your secondary account, you can simply execute this command, which promotes your secondary database to primary status, and enables read-write capabilities.
This process is known as failover, with this, your business applications can function using this new primary database, until the issue is resolved at your original primary.
Once the outage is resolved, you can return to the previous setup by executing the ALTER DATABASE PRIMARY command in your original primary account. This process is known as failback, and is the opposite of failover.
Snowflake offers more robust replication-and-failover configurations at the account level through the use of replication groups and failover groups. This approach provides several advantages over replicating individual databases.
With this, you can replicate or failover a collection of objects as a unit, including databases, shares, warehouses, users, and other relevant resources.
Additionally, you do not need to schedule separate tasks for refreshing, as this is handled as part of the replication or failover group setup.
Please take a look at these commands used to set up a replication group!
And these are the commands to set up a failover group and, if needed, initiate the failover process.
That's all for today! Please stay tuned for our next video where we’ll explore more advanced Snowflake features!
If you found this video helpful, please give it a thumbs up, and hit that subscribe button to stay up-to-date with the latest technologies!
Thanks for watching!
Comments
Post a Comment