Snowflake Zero copy Cloning | How to Clone Database Schema Table View Stage | Examples | Tutorial

 

Hello Data Pros, and welcome back to another exciting episode of our Snowflake learning series!

In our last video, we explained what time travel is! and demonstrated how it works with appropriate examples!

Today, we'll be exploring cloning; most widely known as zero-copy cloning within the Snowflake ecosystem!

Let’s jump right in!



-----------------------------------------------------------------------------------------------------------
--                                         SNOWFLAKE CLONING LAB                                         --
--               All commands/statements can be found via the link in the Video Description              --
-----------------------------------------------------------------------------------------------------------

-- Create a database & Schema
CREATE DATABASE demo_db;
CREATE SCHEMA demo_db.demo_schema;
CREATE SCHEMA demo_db.another_demo_schema;

-- Switch to the newly created database and schema
USE DATABASE demo_db;
USE SCHEMA demo_db.demo_schema;

-- Create Table
CREATE TABLE customers (
    cust_id INT,
    name VARCHAR(255),
    city VARCHAR(255)
);

-- Create Table
CREATE TABLE orders (
    order_id INT,
    order_date DATE,
    order_amt DECIMAL(10, 2)
);

-- Insert 5 new records into the customers table
INSERT INTO customers (cust_id, name, city) VALUES
(1, 'John Doe', 'New York'),
(2, 'Jane Smith', 'Los Angeles'),
(3, 'Michael Johnson', 'Chicago'),
(4, 'Emily Davis', 'New York'),
(5, 'David Brown', 'San Francisco');

-- Retrieve the current timestamp, so that we can demo cloning with Time Travel later
SELECT CURRENT_TIMESTAMP();
--2024-06-08 08:20:52.700 -0700

-- Create View for New York customers
CREATE VIEW customers_new_york AS
SELECT * FROM customers WHERE city = 'New York';

-- Create Named Internal Stage
CREATE STAGE my_internal_stg;

-- Create Named External Stage
CREATE STAGE my_s3_external_stg
  URL='s3://s3explore/'
  CREDENTIALS=(AWS_KEY_ID='AKIA5U56BPGP3TLNXQ4S' AWS_SECRET_KEY='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');




 
-----------------------------------------------------------------------------------------------------------






-- Clone the entire Database and all supported child objects
CREATE DATABASE demo_db_clone CLONE demo_db;
DROP DATABASE demo_db_clone;

-- Clone the entire schema and all supported child objects
USE DATABASE demo_db;
CREATE SCHEMA demo_schema_clone CLONE demo_schema;
DROP SCHEMA demo_schema_clone;

-- Clone an individual object (table in this example)
USE SCHEMA demo_db.demo_schema;
CREATE TABLE customers_clone CLONE customers;

-- Select from both tables to see they have the same records
SELECT * FROM customers;
SELECT * FROM customers_clone;

-- Both tables can be independently managed (inserted, updated, or deleted) without causing any conflicts
INSERT INTO customers (cust_id, name, city) VALUES
(6, 'Alice Johnson', 'Boston'),
(7, 'Tom Brown', 'Miami'),
(8, 'Sarah Wilson', 'Seattle');

-- Select from both tables to see the above 3 records only affected the customers table
SELECT * FROM customers;
SELECT * FROM customers_clone;

-- Insert/update/delete records in customers_clone
INSERT INTO customers_clone (cust_id, name, city) VALUES
(10, 'Robert Green', 'Denver');

-- Select from both tables to see the above 1 record only affected the customers_clone table
SELECT * FROM customers;
SELECT * FROM customers_clone;

-- Time Travel and Cloning Together: Clone an object as it existed at a specific timestamp in the past
-- I have demonstrated with tables; you can use Time Travel to clone databases and schemas
CREATE TABLE customers_clone_after_first_inserts CLONE customers
     BEFORE (TIMESTAMP => '2024-06-08 08:20:52.700 -0700'::timestamp_tz);

-- Fun fact: you can indeed clone an already cloned object!
CREATE TABLE customers_clone2 CLONE customers_clone;


SELECT CURRENT_REGION();







As the name implies, cloning is a feature in Snowflake that efficiently creates an exact copy of objects such as databases, schemas, tables, and more.

Unlike cloning feature offered by other traditional databases, Snowflake does not physically duplicate the underlying data when cloning objects.

 

Let’s understand it further, imagine an employee table with 5 micro-partitions. When we clone this table, no changes occur in the storage layer. Instead, within the metadata service of the cloud services layer, another table named "employee-clone" is created, pointing to the existing micro-partitions of the employee table.

 

Since cloning is a metadata operation, and most importantly, since the underlying data is not physically duplicated, this process is popularly known as zero-copy cloning.

 

Immediately after cloning, both the base table and cloned table share the same micro-partitions, and if you run a query against, they return the same results.

 

However, please be aware that from the user perspective, these are 2 different tables, we can independently insert, update, or delete records from these two tables.

For instance, if you insert few records into the "employee-cloned" table, new micro-partitions are created, but they are associated only with the "employee-cloned" table.

 

Likewise, if some records are updated or deleted from the employee table, certain micro partitions are end-dated, and new ones are added, but that only affects the employee table, not for the cloned table.

 

It's also worth noting that the same micro-partition could be marked with an end date for one table, while it remains active for the other table.

 

 

There are some key considerations to keep in mind when cloning objects in Snowflake.

 

Snowflake supports cloning individual objects such as tables, stages, file formats, streams, tasks, etc.

In addition, container objects like databases and schemas can also be cloned.

For container objects, cloning is recursive, which means cloning a database will clone all the schemas and other supported child objects within it.

Similarly, cloning a schema will clone all the supported child objects within the schema.

 

Some objects, such as external tables and internal named stages, are never cloned, even if they are part of a database or schema that you clone.

 

On the other hand, objects such as views, UDFs, and procedures, cannot be cloned individually. However, they are implicitly cloned when you clone the schema or database they belong to.

 

With all this understanding, let's dive into the Snowflake UI and explore these cloning features in action.

 

To begin, we'll set up a database and a couple of schemas within it.

Following this, we'll switch to the newly created database and schema to carry out further operations.

Next, we'll create two tables: customers and orders, and populate the customers table with some initial data.

Let's just note down the current timestamp at this stage, so that I can later demonstrate cloning with Time Travel.

To make this demonstration broader, we'll also create a view, an internal stage, and an external stage.

So far, we've laid the groundwork by setting up the necessary objects to illustrate cloning across various scenarios.

 

Now, moving on to the cloning process, first, we clone the entire database demo-db!

When a database is cloned, all schemas and supported child objects within the schema are cloned. It's worth noting that internal named stages are not cloned as part of this. This behavior aligns with the fact that internal named stages are among the objects that do not support cloning.

 

Let's consider a scenario where I choose not to clone the entire database, but instead want to clone only a specific schema!

In this case, I create demo schema clone from demo schema. This action clones only that specific schema, along with its supported child objects.

 

But what if I do not want to clone all objects within my schema as well! But rather prefer to clone only a specific object or perhaps just a table.

Snowflake supports cloning individual objects too. For instance, this command creates a clone of the customer table with the name 'customers-clone'.

 

As we said earlier, immediately after cloning, both the original table and the cloned table, reference the same micropartitions!

If you query the tables, you will observe the same set of records in both.

 

However, it's important to note that from the user's perspective, both of these tables are considered completely separate objects. Users can independently insert, update, or delete records from these two tables without affecting the other.

 

Let’s witness this in action! I’ll insert 3 new rows into the original customers table.

 

Upon running these queries, it becomes apparent that the 3 new records are only reflected in the customers table, while the customers clone table remains unaffected.

 

Similarly, let’s attempt to insert one record into customers-clone, and rerun the select statements.

 

This aligns perfectly with our expectations!

Despite sharing some micro-partitions internally, the operations on the cloned table do not affect the original table, and vice versa.

 

Furthermore, you can indeed use the time travel feature while cloning a database, schema, or table, to clone an object as it existed at a specific timestamp in the past.

 

As we wrap up this video, here's a fun fact: you can even create a cloned object from a previously cloned one.

 

Overall, cloning enables quick backups and recovery, facilitates the rapid setup of testing environments, and offers numerous other benefits —all without physically duplicating the identical data, leading to significant reductions in storage costs!

 

That's all for today! Please stay tuned for our next video where we’ll explore more advanced Snowflake features!

If you found this video helpful, please give it a thumbs up, and hit that subscribe button to stay up-to-date with the latest technologies!

Thanks for watching!





Comments

Popular posts from this blog

How to Install Airflow on Windows

Airflow DAGs, Operators, Tasks & Providers

How to Install DBT and Set Up a Project, Create Your First dbt Model