Snowflake Advanced table types - External | Iceberg | Dynamic | Hybrid | Event | Directory
Hello Data Pros, welcome back to another exciting episode of our Snowflake learning series!
In our previous video, we’ve dived deep into three main snowflake table types: permanent, temporary, and transient. Today, we'll move forward, and learn about 6 additional table types available within the Snowflake platform and their specific use cases!
Snowflake External Table Snowflake Iceberg Table Snowflake Dynamic table Snowflake Hybrid Table Snowflake Event Table Snowflake Directory Table
Let’s jump right in, and start with external tables!
External tables in Snowflake are a feature that allows you to query raw data files stored in an external stage, as if they were native tables within Snowflake.
These tables are read-only, meaning you cannot perform insert, update, or delete operations on them. However, they support select queries and can be used in joins with other Snowflake native tables.
External tables are relatively slower than native Snowflake tables. To enhance query performance, consider creating a materialized view on top of the external table. They are best suited for use cases such as cost-effectively querying rarely used data in your cloud storage.
Analyzing data in your data lake without moving it to Snowflake.
Let’s get into Snow Sight and see how we create external tables!
All these statements can be found via the link in the video description!
When querying data, you have the option to include three additional columns: The name of the file from which the record originated, the row number, and the complete row values in JSON format.
While you might wonder why external tables are necessary when you can directly query data from external stages!
Well, use direct querying for one-time analysis or when querying from the internal stage is required.
On the other hand, opt for external tables for repeated queries or when you require better performance, manageability, or security.
Up Next, we’re gonna explore Iceberg Tables!
Database systems generally store their table data in their own proprietary formats. For instance, Amazon Redshift utilizes its unique format, while Snowflake uses micro partitions for data storage. Accessing and processing this underlying data is typically limited to the respective compute engines. For example, the Redshift data cannot be processed using Snowflake's compute engine, and vice versa.
However, Apache Iceberg is an open table format, enabling data accessibility and management across various computing engines like Spark, Trino, PrestoDB, Flink, Hive, and even Snowflake.
In Snowflake, you can create Iceberg tables pointing to a data volume or external cloud storage where Iceberg table data is located. Here's a sample DDL for creating an Iceberg table in Snowflake. Unlike external tables, Iceberg tables generally support all table operations, including insert, update, and delete.
This diagram illustrates how Iceberg table data can be accessed and managed by both Snowflake and Spark compute engines.
Moving forward, let's explore Dynamic tables.
Dynamic tables are truly fascinating; they serve an entirely different purpose! They work as a smart replacement for traditional data engineering pipelines!
In the past, data engineers relied on Streams and Tasks to define various aspects of their data transformation pipeline. This process typically involved creating individual streams for each source table to capture data changes. Subsequently, tasks were created to run at specific schedules, processing the new data received, and merging it with the target table.
We'll be covering streams and tasks more extensively in another video. However, for our current discussion, just be aware that this process is relatively complex, involving multiple steps, and can be time-consuming in terms of data engineering development.
Snowflake addresses these challenges with Dynamic tables!
Dynamic tables eliminate the need for creating streams and tasks. With a simple see-tass statement, you define the end state of the target table, and Snowflake smartly handles the rest.
Here is an example that illustrates a data engineering pipeline that gets data from a single source table, transforms the data, and loads it into a target table.
First, we set up this pipeline using Streams and Tasks. Please take a look at the number of code lines required.
Next, we set up the same data pipeline using a dynamic table! As clearly evident, with Dynamic Tables, data pipeline development becomes much simpler to set up, and easier to manage.
Shifting gears, let's talk about Hybrid Tables!
As we're aware, traditional snowflake tables store data in columnar storage, optimized primarily for read-heavy OLAP workloads. However, they are not well-suited for OLTP processing, particularly in scenarios requiring concurrent and frequent small updates.
Snowflake introduced Hybrid Tables to address these limitations!
Unlike regular tables, hybrid tables leverage row-based stores as the primary data store, ensuring excellent OLTP performance. Additionally, data from the primary store is internally synchronized with secondary columnar storage and cache. This integration of secondary columnar storage also offers decent OLAP performance. This dual functionality is why they are called hybrid tables. As they unite OLTP and OLAP workloads into a single platform, they are also referred to as uni-store tables.
In addition to row-based table storage, hybrid tables also support many essential features specially required for transactional tables, such as row-level locking, referential integrity, UNIQUE constraints, and indexes.
Here is the sample DDL statement for creating a hybrid table named Product Inventory.
Up Next, we'll delve into Event Tables!
These tables are specially designed to capture and store logs and events, generated by your Stored Procedures and User Defined Functions. These events could be errors encountered during data processing, warnings, informational messages, as well as debug and trace events.
Setting up an Event Table for logging involves three steps.
First, we create the Event Table. It's important to note that when creating an Event Table, we do not specify the column names, as it already has a predefined set of columns.
Next, we associate the Event Table with the Snowflake account.
Finally, we start pushing messages to the Event Table from Stored Procedures and User Defined Functions as required.
Let's proceed to verify the contents of our event table. Yes, it has received the info log that we pushed from the stored procedure.
While not within the scope of our current discussion, based on the entries in this Event Table, you can even configure timely alerts and emails.
We've now reached the final type of table in Snowflake: Directory Tables! These are special tables that store file-level metadata about the data files in a particular stage.
We typically create other tables using explicit CREATE TABLE statements, in contrast, directory tables can be added to a stage as part of the CREATE STAGE statement, or later with ALTER STAGE command.
Similar to event tables, you don't need to specify column names for directory tables as they come with predefined columns, including the file size, a timestamp of when it was last modified, and downloadable Snowflake file URL.
A directory table has no grant able privileges of its own! To query a directory table, you'll need to be part of a role that has either the READ or USAGE privilege on the stage itself.
Directory tables are useful for querying a list of files on a stage, and sharing unstructured data in a stage with other users using a secure view.
Both external and internal stages support directory tables. Interestingly, directory tables do not have a name of their own; instead, you use the stage name within the DIRECTORY keyword to access them, and obtain a list of files available in it!
That's all for today! Please stay tuned for our next video where we’ll explore more advanced Snowflake features!
If you found this video helpful, please give it a thumbs up, and hit that subscribe button to stay up-to-date with the latest technologies!
We also welcome your questions or thoughts in the discussion section below!
Thanks for watching!
Comments
Post a Comment