Snowflake overview and its unique Architecture

 Hello Data Pros, Welcome to our exciting new learning series focused on Snowflake!

In this blog, we'll provide an overview of Snowflake, and break down its unique architecture!

Let’s dive right in, and explore the product that has revolutionized the way organizations manage their data!

 

So what exactly is Snowflake! and why is there so much hype surrounding it!

 

Well, I would define Snowflake as a cloud-based data platform, offered as a convenient Software as a Service solution.




 Each component of this definition holds significant importance!

 

Let's begin with the cloud-based aspect. Many database systems available today were initially created for on-premises applications, and subsequently altered for cloud environments!

On the other hand, Snowflake was born in the cloud! this inherent cloud-native aspect, enables Snowflake to seamlessly integrate with a wide array of other cloud services.

 

Next, let's dig into the concept of a data platform. While Snowflake is often categorized as a data warehouse, its capabilities extend far beyond traditional warehouse functionalities, positioning it as a sophisticated data platform.

 

For instance, within Snowflake's framework, you can setup a comprehensive data lake environment. Through its stage objects, Snowflake facilitates the storage of variety of data types such as structured, semi-structured, or unstructured. You’re not required to define schemas upfront.

Furthermore, Snowflake provides native support for processing popular data formats such as JSON, Avro, ORC, Parquet, and XML.

Snowflake facilitates in-house execution of Data Engineering and Data Science workloads, eliminating the necessity to transfer data out of the platform. Leveraging features like Snowpipe, streams, and tasks, Snowflake enables efficient data loading and transformation, including the construction of robust ETL and ELT pipelines.

 

Additionally, with Snowpark ML, users can seamlessly design End-to-End Machine Learning workflows directly within the Snowflake environment.

These are just few factors that makes Snowflake a data platform, but there are many more which we’ll cover in our subsequent videos.

 

Moving on to it’s SaaS aspect; Snowflake operates as a software-as-a-service offering! This means you no longer need to worry about infrastructure planning, data storage, operating system patches, software upgrades, performance optimizations, or query tuning! snowflake handles it all for you!

This streamlined approach allows you and your team to focus on driving business objectives, without the burden of managing technical challenges.

 

Now that we have an understanding of what Snowflake is, it’s time to break down its architecture.

 

When it comes to parallel data processing, there are two main types of architectures available, shared disk and shared nothing!

In a shared disk architecture, multiple processors share the same resources. Imagine it as one large computer with multiple processors, all sharing the same disk, operating system, and memory.

Conversely, in a shared-nothing architecture, each processor has its own dedicated resources. This is like a group of machines, each with its own processor and disk, they are connected through a high-bandwidth network and works together as one platform.

 

Both these architectures have their own pros and cons.

For example, Shared disk architecture provides tightly coupled resources, making it easy to set up and manage, and its also cost-effective.

However, it's not as scalable and fault-tolerant as shared-nothing architecture.

 

On the other hand, shared-nothing architecture can scale to virtually unlimited sizes. However, since storage disks and processors are bundled on each machine, you cannot independently scale them based on your needs. If you want to scale one, you must scale both and pay for both.

 

This is where Snowflake's unique architecture comes into play.

Snowflake takes a hybrid approach, known as a multi-cluster shared data architecture.

In this, data storage layer is decoupled from the processing layer. The storage can independently scale up or down based on size of data hosted on the platform.

You can access and process this data using processing engines available at varying sizes, depending on the workload in hand. You can even completely turn off this processing layer when not in use, and during that time you will only pay for storage.

 

Let's zoom further into it. Snowflake's architecture consists of three key layers; database Storage, Query Processing and Cloud Services!




 

When data is loaded into Snowflake, Snowflake reorganizes the data into its proprietary optimized columnar format. This optimized data is then stored in cloud storage services such as AWS S3, Azure Blob Storage, or Google Cloud Storage, depending on the cloud service chosen during the account setup.

 

Snowflake takes full responsibility for managing the data storage, including partitioning and compression. These data objects are not directly visible or accessible to customers; instead, they are accessed only through SQL query operations.

 

SQL query execution occurs in the processing layer, which utilizes virtual warehouses. Each virtual warehouse represents an MPP compute cluster, provisioned by Snowflake from the selected cloud provider.

Also, each virtual warehouse operates as an independent compute cluster, and hence the performance of one virtual warehouse remains unaffected by the activities of others.

 

The cloud services layer acts-as the brain of Snowflake, this layer is a collection of services that coordinate activities across Snowflake. These services connect together the components of Snowflake in order to process user requests, from login to query dispatch. 

 

Critical services managed within the cloud services layer are, Infrastructure management, Query parsing and optimization, Metadata management, Authentication And Access Control!

 

Overall, this unique architecture gives Snowflake its own distinctive strengths, making it a powerful choice for various data applications.

 

That's all for today! Please stay tuned for our next video where we’ll explore more advanced Snowflake features!

Please do like the video and subscribe to our channel!

If you’ve any questions or thoughts, please feel free to leave them in the comments section below!

Thanks for watching!





 

 

 

Comments

Popular posts from this blog

How to Install Airflow on Windows

Airflow DAGs, Operators, Tasks & Providers

How to Install DBT and Set Up a Project, Create Your First dbt Model