Databricks Free Community Edition | Introduction to Databricks Notebooks

helloword

# Databricks notebook source
# MAGIC %md
# MAGIC # Basic Hello World Example
# MAGIC This example shows how to print "Hello, World!" to the console.

# COMMAND ----------

message = "Hello, World!"

# COMMAND ----------

print(message)

data-processing

# Databricks notebook source
# MAGIC %md
# MAGIC In this notebook, we will perform a simple data processing workflow using PySpark. The process is divided into four main stages:
# MAGIC
# MAGIC 1. **Imports**: Import the necessary functions or libraries.
# MAGIC 2. **Extract/Create Data**: Create a DataFrame with sample data.
# MAGIC 3. **Transform Data**: Filter the DataFrame to include only individuals older than 23.
# MAGIC 4. **Load/Display Data**: Display the filtered DataFrame.

# COMMAND ----------

# Import Libraries
from pyspark.sql.functions import col

# COMMAND ----------

# Create Data
data = [("John", 25), ("Jane", 30), ("Sam", 22)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)

# COMMAND ----------

# Filter Data
df_filtered = df.filter(col("Age") > 23)

# COMMAND ----------

# Display Data
df_filtered.show()

Hello Data Pros, and welcome back to our Databricks learning series!

In our previous video, we covered the fundamentals of Databricks and discussed its key features and architecture. Today, we're diving into something more hands-on. We’ll walk you through how to set up a free Databricks account for practice and create your first Databricks notebook.

To get started, open your web browser and go to the official Databricks website at databricks.com. Once you're on the homepage, look for the “Try Databricks” button in the top right corner and click it. This’ll bring you to a sign-up page. Here, you need to enter your name, email address, and some basic details before clicking continue.

On the next page, you’ll be asked how you’ll be using Databricks; you have two options here! You can sign up for a 14-day trial with full features, or you can opt for the Community Edition, which is available for unlimited duration but comes with some limitations. If you're looking to learn the basics and get some hands-on experience, the free Community Edition is definitely a great choice. Once you’re familiar with the core features, you can take up the 14-day trial to explore advanced features.

For a trial account, you must have an account with one of these three cloud providers. On the other hand, having a cloud account is not mandatory for the Community Edition.

Having said that, let’s sign up for the Community Edition for now. You might be presented with a simple puzzle to prove that you are a real person. After that, you should receive a verification email. As mentioned, your email ID will serve as your username.

Click on the link in the verification email to verify your account and create a new password. Please make sure to create a password that contains at least one uppercase letter, one lowercase letter, one symbol, and one number.

Congratulations! You have reached the Databricks main user interface!

This page gives you access to the core features. On the left side, you’ll find a navigation bar with various key components, including Workspace, where you can organize your notebooks and files.

Recents, which shows your recently accessed notebooks.

Search, allowing you to find specific notebooks or files quickly.

Catalog, where you can manage your data assets such as databases and tables.

Workflows, for orchestrating and scheduling your data processes and jobs.

Compute, where you create and manage your compute clusters.

and Machine Learning Experiments, designed to help you create, track, and manage your MLflow project life cycle.

Before you can run anything on Databricks, you must first create a compute cluster. The Compute blade allows you to create and manage these resources. There are two types of compute: all-purpose compute and job compute. We’ll explore the differences between them in more detail later, but for now, just know that job compute is used for running scheduled jobs, while all-purpose compute is typically used for all other tasks.

Let’s go ahead and create a new all-purpose-compute.

Please feel free to customize the name and runtime version as needed. Once you're done, go ahead and click ‘Create Compute’.

The compute resource will start automatically after it's created. It may take a few moments, so please be patient.

Now that the compute resource is running, it's time for us to create our first Databricks notebook!

Please select Workspace.

The Shared folder is intended for collaboration among all users in the workspace. Files and notebooks placed here can be accessed by any user with the appropriate permissions. On the other hand, the Users Folder is a personal space for each user in the workspace. It contains files, notebooks, and resources specific to an individual user. Other users typically do not have access to this folder unless explicitly shared.

Let’s create a folder named demo for better file organization. Within this folder, I’ll create a notebook named ‘Hello World’.

Notebooks are the heart of Databricks, where you can write and execute code. Each notebook consists of cells, which can be either code cells, where you write your executable Spark code, or text cells, where you can document your code or add markdown text. Note that markdown cells are not executed when you run the notebook. Cells are executed one after another in sequence, from top to bottom.

At the top, you can change the default language for the notebook if needed. Now, let's go ahead and run this Hello World example.

Great! This has been executed successfully, and the message has been printed as expected.

Before we conclude this video, I’d like to demonstrate another notebook in which we’ll perform a simple data processing exercise.

Let’s create the first cell as a documentation or text cell.

Next, we’ll import the necessary functions for our data operations.

Following that, we’ll define our sample data and create a data frame.

Now, we’ll filter our data to keep only those individuals older than 23.

Finally, we’ll display our filtered data.

Let’s run this and validate the output.

Great, this is perfect!

Please note that each cell can be executed individually if needed. While it’s possible to place all executable code lines in a single cell and achieve the same outcome, cells help organize your code in a more structured and efficient way, making it easier to manage and understand!

That's all for today! Please stay tuned for our next video where we’ll explore more databricks topics!

If you found this video helpful, please give it a thumbs up, and hit that subscribe button to stay notified of our latest videos.

Thanks for watching!

Please ensure a cluster is attached before running the code.

Search This Blog

SleekData

Databricks Medallion Architecture: Data Modeling Guide, Best practices, Standards, Examples

Databricks Free Community Edition | Introduction to Databricks Notebooks | Databricks User Interface

Comments

Post a Comment

Popular posts from this blog

How to Install Airflow on Windows

How to Install DBT and Set Up a Project, Create Your First dbt Model

Airflow DAGs, Operators, Tasks & Providers