Snowflake#
Important
snowflake-sqlalchemy
requires SQLAlchemy 1.x (as of version 1.4.7 )
Snowflake
is a cloud-based data warehousing platform that provides organizations with a powerful and flexible solution for storing, managing, and analyzing large amounts of data. Unlike traditional data warehouses, Snowflake operates entirely in the cloud, utilizing a distributed architecture that allows it to process and store data across multiple computing resources.
In this guide, we’ll demonstrate how to integrate with Snowflake using JupySQL magics.
Tip
If you encounter any issues, feel free to join our community and we’ll be happy to help!
Pre-requisites#
We will need the snowflake-sqlalchemy
package for connecting to the warehouse.
%pip install --upgrade snowflake-sqlalchemy 'sqlalchemy<2' --quiet
Note: you may need to restart the kernel to use updated packages.
Now let’s define the URL connection parameters and create an Engine
object.
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
engine = create_engine(
URL(
drivername="driver",
user="user",
password="password",
account="account",
database="database",
role="role",
schema="schema",
warehouse="warehouse",
)
)
Load sample data#
Now, let’s load the penguins
dataset. We’ll convert this .csv
file to a dataframe and create a table in Snowflake database from the data.
import urllib.request
urllib.request.urlretrieve(
"https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv",
"penguins.csv",
)
('penguins.csv', <http.client.HTTPMessage at 0x12ea1f100>)
%load_ext sql
%sql engine --alias connection
import pandas as pd
df = pd.read_csv("penguins.csv")
connection = engine.connect()
df.to_sql(name="penguins", con=connection, index=False, if_exists="replace")
344
Query#
List the tables in the database:
%sqlcmd tables
Name |
---|
penguins |
List columns in the penguins table:
%sqlcmd columns --table penguins
name | type | nullable | default | autoincrement | comment | primary_key |
---|---|---|---|---|---|---|
species | VARCHAR(16777216) | True | None | False | None | False |
island | VARCHAR(16777216) | True | None | False | None | False |
bill_length_mm | FLOAT | True | None | False | None | False |
bill_depth_mm | FLOAT | True | None | False | None | False |
flipper_length_mm | FLOAT | True | None | False | None | False |
body_mass_g | FLOAT | True | None | False | None | False |
sex | VARCHAR(16777216) | True | None | False | None | False |
Query our data:
%%sql
SELECT COUNT(*) FROM penguins
COUNT(*) |
---|
344 |
%%sql
SELECT species, COUNT(*) AS count
FROM penguins
GROUP BY species
ORDER BY count DESC
species | count |
---|---|
Adelie | 152 |
Gentoo | 124 |
Chinstrap | 68 |
Parametrize queries#
JupySQL supports variable expansion in this format: {{variable}}
.
dynamic_limit = 5
dynamic_column = "island, sex"
%sql SELECT {{dynamic_column}} FROM penguins LIMIT {{dynamic_limit}}
island | sex |
---|---|
Torgersen | MALE |
Torgersen | FEMALE |
Torgersen | FEMALE |
Torgersen | None |
Torgersen | FEMALE |
CTEs#
Using JupySQL we can save query snippets, and use these saved snippets to form larger queries. Let’s see CTEs in action:
%%sql --save no_nulls --no-execute
SELECT *
FROM penguins
WHERE body_mass_g IS NOT NULL and
sex IS NOT NULL
%%sql
SELECT island, avg(body_mass_g) as avg_body_mass_g
FROM no_nulls
GROUP BY island;
Generating CTE with stored snippets : no_nulls
island | avg_body_mass_g |
---|---|
Torgersen | 3708.5106382978724 |
Biscoe | 4719.171779141105 |
Dream | 3718.9024390243903 |
The query gets compiled like so:
final = %sqlcmd snippets no_nulls
print(final)
WITH
SELECT *
FROM penguins
WHERE body_mass_g IS NOT NULL and
sex IS NOT NULL
Plotting#
%sqlplot histogram --table penguins --column bill_length_mm
<Axes: title={'center': "'bill_length_mm' from 'penguins'"}, xlabel='bill_length_mm', ylabel='Count'>
Clean up#
To ensure that the Python connector closes the session properly, execute connection.close()
before engine.dispose()
. This prevents the garbage collector from removing the resources required to communicate with Snowflake.
connection.close()
engine.dispose()