Quick Start

Quick Start#

JupySQL allows you to run SQL and plot large datasets in Jupyter via a %sql, %%sql, and %sqlplot magics. JupySQL is compatible with all major databases (e.g., PostgreSQL, MySQL, SQL Server), data warehouses (e.g., Snowflake, BigQuery, Redshift), and embedded engines (SQLite, and DuckDB).

It is a fork of ipython-sql with many bug fixes and a lot of great new features!

Installation#

Run this on your terminal (we’ll use DuckDB for this example):

pip install jupysql duckdb-engine

Or the following in a Jupyter notebook:

%pip install jupysql duckdb-engine --quiet

Note: you may need to restart the kernel to use updated packages.

You might also install it from conda:

conda install jupysql -c conda-forge

Setup#

Tip

If you are unfamiliar with Jupyter magics, you can refer to our FAQ. Also, you can view the documentation and command line arguments of any magic command by running %magic? like %sql? or %sqlplot?.

Load the extension:

%load_ext sql

Let’s download some sample .csv data:

from pathlib import Path
from urllib.request import urlretrieve

if not Path("penguins.csv").is_file():
    urlretrieve(
        "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv",
        "penguins.csv",
    )

Start a DuckDB in-memory database:

%sql duckdb://

Connecting to 'duckdb://'

Tip

You can create as many connections as you want. Pass an --alias {alias} to easily switch them or close them.

Querying#

For short queries, you can write them in a single line via the %sql line magic:

%sql SELECT * FROM penguins.csv LIMIT 3

Running query in 'duckdb://'

species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
Adelie	Torgersen	39.1	18.7	181	3750	MALE
Adelie	Torgersen	39.5	17.4	186	3800	FEMALE
Adelie	Torgersen	40.3	18.0	195	3250	FEMALE

For longer queries, you can break them down into multiple lines using the %%sql cell magic:

%%sql
SELECT *
FROM penguins.csv
WHERE bill_length_mm > 40
LIMIT 3

Running query in 'duckdb://'

species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
Adelie	Torgersen	40.3	18.0	195	3250	FEMALE
Adelie	Torgersen	42.0	20.2	190	4250	None
Adelie	Torgersen	41.1	17.6	182	3200	FEMALE

Saving queries#

%%sql --save not_nulls --no-execute
SELECT *
FROM penguins.csv
WHERE bill_length_mm IS NOT NULL
AND bill_depth_mm IS NOT NULL

Running query in 'duckdb://'

Skipping execution...

Plotting#

%sqlplot boxplot --column bill_length_mm bill_depth_mm --table not_nulls

<Axes: title={'center': "Boxplot from 'not_nulls'"}>

_images/1016d01aae2bff0e6676bcb3e4de833e8ca0e53433a67db541a823ed9b237aa9.png

%sqlplot histogram --column bill_length_mm bill_depth_mm --table not_nulls

<Axes: title={'center': "Histogram from 'not_nulls'"}, ylabel='Count'>

_images/e8359a93cc78fe2ecf5712d238bf6d4c3ca80ac982a49942acfb3e5828be6f04.png

`pandas` integration#

result = %sql SELECT * FROM penguins.csv

Running query in 'duckdb://'

df = result.DataFrame()

df.head()

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	MALE
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	FEMALE
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	FEMALE
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	FEMALE

Quick Start

Contents

Quick Start#

Installation#

Setup#

Querying#

Saving queries#

Plotting#

pandas integration#

`pandas` integration#