Parameterizing SQL queries#
Changed in version 0.7: JupySQL uses Jinja templates for enabling SQL query parametrization. Queries are parametrized with {{variable}}
.
Note
The legacy formats of parametrization, namely {variable}
, and $variable
from ipython-sql
have been deprecated. :variable
is turned off by default but can be enabled with %config SqlMagic.named_parameters
(requires jupysql>=0.9
).
Parametrization via {{variable}}
#
JupySQL supports variable expansion in the form of {{variable}}
. This allows the user to write a query with placeholders that can be replaced by variables dynamically.
The benefits of using parametrized SQL queries are:
They can be reused with different values and for different purposes.
Such queries can be prepared ahead of time and reused without having to create distinct SQL queries for each scenario.
Parametrized queries can be used with dynamic data also.
Let’s load some data and connect to the in-memory DuckDB instance:
%load_ext sql
%sql duckdb://
%config SqlMagic.displaylimit = 3
from pathlib import Path
from urllib.request import urlretrieve
if not Path("penguins.csv").is_file():
urlretrieve(
"https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv",
"penguins.csv",
)
The simplest use case is to use a variable to determine which data to filter:
Data filtering#
sex = "MALE"
%%sql
SELECT *
FROM penguins.csv
WHERE sex = '{{sex}}'
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | MALE |
Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 | MALE |
Adelie | Torgersen | 39.2 | 19.6 | 195 | 4675 | MALE |
Note that we have to add quotes around {{sex}}
, since the literal is replaced.
{{variable}}
parameters are not limited to WHERE
clauses, you can use them anywhere:
dynamic_limit = 5
dynamic_column = "island, sex"
%sql SELECT {{dynamic_column}} FROM penguins.csv LIMIT {{dynamic_limit}}
island | sex |
---|---|
Torgersen | MALE |
Torgersen | FEMALE |
Torgersen | FEMALE |
SQL generation#
Note
We use jinja to parametrize queries, to learn more about the syntax, check our their docs.
Since there are no restrictions on where you can use {{variable}}
you can use it to dynamically generate SQL if you also use advanced control structures.
Let’s look at generating SQL queries using a {% for %}
loop. First, we’ll create a set of unique sex
values. This is required since the dataset contains samples for which sex
couldn’t be determined (null
).
sex = ("MALE", "FEMALE")
Then, we’ll set a list of islands of interest, and for each island calculate the average body_mass_g
of all penguins belonging to that island.
%%sql --save avg_body_mass
{% set islands = ["Torgersen", "Biscoe", "Dream"] %}
select
sex,
{% for island in islands %}
avg(case when island = '{{island}}' then body_mass_g end) as {{island}}_body_mass_g,
{% endfor %}
from penguins.csv
where sex in {{sex}}
group by sex
sex | Torgersen_body_mass_g | Biscoe_body_mass_g | Dream_body_mass_g |
---|---|---|---|
MALE | 4034.782608695652 | 5104.518072289156 | 3987.0967741935483 |
FEMALE | 3395.8333333333335 | 4319.375 | 3446.311475409836 |
Here’s the final compiled query:
final = %sqlcmd snippets avg_body_mass
print(final)
select
sex,
avg(case when island = 'Torgersen' then body_mass_g end) as Torgersen_body_mass_g,
avg(case when island = 'Biscoe' then body_mass_g end) as Biscoe_body_mass_g,
avg(case when island = 'Dream' then body_mass_g end) as Dream_body_mass_g,
from penguins.csv
where sex in ('MALE', 'FEMALE')
group by sex
SQL generation with macros#
If {% for %}
lops are not enough, you can modularize your code generation even more with macros.
macros is a construct analogous to functions that promote re-usability. We’ll first define a macro for converting a value from millimetre
to centimetre
. And then use this macro in the query using variable expansion.
%%sql --save convert
{% macro mm_to_cm(column_name, precision=2) %}
({{ column_name }} / 10)::numeric(16, {{ precision }})
{% endmacro %}
select
sex, island,
{{ mm_to_cm('bill_length_mm') }} as bill_length_cm,
{{ mm_to_cm('bill_depth_mm') }} as bill_length_cm,
from penguins.csv
sex | island | bill_length_cm | bill_length_cm_1 |
---|---|---|---|
MALE | Torgersen | 3.91 | 1.87 |
FEMALE | Torgersen | 3.95 | 1.74 |
FEMALE | Torgersen | 4.03 | 1.80 |
Let’s see the final rendered query:
final = %sqlcmd snippets convert
print(final)
select
sex, island,
(bill_length_mm / 10)::numeric(16, 2)
as bill_length_cm,
(bill_depth_mm / 10)::numeric(16, 2)
as bill_length_cm,
from penguins.csv
Using snippets#
You can combine the snippets feature with {{variable}}
:
species = "Adelie"
%%sql --save one_species --no-execute
SELECT * FROM penguins.csv
WHERE species = '{{species}}'
%%sql
SELECT *
FROM one_species
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | MALE |
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | FEMALE |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | FEMALE |
Important
When storing a snippet with {{variable}}
, the values are replaced upon saving, so assigning a new value to variable
won’t have any effect.
species = "Gentoo"
%%sql
SELECT *
FROM one_species
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | MALE |
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | FEMALE |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | FEMALE |
Combining Python and {{variable}}
#
You can combine Python code with the %sql
magic to execute parametrized queries.
Let’s see how we can create multiple tables, each one containing the penguins for a given island
.
for island in ("Torgersen", "Biscoe", "Dream"):
%sql CREATE TABLE {{island}} AS (SELECT * from penguins.csv WHERE island = '{{island}}')
%sqlcmd tables
Name |
---|
Torgersen |
Biscoe |
Dream |
Let’s verify data in one of the tables:
%sql SELECT * FROM Dream;
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
---|---|---|---|---|---|---|
Adelie | Dream | 39.5 | 16.7 | 178 | 3250 | FEMALE |
Adelie | Dream | 37.2 | 18.1 | 178 | 3900 | MALE |
Adelie | Dream | 39.5 | 17.8 | 188 | 3300 | FEMALE |
%sql SELECT * FROM Torgersen;
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | MALE |
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | FEMALE |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | FEMALE |
Parametrization via :variable
#
Changed in version 0.10.9.
There is a second method to parametrize variables via :variable
. This method has the following limitations
Only available for SQLAlchemy connections
Only works for data filtering parameters (
WHERE
,IN
,>=
, etc.)
To enable it:
%config SqlMagic.named_parameters = "enabled"
sex = "MALE"
%%sql
SELECT *
FROM penguins.csv
WHERE sex = :sex
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | MALE |
Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 | MALE |
Adelie | Torgersen | 39.2 | 19.6 | 195 | 4675 | MALE |
Note that we don’t have to quote :sex
. When using :variable
, if variable
is a string, it’ll automatically be quoted.
Here’s another example where we use the parameters for an IN
and a >=
clauses:
one = "Adelie"
another = "Chinstrap"
min_body_mass_g = 4500
%%sql
SELECT *
FROM penguins.csv
WHERE species IN (:one, :another)
AND body_mass_g >= :min_body_mass_g
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
---|---|---|---|---|---|---|
Adelie | Torgersen | 39.2 | 19.6 | 195 | 4675 | MALE |
Adelie | Torgersen | 42.5 | 20.7 | 197 | 4500 | MALE |
Adelie | Dream | 39.8 | 19.1 | 184 | 4650 | MALE |
Parametrizing other parts of the query like table names or column names won’t work.
tablename = "penguins.csv"
%%sql
SELECT *
FROM :tablename
RuntimeError: If using snippets, you may pass the --with argument explicitly.
For more details please refer: https://jupysql.ploomber.io/en/latest/compose.html#with-argument
Original error message from DB driver:
(duckdb.ParserException) Parser Error: syntax error at or near "?"
LINE 2: FROM ?
^
[SQL: SELECT *
FROM ?]
[parameters: ('penguins.csv',)]
(Background on this error at: https://sqlalche.me/e/20/f405)
If you need help solving this issue, send us a message: https://ploomber.io/community
Using snippets and :variable
#
Unlike {{variable}
, :variable
parameters are evaluated at execution time, meaning you can --save
a query and the output will change depending on the value of variable
when the query is executed:
sex = "MALE"
%%sql --save one_sex
SELECT *
FROM penguins.csv
WHERE sex = :sex
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | MALE |
Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 | MALE |
Adelie | Torgersen | 39.2 | 19.6 | 195 | 4675 | MALE |
sex = "FEMALE"
%%sql
SELECT * FROM one_sex
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
---|---|---|---|---|---|---|
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | FEMALE |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | FEMALE |
Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 | FEMALE |
Disabling named parameters#
Sometimes, valid SQL can contain instances of :x
which should not be mistaken as named parameters.
In this case, you may want to disable named parameters:
%config SqlMagic.named_parameters = "disabled"
This can be helpful when executing statements which include JSON or other DB-specific syntax.