Parameterizing SQL queries#

Changed in version 0.7: JupySQL uses Jinja templates for enabling SQL query parametrization. Queries are parametrized with {{variable}}.

Note

The legacy formats of parametrization, namely {variable}, and $variable from ipython-sql have been deprecated. :variable is turned off by default but can be enabled with %config SqlMagic.named_parameters (requires jupysql>=0.9).

Parametrization via {{variable}}#

JupySQL supports variable expansion in the form of {{variable}}. This allows the user to write a query with placeholders that can be replaced by variables dynamically.

The benefits of using parametrized SQL queries are:

  • They can be reused with different values and for different purposes.

  • Such queries can be prepared ahead of time and reused without having to create distinct SQL queries for each scenario.

  • Parametrized queries can be used with dynamic data also.

Let’s load some data and connect to the in-memory DuckDB instance:

%load_ext sql
%sql duckdb://
%config SqlMagic.displaylimit = 3
Connecting to 'duckdb://'
from pathlib import Path
from urllib.request import urlretrieve

if not Path("penguins.csv").is_file():
    urlretrieve(
        "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv",
        "penguins.csv",
    )

The simplest use case is to use a variable to determine which data to filter:

Data filtering#

sex = "MALE"
%%sql
SELECT *
FROM penguins.csv
WHERE  sex = '{{sex}}'
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.3 20.6 190 3650 MALE
Adelie Torgersen 39.2 19.6 195 4675 MALE
Truncated to displaylimit of 3.

Note that we have to add quotes around {{sex}}, since the literal is replaced.

{{variable}} parameters are not limited to WHERE clauses, you can use them anywhere:

dynamic_limit = 5
dynamic_column = "island, sex"
%sql SELECT {{dynamic_column}} FROM penguins.csv LIMIT {{dynamic_limit}}
Running query in 'duckdb://'
island sex
Torgersen MALE
Torgersen FEMALE
Torgersen FEMALE
Truncated to displaylimit of 3.

SQL generation#

Note

We use jinja to parametrize queries, to learn more about the syntax, check our their docs.

Since there are no restrictions on where you can use {{variable}} you can use it to dynamically generate SQL if you also use advanced control structures.

Let’s look at generating SQL queries using a {% for %} loop. First, we’ll create a set of unique sex values. This is required since the dataset contains samples for which sex couldn’t be determined (null).

sex = ("MALE", "FEMALE")

Then, we’ll set a list of islands of interest, and for each island calculate the average body_mass_g of all penguins belonging to that island.

%%sql --save avg_body_mass
{% set islands = ["Torgersen", "Biscoe", "Dream"] %}
select
    sex,
    {% for island in islands %}
    avg(case when island = '{{island}}' then body_mass_g end) as {{island}}_body_mass_g,
    {% endfor %}
from penguins.csv
where sex in {{sex}}
group by sex 
Running query in 'duckdb://'
sex Torgersen_body_mass_g Biscoe_body_mass_g Dream_body_mass_g
MALE 4034.782608695652 5104.518072289156 3987.0967741935483
FEMALE 3395.8333333333335 4319.375 3446.311475409836

Here’s the final compiled query:

final = %sqlcmd snippets avg_body_mass
print(final)
select
    sex,

    avg(case when island = 'Torgersen' then body_mass_g end) as Torgersen_body_mass_g,

    avg(case when island = 'Biscoe' then body_mass_g end) as Biscoe_body_mass_g,

    avg(case when island = 'Dream' then body_mass_g end) as Dream_body_mass_g,

from penguins.csv
where sex in ('MALE', 'FEMALE')
group by sex

SQL generation with macros#

If {% for %} lops are not enough, you can modularize your code generation even more with macros.

macros is a construct analogous to functions that promote re-usability. We’ll first define a macro for converting a value from millimetre to centimetre. And then use this macro in the query using variable expansion.

%%sql --save convert
{% macro mm_to_cm(column_name, precision=2) %}
    ({{ column_name }} / 10)::numeric(16, {{ precision }})
{% endmacro %}

select
  sex, island,
  {{ mm_to_cm('bill_length_mm') }} as bill_length_cm,
  {{ mm_to_cm('bill_depth_mm') }} as bill_length_cm,
from penguins.csv
Running query in 'duckdb://'
sex island bill_length_cm bill_length_cm_1
MALE Torgersen 3.91 1.87
FEMALE Torgersen 3.95 1.74
FEMALE Torgersen 4.03 1.80
Truncated to displaylimit of 3.

Let’s see the final rendered query:

final = %sqlcmd snippets convert
print(final)
select
  sex, island,

    (bill_length_mm / 10)::numeric(16, 2)
 as bill_length_cm,

    (bill_depth_mm / 10)::numeric(16, 2)
 as bill_length_cm,
from penguins.csv

Using snippets#

You can combine the snippets feature with {{variable}}:

species = "Adelie"
%%sql --save one_species --no-execute
SELECT * FROM penguins.csv
WHERE species = '{{species}}'
Running query in 'duckdb://'
Skipping execution...
%%sql
SELECT *
FROM one_species
Generating CTE with stored snippets: 'one_species'
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.5 17.4 186 3800 FEMALE
Adelie Torgersen 40.3 18.0 195 3250 FEMALE
Truncated to displaylimit of 3.

Important

When storing a snippet with {{variable}}, the values are replaced upon saving, so assigning a new value to variable won’t have any effect.

species = "Gentoo"
%%sql
SELECT *
FROM one_species
Generating CTE with stored snippets: 'one_species'
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.5 17.4 186 3800 FEMALE
Adelie Torgersen 40.3 18.0 195 3250 FEMALE
Truncated to displaylimit of 3.

Combining Python and {{variable}}#

You can combine Python code with the %sql magic to execute parametrized queries.

Let’s see how we can create multiple tables, each one containing the penguins for a given island.

for island in ("Torgersen", "Biscoe", "Dream"):
    %sql CREATE TABLE {{island}} AS (SELECT * from penguins.csv WHERE island = '{{island}}')
Running query in 'duckdb://'
Running query in 'duckdb://'
Running query in 'duckdb://'
%sqlcmd tables
Name
Torgersen
Biscoe
Dream

Let’s verify data in one of the tables:

%sql SELECT * FROM Dream;
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Dream 39.5 16.7 178 3250 FEMALE
Adelie Dream 37.2 18.1 178 3900 MALE
Adelie Dream 39.5 17.8 188 3300 FEMALE
Truncated to displaylimit of 3.
%sql SELECT * FROM Torgersen;
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.5 17.4 186 3800 FEMALE
Adelie Torgersen 40.3 18.0 195 3250 FEMALE
Truncated to displaylimit of 3.

Parametrization via :variable#

Changed in version 0.10.9.

There is a second method to parametrize variables via :variable. This method has the following limitations

  • Only available for SQLAlchemy connections

  • Only works for data filtering parameters (WHERE, IN, >=, etc.)

To enable it:

%config SqlMagic.named_parameters = "enabled"
sex = "MALE"
%%sql
SELECT *
FROM penguins.csv
WHERE sex = :sex
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.3 20.6 190 3650 MALE
Adelie Torgersen 39.2 19.6 195 4675 MALE
Truncated to displaylimit of 3.

Note that we don’t have to quote :sex. When using :variable, if variable is a string, it’ll automatically be quoted.

Here’s another example where we use the parameters for an IN and a >= clauses:

one = "Adelie"
another = "Chinstrap"
min_body_mass_g = 4500
%%sql
SELECT *
FROM penguins.csv
WHERE species IN (:one, :another)
AND body_mass_g >= :min_body_mass_g
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.2 19.6 195 4675 MALE
Adelie Torgersen 42.5 20.7 197 4500 MALE
Adelie Dream 39.8 19.1 184 4650 MALE
Truncated to displaylimit of 3.

Parametrizing other parts of the query like table names or column names won’t work.

tablename = "penguins.csv"
%%sql
SELECT *
FROM :tablename
Running query in 'duckdb://'
RuntimeError: If using snippets, you may pass the --with argument explicitly.
For more details please refer: https://jupysql.ploomber.io/en/latest/compose.html#with-argument


Original error message from DB driver:
(duckdb.ParserException) Parser Error: syntax error at or near "?"
LINE 2: FROM ?
             ^
[SQL: SELECT *
FROM ?]
[parameters: ('penguins.csv',)]
(Background on this error at: https://sqlalche.me/e/20/f405)

If you need help solving this issue, send us a message: https://ploomber.io/community

Using snippets and :variable#

Unlike {{variable}, :variable parameters are evaluated at execution time, meaning you can --save a query and the output will change depending on the value of variable when the query is executed:

sex = "MALE"
%%sql --save one_sex
SELECT *
FROM penguins.csv
WHERE sex = :sex
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.3 20.6 190 3650 MALE
Adelie Torgersen 39.2 19.6 195 4675 MALE
Truncated to displaylimit of 3.
sex = "FEMALE"
%%sql
SELECT * FROM one_sex
Generating CTE with stored snippets: 'one_sex'
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.5 17.4 186 3800 FEMALE
Adelie Torgersen 40.3 18.0 195 3250 FEMALE
Adelie Torgersen 36.7 19.3 193 3450 FEMALE
Truncated to displaylimit of 3.

Disabling named parameters#

Sometimes, valid SQL can contain instances of :x which should not be mistaken as named parameters. In this case, you may want to disable named parameters:

%config SqlMagic.named_parameters = "disabled"

This can be helpful when executing statements which include JSON or other DB-specific syntax.