Parameterizing SQL queries#

Changed in version 0.7: JupySQL uses Jinja templates for enabling SQL query parametrization. Queries are parametrized with {{variable}}.

Note

The legacy formats of parametrization, namely {variable}, and $variable from ipython-sql have been deprecated. :variable is turned off by default but can be enabled with %config SqlMagic.named_parameters (requires jupysql>=0.9).

Parametrization via {{variable}}#

JupySQL supports variable expansion in the form of {{variable}}. This allows the user to write a query with placeholders that can be replaced by variables dynamically.

The benefits of using parametrized SQL queries are:

  • They can be reused with different values and for different purposes.

  • Such queries can be prepared ahead of time and reused without having to create distinct SQL queries for each scenario.

  • Parametrized queries can be used with dynamic data also.

Let’s load some data and connect to the in-memory DuckDB instance:

%load_ext sql
%sql duckdb://
%config SqlMagic.displaylimit = 3
from pathlib import Path
from urllib.request import urlretrieve

if not Path("penguins.csv").is_file():
    urlretrieve(
        "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv",
        "penguins.csv",
    )

The simplest use case is to use a variable to determine which data to filter:

Data filtering#

sex = "MALE"
%%sql
SELECT *
FROM penguins.csv
WHERE  sex = '{{sex}}'
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.3 20.6 190 3650 MALE
Adelie Torgersen 39.2 19.6 195 4675 MALE
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()
Truncated to displaylimit of 3
If you want to see more, please visit displaylimit configuration

Note that we have to add quotes around {{sex}}, since the literal is replaced.

{{variable}} parameters are not limited to WHERE clauses, you can use them anywhere:

dynamic_limit = 5
dynamic_column = "island, sex"
%sql SELECT {{dynamic_column}} FROM penguins.csv LIMIT {{dynamic_limit}}
Running query in 'duckdb://'
island sex
Torgersen MALE
Torgersen FEMALE
Torgersen FEMALE
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()
Truncated to displaylimit of 3
If you want to see more, please visit displaylimit configuration

SQL generation#

Note

We use jinja to parametrize queries, to learn more about the syntax, check our their docs.

Since there are no restrictions on where you can use {{variable}} you can use it to dynamically generate SQL if you also use advanced control structures.

Let’s look at generating SQL queries using a {% for %} loop. First, we’ll create a set of unique sex values. This is required since the dataset contains samples for which sex couldn’t be determined (null).

sex = ("MALE", "FEMALE")

Then, we’ll set a list of islands of interest, and for each island calculate the average body_mass_g of all penguins belonging to that island.

%%sql --save avg_body_mass
{% set islands = ["Torgersen", "Biscoe", "Dream"] %}
select
    sex,
    {% for island in islands %}
    avg(case when island = '{{island}}' then body_mass_g end) as {{island}}_body_mass_g,
    {% endfor %}
from penguins.csv
where sex in {{sex}}
group by sex 
Running query in 'duckdb://'
sex Torgersen_body_mass_g Biscoe_body_mass_g Dream_body_mass_g
MALE 4034.782608695652 5104.518072289156 3987.0967741935483
FEMALE 3395.8333333333335 4319.375 3446.311475409836
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()

Here’s the final compiled query:

final = %sqlcmd snippets avg_body_mass
print(final)
select
    sex,
    
    avg(case when island = 'Torgersen' then body_mass_g end) as Torgersen_body_mass_g,
    
    avg(case when island = 'Biscoe' then body_mass_g end) as Biscoe_body_mass_g,
    
    avg(case when island = 'Dream' then body_mass_g end) as Dream_body_mass_g,
    
from penguins.csv
where sex in ('MALE', 'FEMALE')
group by sex

SQL generation with macros#

If {% for %} lops are not enough, you can modularize your code generation even more with macros.

macros is a construct analogous to functions that promote re-usability. We’ll first define a macro for converting a value from millimetre to centimetre. And then use this macro in the query using variable expansion.

%%sql --save convert
{% macro mm_to_cm(column_name, precision=2) %}
    ({{ column_name }} / 10)::numeric(16, {{ precision }})
{% endmacro %}

select
  sex, island,
  {{ mm_to_cm('bill_length_mm') }} as bill_length_cm,
  {{ mm_to_cm('bill_depth_mm') }} as bill_length_cm,
from penguins.csv
Running query in 'duckdb://'
sex island bill_length_cm bill_length_cm_1
MALE Torgersen 3.91 1.87
FEMALE Torgersen 3.95 1.74
FEMALE Torgersen 4.03 1.80
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()
Truncated to displaylimit of 3
If you want to see more, please visit displaylimit configuration

Let’s see the final rendered query:

final = %sqlcmd snippets convert
print(final)
select
  sex, island,
  
    (bill_length_mm / 10)::numeric(16, 2)
 as bill_length_cm,
  
    (bill_depth_mm / 10)::numeric(16, 2)
 as bill_length_cm,
from penguins.csv

Using snippets#

You can combine the snippets feature with {{variable}}:

species = "Adelie"
%%sql --save one_species --no-execute
SELECT * FROM penguins.csv
WHERE species = '{{species}}'
Running query in 'duckdb://'
Skipping execution...
%%sql
SELECT *
FROM one_species
Generating CTE with stored snippets: 'one_species'
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.5 17.4 186 3800 FEMALE
Adelie Torgersen 40.3 18.0 195 3250 FEMALE
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()
Truncated to displaylimit of 3
If you want to see more, please visit displaylimit configuration

Important

When storing a snippet with {{variable}}, the values are replaced upon saving, so assigning a new value to variable won’t have any effect.

species = "Gentoo"
%%sql
SELECT *
FROM one_species
Generating CTE with stored snippets: 'one_species'
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.5 17.4 186 3800 FEMALE
Adelie Torgersen 40.3 18.0 195 3250 FEMALE
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()
Truncated to displaylimit of 3
If you want to see more, please visit displaylimit configuration

Combining Python and {{variable}}#

You can combine Python code with the %sql magic to execute parametrized queries.

Let’s see how we can create multiple tables, each one containing the penguins for a given island.

for island in ("Torgersen", "Biscoe", "Dream"):
    %sql CREATE TABLE {{island}} AS (SELECT * from penguins.csv WHERE island = '{{island}}')
Running query in 'duckdb://'
Running query in 'duckdb://'
Running query in 'duckdb://'
%sqlcmd tables
Name
Torgersen
Biscoe
Dream

Let’s verify data in one of the tables:

%sql SELECT * FROM Dream;
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Dream 39.5 16.7 178 3250 FEMALE
Adelie Dream 37.2 18.1 178 3900 MALE
Adelie Dream 39.5 17.8 188 3300 FEMALE
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()
Truncated to displaylimit of 3
If you want to see more, please visit displaylimit configuration
%sql SELECT * FROM Torgersen;
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.5 17.4 186 3800 FEMALE
Adelie Torgersen 40.3 18.0 195 3250 FEMALE
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()
Truncated to displaylimit of 3
If you want to see more, please visit displaylimit configuration

Parametrization via :variable#

New in version 0.9.

There is a second method to parametrize variables via :variable. This method has the following limitations

  • Only available for SQLAlchemy connections

  • Only works for data filtering parameters (WHERE, IN, >=, etc.)

To enable it:

%config SqlMagic.named_parameters = True
sex = "MALE"
%%sql
SELECT *
FROM penguins.csv
WHERE sex = :sex
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.3 20.6 190 3650 MALE
Adelie Torgersen 39.2 19.6 195 4675 MALE
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()
Truncated to displaylimit of 3
If you want to see more, please visit displaylimit configuration

Note that we don’t have to quote :sex. When using :variable, if variable is a string, it’ll automatically be quoted.

Here’s another example where we use the parameters for an IN and a >= clauses:

one = "Adelie"
another = "Chinstrap"
min_body_mass_g = 4500
%%sql
SELECT *
FROM penguins.csv
WHERE species IN (:one, :another)
AND body_mass_g >= :min_body_mass_g
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.2 19.6 195 4675 MALE
Adelie Torgersen 42.5 20.7 197 4500 MALE
Adelie Dream 39.8 19.1 184 4650 MALE
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()
Truncated to displaylimit of 3
If you want to see more, please visit displaylimit configuration

Parametrizing other parts of the query like table names or column names won’t work.

tablename = "penguins.csv"
%%sql
SELECT *
FROM :tablename
Running query in 'duckdb://'
RuntimeError: If using snippets, you may pass the --with argument explicitly.
For more details please refer: https://jupysql.ploomber.io/en/latest/compose.html#with-argument


Original error message from DB driver:
(duckdb.ParserException) Parser Error: syntax error at or near "?"
LINE 2: FROM ?
             ^
[SQL: SELECT *
FROM ?]
[parameters: ('penguins.csv',)]
(Background on this error at: https://sqlalche.me/e/20/f405)

If you need help solving this issue, send us a message: https://ploomber.io/community

Using snippets and :variable#

Unlike {{variable}, :variable parameters are evaluated at execution time, meaning you can --save a query and the output will change depending on the value of variable when the query is executed:

sex = "MALE"
%%sql --save one_sex
SELECT *
FROM penguins.csv
WHERE sex = :sex
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.3 20.6 190 3650 MALE
Adelie Torgersen 39.2 19.6 195 4675 MALE
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()
Truncated to displaylimit of 3
If you want to see more, please visit displaylimit configuration
sex = "FEMALE"
%%sql
SELECT * FROM one_sex
Generating CTE with stored snippets: 'one_sex'
Running query in 'duckdb://'
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.5 17.4 186 3800 FEMALE
Adelie Torgersen 40.3 18.0 195 3250 FEMALE
Adelie Torgersen 36.7 19.3 193 3450 FEMALE
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()
Truncated to displaylimit of 3
If you want to see more, please visit displaylimit configuration