Organizing large queries#
Required packages
pip install jupysql matplotlib
New in version 0.4.3
Note
This is a beta feature, please join our community and let us know how we can improve it!
JupySQL allows you to break queries into multiple cells, simplifying the process of building large queries.
As an example, we are using a sales database from a record store. We’ll find the artists that have produced the largest number of Rock and Metal songs.
Let’s load some data:
import urllib.request
from pathlib import Path
if not Path("my.db").is_file():
url = "https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite" # noqa
urllib.request.urlretrieve(url, "my.db")
Initialize the extension and set autolimit=3
so we only retrieve a few rows.
%load_ext sql
%config SqlMagic.autolimit = 3
Let’s see the track-level information:
%%sql sqlite:///my.db
SELECT * FROM Track
Done.
TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice |
---|---|---|---|---|---|---|---|---|
1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 |
2 | Balls to the Wall | 2 | 2 | 1 | None | 342562 | 5510424 | 0.99 |
3 | Fast As a Shark | 3 | 2 | 1 | F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman | 230619 | 3990994 | 0.99 |
Let’s join track with album and artist to get the artist name and store the query using --save tracks_with_info
.
Note: --save
stores the query, not the data
%%sql --save tracks_with_info
SELECT t.*, a.title AS album, ar.Name as artist
FROM Track t
JOIN Album a
USING (AlbumId)
JOIN Artist ar
USING (ArtistId)
* sqlite:///my.db
Done.
TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice | album | artist |
---|---|---|---|---|---|---|---|---|---|---|
1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 | For Those About To Rock We Salute You | AC/DC |
2 | Balls to the Wall | 2 | 2 | 1 | None | 342562 | 5510424 | 0.99 | Balls to the Wall | Accept |
3 | Fast As a Shark | 3 | 2 | 1 | F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman | 230619 | 3990994 | 0.99 | Restless and Wild | Accept |
Let’s subset the genres we are interested in (Rock and Metal) and save the query.
%%sql --save genres_fav
SELECT * FROM Genre
WHERE Name
LIKE '%rock%'
OR Name LIKE '%metal%'
* sqlite:///my.db
Done.
GenreId | Name |
---|---|
1 | Rock |
3 | Metal |
5 | Rock And Roll |
Now, join genres and tracks, so we only get Rock and Metal tracks.
Note that we are using --with
; this will retrieve previously saved queries, and preprend them (using CTEs), then, we save the query in track_fav
.
%%sql --with genres_fav --with tracks_with_info --save track_fav
SELECT t.*
FROM tracks_with_info t
JOIN genres_fav
ON t.GenreId = genres_fav.GenreId
* sqlite:///my.db
Done.
TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice | album | artist |
---|---|---|---|---|---|---|---|---|---|---|
1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 | For Those About To Rock We Salute You | AC/DC |
2 | Balls to the Wall | 2 | 2 | 1 | None | 342562 | 5510424 | 0.99 | Balls to the Wall | Accept |
3 | Fast As a Shark | 3 | 2 | 1 | F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman | 230619 | 3990994 | 0.99 | Restless and Wild | Accept |
We can now use track_fav
(which contains Rock and Metal tracks). Let’s find which artists have produced the most tracks (and save the query):
%%sql --with track_fav --save top_artist
SELECT artist, COUNT(*) FROM track_fav
GROUP BY artist
ORDER BY COUNT(*) DESC
* sqlite:///my.db
Done.
artist | COUNT(*) |
---|---|
Iron Maiden | 204 |
Led Zeppelin | 114 |
U2 | 112 |
Let’s retrieve top_artist
and plot the results:
top_artist = %sql --with top_artist SELECT * FROM top_artist
top_artist.bar()
* sqlite:///my.db
Done.
<Axes: xlabel='artist', ylabel='COUNT(*)'>

We can render the full query with the %sqlrender
magic:
final = %sqlrender top_artist
print(final)
WITH genres_fav AS (SELECT * FROM Genre WHERE Name LIKE '%rock%' OR Name LIKE '%metal%'), tracks_with_info AS (SELECT t.*, a.title AS album, ar.Name AS artist FROM Track AS t JOIN Album AS a USING (AlbumId) JOIN Artist AS ar USING (ArtistId)), track_fav AS (SELECT t.* FROM tracks_with_info AS t JOIN genres_fav ON t.GenreId = genres_fav.GenreId) SELECT artist, COUNT(*) FROM track_fav GROUP BY artist ORDER BY COUNT(*) DESC
We can verify the retrieved query returns the same result:
%%sql
$final
* sqlite:///my.db
(sqlite3.OperationalError) near "$final": syntax error
[SQL: $final]
(Background on this error at: https://sqlalche.me/e/14/e3q8)