Organizing large queries#

Required packages
pip install jupysql matplotlib

New in version 0.4.3

Note

This is a beta feature, please join our community and let us know how we can improve it!

JupySQL allows you to break queries into multiple cells, simplifying the process of building large queries.

As an example, we are using a sales database from a record store. We’ll find the artists that have produced the largest number of Rock and Metal songs.

Let’s load some data:

import urllib.request
from pathlib import Path

if not Path("my.db").is_file():
    url = "https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite"  # noqa
    urllib.request.urlretrieve(url, "my.db")

Initialize the extension and set autolimit=3 so we only retrieve a few rows.

%load_ext sql
%config SqlMagic.autolimit = 3

Let’s see the track-level information:

%%sql sqlite:///my.db
SELECT * FROM Track
Done.
TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice
1 For Those About To Rock (We Salute You) 1 1 1 Angus Young, Malcolm Young, Brian Johnson 343719 11170334 0.99
2 Balls to the Wall 2 2 1 None 342562 5510424 0.99
3 Fast As a Shark 3 2 1 F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman 230619 3990994 0.99

Let’s join track with album and artist to get the artist name and store the query using --save tracks_with_info.

Note: --save stores the query, not the data

%%sql --save tracks_with_info
SELECT t.*, a.title AS album, ar.Name as artist
FROM Track t
JOIN Album a
USING (AlbumId)
JOIN Artist ar
USING (ArtistId)
*  sqlite:///my.db
Done.
TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice album artist
1 For Those About To Rock (We Salute You) 1 1 1 Angus Young, Malcolm Young, Brian Johnson 343719 11170334 0.99 For Those About To Rock We Salute You AC/DC
2 Balls to the Wall 2 2 1 None 342562 5510424 0.99 Balls to the Wall Accept
3 Fast As a Shark 3 2 1 F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman 230619 3990994 0.99 Restless and Wild Accept

Let’s subset the genres we are interested in (Rock and Metal) and save the query.

%%sql --save genres_fav
SELECT * FROM Genre
WHERE Name
LIKE '%rock%'
OR Name LIKE '%metal%' 
*  sqlite:///my.db
Done.
GenreId Name
1 Rock
3 Metal
5 Rock And Roll

Now, join genres and tracks, so we only get Rock and Metal tracks.

Note that we are using --with; this will retrieve previously saved queries, and preprend them (using CTEs), then, we save the query in track_fav .

%%sql --with genres_fav --with tracks_with_info --save track_fav
SELECT t.*
FROM tracks_with_info t
JOIN genres_fav
ON t.GenreId = genres_fav.GenreId
*  sqlite:///my.db
Done.
TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice album artist
1 For Those About To Rock (We Salute You) 1 1 1 Angus Young, Malcolm Young, Brian Johnson 343719 11170334 0.99 For Those About To Rock We Salute You AC/DC
2 Balls to the Wall 2 2 1 None 342562 5510424 0.99 Balls to the Wall Accept
3 Fast As a Shark 3 2 1 F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman 230619 3990994 0.99 Restless and Wild Accept

We can now use track_fav (which contains Rock and Metal tracks). Let’s find which artists have produced the most tracks (and save the query):

%%sql --with track_fav --save top_artist
SELECT artist, COUNT(*) FROM track_fav
GROUP BY artist
ORDER BY COUNT(*) DESC
*  sqlite:///my.db
Done.
artist COUNT(*)
Iron Maiden 204
Led Zeppelin 114
U2 112

Let’s retrieve top_artist and plot the results:

top_artist = %sql --with top_artist SELECT * FROM top_artist
top_artist.bar()
*  sqlite:///my.db
Done.
<Axes: xlabel='artist', ylabel='COUNT(*)'>
_images/0fb66f615bc348beec966594cf10178d7d246b85fd46378161ef51160f455ee6.png

We can render the full query with the %sqlrender magic:

final = %sqlrender top_artist
print(final)
WITH genres_fav AS (SELECT * FROM Genre WHERE Name LIKE '%rock%' OR Name LIKE '%metal%'), tracks_with_info AS (SELECT t.*, a.title AS album, ar.Name AS artist FROM Track AS t JOIN Album AS a USING (AlbumId) JOIN Artist AS ar USING (ArtistId)), track_fav AS (SELECT t.* FROM tracks_with_info AS t JOIN genres_fav ON t.GenreId = genres_fav.GenreId) SELECT artist, COUNT(*) FROM track_fav GROUP BY artist ORDER BY COUNT(*) DESC

We can verify the retrieved query returns the same result:

%%sql
$final
*  sqlite:///my.db
(sqlite3.OperationalError) near "$final": syntax error
[SQL: $final]
(Background on this error at: https://sqlalche.me/e/14/e3q8)