QuestDB#

This tutorial will show you how to get a QuestDB instance up and running locally to test JupySQL. You can run this in a Jupyter notebook.

Pre-requisites#

To run this tutorial, you need to install following Python packages:

%pip install jupysql pandas pyarrow --quiet
Note: you may need to restart the kernel to use updated packages.

You also need a PostgreSQL connector. We recommend using psycopg2. The easiest way to install it is via:

%pip install psycopg2-binary --quiet
Note: you may need to restart the kernel to use updated packages.

You also need Docker installed and running to start the QuestDB instance.

Start QuestDB instance#

We fetch the official image, create a new database, and user (this will take 1-2 minutes):

%%bash
docker run --detach --name questdb_ \
    -p 9000:9000 -p 9009:9009 -p 8812:8812 -p 9003:9003 questdb/questdb:7.1
0322c413699420adb1ccb136bc602d0a6514276df34778c90e60cf423ab8aac6

Our database is running, let’s load some data!

Load sample data#

Now, let’s fetch some sample data. We’ll be using the Penguins dataset:

import urllib.request

urllib.request.urlretrieve(
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv",
    "penguins.csv",
)
('penguins.csv', <http.client.HTTPMessage at 0x2dda0ff3810>)

Let’s create a new table penguins in our QuestDB instance and load this csv file into it (this will take about a minute)

import requests

with open("penguins.csv", "rb") as csv:
    file_data = csv.read()
    files = {"data": ("penguins", file_data)}
    response = requests.post("http://127.0.0.1:9000/imp", files=files)

Query#

Now, let’s start JupySQL, authenticate and start querying the data!

%load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql

Create a new connection using psycopg2

import psycopg2 as pg

engine = pg.connect(
    "dbname='qdb' user='admin' host='127.0.0.1' port='8812' password='quest'"
)

Initialize the connection

%sql engine

Important

If the cell above fails, you might have some missing packages. Message us on Slack and we’ll help you!

Note

QuestDB now supports a connection string via sqlalchemy:

%sql postgresql+psycopg2://admin:quest@localhost:8812/qdb

Let’s run our first queries to count and fetch some data

%%sql
SELECT COUNT(*) FROM penguins
*  "<connection object at 0x000002DD9FE81CF0; dsn: 'user=admin password=xxx dbname=qdb host=127.0.0.1 port=8812', closed: 0>"
1 rows affected.
count
344
%sql select * from penguins limit 5
*  "<connection object at 0x000002DD9FE81CF0; dsn: 'user=admin password=xxx dbname=qdb host=127.0.0.1 port=8812', closed: 0>"
5 rows affected.
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.5 17.4 186 3800 FEMALE
Adelie Torgersen 40.3 18.0 195 3250 FEMALE
Adelie Torgersen None None None None None
Adelie Torgersen 36.7 19.3 193 3450 FEMALE

Plotting#

To utilize JupySQL ggplot API, it is crucial to have valid data, so let’s remove null values.

%%sql --save no_nulls --no-execute
SELECT *
FROM penguins
WHERE body_mass_g IS NOT NULL and
sex IS NOT NULL
*  "<connection object at 0x000002DD9FE81CF0; dsn: 'user=admin password=xxx dbname=qdb host=127.0.0.1 port=8812', closed: 0>"
Skipping execution...
from sql.ggplot import ggplot, aes, geom_histogram

(
    ggplot(
        table="no_nulls",
        with_="no_nulls",
        mapping=aes(x=["bill_length_mm", "bill_depth_mm"]),
    )
    + geom_histogram(bins=50)
)
<sql.ggplot.ggplot.ggplot at 0x2dda0364910>
../_images/97db6bd7996a155d57f3886c09bbbce751be1ff2d278742a7f03b731674f26a8.png

Clean up#

! docker container ls
CONTAINER ID   IMAGE                 COMMAND                  CREATED          STATUS         PORTS                                                                                            NAMES
0322c4136994   questdb/questdb:7.1   "/docker-entrypoint.…"   10 seconds ago   Up 9 seconds   0.0.0.0:8812->8812/tcp, 0.0.0.0:9000->9000/tcp, 0.0.0.0:9003->9003/tcp, 0.0.0.0:9009->9009/tcp   questdb_
%%capture out
! docker ps -a -q --filter="name=questdb" --quiet
container_id = out.stdout.strip()
print(f"Container id: {container_id}")
Container id: 0322c4136994
! docker container stop {container_id}
0322c4136994
! docker container rm {container_id}
0322c4136994
! docker container ls
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES