QuestDB

QuestDB#

This tutorial will show you how to get a QuestDB instance up and running locally to test JupySQL. You can run this in a Jupyter notebook.

Pre-requisites#

To run this tutorial, you need to install following Python packages:

%pip install jupysql pandas pyarrow --quiet

Note: you may need to restart the kernel to use updated packages.

You also need a PostgreSQL connector. We recommend using psycopg2. The easiest way to install it is via:

%pip install psycopg2-binary --quiet

Note: you may need to restart the kernel to use updated packages.

You also need Docker installed and running to start the QuestDB instance.

Start QuestDB instance#

We fetch the official image, create a new database, and user (this will take 1-2 minutes):

%%bash
docker run --detach --name questdb_ \
    -p 9000:9000 -p 9009:9009 -p 8812:8812 -p 9003:9003 questdb/questdb:7.1

0322c413699420adb1ccb136bc602d0a6514276df34778c90e60cf423ab8aac6

Our database is running, let’s load some data!

Load sample data#

Now, let’s fetch some sample data. We’ll be using the Penguins dataset:

import urllib.request

urllib.request.urlretrieve(
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv",
    "penguins.csv",
)

('penguins.csv', <http.client.HTTPMessage at 0x2dda0ff3810>)

Let’s create a new table penguins in our QuestDB instance and load this csv file into it (this will take about a minute)

import requests

with open("penguins.csv", "rb") as csv:
    file_data = csv.read()
    files = {"data": ("penguins", file_data)}
    response = requests.post("http://127.0.0.1:9000/imp", files=files)

Query#

Now, let’s start JupySQL, authenticate and start querying the data!

%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql

Create a new connection using psycopg2

import psycopg2 as pg

engine = pg.connect(
    "dbname='qdb' user='admin' host='127.0.0.1' port='8812' password='quest'"
)

Initialize the connection

%sql engine

Important

If the cell above fails, you might have some missing packages. Message us on Slack and we’ll help you!

Note

QuestDB now supports a connection string via sqlalchemy:

%sql postgresql+psycopg2://admin:quest@localhost:8812/qdb

Let’s run our first queries to count and fetch some data

%%sql
SELECT COUNT(*) FROM penguins

*  "<connection object at 0x000002DD9FE81CF0; dsn: 'user=admin password=xxx dbname=qdb host=127.0.0.1 port=8812', closed: 0>"
1 rows affected.

count
344

%sql select * from penguins limit 5

*  "<connection object at 0x000002DD9FE81CF0; dsn: 'user=admin password=xxx dbname=qdb host=127.0.0.1 port=8812', closed: 0>"
5 rows affected.

species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
Adelie	Torgersen	39.1	18.7	181	3750	MALE
Adelie	Torgersen	39.5	17.4	186	3800	FEMALE
Adelie	Torgersen	40.3	18.0	195	3250	FEMALE
Adelie	Torgersen	None	None	None	None	None
Adelie	Torgersen	36.7	19.3	193	3450	FEMALE

Plotting#

To utilize JupySQL ggplot API, it is crucial to have valid data, so let’s remove null values.

%%sql --save no_nulls --no-execute
SELECT *
FROM penguins
WHERE body_mass_g IS NOT NULL and
sex IS NOT NULL

*  "<connection object at 0x000002DD9FE81CF0; dsn: 'user=admin password=xxx dbname=qdb host=127.0.0.1 port=8812', closed: 0>"
Skipping execution...

from sql.ggplot import ggplot, aes, geom_histogram

(
    ggplot(
        table="no_nulls",
        with_="no_nulls",
        mapping=aes(x=["bill_length_mm", "bill_depth_mm"]),
    )
    + geom_histogram(bins=50)
)

<sql.ggplot.ggplot.ggplot at 0x2dda0364910>

../_images/97db6bd7996a155d57f3886c09bbbce751be1ff2d278742a7f03b731674f26a8.png

Clean up#

! docker container ls

CONTAINER ID   IMAGE                 COMMAND                  CREATED          STATUS         PORTS                                                                                            NAMES
0322c4136994   questdb/questdb:7.1   "/docker-entrypoint.…"   10 seconds ago   Up 9 seconds   0.0.0.0:8812->8812/tcp, 0.0.0.0:9000->9000/tcp, 0.0.0.0:9003->9003/tcp, 0.0.0.0:9009->9009/tcp   questdb_

%%capture out
! docker ps -a -q --filter="name=questdb" --quiet

container_id = out.stdout.strip()
print(f"Container id: {container_id}")

Container id: 0322c4136994

! docker container stop {container_id}

0322c4136994

! docker container rm {container_id}

0322c4136994

! docker container ls

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES