MariaDB#

In this tutorial, we’ll see how to query MariaDB from Jupyter. Optionally, you can spin up a testing server.

Tip

If you encounter issues, feel free to join our community and we’ll be happy to help!

Installing the MariaDB driver#

To run this tutorial, you need to install the mysqlclient package.

Note

We highly recommend you that you install it using conda, since it’ll also install mysql-connector-c; if you want to use pip, then you need to install mysql-connector-c and then mysqlclient.

%conda install mysqlclient -c conda-forge --quiet
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /Users/eduardo/miniconda3/envs/jupysql

  added / updated specs:
    - mysqlclient


The following NEW packages will be INSTALLED:

  mysql-connector-c  pkgs/main/osx-arm64::mysql-connector-c-6.1.11-h4a942e0_1
  mysqlclient        pkgs/main/osx-arm64::mysqlclient-2.0.3-py39hc377ac9_1

The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates    pkgs/main::ca-certificates-2023.01.10~ --> conda-forge::ca-certificates-2022.12.7-h4653dfc_0
  certifi            pkgs/main/osx-arm64::certifi-2022.12.~ --> conda-forge/noarch::certifi-2022.12.7-pyhd8ed1ab_0
  openssl              pkgs/main::openssl-1.1.1t-h1a28f6b_0 --> conda-forge::openssl-1.1.1t-h03a7124_0


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done

Note: you may need to restart the kernel to use updated packages.

Starting a MariaDB server with Docker#

If you don’t have a MariaDB Server running or you want to spin up one for testing, you can do it with the official Docker image.

To start the server:

%%bash
docker run --detach --name mariadb \
    --env MARIADB_USER=user \
    --env MARIADB_PASSWORD=password \
    --env MARIADB_ROOT_PASSWORD=password \
    --env MARIADB_DATABASE=db \
    -p 3306:3306 mariadb:latest
e89cb1c6d615c6e36008f7b17e7d1a10aa8a7d4a534c57e24b5b114f391fd690

Ensure that the container is running:

%%bash
docker ps
CONTAINER ID   IMAGE            COMMAND                  CREATED         STATUS        PORTS                    NAMES
e89cb1c6d615   mariadb:latest   "docker-entrypoint.s…"   2 seconds ago   Up 1 second   0.0.0.0:3306->3306/tcp   mariadb

Load sample data#

Now, let’s fetch some sample data. We’ll be using the NYC taxi dataset:

%pip install pandas pyarrow --quiet
import pandas as pd

df = pd.read_parquet(
    "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet"
)
df.shape
(1369769, 19)

As you can see, this chunk of data contains ~1.4M rows, loading the data will take about a minute:

from sqlalchemy import create_engine

engine = create_engine("mysql+mysqldb://user:password@127.0.0.1:3306/db")
df.to_sql(name="taxi", con=engine, chunksize=100_000)
engine.dispose()

Query#

Now, let’s install JupySQL, authenticate and start querying the data!

%pip install jupysql --quiet
%load_ext sql
%sql mysql+mysqldb://user:password@127.0.0.1:3306/db
Note: you may need to restart the kernel to use updated packages.
The sql extension is already loaded. To reload it, use:
  %reload_ext sql

Important

If the cell above fails, you might have some missing packages. Message us on Slack and we’ll help you!

%%sql
SELECT COUNT(*) FROM taxi
*  mysql+mysqldb://user:***@127.0.0.1:3306/db
1 rows affected.
COUNT(*)
1369769
%%sql
SELECT * FROM taxi
LIMIT 3
*  mysql+mysqldb://user:***@127.0.0.1:3306/db
3 rows affected.
index VendorID tpep_pickup_datetime tpep_dropoff_datetime passenger_count trip_distance RatecodeID store_and_fwd_flag PULocationID DOLocationID payment_type fare_amount extra mta_tax tip_amount tolls_amount improvement_surcharge total_amount congestion_surcharge airport_fee
0 1 2021-01-01 00:30:10 2021-01-01 00:36:12 1.0 2.1 1.0 N 142 43 2 8.0 3.0 0.5 0.0 0.0 0.3 11.8 2.5 None
1 1 2021-01-01 00:51:20 2021-01-01 00:52:19 1.0 0.2 1.0 N 238 151 2 3.0 0.5 0.5 0.0 0.0 0.3 4.3 0.0 None
2 1 2021-01-01 00:43:30 2021-01-01 01:11:06 1.0 14.7 1.0 N 132 165 1 42.0 0.5 0.5 8.65 0.0 0.3 51.95 0.0 None

Clean up#

To stop and remove the container:

%%bash
docker container ls
CONTAINER ID   IMAGE            COMMAND                  CREATED         STATUS         PORTS                    NAMES
e89cb1c6d615   mariadb:latest   "docker-entrypoint.s…"   8 minutes ago   Up 8 minutes   0.0.0.0:3306->3306/tcp   mariadb
%%bash
docker container stop mariadb
mariadb
%%bash
docker container rm mariadb
mariadb
%%bash
docker container ls
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES