MariaDB#
In this tutorial, we’ll see how to query MariaDB from Jupyter. Optionally, you can spin up a testing server.
Tip
If you encounter issues, feel free to join our community and we’ll be happy to help!
Installing the MariaDB driver#
To run this tutorial, you need to install the mysqlclient
package.
Note
We highly recommend you that you install it using conda
, since it’ll also install mysql-connector-c
; if you want to use pip
, then you need to install mysql-connector-c
and then mysqlclient
.
%conda install mysqlclient -c conda-forge --quiet
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done
## Package Plan ##
environment location: /Users/eduardo/miniconda3/envs/jupysql
added / updated specs:
- mysqlclient
The following NEW packages will be INSTALLED:
mysql-connector-c pkgs/main/osx-arm64::mysql-connector-c-6.1.11-h4a942e0_1
mysqlclient pkgs/main/osx-arm64::mysqlclient-2.0.3-py39hc377ac9_1
The following packages will be SUPERSEDED by a higher-priority channel:
ca-certificates pkgs/main::ca-certificates-2023.01.10~ --> conda-forge::ca-certificates-2022.12.7-h4653dfc_0
certifi pkgs/main/osx-arm64::certifi-2022.12.~ --> conda-forge/noarch::certifi-2022.12.7-pyhd8ed1ab_0
openssl pkgs/main::openssl-1.1.1t-h1a28f6b_0 --> conda-forge::openssl-1.1.1t-h03a7124_0
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Note: you may need to restart the kernel to use updated packages.
Starting a MariaDB server with Docker#
If you don’t have a MariaDB Server running or you want to spin up one for testing, you can do it with the official Docker image.
To start the server:
%%bash
docker run --detach --name mariadb \
--env MARIADB_USER=user \
--env MARIADB_PASSWORD=password \
--env MARIADB_ROOT_PASSWORD=password \
--env MARIADB_DATABASE=db \
-p 3306:3306 mariadb:latest
e89cb1c6d615c6e36008f7b17e7d1a10aa8a7d4a534c57e24b5b114f391fd690
Ensure that the container is running:
%%bash
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e89cb1c6d615 mariadb:latest "docker-entrypoint.s…" 2 seconds ago Up 1 second 0.0.0.0:3306->3306/tcp mariadb
Load sample data#
Now, let’s fetch some sample data. We’ll be using the NYC taxi dataset:
%pip install pandas pyarrow --quiet
import pandas as pd
df = pd.read_parquet(
"https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet"
)
df.shape
(1369769, 19)
As you can see, this chunk of data contains ~1.4M rows, loading the data will take about a minute:
from sqlalchemy import create_engine
engine = create_engine("mysql+mysqldb://user:password@127.0.0.1:3306/db")
df.to_sql(name="taxi", con=engine, chunksize=100_000)
engine.dispose()
Query#
Now, let’s install JupySQL, authenticate and start querying the data!
%pip install jupysql --quiet
%load_ext sql
%sql mysql+mysqldb://user:password@127.0.0.1:3306/db
Note: you may need to restart the kernel to use updated packages.
The sql extension is already loaded. To reload it, use:
%reload_ext sql
Important
If the cell above fails, you might have some missing packages. Message us on Slack and we’ll help you!
%%sql
SELECT COUNT(*) FROM taxi
* mysql+mysqldb://user:***@127.0.0.1:3306/db
1 rows affected.
COUNT(*) |
---|
1369769 |
%%sql
SELECT * FROM taxi
LIMIT 3
* mysql+mysqldb://user:***@127.0.0.1:3306/db
3 rows affected.
index | VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | airport_fee |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2021-01-01 00:30:10 | 2021-01-01 00:36:12 | 1.0 | 2.1 | 1.0 | N | 142 | 43 | 2 | 8.0 | 3.0 | 0.5 | 0.0 | 0.0 | 0.3 | 11.8 | 2.5 | None |
1 | 1 | 2021-01-01 00:51:20 | 2021-01-01 00:52:19 | 1.0 | 0.2 | 1.0 | N | 238 | 151 | 2 | 3.0 | 0.5 | 0.5 | 0.0 | 0.0 | 0.3 | 4.3 | 0.0 | None |
2 | 1 | 2021-01-01 00:43:30 | 2021-01-01 01:11:06 | 1.0 | 14.7 | 1.0 | N | 132 | 165 | 1 | 42.0 | 0.5 | 0.5 | 8.65 | 0.0 | 0.3 | 51.95 | 0.0 | None |
Clean up#
To stop and remove the container:
%%bash
docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e89cb1c6d615 mariadb:latest "docker-entrypoint.s…" 8 minutes ago Up 8 minutes 0.0.0.0:3306->3306/tcp mariadb
%%bash
docker container stop mariadb
mariadb
%%bash
docker container rm mariadb
mariadb
%%bash
docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES