MySQL#

In this tutorial, we’ll see how to query MySQL from Jupyter. Optionally, you can spin up a testing server.

Tip

If you encounter issues, feel free to join our community and we’ll be happy to help!

Installing the MySQL driver#

To run this tutorial, you need to install the mysqlclient package.

Note

We highly recommend you that you install it using conda, since it’ll also install mysql-connector-c; if you want to use pip, then you need to install mysql-connector-c and then mysqlclient.

%conda install mysqlclient -c conda-forge --quiet
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /Users/eduardo/miniconda3/envs/jupysql

  added / updated specs:
    - mysqlclient


The following NEW packages will be INSTALLED:

  mysql-connector-c  pkgs/main/osx-arm64::mysql-connector-c-6.1.11-h4a942e0_1
  mysqlclient        pkgs/main/osx-arm64::mysqlclient-2.0.3-py39hc377ac9_1

The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates    pkgs/main::ca-certificates-2023.01.10~ --> conda-forge::ca-certificates-2022.12.7-h4653dfc_0
  certifi            pkgs/main/osx-arm64::certifi-2022.12.~ --> conda-forge/noarch::certifi-2022.12.7-pyhd8ed1ab_0
  openssl              pkgs/main::openssl-1.1.1t-h1a28f6b_0 --> conda-forge::openssl-1.1.1t-h03a7124_0


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done

Note: you may need to restart the kernel to use updated packages.

Starting a MySQL server with Docker#

If you don’t have a MySQL Server running or you want to spin up one for testing, you can do it with the official Docker image.

To start the server:

%%bash
docker run --name mysql -e MYSQL_DATABASE=db \
  -e MYSQL_USER=user \
  -e MYSQL_PASSWORD=password \
  -e MYSQL_ROOT_PASSWORD=password \
  -p 3306:3306 -d mysql
7c8acff152e0f2eac2a5c954ca01f1ddc99419d2e342560d94c6955c036d7d93

Ensure that the container is running:

%%bash
docker ps
CONTAINER ID   IMAGE     COMMAND                  CREATED         STATUS         PORTS                               NAMES
7c8acff152e0   mysql     "docker-entrypoint.s…"   4 seconds ago   Up 4 seconds   0.0.0.0:3306->3306/tcp, 33060/tcp   mysql

We need to make a small configuration change, so do the following:

Open a new terminal and execute: docker exec -it mysql bash

Then: mysql --user=root --password

When prompted for a password, type: password

Once the MySQL console appears, execute:

ALTER USER user
IDENTIFIED WITH mysql_native_password
BY 'password';

Exit the MySQL console with: exit Exit the container with: exit

The session should look like this:

docker exec -it mysql bash

bash-4.4# mysql --user=root --password
Enter password:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 8.0.31 MySQL Community Server - GPL

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> ALTER USER user
    -> IDENTIFIED WITH mysql_native_password
    -> BY 'password';
Query OK, 0 rows affected (0.01 sec)

mysql> exit
Bye
bash-4.4# exit
exit

Load sample data#

Now, let’s fetch some sample data. We’ll be using the NYC taxi dataset:

%pip install pandas pyarrow --quiet
import pandas as pd

df = pd.read_parquet(
    "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet"
)
df.shape
(1369769, 19)

As you can see, this chunk of data contains ~1.4M rows, loading the data will take about a minute:

from sqlalchemy import create_engine

engine = create_engine("mysql+mysqldb://user:password@127.0.0.1:3306/db")
df.to_sql(name="taxi", con=engine, chunksize=100_000)
engine.dispose()

Query#

Now, let’s install JupySQL, authenticate and start querying the data!

%pip install jupysql --quiet
%load_ext sql
%sql mysql+mysqldb://user:password@127.0.0.1:3306/db
Note: you may need to restart the kernel to use updated packages.
The sql extension is already loaded. To reload it, use:
  %reload_ext sql

Important

If the cell above fails, you might have some missing packages. Message us on Slack and we’ll help you!

%%sql
SELECT COUNT(*) FROM taxi
*  mysql+mysqldb://user:***@127.0.0.1:3306/db
1 rows affected.
COUNT(*)
1369769
%%sql
SELECT * FROM taxi
LIMIT 3
*  mysql+mysqldb://user:***@127.0.0.1:3306/db
3 rows affected.
index VendorID tpep_pickup_datetime tpep_dropoff_datetime passenger_count trip_distance RatecodeID store_and_fwd_flag PULocationID DOLocationID payment_type fare_amount extra mta_tax tip_amount tolls_amount improvement_surcharge total_amount congestion_surcharge airport_fee
0 1 2021-01-01 00:30:10 2021-01-01 00:36:12 1.0 2.1 1.0 N 142 43 2 8.0 3.0 0.5 0.0 0.0 0.3 11.8 2.5 None
1 1 2021-01-01 00:51:20 2021-01-01 00:52:19 1.0 0.2 1.0 N 238 151 2 3.0 0.5 0.5 0.0 0.0 0.3 4.3 0.0 None
2 1 2021-01-01 00:43:30 2021-01-01 01:11:06 1.0 14.7 1.0 N 132 165 1 42.0 0.5 0.5 8.65 0.0 0.3 51.95 0.0 None

Clean up#

To stop and remove the container:

%%bash
docker container ls
CONTAINER ID   IMAGE     COMMAND                  CREATED         STATUS         PORTS                               NAMES
7c8acff152e0   mysql     "docker-entrypoint.s…"   9 minutes ago   Up 9 minutes   0.0.0.0:3306->3306/tcp, 33060/tcp   mysql
%%bash
docker container stop mysql
mysql
%%bash
docker container rm mysql
mysql
%%bash
docker container ls
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES