Back to articles list
- 11 minutes read

Using Python and MySQL in the ETL Process: SQLAlchemy

SQLAlchemy helps you work with databases in Python. In this post, we tell you everything you need to know to get started with this module.

In the previous article, we talked about how to use Python in the ETL process. We focused on getting the job done by executing stored procedures and SQL queries. In this article and the next, we’ll use a different approach. Instead of writing SQL code, we’ll use the SQLAlchemy toolkit. You can also use this article separately, as a quick introduction on installing and using SQLAlchemy.

Ready? Let’s begin.

What Is SQLAlchemy?

Python is well known for its number and variety of modules. These modules reduce our coding time significantly because they implement routines needed to achieve a specific task. A number of modules that work with data are available, including SQLAlchemy.

To describe SQLAlchemy, I’ll use a quote from SQLAlchemy.org:

SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.

It provides a full suite of well known enterprise-level persistence patterns, designed for efficient and high-performing database access, adapted into a simple and Pythonic domain language.

The most important part here is the bit about the ORM (object-relational mapper), which helps us treat database objects as Python objects rather than lists.

Before we go any further with SQLAlchemy, let’s pause and talk about ORMs.

The Pros and Cons of Using ORMs

Compared to raw SQL, ORMs have their pros and cons – and most of these apply to SQLAlchemy as well.

The Good Stuff:

  • Code portability. The ORM takes care of syntactic differences between databases.
  • Only one language is needed to handle your database. Although, to be honest, this shouldn’t be the main motivation to use an ORM.
  • ORMs simplify your code, e.g. they take care of relationships and treat them like objects, which is great if you’re used to OOP.
  • You can manipulate your data inside the program.

Unfortunately, everything comes with a price. The Not-So-Good Stuff about ORMs:

  • In some cases, an ORM could be slow.
  • Writing complex queries could become even more complicated, or could result in slow queries. But this is not the case when using SQLAlchemy.
  • If you know your DBMS well, then it’s a waste of time to learn how to write the same stuff in an ORM.

Now that we’ve handled that topic, let’s get back to SQLAlchemy.

Before we start...

… let’s remind ourselves of the goal of this article. If you’re just interested in installing SQLAlchemy and need a quick tutorial on how to perform simple commands, this article will do that. However, the commands presented in this article will be used in the next article to perform the ETL process and replace the SQL (stored procedures) and Python code we presented in previous articles.

Okay, now let’s begin right at the beginning: with installing SQLAlchemy.

Installing SQLAlchemy

1. Check If the Module Is Already Installed

To use a Python module, you have to install it (that is, if it wasn’t previously installed). One way to check which modules have been installed is using this command in Python Shell:

help('modules')

To check if a specific module is installed, simply try importing it. Use these commands:

import sqlalchemy
sqlalchemy.__version__

If SQLAlchemy is already installed, then the first line will execute successfully. import is a standard Python command used to import modules. If the module isn’t installed, Python will throw an error – actually a list of errors, in red text – that you can’t miss :)

The second command returns the current version of SQLAlchemy. The result returned is pictured below:

SQLAlchemy version

We’ll need another module, too, and that is PyMySQL. This is a pure-Python lightweight MySQL client library. This module supports everything we need to work with a MySQL database, from running simple queries to more complex database actions. We can check if it exists using help('modules'), as previously described, or using the following two statements:

import pymysql
pymysql.__version__

Of course, these are the same commands we used to test if SQLAlchemy was installed.

What If SQLAlchemy or PyMySQL Isn’t Already Installed?

Importing previously installed modules is not hard. But what if the modules you need are not already installed?

Some modules have an installation package, but mostly you’ll use the pip command to install them. PIP is a Python tool used to install and uninstall modules. The easiest way to install a module (in Windows OS) is:

  1. Use Command Prompt -> Run -> cmd.
  2. Position to the Python directory cd C:\...\Python\Python37\Scripts.
  3. Run the command pip install (in our case, we’ll run pip install pyMySQL and pip install sqlAlchemy.

PIP can also be used to uninstall the existing module. To do that, you should use pip uninstall .

2. Connecting to the Database

While installing everything necessary to use SQLAlchemy is essential, it’s not very interesting. Nor is it really part of what we’re interested in. We haven’t even connected to the databases we want to use. We’ll solve that now:

import sqlalchemy
from sqlalchemy.engine import create_engine
engine_live = sqlalchemy.create_engine('mysql+pymysql://:@localhost:3306/subscription_live')
connection_live = engine_live.connect()
print(engine_live.table_names())

Using the script above, we’ll establish a connection to the database located on our local server, the subscription_live database.

(Note: Replace : with your actual username and password.)

Let’s go through the script, command by command.

import sqlalchemy
from sqlalchemy.engine import create_engine

These two lines import our module and the create_engine function.

Next, we’ll establish a connection to the database located on our server.

engine_live = sqlalchemy.create_engine('mysql+pymysql:// :@localhost:3306/subscription_live')
connection_live = engine_live.connect()

The create_engine function creates the engine, and using .connect(), connects to the database. The create_engine function uses these parameters:

dialect+driver://username:password@host:port/database

In our case, the dialect is mysql, the driver is pymysql (previously installed) and the remaining variables are specific for the server and database(s) we want to connect to.

(Note: If you’re connecting locally, use localhost instead of your “local” IP address, 127.0.0.1 and the appropriate port :3306.)

The result of the command print(engine_live.table_names())is shown in the picture above. As expected, we got the list of all the tables from our operational/live database.

3. Running SQL Commands Using SQLAlchemy

In this section, we’ll analyze the most important SQL commands, examine table structure, and perform all four DML commands: SELECT, INSERT, UPDATE, and DELETE.

We’ll discuss the statements used in this script separately. Please note that we’ve already gone through the connection part of this script and we’ve already listed table names. There are minor changes in this line:

from sqlalchemy import create_engine, select, MetaData, Table, asc

We’ve just imported everything we’ll use from SQLAlchemy.

Tables and Structure

We’ll run the script by typing the following command in the Python Shell:

import os
file_path = 'D://python_scripts'
os.chdir(file_path)
exec(open("queries.py").read())

The result is the executed script. Now let’s analyze the rest of the script.

SQLAlchemy imports information related to tables, structure, and relations. To work with that info, it could be useful to check the list of tables (and their columns) in the database:

#print connected tables
print("\n -- Tables from _live database -- ")
print (engine_live.table_names())

This simply returns a list of all tables from the connected database.

Note: The table_names() method returns a list of table names for the given engine. You can print the whole list or iterate through it using a loop (as you could do with any other list).

Next, we’ll return a list of all attributes from the selected table. The relevant part of the script and the result are shown below:

#SELECT
metadata = MetaData(bind=None)
table_city = Table('city', metadata, autoload = True, autoload_with = engine_live)

# print table columns
print("\n -- Tables columns for table 'city' --")
for column in table_city.c:
  print(column.name)

table city

You can see that I’ve used for to loop through the result set. We could replace table_city.c with table_city.columns.

Note: The process of loading the database description and creating metadata in SQLAlchemy is called reflection.

Note: MetaData is the object that keeps information about objects in the database, so tables in the database are also linked to this object. In general, this object stores info about what the database schema looks like. You’ll use it as a single point of contact when you want to make changes to or get facts about the DB schema.

Note: The attributes autoload = True and autoload_with = engine_live should be used to ensure that table attributes will be uploaded (if they already haven’t been).

SELECT

I don’t think I need to explain how important the SELECT statement is :) So, let’s just say that you can use SQLAlchemy to write SELECT statements. If you’re used to MySQL syntax, it will take some time to adapt; still, everything is pretty logical. To put it as simply as possible, I would say that the SELECT statement is sliced up and some parts are omitted, but everything is still in the same order.

Let’s try a few SELECT statements now.

# simple select
print("\n -- SIMPLE SELECT -- ")
stmt = select([table_city])
print(stmt)
print(connection_live.execute(stmt).fetchall())
# loop through results
results = connection_live.execute(stmt).fetchall()
for result in results:
  print(result)

The first one is a simple SELECT statement returning all values from the given table. The syntax of this statement is very simple: I’ve placed the name of the table in the select(). Please notice that I’ve:

  • Prepared the statement - stmt = select([table_city].
  • Printed the statement using print(stmt), which gives us a good idea about the statement that’s just executed. This could also be used for debugging.
  • Printed the result with print(connection_live.execute(stmt).fetchall()).
  • Looped through the result and printed each single record.

Note: Because we also loaded primary and foreign key constraints into SQLAlchemy, the SELECT statement takes a list of table objects as arguments and automatically establishes relationships where needed.

The result is shown in the picture below:

Simple select

Python will fetch all attributes from the table and store them in the object. As shown, we can use this object to perform additional operations. The final result of our statement is a list of all cities from the city table.

Now, we’re ready for a more complex query. I’ve just added an ORDER BY clause.

# simple select
# simple select, using order by
print("\n -- SIMPLE SELECT, USING ORDER BY")
stmt = select([table_city]).order_by(asc(table_city.columns.id))
print(stmt)
print(connection_live.execute(stmt).fetchall())

Order by

Note: The asc() method performs ascending sorting against the parent object, using defined columns as parameters.

The returned list is the same, but now it is sorted by the id value, in ascending order. It’s important to note that we’ve simply added .order_by() to the previous SELECT query. The .order_by(...) method allows us to change the order of the result set returned, in the same manner as we would use in a SQL query. Therefore, parameters should follow SQL logic, using column names or column order and ASC or DESC.

Next, we’ll add WHERE to our SELECT statement.

# select with WHERE
print("\n -- SELECT WITH WHERE --")
stmt = select([table_city]).where(table_city.columns.city_name == 'London')
print(stmt)
print(connection_live.execute(stmt).fetchall())

WHERE

Note: The .where() method is used to test a condition we’ve used as an argument. We could also use the .filter() method, which is better at filtering more complex conditions.

Once more, the .where part is simply concatenated to our SELECT statement. Notice that we’ve put the condition inside the brackets. Whatever condition is in the brackets is tested in the same manner as it would be tested in the WHERE part of a SELECT statement. The equality condition is tested using == instead of =.

The last thing we’ll try with SELECT is joining two tables. Let’s take a look at the code and its result first.

# select with JOIN
print("\n -- SELECT WITH JOIN --")
table_country = Table('country', metadata, autoload = True, autoload_with = engine_live)
stmt = select([table_city.columns.city_name, table_country.columns.country_name]).select_from(table_city.join(table_country))
print(stmt)
print(connection_live.execute(stmt).fetchall())

select with JOIN

There are two important parts in the above statement:

  • select([table_city.columns.city_name, table_country.columns.country_name]) defines which columns will be returned in our result.
  • .select_from(table_city.join(table_country)) defines the join condition/table. Notice that we didn’t have to write down the full join condition, including the keys. This is because SQLAlchemy “knows” how these two tables are joined, as primary keys and foreign keys rules are imported in the background.
INSERT / UPDATE / DELETE

These are the three remaining DML commands we’ll cover in this article. While their structure can get very complex, these commands are usually much simpler. The code used is presented below.

# INSERT
print("\n -- INSERT --")
stmt = table_country.insert().values(country_name='USA')
print(stmt)
connection_live.execute(stmt)
# check & print changes
stmt = select([table_country]).order_by(asc(table_country.columns.id))
print(connection_live.execute(stmt).fetchall())


# UPDATE
print("\n -- UPDATE --")
stmt = table_country.update().where(table_country.columns.country_name == 'USA').values(country_name = 'United States of America')
print(stmt)
connection_live.execute(stmt)
# check & print changes
stmt = select([table_country]).order_by(asc(table_country.columns.id))
print(connection_live.execute(stmt).fetchall())

# DELETE
print("\n -- DELETE --")
stmt = table_country.delete().where(table_country.columns.country_name == 'United States of America')
print(stmt)
connection_live.execute(stmt)
# check & print changes
stmt = select([table_country]).order_by(asc(table_country.columns.id))
print(connection_live.execute(stmt).fetchall())

The same pattern is used for all three statements: preparing the statement, printing and executing it, and printing the result after each statement so we can see what actually happened in the database. Notice once more that parts of the statement were treated as objects (.values(), .where()).

INSERT UPDARE DELETE

We’ll use this knowledge in the upcoming article to build an entire ETL script using SQLAlchemy.

Next Up: SQLAlchemy in the ETL Process

Today we’ve analyzed how to set up SQLAlchemy and how to perform simple DML commands. In the next article, we’ll use this knowledge to write the complete ETL process using SQLAlchemy.

You can download the complete script, used in this article here.

go to top