Find out how to keep track of data model changes in Vertabelo using Git as a source control management tool.
In this article we will discuss a possible way to use source control management tools like GIT to track all changes in your Vertabelo data model. We are also going to detail the code-database workflow that we use daily to develop Vertabelo itself as well as our other applications.
Why Use a Source Control Tool?
The Vertabelo online data modeler contains two main model types: physical data models and logical data models. (To learn more, read What Are Conceptual, Logical, and Physical Data Models?.) These data model types are the roots of four output documents available in Vertabelo:
- Physical data models stored as XML files.
- Logical data models stored as XML files.
- SQL DDL files generated from a physical data model.
- SQL migration scripts between different versions of a data model.
Since these documents are also text files, they can be easily added to version control systems like Git. Storing documents in an external repository in addition to Vertabelo can improve the quality of your work in many ways.
The Benefits of Storing a Data Model Outside Vertabelo
Let’s explain why it is worthwhile to store data models outside Vertabelo. Vertabelo guarantees that:
- The most recent version of a document is stored.
- All tagged versions are stored. (How to set a tag is described here in How to name a version of the model.)
- At least one version each month is stored (if a change occurred).
This means that not every change is always tracked. Moreover, sometimes teams delete the model when the project ends and models are lost. Either way, Git is a good source from which data models can be restored and reused.
Source Code Repositories Provide Full Contextual Information
The data model is most likely part of a more complex software solution; thus, it gains full meaning when you can see it in context. A new table in the data model most likely will be related to new source code written in a programming language – especially when our XML is combined with source generators like ORM software. (An ORM is a library that makes it possible to work with a database in an object-oriented way. You can read more about ORMs in ORMs Under the Hood.)
Source Code Repositories Track History and Allow for Rollback
Git is a good tool to track software changes for at least two reasons:
- Changes can be reverted. Git allows you to make a precise cut without losing too much.
- We can more easily add the same tables/views in other projects based on the precise diff available in Git.
Source Code Repositories Make Migration and Reproduction Easier
Software development is a continuous process that requires migrating data between versions. Storing all changes allows us to easily generate migration scripts. How to do this using models stored in the version repository will be explained later on.
Source code based on files stored in Git is more likely to be reproducible, which is important when we operate on many development branches. Git makes it possible to:
- Go back to a specific version.
- Have many feature branches.
- Automate rollouts to production environments.
Vertabelo Model Management with Git
For the purpose of this article, let's assume that we have a Git repository. (Note: All examples in this article were developed in Linux.)
cd ~/work mkdir version-control git init echo “Version Control example” >> README.md git add README.md git commit -m "Initial commit"
In the Git tree, it looks like this:
Example Project Structure
How database-related code is stored varies between technologies and companies. A project structure may look like this:
➜ version-control git:(master) tree ├── build ├── database │ ├── create.sql │ ├── dictionary-data │ │ └── dictionary-init.sql │ ├── initial-data │ │ └── initial-data.sql │ ├── model.xml │ ├── test-data │ │ └── test-data.sql │ └── updates │ └── from-0.0.1-to-0.0.2.sql ├── README.md ├── src ├── update-db.sh └── update.sh 7 directories, 9 files
Our current model contains one table, client
:
The src
and build directories don’t contain files related to the data model and are not crucial for this article. (The src
directory contains application code, while the build
directory contains final artifacts, i.e. executable and resource files.)
The database
directory contains everything related to the database, including the:
- SQL definition file (
create.sql
) needed to create a database. - Model as XML file (
model.xml
) needed to generate source code for the ORM. An example of ORM generators based on Vertabelo XML is published on the Vertabelo SqlAlchemy GitHub. The structure of Vertabelo’s XML schema is described in Vertabelo XML Version 2.3. - All database migration scripts between production versions, which are stored in the updates catalog; when a product has a linear development process, it’s a good practice to store all migration scripts between versions.
- Initial and dictionary data required to run an application.
The update-db.sh
bash script automatically updates the DB files model.xml
and create.sql
from the current Vertabelo version. This will be discussed in the next section.
Recently, OS-level virtualization software like Docker has become the industry standard. Stored SQL files can be easily combined with Docker database images to quickly set up a clear, ready-to-use database.
Update Script
Export to XML is the simplest way to download the model
We can download the model manually using Vertabelo’s Export to XML and Generate SQL options. However, a more automated method is preferred for daily use. A good enhancement is to make use of our Vertabelo Public API and write a simple script that downloads everything.
Here is an example in Linux’s bash update-db.sh
:
#!/bin/bash curl -u "$VERTABELO_API_TOKEN:" -H "Accept: text/sql" https://my.vertabelo.com/model/$MODEL_ID > database/create.sql curl -u "$VERTABELO_API_TOKEN:" -H "Accept: text/xml" https://my.vertabelo.com/model/$MODEL_ID > database/model.xml
Commit and Tag Changes in Git
Let's suppose we have added the phone column to the client table:
The commit algorithm is very simple and has three steps:
1. Download the model as an XML file, a DDL file, or both. The script that updates database files looks like this:
➜ version-control git:(master) ./update-db.sh (curl output ...) database README.md src update-db.sh update.sh ➜ version-control git:(master) ✗ git status On branch master Changes not staged for commit: (use "git add..." to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: database/create.sql modified: database/model.xml
2. Commit the changes. Add database files and use git commit
:
git add database/create.sql database/model.xml git commit -m “Added phone column to client table”
3. Tag the changes.
git tag -m “Version 0.0.1” v0.0.1
The Git tree visualization for this example looks like this:
Generate a Migration Script from Data Models Stored in Git
The recommended solution is to tag your models inside Vertabelo, as described in How to Generate a Migration Script in Vertabelo, and then generate migration scripts directly in Vertabelo. However, generating a migration script from models restored from Git is possible. Here’s how to do it in just a few steps:
1. First, you need to check out on the source version taken to migration:
git checkout v0.0.1
2. Upload the model from disc:
3. Checkout on the target version taken to migration:
git checkout v0.0.2
4. Override the existing model with this version:
a. Choose the Import from XML option.
b. Upload the XML from the disc. Be sure to select the Overwrite current model option.
5. Go to the document tree and right-click on the migration document. Next, click on the Model migration option.
6. Choose First and last version and click Generate.
Better Version Control with Git and Vertabelo
Version control systems like Git increase the quality of our work with Vertabelo models. Vertabelo’s Public API makes model updates and commits inside Git very smooth. When you also use our XML to generate source code, combining it with Git and our public API is a must-have.