Do you need to manage an existing database, but without a proper model of the entities and relationships? We're here to help. This article will explain the why, what, and how of modeling an existing database.
Typically, in IT, we think that "legacy" is bad. In actual fact, legacy is often a way of life. Most organizations are not working with entirely new "greenfield" development. (Except in the world of startups.) In most jobs and companies, you will find that someone has come before you and already done some work creating a solution. In the case of data, this means that there is an existing database, possibly one that is already in production and has been used for years, maybe even decades. Even if the database is recent, you might not have been involved in its creation. You might not even know about its history and lifecycle.
Why Working with Legacy Databases Is Tough
Working with any existing system has its own special challenges.
First, you may not have any documentation about the database. If a database design, model, or an entity-relationship diagram (ERD) was created before the database was created physically, that design is probably lost in the mists of time.
If you can get your hands on the design or ER diagram, I have more bad news. There are no guarantees that the ERD has been updated every time a change was made to the physical database. It’s more likely the ER diagram was not updated when changes were made, as physical database changes are sometimes required in emergency situations (e.g. to fix bugs) and when time is of the essence. The mindset is "We don't have time to go back and change the model – our customers are relying on us to fix this bug today." Unfortunately, this is the kind of statement that customer support managers say regularly.
Finally, if the database is already in usage, there may be lots of data stored in it as well as systems using that data. Making changes to a legacy database running with a legacy system is particularly challenging, as the connections may be quite prone to breakage. It's rarely easy to make changes to an existing database. And it can be impossible to migrate an application to a new database.
So, we are stuck with working with legacy databases. You can expect to see many during your IT career, regardless of where you work. Let's make the best of it and identify strategies that address the challenges of working with legacy databases.
How to Successfully Work with Legacy Databases
1. Get a Database Model
As mentioned previously, an existing database has a life of its own and is unlikely to match any database model from which it was originally created.
Data models make it easier to include business concepts and processes with the associated rules. Also, data models are easier to change than physical databases. With a physical database, it's tempting to push back business needs because it's hard to change the physical database and still ensure that everything is still working.
Data models are a good way to communicate between business and development. They are easier to understand and review. Data models provide an abstraction from the physical database that is simpler for business users to provide feedback on. Finally, data models provide you a tool to look at things from a business analyst’s perspective.
Certainly, it's tempting to think "I can easily just add a new column here and a new table there", but keep reading to understand why a data model will come in handy, especially when interacting with business users.
Having a diagram will show the entities and relationships you’re representing, it’s easier to visualize the impact of a change in one part of your model on other parts of the model. This would be more challenging if you were working directly with the physical database definitions. Both business and technical people can understand a data model. It provides a common language to communicate about database design.
An ER diagram is indispensable to understand the data being stored. Of course, it will not solve all our problems. Think of it as a guide to the information stored in a database.
Wait, You Told Me I Won't Find an Accurate Model!
True. In all likelihood, any model or ERD that you may find will not accurately represent the true physical database with complete accuracy.
Tip #1: Create your own 100% accurate model of the database. But don't do it manually; reverse engineer it. Extract the definition of the database and use a modelling tool to create a physical data model.
2. Extract Definitions from the Physical Database
Tip #2: Ensure that the definitions that you are using in your model are 100% accurate and up to date. I would recommend not trusting any Data Definition Language (DDL) scripts you are given. Go straight to the source: the physical database.
If you're not sure how to do this, I refer you to this fine article about reverse engineering with Vertabelo that will extract the definitions into XML representation. This representation can be imported directly into Vertabelo.
You can create a physical database model from DDL scripts for the physical database, but those require different tools to extract from a physical database. If you are going to use DDL scripts, then I recommend reading about how to import DDL scripts into Vertabelo (the article refers to SuiteCRM, but the techniques can be applied to any database for which you have the DDL scripts). You’ll also need to find the tools to extract DDL scripts from your particular physical database management system.
Whichever route you choose, the goal is to have an accurate model of the physical database in its current state.
Tip #3: Get organized. Working with legacy databases often means working with large databases that have grown over time. We can visually group tables with subject areas. You can see how useful that can be in the second part of this article on creating a physical model from DDL scripts, including information about settings that you should disable to work more effectively with the physical model. If you'd like to see how to do this, watch this video on managing large data models in Vertabelo.
3. Figure Out What You Need to Change
This may seem obvious, but you need to really dig into what changes need to be made. What do you already have in the database? What new and additional needs exist for the database? This means talking to business users and understanding what the requirements are. It doesn't mean just blindly adding new tables and new columns because someone told you to.
Tip #4: Understand the business needs. Before you start changing your model, understand the change. I cannot say this enough: talk to business people before you start changing your model. If you are not sure what you need to know or what to ask, here are some questions to ask business users.
4. Get Others’ Insights
Tip #5: Collaborate.
Any changes that you determine are needed should be carefully reviewed with business stakeholders:
- How will the change impact user experience?
- What effects will the change have on the data in the database?
- How will existing data be updated to comply with any new constraints or keys?
You’ll need a tool that supports collaborative database modeling, facilitates working with business users, and makes it easy to hand off the database to developers. Additionally, look for versioning support for the data model. In the future, hopefully you can begin to trust that you have a 100% accurate and current model of your legacy database.
5. Plan for the Future
Tip #6: Take advantage of version control and management tools.
Vertabelo includes a version control system so that you can track modifications and manage versions of the data model; it also can generate migration scripts between versions of your data model, which help you avoid having to manually write and correct migration scripts. The Vertabelo tool will manage your versions and figure out the impact of the changes between versions.
More Help with Data Modeling
First, I recommend getting started with ERDs, especially on Vertabelo. Then I'd recommend reading up on best practices in database design and applying these tips for better database design.