Having a guide of best practices for database modeling always at hand will help you improve the quality of your work. Read the tips and advice in this article to put together your own list of best practices. Then put it where you can see it every day.
When you find yourself needing to perform a task that you have never done before, it is always useful to read the advice of someone who has already done it hundreds or thousands of times. That way, you will gain valuable experience and save yourself many hours of learning. And you’ll avoid making mistakes that others have made before you. In the case of database design, a guide of best practices for database modeling condenses the experience of designers who have spent many years solving problems.
Without further ado, let's see what advice you can get from an experienced designer who has learned valuable lessons by stumbling over the same stones over and over again. The list of best practices for database modeling compiled in this article is divided into three parts:
- The first part includes kick-start tips to get you off on the right foot.
- In the second part, there are technical tips on dealing with indexes, normalization, and so on.
- In the last part, the tips focus on administration and maintenance for your database designs and models.
Part 1: Database Design Basics and Kick-Off
#1: See the Big Picture
When you know the domain of a problem and the objects that compose it, an image of the solution forms in your mind. If you usually work with databases, you will immediately start imagining tables, columns, primary keys, relationships, etc. You’ll feel the urge to dump those mental images into a database to start materializing the solution. But you must restrain these impulses.
First, you must outline a general diagram of the data model you are going to create. This general diagram must be understandable to non-technical users and stakeholders. This is the conceptual data model, which provides a common language that all interested parties will understand.
The conceptual model will help you communicate your ideas without confusing your audience with the technical details of the database. This communication will allow you to get feedback from users and stakeholders, which will enable you to correct misconceptions and add missing elements. Remember to get validation that your model is what they expect.
You can create a conceptual model by drawing only the main entities and the relationships between them, leaving aside details such as cardinality, primary keys, foreign keys, etc. For this purpose, it is advisable to use a tool that allows you to create a basic model and then evolve it into an operational database.
Conceptual models represent extremely simple schemas that help you communicate your ideas – without confusing your audience with the technical details of a database.
#2: Define a Naming Convention (and Follow It)
The names you give to the elements that make up your model are not easily changeable once they become database objects. That is why the choice of names is not trivial; indeed, it is vital that those names strictly conform to a database schema naming convention. Defining and following a naming convention is one of the most important best practices for database design.
Before you start modeling, you should lay the foundations of your database naming convention. Have the conventions clearly specified in writing and available in a visible place. For example, conventions may appear in text annotations in prominent areas of your entity-relationship diagrams (ERDs). If you use Vertabelo as a design tool, you can detail the naming conventions of your models using virtual sticky notes that will always accompany your ERDs.
#3: Partition Early
Partition your diagrams before they become a tangle of objects that makes it impossible to understand their structure at a glance. Start from the beginning by creating the partitions into which your ERD will be divided and then place each object in the corresponding partition. If you partition by functional areas, placing each object in the correct partition will give more clarity to your model and will facilitate the understanding of those who must validate or use it.
With Vertabelo, you can use subject areas to partition your models. In addition to partitioning by functional areas, subject areas allow you to partition models according to other criteria, such as table update frequency or table type (particularly useful for differentiating fact and dimension tables in data warehouses). Always include partitioning techniques within your list of fundamental database modeling tips.
#4: Neatness Counts
This is not a question of aesthetics or visual appeal. A diagram with a good layout improves the productivity of everyone who has to work with a data model. To make a good layout, you have to take into account the distribution of objects, intersection of lines, and use of colors and areas, among other things. You can get many tips on improving the layout of your diagrams in this article.
Part 2: Technical Matters
#5: Anticipate Index Creation
Indexes in a database serve to improve the overall performance of the applications that use the database. Basically, an index makes it easier for the database management system (DBMS) to make better use of server resources and respond more quickly to commands given to the database. Read this article for more information on database indexes.
While the task of creating indexes is usually performed after the database is operational and performance problems begin, you can anticipate these problems by creating indexes at the time of design.
Each DBMS has its own particularities when it comes to creating and managing indexes. This means you must define indexes on your physical model, which you will have created by choosing the target DBMS.
Good index-creation practices indicate that all primary and foreign keys should have indexes associated with them. Generally, database engines take care of this automatically. But if they don’t, you must create indexes for the primary and foreign keys of each table included in your schema.
With Vertabelo, you can include index definitions in your ERDs; when you generate the DDL scripts to create the database, the index creation is included in those scripts.
#6: Clearly Define Nullable and Non-Nullable Attributes
If we are in doubt about whether an attribute is mandatory or not (or in the physical schema, if the column supports null values or not), it’s common to make it non-mandatory – just in case, to avoid INSERT or UPDATE operations canceling by mistake. This is not advisable, since the mandatory nature of an attribute is important to maintaining data integrity.
If you allow an attribute to allow null values when they do not apply, you risk the database providing inconsistent information. For example, if a foreign key allows nulls when it shouldn’t, queries that combine related tables may return fewer rows than they should.
The same happens with numeric values involved in aggregate calculations. These calculations replace null values with zeros – often without the user’s knowledge. In that case, an aggregate calculation query (using
AVG(), for example) may return false information.
It is preferable for an UPDATE operation to be canceled by mistake than for a data analysis query to make incorrect calculations. Remember: at all costs, your database must maintain the integrity of the information it stores.
#7: Natural Keys vs. Surrogate Keys
Another widespread custom among designers and programmers is to assign primary surrogate keys to all tables in a schema – again, just in case. This is not a good idea, since surrogate keys, when they are not necessary, are more of a problem than a solution.
As you can read in the article What Is a Surrogate Key, surrogate keys increase the size of the table and add data that does not give any context to the entity they identify. In certain situations, a surrogate key does not serve any function, e.g. in junction tables used to implement many-to-many relat
It makes sense to add a surrogate key to an entity when:
- The entity does not have a natural key (a set of attributes that uniquely identifies it), or
- The natural key includes many attributes with different data types, or
- The natural key is too large an attribute.
#8: When and How to Normalize
A large part of your working hours as a data modeler will be dedicated to the creation of transactional databases. These databases support systems that handle online transactions, such as e-commerce or ticket booking applications. In transactional databases, it is vital to preserve the integrity of the information and to avoid anomalies in data insertions, updates or deletions. To this end, it is essential to apply normalization. Bringing a data model into the third normal form helps to avoid integrity problems, reduce redundancy and optimize storage space. Read about normal forms here.
However, not all databases are transactional; therefore, not all need to have normalization applied to them. For example, dimensional databases are used as a repository of information for Business Intelligence systems. When designing a dimensional database, you should focus on simplifying and speeding up access to summarized information. For this purpose, it is not always necessary to normalize. Instead – to achieve simplicity and speed in obtaining results – dimensional data modeling applies denormalization techniques. Denormalization adds redundancy to simplify queries and thus aims to reduce the number of tables that must be combined (joined) to solve complex queries.
Denormalization techniques add redundant columns to speed up analytical queries.
#9: SQL or NoSQL?
We are very used to building abstractions of data models by imagining them as sets of related tables composed of rows and columns. But many current scenarios (such as Cloud computing, IoT, or Big Data) benefit from data repositories that depart from that conventional structuring.
NoSQL databases offer many options that leave the relational model behind and solve different needs – e.g. document, key-value, graph, wide table, and columnar databases. You can see all these options in this article on SQL versus NoSQL databases. Note: You can use conventional tools based on ERD diagrams to design NoSQL databases.
Part 3: Managing and Organizing Your Work
#10: Always Save Versions
In your working hours as a database designer, there will be many times when you will need to retrace your steps. Perhaps you made a mistake in modifying a data model, someone else made a change that they shouldn’t have, or your clients/stakeholders backed out of a change you’d already started to implement. In such situations, having a version history of a data model can save you a lot of aggravation.
Versioning can be done manually simply by taking the trouble to save a copy of your model before making any changes. But to avoid the risk of forgetting to back up your model, it’s best to use an automatic versioning tool that generates versions of your models every time you make a modification.
The Vertabelo platform takes automatic model versioning one step further. Unlike conventional code control systems (which handle file versioning), Vertabelo handles logical and physical model versioning. It allows you to generate versions manually or automatically, check versions, generate branches, and review version history. In this review, you can even compare the differences between different versions.
#11: Don’t Rely on Visual Validation
Database diagrams tend to grow over time. The diagram that today fits on a single screen and allows you to read entity names and attributes without zooming in will turn into a tangle of crisscrossing lines and names that you can only read with a magnifying glass. While a good editing and visualization tool like Vertabelo facilitates the visual review of diagrams, there are always errors that escape even the most trained eye. That’s why automatic validation can save you a lot of trouble and many hours of work.
The live model validation offered by Vertabelo automatically detects when a diagram has different elements with the same name, when you leave an entity without attributes, when you forget to change the default name of an element, and many other mistakes and oversights.
#12: Keep Your Documentation Up to Date
Maybe you think that spending time documenting your data model is a luxury you can’t afford. You won’t feel the same way when you have to make changes to the diagram you are working on today and you don’t remember the reasons that guided the design decisions you made. Then you will wish you had documented earlier.
The task of documenting doesn’t have to consume the time you need to finish your designs on time. With a smart data modeling tool like Vertabelo, you can generate documentation automatically and have this documentation follow your models as they evolve. This lets you easily see what changed from one version to the next without having to visually walk through the ERD.
Another way to document is to use virtual sticky notes that remain attached to your ERDs throughout their existence. With these notes, you can add comments explaining why you made a certain design decision.
Beyond Database Design Best Practices: Value Experience
The lessons you can get from a degree, a course, or a book are necessary, but they are only a part of the knowledge you need to become a skilled database designer. The rest of the knowledge you gain from experience – either your own or that of designers who have been encountering problems and finding solutions to them for many years.
Leveraging the experience of others will save you a lot of time and effort. You can accumulate valuable experience quickly, without having to solve design problems on your own. That’s why we wrote this article on database design best practices, and we hope it’s helped you.