Many people draw boxes connected by lines and think they do data modeling. But we data modelers know our work goes far beyond that. Here are the desired skills in data modeling and important considerations for staying relevant as a data modeler.
Data modelers are essential in many situations. We are called to be part of a software development team in the early stages of a project for building the foundations of a software solution. We are called in to design a repository for a company data warehouse needed to provide accurate information for making critical business decisions. And we are called in to fix a flawed or outdated data model, possibly designed by someone who does not have the desired skills in data modeling, someone who knows nothing about normal forms, keys, relationships, entity-relationship diagrams, or even what data modeling is about.
Benefits and Importance of Data Modeling
There is no doubt what the data modeler does is essential for effective information repositories that provide a solid foundation for business processes. But there are many other important benefits of data modeling. To name a few:
- Database versioning: When database design is part of a software development process, working with a data model makes it possible to maintain database versions that accompany versions of the software under development. Using an appropriate design tool like Vertabelo, you can analyze the differences between different versions and even integrate the data model with a version control system such as Git.
- Database migration: Using data models makes it possible to build a model from an existing database and replicate it in the destination server, even if the source and destination servers run different RDBMS. To migrate the data in tables, using a model allows you to determine how and in what order to migrate the contents of each table.
- Document and share models: Any database must facilitate access to the information it contains. The way to do this is to provide visibility over its structure so that anyone with permission can easily obtain the information they need. To guarantee this visibility, use updated data models that act as maps or blueprints that make design decisions in database modeling
Staying Relevant
As with any profession, data modelers need to stay current to remain competitive. Be up to date in data modeling with the latest technologies, know what’s hot, know what’s trending, and understand what the market is asking for as desired skills in data modeling.
It is not enough to follow a database career path and learn how to become a database designer. Today, a data modeler must know how to use the latest database tools, how to design data warehouses, how to optimize a database, and what the new paradigms are (such as NoSQL), among other things.
It may seem overwhelming. But the following tips will help you get an idea of the most desired skills in data modeling.
1. Modern Design Tools
To be up to date in data modeling, you need to get the most out of the time you put in at work. This means maximizing your efficiency and productivity in creating and maintaining data models. It won’t do you any good to design your models with a conventional drawing tool or to limit yourself to working with the standard design tools that come with database engines.
If you use archaic tools, you quickly fall into obsolescence as a data modeler. You must be able to handle design tools that offer:
- Compatibility with as many RDBMS as possible.
- Capability for generating database creation or modification scripts that run error-free on the target RDBMS.
- Capability for generating models from existing databases.
- Intelligent validation of your designs to avoid generating faulty databases that cause data anomalies when they become operational.
Staying relevant as a data modeler means knowing the design tool options available. Among them, Vertabelo stands out as an online tool that allows you to reach your maximum productivity and efficiency.
2. Data Warehousing Modeling Techniques
Data modeling techniques are learned, in most cases, by applying them to database models for transactional processing that respond to the “classical” relational model. This is not necessarily a bad thing.
However, many designers get stuck with these techniques and try to apply them in every situation, including data warehousing. The result is inefficient data warehouses that cannot support the business intelligence (BI) and online analytical processing (OLAP) activities organizations need.
One of the most desired skills in data modeling is knowing the differences between a data warehousing schema and a traditional transaction processing schema. It will surely pop up among the interview questions for a data modeling job. Mainly, you must master some very specific concepts of data warehousing schema design, such as:
- Facts and dimensions: the measurable events and the qualifiers of those events.
- Topologies of data warehousing schemas: star, constellation, and snowflake.
- Types of dimensions: conformed, slowly changing, junk, etc.
- Types of attributes of fact tables: qualitative and quantitative.
- Different ways of recording measures in fact tables.
- Denormalization techniques.
- ETL processes to populate data warehouse tables.
- The granularity of facts.
3. Big Data
An important database modeling career advice we can give you is to know how to handle Big Data in addition to data warehousing. They are similar concepts since both refer to repositories of large volumes of data.
Data warehouses handle “clean” and structured data, using conventional database technologies. Big Data handles even larger volumes than data warehousing, often exceeding the capacity of traditional database engines.
It is common in Big Data for information to come from diverse and unstructured sources and to be generated at a high speed or in real-time. Examples include information collected by sensors and IoT (Internet of Things) devices in industrial control systems.
Big Data repositories often take the form of data lakes and use specific technologies such as Hadoop or Spark designed to take advantage of highly scalable and low-cost storage mechanisms. For this reason, models suitable for Big Data must give priority to storage space economy and data partitioning to enable parallelism and real-time data capture. This requires non-relational or NoSQL database systems, which leads us to the next topic.
4. NoSQL
It is good database modeling career advice to step out of your comfort zone from time to time. Let go of the classic relational model in which a relation is a set of tuples, all with the same attributes. Welcome to the world of NoSQL databases.
The enterprise transition toward the digital world has accelerated significantly in recent times, bringing to the forefront new applications that differ from traditional ERP, HR administration, and financial accounting applications. These new applications need to handle unstructured data, provide instant responses, and support a large number of users. For these, NoSQL databases are a preferred option.
NoSQL databases provide the flexibility needed for many modern, unstructured use cases. These use cases include those related to social media, Big Data, pattern recognition, and IoT applications. Instead of using tables, they use a variety of flexible data models such as documents, key-value stores, graphs, and "wide" columns.
These models are intended to support large volumes of information and new classes of applications. The document database model uses JSON documents to store information, while key-value stores resemble relational tables – or rather, the dictionary structures of programming languages such as Python or R – that optimize the retrieval of the data associated with a key. The wide-column model also adopts the table format of relational databases but adds a great deal of flexibility in the way data is named and formatted per row. Finally, the graph data model uses graph structures to define relationships between stored data points. This model is widely used in pattern identification applications on unstructured information.
5. Database Optimization
When a database you designed runs slowly, you are the one who is called in to fix it. A DBA may try to tackle the problem before passing it on to you, but if the circumstances are beyond his/her control, you have to prove you not only know how to build a data model but also how to make your model work optimally.
There are numerous tricks and techniques to improve the performance of a database without altering its design, such as creating indexes and statistics and applying best practices in query writing (avoiding cursors, avoiding temporary tables, etc.). However, there is a limit to what can be achieved with this toolbox of tricks and techniques. At some point, it becomes necessary to apply improvements and refactoring to the database design for substantial performance gains.
You may think the time and effort to create a good data model from the start is more expensive than deploying functions quickly once the database is operational. But in the long run, the price you pay for hasty or wrong design decisions is higher. One of the keys to staying relevant as a data modeler, then, is knowing how to justify the benefits of good design, be it in creating a database from scratch or in refactoring and optimizing an existing database.
6. Database Security
Database security encompasses tools, processes, and methodologies that establish a security perimeter around database servers to prevent cyber-attacks and illegitimate uses of information. You may think security management is outside your area of responsibility as a data modeler. But if you want to stay relevant, you want to keep security aspects in mind when designing your databases.
A common mistake data modelers make is assuming our databases are safe from attacks and typical cybersecurity threats. We think firewalls and other protection mechanisms would take care of repelling any danger.
But cybercriminals do occasionally discover breaches that give them access to databases or have henchmen within an organization who open the way for their malicious activities. They get to the heart of databases with nothing to stop them from doing their misdeeds.
For that reason, we must think about generating layers of protection from the very design of a database. Designing an efficient and secure scheme for authenticating the users of a system is one such example. Another is partitioning database schemas so that a DBA can leverage the authentication mechanisms of the database engine to assign permissions differentiated by access levels.
7. Keeping up to Date With the Latest Trends
Many of the fundamental database concepts were developed 50 years ago and are still valid today. The relational model is the best example of this.
But that doesn’t mean we can sit back, believing we already have all the knowledge we could ever need. New technologies, new paradigms, and new concepts appear frequently. They force us to keep learning and expanding our skills; If we don’t, we quickly cease to be relevant as data modelers.
Database conferences are often a good source of information on industry trends and developments, although the topics covered tend to be very advanced and cutting edge. Don’t expect to find, for example, a conference discussing the basics of database design. Let’s browse the topics of a few conferences planned for 2022:
- ACM Symposium on Principles of Database Systems (PODS): A state-of-the-art conference on the theoretical aspects of data management to advance the fundamentals of tools we use to store, integrate, query, and analyze data. The PODS community provides a solid scientific basis for methods, techniques, and solutions to common data management problems.
- The 6th International Conference on Information System and Data Mining (ICISDM): This conference presents the latest research and results of scientists working in fields related to information systems and data mining. For this 6th edition, topics such as coupling and integration of heterogeneous data sources, legacy systems, scalable data preprocessing, parallel and distributed data mining algorithms, among many others, are proposed.
- The 5th International Conference on Data Storage and Data Engineering (DSDE): The 5th edition of this conference focuses on topics such as Big Data (infrastructures, architectures, security and privacy issues, etc.), social data analytics, mobile data management, among others.
Books are also a valuable source of information to stay current in data modeling. The best books on the subject were generally written a few years ago, but their authors take care to revise them from time to time with new editions to keep them updated.
Finally, I suggest you frequent specific blogs on modeling and databases. Select the ones from authors who are most active in searching, updating, and publishing useful knowledge.
Bonus Tip: Align Your Objectives With Those of Your Clients, Sponsors, and Users
Even if your models are perfect and you meet all the conditions to maintain your relevance as a data modeler, your objectives must be aligned with those of the ones who ultimately use your designs. It is important you understand the business vision of your users and stakeholders and reflect that vision in the outcome of your work.
A data model can be impeccable from every technical point of view. But if it does not serve the users’ purposes, its usefulness does not go beyond decorating some wall of a room in the IT department.
Maintain Currently Desired Skills in Data Modeling at All Times
It’s important to be up to date in data modeling. As with any profession, staying relevant as a data modeler is key to remaining competitive. Know the latest technologies and what’s trending.
Don’t get too comfortable in your data modeling job. If you do, you can become professionally obsolete very quickly!