Database design is the process of producing a detailed model of a database. The start of data modelling is to grasp the business area and functionality being developed.
Before Modeling: Talk to the Business People
This is a key principle in information technology. We must remember that we provide a service and must deliver value to the business. In data modeling that means solving a business problem from the data-side such that the required data is available in a responsive and secure way.
Data modelers need to talk to the business people. “What is the business domain?” “What information needs to be held in the database?” “What are the problems to be solved?” “What are the issues in this domain?” From talking to the business people you have the basis to consider trade-offs between performance, flexibility, security, and other non-functional requirements which may affect the data model.
This work helps to establish the foundation for the modeling. Therefore, it needs to be done before the actual creation of the entity relationship model so that the model will address the correct areas of focus.
Take the example of a financial application for a company: you would need to model customers, suppliers, invoices, payments, accounts, balances, etc. You need to learn about these concepts; learn about accounting. You do this by talking to the business people.
Where Does Data Modeling Fit Into the System Development Lifecycle?
This has been hotly debated over the years. Some developers see the data model as an organic object that results from prototyping and design. Data professionals tend to take a more strategic approach of identifying the business needs up-front and modeling them to meet the strategic enterprise needs. In other words, some developers think that the data model should evolve around the actual code, while some data modelers think that the code should be created based on a relatively static data model. I would argue that data modelers today in just about any development environment need to embrace a highly collaborative approach to data modeling. The data model and code influence each other back and forth. Developers influence the work of the data modeler; the data modeler influences the work of developers.
With the data modeler’s understanding of the business domain, he or she must create the required data model and follow-up when changes are required. In an interactive way, modelers can work with the user experience and development teams. If, for example, the developers discover that the model is too normalized for high-performance, or that it requires additional normalization for some tables, the data modeler updates the model. The teams should consider what the exact field lengths should be and might even modify fields based on UX screen designs. The teams can work through examples of what data will be stored in the tables and how it will be used by the application.
So, basically, modelers and developers must “massage” the model until it really supports the business and non-functional requirements of the system. This is a very iterative process which might continue for many months while the application code, user interfaces and application interfaces are being developed.
Going back to the example, you might discover during development that it is not enough to model a single currency such as euros or dollars for invoices and payments, but that you need to store amounts with their associated currency and convert them to a “base” currency in which the company’s financial accounting (bookkeeping) is held. Therefore, you will also need the associated currency conversion information for the currencies and amounts and to store historic currency exchange rate information used to convert currencies in the past.
Efficient Modeling: Reuse, Reuse, Reuse
Being efficient in data modeling requires a data modeler to create reusable and repeatable designs. Modelers must be very wary not to reinvent the wheel. Typically, there are numerous opportunities to re-use portions of existing models across various systems and projects. In addition, this provides benefits in the development process, as a re-used data model might allow a similar reuse in code, which potentially eliminates significant development efforts. The re-use of existing industry data models and data design patterns can be the basis for an enterprise catalog of data components.
Follow-Up: Formal Design
So, actually building a physical data model and an ERD of the database are simply side effects of a larger process around the development projects and the strategic enterprise data approach. However, properly cataloging and controlling formal designs can facilitate and increate the potential for re-use. After creating a model for an individual project, data modelers can start thinking about creating extensions and extending to reusable data designs. What additional information and attributes might be required in the future beyond what was required by the current project? How might future iterations of this data model look?
After implementation and during operational usage, it is likely that new requirements will be uncovered which may have an impact on the data model. Again, the data modeler may attempt to foresee some of these future requirements during the design and implementation, but when data modelers keep up-to-date on the business they may uncover new requirements that will need to be met in the future. One recent example is the numerous changes that have occurred in the payments area with the Single European Payment Area (SEPA). The SEPA rules have forced new requirements on corporations to store information about payments, such as international bank account number format (IBAN) for customer and supplier accounts, and bank identified codes (BIC) being required for fast, efficient, inexpensive payments and debits in the Euro zone.
Data Modeling Is More Than an ERD
The entity relationship model (E-R model) identifies a business need and focuses on the data needed to solve issues related to the business. When businesses use the E-R model as a guide in the data design, it helps to address concerns related to organizing and relating data, particularly using the relationships of data to find the desired information. The ERD allows the business to understand what data and information is needed to operate and manage the business.
Some data modelers see the ERD as their only contribution to a development project. They spend many hours making the data model a work of art. Some data architects spend little or no effort beyond the boxes and lines. Although the ERD is an important piece of a data modeler’s job, other activities such as keeping up-to-date on the business, managing metadata and proper data governance have an impact on the enterprise’s strategic data approach.
The amount of data to be handled is constantly increasing. Enterprises have the need to split, combine and analyze the data within their disparate systems. Structured and unstructured data coming from various sources has replaced the traditional single master data source. If the data modeler is to become a data architect, then these types of issues must be managed and they go far beyond the simple lines and boxes of the ERD.
Metadata is important to understand how data should be stored in a corporation’s data warehouse. Data modelers should consider how data may be extracted from the application that they are working with so that it can be integrated into a data warehouse, and enable an enterprise wide perspective. Metadata should be structured within an ERD to assist in data warehousing. The metadata provide details on the data that is stored within the data warehouse.
Data governance looks at the management of availability, usability, integrity, and security of the data across the entire enterprise. Data governance should include well-defined procedures and execution planning. Data governance defines the owners of data assets across the enterprise. Data governance policies specify accountability for the data, including accuracy, accessibility, consistency, completeness, and updating.
Conclusion
Database design is the process of producing a detailed model of a database. To be efficient, a data modeler must focus on the “big picture” even while solving specific data modeling tasks within a project.