Back to articles list

September 15, 2015 - 14 minutes read

Example er diagrams

Tackling Your Troubles – Building a Bug and Problem Database

Andrew Wolfe

Educator, Writer and Practitioner in Database Software and Information Security

Tags:

Death and taxes – add “software problems” to that list of the inevitable. There is always a new issue, a new failure, a new key opportunity that an organization must address. And to avoid repeating the problems, or to revise your prior fixes, it is critical to capture the problems accurately and completely. You need a history of what happened and when. In this piece, we create the logical model for a problem or “bug” reporting system.

The Problem: Problems

Every organization has to deal with problems, exceptions, or other needs. Huge enterprises have problem-tracking databases, some small enterprises use them as well. Vertabelo is going to help us create a bug tracking database for ourselves. For the purposes of this article, we’ll consider a bug tracking database for an inventory system called “StickyStuff” which prints and tracks inventory-asset labels using various printer and scanner hardware. We’ll call our bug tracking system BugBlaster.

What are the characteristics of a software bug? Well, it is associated with a failure or exception in a specific piece of software. Perhaps an inventory item is listed in two places, or a label came out unusable. There are measurements or observations that are associated with the bug. These must be listed with the particular person who reported them, along with the circumstances in which it occurred. We need to understand its severity to distinguish a product-killing disaster from a misspelling on some ancillary UI page. We want to know if it has been fixed, or whether we can’t reproduce it at all.

Every piece of information that can assist in resolving the problem should be available in the database schema, it should be clear how to locate the information, and as simple as possible to program it. With these general goals in mind, we begin the process of modeling our bug database.

Getting Started with a Logical Model

Let’s get started on “BugBlaster v0.9.” We won’t call it version 1.0 yet. After all, the developers building their BugBlaster application code don’t call their piece “version 1” yet. I’ll target MySQL. Now, Vertabelo could be called a “physical modeling” tool but I use it for logical modeling too, using tables for my logical entities. After all, almost all of your final tables will start as a logical entity. So what’s our first entity?

Problem – Our First Entity

Naturally our first entity is Problem. I mean, this is a problem tracking system!

The first entity in our problem tracking database is “Problem”!

More Entities

I add more tables as entities. I am not adding any columns to them at all. Why? Because I haven’t decided which are dependent, and which are actually attributes and not tables at all. More entities: Person, Report, Condition…

As per usual logical modeling, I create my entities. In Vertabelo, I make them a bunch of column-less tables.

Our initial set of data entities, modeled as tables with no columns.

The entities we are using are:

Problem	Describes some observation, event, happening or condition we need to resolve
Report	Describes an event that occurs, or a measurement or observation that may be relevant to a problem. A report may consist of several measurements, comments, etc.
Asset	This is an app, software component, device, or anything else that may be involved when a problem is observed. It might be the inventory user-interface, a particular type of device, or perhaps one scanner with unexpected results.
Person	Someone who might encounter, report, analyze, fix, test a problem
Change	Change records an alteration of an asset (or its creation, destruction, retirement)
Circumstance	Describes the location and context for a report
Condition	An aspect of the circumstance of a report
ProblemAspect	Lists one aspect of a problem – it may have several. Perhaps StickyStuff printed a gibberish label, but then we discover it’s from the inventory being recorded incorrectly. Both are aspects of a problem.
Measurement	A measurement associated with a report. This may be yes/no, e.g., a label is readable or not. It may be a numeric value, such as numbers of things in the inventory.
Date	A record of the event
Cause	Identifies an aspect or defect and attributes that as a cause of the observed problem.
Severity	How seriously the problem affects some operation or another
Fix_Target	deadline for fixing a problem
Status	The status – open, resolved, etc. – of a problem
Tag	A means of non-exclusively grouping sets of information in the database

The point is not to get the final set of entities/tables from the first. Some of these entities are just going to disappear, and you may be able to tell which ones already. Others will be renamed or will evolve, or new ones arise. This is one of the freedoms of not using a compulsive tool that wants to trace each logical entity through to physical tables. I have found over many years that when you are logically modeling, you want to just capture all the “things” that participate in your final system. You may liken this to an older (but valuable) data modeling technique that suggests you list out all the nouns describing your system, and turn them into entities as you see fit.

Getting Physical – Adapting the Initial Logical Model

Identifying Main Entities

As I mentioned before, I don’t expect these all to become tables. But there are certain entities I know we will treat as independent entities. These exist on their own and do not depend on any other entity. I’ll change their fill color to yellow.

We distinguish our important entities – our main business objects – by marking them in yellow.

Isn’t Report a Problem?

It sure is – is Report dependent on Problem or not? Here we make a modeling decision, but we need to know the tradeoffs:

	Advantages	Disadvantages
Report is part of Problem	Every problem needs a report anyway Simplifies adding multiple reports to a problem Most typical use case for a report	Associating reports with additional problems is difficult
Report is Independent	Makes it easier to split out distinct problems from a single report or event	M:N relationship adds programming effort

On considering the bug tracking systems I have used, I realize that a key problem I’ve observed is that each problem reported gets connected way too tightly with a single sub-issue and focuses down on it. Often a problem has information that indicates other problems, or may relate to other known issues. I don’t want to do that with this system, so I am making Report an independent entity.

Here is the model with the ‘final’ set of entities.

Our bug database logical model with all entities identified.

Finalizing the Logical View

The last aspect I want to tie down are those entities which are in essence dimensions or domains. You could alternatively call them reference lists or lookups. These have their own identities, but only are descriptive of actual business objects. Severity is a good example. I’ll color these with a sort of lavender. I’m also adding Unit, which will allow us to capture information needed for aggregation and analysis, especially for non-radix-10 data like time durations.

Finally, we drop some entities. There’s at least one ‘entity’ that is simply unusable – Date. In addition, Fix_Target starts putting a scheduling aspect into this system, and Change and Remedy would lead us into asset versioning. I don’t want to clutter the initial implementation with these. And Cause is redundant with Problem_Aspect.

Final logical model with necessary additions and adjustments.

Building the Real Physical Model

Now we’re ready to get down to brass tacks – building the real model. I add primary keys to the independent entities and lookups.

Independent Entities as Tables

Table	Primary Key
Person	id
Report	id
problem	id
Asset	id
circumstance	id

Lookups and Reference Information

Table	Primary Key
data_types	id
problem_aspect_type*	handle
status	handle
severity	id
Units	id
tag*	id
enclosure_type	id

I have put asterisks after problem_aspect_type and tag because these reference tables will allow additional entries by the end user, not just by myself as the modeler or the programmer. Tag, of course, needs to have this – we are not going to give the user any tags to start, the users will leverage them for every and any purpose they have. Problem_aspect_type will start with values like cause, symptom, damage, side_effect but perhaps users will have other ways to associate a reported piece of data with a problem.

Primary Dependent Tables

The main dependent tables that are needed should now be apparent. We need to a place to store the problem information, and must show relationships among our software and hardware assets. What is also apparent is that these are gaps in the logical model, and we’ll fix them now.

Table	Primary Key	Description
report_finding (renamed from “Measurement”)	report_id finding_seq	This is the main table where we capture user measurements and observations. Each record is a distinct unit. The table is overloaded to carry different kinds of information, and we distinguish this by the data_type_id. A typical report will have three or more of these records, one with a substantial text comment, one with a screen capture, one with a timing, etc.
Package	enclosing_asset_id referenced_asset_id enclosure_type_id enclosure_seq	We need to give more detail on a problem than just “this happened with the StickyStuff application.” Was it in the back-end, label printing, user interface? This allows us also to record ‘softer’ asset relationships which may be relevant.

Lookups and Reference Information

Table	Primary Key
data_types	id
problem_aspect_type*	aspect_handle
Status	status_handle
Severity	id
Units	id
tag*	id
enclosure_type	id

You may have seen the new data_type table. This table allows us to keep a strict control over the information in each report_finding record. It includes data type records for integer, text, and decimal, but also HTML text, images, sounds, and such. We also use this table to specify the type of information in the report’s Circumstance/Condition tables.

Association Tables Finish Describing Each Report

Circumstance and Condition provide the context for any report that is made. It includes names of computers, memory or network resources, test data sets loaded and such. It is much like report_finding, but it allows looser information specifically to handle the imprecision when a customer reports a problem. Also, each ‘circumstance’ is reusable to show relationships between different reports and to allow correlation.

ProblemAspect is the magic glue that turns a finding/measurement into information. Why is this a problem? How does it contribute to the problem? “I measured the label printing taking 45 seconds, it is part of this problem because it fails to satisfy performance requirements.” A single problem may have many aspects. And we may find that findings from prior reports actually reflect on the same problem. Perhaps a unit test against a particular data set found that the web service generating labels sometimes takes 42 seconds. Take the same finding and attach it to the problem.

I have put asterisks after problem_aspect_type and tag because these reference tables will allow additional entries by the end user, not just by myself as the modeler or the programmer. Tag, of course, needs to have this – we are not going to give the user any tags to start, the users will leverage them for every and any purpose they have. Problem_aspect_type will start with values like cause, symptom, damage, side_effect but perhaps users will have other ways to associate a reported piece of data with a problem.

Integer Surrogate IDs as Primary Keys

You will notice that most of the primary keys are integer or bigint. This is a common practice, and when you use a completely artificial value to identity data in a table it is called a “surrogate key.” All of the independent entities have these primary keys. This insulates the records from changes to name or assignment.

“Handles” – Textual Primary Keys

Certain tables I have given “handles” instead of “ids.” Why? I want these values to describe themselves rather than having to dereference them through the lookup table. Status is one in which I really want someone querying the database to be able to read the field values in the referencing tables (problem and report_finding) rather than having to look up an ID.

Compound Primary Keys

Many of our tables implement relationships among primary entities and lookups, that is, using foreign keys to those tables. For these relationship tables I have not created independent primary key columns, but instead combined the foreign-key fields to create the primary key. I have found that creating another independent surrogate key for such tables does not add any value, but instead obscures the need for each combination of references to be controlled. It fosters data corruption. So these tables have primary keys built from foreign keys. In cases like Package, where the same (combination of) references may repeat, I add a sequence number or other attribute to the primary key to preserve uniqueness.

Adding Fields/Attributes

We have keys and relationships but that’s not giving us any real information. If the model is named effectively, and diagrammed neatly, you may be able to see how it will fit together. But remember, most of our primary keys are surrogates – that is, they are completely meaningless values. Most of these tables need us to put something in there that allows the programmer or user to query them meaningfully. So unit needs unit_name to tell people what it is.

In reference tables, and some others, we want to put description to explain what the record means.

Many of the tables need some history or tracking information. I use modification to tell when a record was last modified and creation for the time it was created. Note that I do not have a modification field for report_finding. This is because I really don’t ever want the finding records to be modified. Updates would only confuse the records that can be crucial to diagnosing – or re-diagnosing a problem.

Correcting Problems

When you start putting the keys and attributes into your model, you will start to find where your model has a problem so far. You will ask yourself how you will insert, or what order will you have the application modify data. You don’t have to be a user interface expert for this, but a little sympathy for the programmer is definitely in order! In this model, I found that problem_aspect had incorrect and incomplete relationships. It needs to implement a many-to-many between problem and report_finding. Then I saw I needed some way to characterize the relationship, so I added problem_aspect_type. You have no data yet, so fix it now!

The Final Model

Here we have our final physical model. You can see I have added in all of the keys, relationships, and attributes. You should also notice that I have tried to make the diagram as clear as possible by putting the primary entities towards the top. I made sure no lines showing table relationships have crossed. This makes it the best possible communication vehicle for programmers, UI designers, and product managers.

Final data model, ready to use!

Driving through to Implementation

We have a complete model now for our bug tracking system. To deliver it, we want to generate SQL for our target database and deploy it into a development database. When I’m using MySQL, I may use Sequel Pro or another tool to do this.

Add sample data for every table in the schema. This will allow developers to see how you expect it to work, and thus how they will program the data access and the user interface. Some of the reference-table data will be more or less permanent. Give your developers something to work on.

You may also want to write some queries or views to help the developers start.

Follow-Through

With an up-front design like this, we have already thought through most aspects of the data and anticipated much of the implementation. But don’t treat it like glass. Every application develops iteratively. As this goes through implementation, I expect to see changes in the database structure. Be sure to capture these precisely in your data model, whether by revising the Vertabelo model, or else by using its reverse-engineering function to pull changes from a live database. Remember that Vertabelo is keeping intermediate versions of your model at every phase. Just like our programmer colleagues do, we may want to pull out a prior version and try a different tack, make a different design or implementation decision and work from there.

As I mentioned, designing a data model is an iterative process. This model matches my mental idea of bug tracking in organizations where I have worked and in my own thinking. What sort of entities and tables would you want to put into a system like this? Does this model miss any specific aspects of the bug-tracking problem? Are they important for the initial release or can we put them into BugBlaster 2? I hope that you have found this exercise useful and I would love to hear your experiences and your feedback with systems like this!

Tags:

The Problem: Problems

Getting Started with a Logical Model

Problem – Our First Entity

More Entities

Getting Physical – Adapting the Initial Logical Model

Identifying Main Entities

Isn’t Report a Problem?

Finalizing the Logical View

Building the Real Physical Model

Independent Entities as Tables

Lookups and Reference Information

Primary Dependent Tables

Lookups and Reference Information

Association Tables Finish Describing Each Report

Integer Surrogate IDs as Primary Keys

“Handles” – Textual Primary Keys

Compound Primary Keys

Adding Fields/Attributes

Correcting Problems

The Final Model

Driving through to Implementation

Follow-Through

You may also like

Defining Identifying and Non-Identifying Relationships in Vertabelo

Managing Roles and Statuses in a System

How to Keep Track of What the Users Do

7 Common Database Design Errors

Designing a Data Model for a Hotel Room Booking System