Have you ever come across a situation where you need to manage the state of an entity that changes over time? There are many examples out there. Let’s start with an easy one: merging customer records.
Suppose we are merging lists of customers from two different sources. We could have any of the following states arise: Duplicates Identified – the system has found two potentially duplicate entities; Confirmed Duplicates – a user validates the two entities are indeed duplicates; or Confirmed Unique – the user decides the two entities are unique. In any of these situations, the user only has a yes-no decision to make.
But what about more complex situations? Is there a way to define the actual workflow between states? Read on…
How Things Can Easily Go Wrong
Many organizations need to manage job applications. In a simple model, you could have a table called JOB_APPLICATION
, and you could track the state of the application using a reference data table containing values like these:
Application Status |
---|
APPLICATION_RECEIVED |
APPLICATION_UNDER_REVIEW |
APPLICATION_REJECTED |
INVITED_TO_INTERVIEW |
INVITATION_DECLINED |
INVITATION_ACCEPTED |
INTERVIEW_PASSED |
INTERVIEW_FAILED |
REFERENCES_SOUGHT |
REFERENCES_ACCEPTABLE |
REFERENCES_UNACCEPTABLE |
JOB_OFFER_MADE |
JOB_OFFER_ACCEPTED |
JOB_OFFER_DECLINED |
APPLICATION_CLOSED |
These values can be selected in any order at any time. It relies on end-users to ensure that a logical and correct selection is made at each stage. Nothing prohibits an illogical sequence of states.
For example, let’s say that an application has been rejected. The current status would obviously be APPLICATION_REJECTED
. There’s nothing that can be done at the application level to prevent an inexperienced user from subsequently selecting INVITED_TO_INTERVIEW
or some other illogical state.
What’s needed is something to guide the user into selecting the next logical state, something that defines a logical workflow.
And what if you have different requirements for different types of job applications? For example, some jobs may require the applicant to take an aptitude test. Sure, you can add more values to the list to cover these, but there’s nothing in the current design that prevents the end-user from making an incorrect selection for the type of application in question. The reality is that there are different workflows for different contexts.
Another point to think about: are the listed options really all states? Or are some in fact outcomes? For example, the offer of a job can be accepted or rejected by the applicant. Therefore, JOB_OFFER_MADE
really has two outcomes: JOB_OFFER_ACCEPTED
and JOB_OFFER_DECLINED
.
Another outcome could be that a job offer is withdrawn. You may want to record the reason why it was withdrawn using a qualifier. If you just add these reasons to the above list, nothing guides the end-user into making logical selections.
So really, the more complex the states, outcomes, and qualifiers become, the more you need to define the workflow of a process.
Organizing Processes, States, and Outcomes
It’s important to understand what’s going on with your data before you attempt to model it. You may at first be inclined to think that there is a strict hierarchy of types here:
When we look more closely at the above example, we see that the INVITED_TO_INTERVIEW
and the JOB_OFFER_MADE
states share the same possible outcomes, namely ACCEPTED
and DECLINED
. This tells us there is a many-to-many relationship between states and outcomes. This is often true for other states, outcomes, and qualifiers.
At a conceptual level, then, this is what is actually going on with our metadata:
If you were to transform this model to the physical world using the standard approach, you would have tables called PROCESS
, STATE
, OUTCOME
, and QUALIFIER
; you would also need to have intermediate tables between them – PROCESS_STATE
, STATE_OUTCOME
, and OUTCOME_QUALIFIER
– to resolve the many-to-many relationships. This complicates the design.
While the logical hierarchy of levels (process → state → outcome → qualifier) must be maintained, there is a simpler way to physically organize our metadata.
The Workflow Pattern
The diagram below defines the main components of a workflow database model:
The yellow tables on the left contain workflow metadata, and the blue tables on the right contain business data.
The first thing to point out is that any entity can be managed without requiring major changes to this model. The YOUR_ENTITIY_TO_MANAGE
table is the one under workflow management. In terms of our example, this would be the JOB_APPLICATION
table.
Next, we simply need to add the wf_state_type_process_id
column to whatever table we want to manage. This column points to the actual workflow process being used to manage the entity. This is not strictly a foreign key column, but it allows us to quickly query WORKFLOW_STATE_TYPE
for the correct process. The table that will contain the state history is MANAGED_ENTITY_STATE
. Again, you would choose your own specific table name here and modify it for your own requirements.
The Metadata
The different levels of workflow are defined in WORKFLOW_LEVEL_TYPE
. This table contains the following:
Type Key | Description |
---|---|
PROCESS | High level workflow process. |
STATE | A state in the process. |
OUTCOME | How a state ends, its outcome. |
QUALIFIER | An optional, more detailed qualifier for an outcome. |
WORKFLOW_STATE_TYPE
and WORKFLOW_STATE_HIERARCHY
form a classic Bill of Materials (BOM) structure. This structure, which is very descriptive of an actual manufacturing bill of materials, is quite common in data modelling. It can define hierarchies or be applied to many recursive situations. We’re going to make use of it here to define our logical hierarchy of processes, states, outcomes, and optional qualifiers.
Before we can define a hierarchy, we need to define the individual components. These are our basic building blocks. I’m just going to reference these by TYPE_KEY
(which is unique) for the sake of brevity. For our example, we have:
Workflow Level Type | Workflow State Type.Type Key |
---|---|
OUTCOME | PASSED |
OUTCOME | FAILED |
OUTCOME | ACCEPTED |
OUTCOME | DECLINED |
OUTCOME | CANDIDATE_CANCELLED |
OUTCOME | EMPLOYER_CANCELLED |
OUTCOME | REJECTED |
OUTCOME | EMPLOYER_WITHDRAWN |
OUTCOME | NO_SHOW |
OUTCOME | HIRED |
OUTCOME | NOT_HIRED |
STATE | APPLICATION_RECEIVED |
STATE | APPLICATION_REVIEW |
STATE | INVITED_TO_INTERVIEW |
STATE | INTERVIEW |
STATE | TEST_APTITUDE |
STATE | SEEK_REFERENCES |
STATE | MAKE_OFFER |
STATE | APPLICATION_CLOSED |
PROCESS | STANDARD_JOB_APPLICATION |
PROCESS | TECHNICAL_JOB_APPLICATION |
Now we can start to define our hierarchy. This is where we take our building blocks and define our structure. For each state, we define the possible outcomes. In fact, it’s a rule of this workflow system that each state must end with an outcome:
Parent Type – STATES | Child Type – OUTCOMES |
---|---|
APPLICATION_RECEIVED | ACCEPTED |
APPLICATION_RECEIVED | REJECTED |
APPLICATION_REVIEW | PASSED |
APPLICATION_REVIEW | FAILED |
INVITED_TO_INTERVIEW | ACCEPTED |
INVITED_TO_INTERVIEW | DECLINED |
INTERVIEW | PASSED |
INTERVIEW | FAILED |
INTERVIEW | CANDIDATE_CANCELLED |
INTERVIEW | NO_SHOW |
MAKE_OFFER | ACCEPTED |
MAKE_OFFER | DECLINED |
SEEK_REFERENCES | PASSED |
SEEK_REFERENCES | FAILED |
APPLICATION_CLOSED | HIRED |
APPLICATION_CLOSED | NOT_HIRED |
TEST_APTITUDE | PASSED |
TEST_APTITUDE | FAILED |
Our processes are simply a set of states that each exist for a period of time. In the table below they are presented in a logical order, but this does not define the actual order of processing.
Parent Type – PROCESSES | Child Type – STATES |
---|---|
STANDARD_JOB_APPLICATION | APPLICATION_RECEIVED |
STANDARD_JOB_APPLICATION | APPLICATION_REVIEW |
STANDARD_JOB_APPLICATION | INVITED_TO_INTERVIEW |
STANDARD_JOB_APPLICATION | INTERVIEW |
STANDARD_JOB_APPLICATION | MAKE_OFFER |
STANDARD_JOB_APPLICATION | SEEK_REFERENCES |
STANDARD_JOB_APPLICATION | APPLICATION_CLOSED |
TECHNICAL_JOB_APPLICATION | APPLICATION_RECEIVED |
TECHNICAL_JOB_APPLICATION | APPLICATION_REVIEW |
TECHNICAL_JOB_APPLICATION | INVITED_TO_INTERVIEW |
TECHNICAL_JOB_APPLICATION | TEST_APTITUDE |
TECHNICAL_JOB_APPLICATION | INTERVIEW |
TECHNICAL_JOB_APPLICATION | MAKE_OFFER |
TECHNICAL_JOB_APPLICATION | SEEK_REFERENCES |
TECHNICAL_JOB_APPLICATION | APPLICATION_CLOSED |
There’s an important point to make regarding a BOM hierarchy. Just as a physical bill of materials defines assemblies and sub-assemblies down to the smallest components, we have a similar arrangement in our hierarchy. This means that we get to reuse ‘assemblies’ and ‘sub-assemblies’.
By way of example: Both the STANDARD_JOB_APPLICATION
and TECHNICAL_JOB_APPLICATION
processes have the INTERVIEW
state. In turn, the INTERVIEW
state has the PASSED
, FAILED
, CANDIDATE_CANCELLED
, and NO_SHOW
outcomes defined for it.
When you use a state in a process, you automatically get its child outcomes with it because it’s already an assembly. This means that the same outcomes exist for both types of job application at the INTERVIEW
stage. If you want different interview outcomes for different types of job applications, you need to define, say, TECHNICAL_INTERVIEW
and STANDARD_INTERVIEW
states that each have their own specific outcomes.
In this example, the only difference between the two types of job applications is that a technical job application includes an aptitude test.
Before You Go
Part 1 of this two-part article has introduced the workflow database pattern. It has shown how you can incorporate it to manage the lifecycle of any entity in your database.
Part 2 will show you how to define the actual workflow using additional configuration tables. This is where the user will be presented with allowable next steps. We’ll also demonstrate a technique for getting around the strict reuse of ‘assemblies’ and ‘sub-assemblies’ in BOMs.