May 12, 2015 - 6 minutes read

A Database Model for an Online Survey. Part 4

Over the past 25 years, Jeffrey has worked in various roles in financial services software, with a focus on transaction banking and automation with artificial intelligence methods. He has extensive knowledge and expertise in financial payments, combating financial crime, and virtual account management. His experience spans various disciplines ranging from architecture and design to product management and sales.

Tags:

In this final article in a four-part series, I complete the design for an online survey database to provide flexibility for multiple surveys, question re-use, multiple choice answers, ordering of questions, conditional jumps in the survey based on responses, and control over the users’ access to surveys via groups of survey owners.

Introduction

In the conclusion to Part 3 of this series of articles, I mentioned that I would be adding more advanced features in this article. Those advanced features are:

administration of the surveys
reports and analytics

As a reminder, here was the model after Part 3:

Administration

My goal in the administration of surveys is to allow a survey and its corresponding information to be managed by a group. So I will enable an administrative user to define groups of users that can jointly maintain an online survey and its questions. The owner of the group can define what functions the other users of the group can perform; for example, Jeff can change and delete surveys and questions, but Joe can only view surveys and questions, but not change or delete them.

One thing that you might notice is that users are separate from respondents to the survey. Of course, a user might also respond to a survey, but I would like to keep them separate so that I can require less information from a respondent than from a user (for example, I have removed the password field from a respondent so that it is easier for people to respond to the survey without creating a login/account).

Basically, for this administration, I will create tables for groups and users, and the roles and corresponding permissions or actions that are allowed. This allows for flexibility rather than a hard-coded link between the roles and the actions that are allowed by each role. Of course, the corresponding application must be built to understand what functionality is permitted by each permission and must be adapted when new functionality is added, but the database design will not need to be changed when functionality is added – new rows will be added to the table linking roles to permissions.

You might also notice that I have used an odd length for the email column on the user and respondent tables and an odd value for the the ip_address column for the respondent; 254 is the maximum length that an email address can be according to the RFC definitions, while 45 is the maximum length that an IPv6 address (with IPv4 tunneling) can be.

In addition, I will add a link from the group table to the survey table from which links go to all associated tables (question_order, survey_response, conditional_order, question_type, response_choice). In that way, when the group is being deleted, I can warn the owner of the group that all corresponding information will be deleted.

I prefer this approach of linking the table data to something other than the specific user versus not linking the data to anything. If we did not link the data to anything (neither group nor user) as I seemed to do in previous parts of this series of articles, then we will have a challenge to “cleanup” stale data when a user is deleted from the online survey application. By linking it to the more abstract concept “group,” then it becomes possible for the owner to re-assign ownership of the group and all corresponding data (surveys, questions, responses, etc.) to another member of the group if necessary.

Formal Design

Then we extend the ERD that was created in the other parts of this series of articles.

I have colored the tables which were created in the Part 1 article in yellow, colored the tables added in Part 2 in orange, colored the tables added in Part 3 in green, and the newly added tables in light blue so that it is easier to see the additions. Color was not added to the columns and foreign keys which were added in this final article, so you would have to compare the current model to the previous one from Part 3 to see the differences.

Reports and Analytics

We have enough information that can be extracted from the tables to produce several reports.

For example, which questions were answered in a particular way (“on survey 7, how many times did respondents answer ‘Yes’ to question 10?”). This level of information is probably fine for basic reports about survey responses.

We can also extract how long it took respondents to respond to a particular survey (“on survey 5, the average time spent on the survey was 13 minutes”); again, this might be useful information so that owners of a survey can adjust survey questions so that they do not require more time than what a typical respondent is willing to spend or what the surveyor has “promised” to respondents (e.g., “this survey should take between 5 and 10 minutes”). I know that when someone tells me I should be done in less than 10 minutes and I am still plowing through questions 15 minutes later, then I get angry and I am generally unwilling to respond to another survey from them.

Based on the IP addresses of respondents, we could perform an IP location lookup to get an approximate idea of where respondents are from or at least where their IP address appears to be from when they responded. Be aware that this information is not entirely reliable as people may connect in via VPNs or other mechanisms which dissociate their IP address from their physical location.

We can even extract how questions are answered by the first respondents versus how it was answered by the latter respondents. This could present and interesting angle on your survey – for example, did the eager people who responded to the survey first respond differently than people who were not as eager and responded to the survey later?

At this stage, I think that these reports will be enough and that more advanced analytics are not necessary, as the most important information is obviously the basic report of what answers were given to each question in a survey. If you require more advanced analytics, then consider what your requirements are and how existing data or new structures can support those analytics.

Conclusion

And there you have it. I will not claim that this is the design for the ideal online survey database, but this will meet my needs in terms of flexibility: multiple surveys, question re-use, multiple choice answers, ordering of questions, conditional jumps in the survey based on responses, and control over the users’ access to surveys via groups of survey owners.

As I have done in each previous part of this series of articles, I will point out that you might have other requirements. Identify your requirements and implement or adapt what you need. I believe strongly in re-use and not re-inventing the wheel.

If you would like us to redesign or expand this model according to your application’s needs, let us know. We can help you.

A database model for an online survey – the whole series