Back to articles list
- 9 minutes read

“Is It Secret? Is It Safe?” Handling Sensitive Data in Your Data Modeling

Early in the movie “The Fellowship of the Ring”, the wizard Gandalf asks the hero Frodo this question: “Is it secret? Is it safe?” We may not have a magic ring to protect, but we’re asking the same question. But we’re talking about information.

This is the second in a multi-part series on how to apply information security principles and techniques as part of data modeling. This series uses a simple data model designed to manage non-commercial clubs as an example of security approaches. In later articles, we will address modeling for fine-grained access controls, auditing, authentication, and other key aspects of secure database implementation.

In the first article of this series, we applied some simple access controls to our club-managing database. Obviously, there’s more to it than just providing access controls when adding photos. Let’s take a deep dive into our data model and find what needs to be secured. Along the way, we’ll discover that our data has more information than what we have so far included in our model.

Identifying the Club’s Secure Content

In our first installment, we started with an existing database that provided a bulletin-board service to private clubs. We examined the effect of adding photographs or other images to the data model, and we developed a simple model of access control to give some security for the new data. Now, we’ll look at the other tables in the database and determine the what of information security for this application.

Know Your Data

We have a data model that has nothing to support any kind of information security. We know the database carries data, but we need to examine our understanding of the database and its structures to secure it.

Key Learning No. 1:

Scrutinize your existing database before applying security controls.

I’ve built out person here with the typical information you’d enter on any website or give to a club. You’ll certainly notice that some of the information here could be considered sensitive. In fact, apart from the id surrogate primary key, it’s all sensitive. All of these fields are categorized as personally identifiable information (PII) according to the definitive PII guide from the US National Institute of Standards and Technology (NIST). In fact, the information from this table is nearly sufficient to commit identity theft. Handle this wrong and you can get sued by people, sued by corporations, fined by regulators, or even prosecuted. It’s even worse if children’s data is compromised. And this isn’t just in the USA; the EU and many other countries have strict privacy laws.

Key Learning No. 2:

Always start a security review with the tables describing people.

Of course, there is more data in this schema than just person. Let’s consider these table by table.

  • graphic_format – This table has little other than a snapshot of commonly-known items like JPEG, BMP, and such. Nothing sensitive here.
  • photo_action – This table is itself very minimal. It only has a handful of rows, each describing the sensitivity, not of the data, but of an action on a photo. This isn’t very interesting on its own.
  • photo – Ah, photo. The saying goes that “a picture is worth a thousand words”. Is that true from the standpoint of security and sensitivity? If you thought person was risky, brace yourself for photo. Take a look at what those “thousand words” may contain:

    • details and conditions of important public infrastructure, buildings, etc.
    • a copyrighted image
    • metadata indicating the exact time and GPS location of the photo
    • metadata identifying camera model, serial number, and owner
    • incidental information like expensive jewelry construction, art, vehicles, or businesses
    • a record or depiction of actions that are of dubious legal, moral, or ethical status
    • textual messages: banal, provocative, hateful, benign
    • an association with a club
    • an association with the person who uploaded the image
    • pornography
    • non-sexual bodily details, such as facial features, injuries, disabilities, height, weight
    • faces of non-consenting people, included intentionally or incidentally (more PII)
    • the implicit association of the people in the photo with each other and with any of the information previously mentioned. This may suggest employment, military service, cars owned, size or value of houses or real estate…

Key Learning No. 3:

Captures of physical data, as in photos, must be scrutinized for the many sorts of information and relationships they might carry.

  • club – Some clubs’ names and descriptions may convey more information than you’d expect. Did you really want to advertise that your club meets at Martha’s house on Elm Street? Does it indicate political activity that others may target?
  • club_office – Identifies the meaning and privileges of a club leadership position. The use of or description of titles may convey a lot of information about the club. Some of that could be deduced from a club’s public description, others might reveal private aspects of the club’s operation.
  • member – Records a person’s history with a club.
  • officer – Records a member’s leadership history with a club.

Clearly, there are items here that should be protected. But whose responsibility is that?

Who Owns That Data?

You’re storing it, you own it! Right? Wrong. Way, way wrong. Let me illustrate just how wrong with a common example: health care information. Here’s a U.S. scenario – hope it’s not as bad elsewhere! Aldo’s physician Dr. B. found underarm nodules and ordered a blood test. Aldo went to Lab C where Nurse D. drew blood. Results went to endocrinologist Dr. E. via Hospital F, using YOUR system operated by IT contractor YOU. Insurer G got the bills.

So do you own the lab information? In addition to Aldo, his doctors, and his insurer having an interest, you and any one of these people could get sued if they do something that compromises this confidential information. In this way, everyone in the chain is responsible, so everyone “owns” it. (Aren’t you glad I’m using a simple example?)

Key Learning No. 4:

Even simple data may connect to a web of people and organizations you must handle.

Let’s look at our club again. Whew! What do we know about the parties interested in each main data entity?

personthat person
parent or other guardian if anyparents or guardians are responsible for the person if a minor or if incapacitated
Court officersif the Person, under certain legal restrictions, may be subject to scrutiny by an officer or designee of a court
clubthe club itself
officers of the clubofficers are responsible for maintaining the club, its description, and its outward appearances
members of the club
club-officethe club itselfoffices and titles form part of the internal structure of the club
club officersdepending on the type of powers and responsibilities associated with an office, the officers will be affected in what they do and how they do it
club membersmembers may want to seek a club office or understand it, sometimes to hold an officer accountable
photouploading person
owner (copyright holder) of the photothe photo may not be owned by the person who uploaded it!
licensees of the photothe photo may be included under a licensing agreement
people in the photoif your image is included in published material, it could affect your interests in some way – maybe in lots of different ways
owners of land or other objects in photosuch people may have their interests affected by the depictions in a photo
owners of textual messages in the photomessages and symbols may be subject to intellectual property restrictions

Know the Relationships Among Your Data

No, we’re not done yet. Take a look at the data model. We have not examined member or officer. Note that member and officer don’t have a single field that is real data. Everything is a foreign or surrogate key, except the dates which only time-box each record. These are purely relationship tables. What can you derive from this?

  • member will suggest a person’s interests because of the club.
  • member will suggest what person knows other persons.
  • member says how large the club is.
  • member will suggest similar or related clubs when a person has multiple memberships.
  • officer will strongly tie a person to the interests of the club.
  • officer may suggest access to club money, facilities, or equipment by a person.
  • officer will indicate the abilities of a person (leadership in particular) when the office definition suggests other skills. Treasurer would suggest accounting and budgeting skills, for example.
  • officer may indicate relatively tight control of a club by a small group when durations are long or when the number of distinct members is small.

Key Learning No. 5:

Data relationships may leak a lot of information about primary data entities.

But let’s not forget our old favorite, photo...

  • A person in a photo other than the uploader may suggest a club association akin to member.
  • Multiple persons in a photo suggest relationships among them.
  • Activity depicted in a photo may suggest club activities or the interests or abilities of persons in the photo.
  • photo GPS information will document the presence of depicted persons in a particular location, as will the background of the photo.
  • A photo will typically participate in zero or more photo albums for presentations, etc.

Getting the Full View

With this analysis of the data, we start to see where we have to focus our efforts. We can view the model with some visual assistance:

In other words … nearly our whole data model has some security content.

Key Learning No. 6:

Expect the majority of your schema to have security content.

That’s right. Practically the whole thing. This will happen to you all the time. Any table more trivial than a simple look-up may be involved in your overall database security approach. This makes it important for you to practice economy and care in modeling to minimize the number of tables you’re wrestling.

In Conclusion: Know Your Data

Knowing your data is essential to securing it. Knowing the value of your data and its sensitivity will give you crucial guidance in how to implement a comprehensive security architecture within your database.

Information security is an extensive task, and in this series I am bringing issues and techniques for you to use incrementally in improving database security. In the next installment, I will show how to use this information in the Club’s database to help you identify the sensitivity and value of your data. As we continue in the series, we will improve the access control approach from the last article with more comprehensive and flexible controls. We’ll also see how data modeling can be used to support authentication and auditing, as well as database multi-tenancy and recovery.

I hope that this article has given you tools and – just as importantly – insights on how to go about this crucial step in database security. I eagerly welcome feedback on this article. Please use the box for any comments or critiques.

go to top