Recently a governmental body in the Netherlands has leaked personal data through an application which was able to export data by design. This design allowed records to be both printed or exported to file for future system integrations.
Nine months after rolling out the application, this leaky feature came to light, for personal data was being offered for sale by dubious employees. I was just wondering how many mistakes were made during the hasty development of this application.
The potential data leak was obviously build by design. The real problem seems to be that abusive use of this feature went undetected when it was actively used by a handful of corrupt employees.
Hasty Design
The designed feature seems to fulfill the need of exporting data using the software. Traditionally data export and migration is done on databases directly by specific tools. These are typically called data migration efforts. Never have I seen an application providing these features themselves, especially not enabled by regular users, employees in this case.
It made me think how this feature nevertheless got to be implemented. Typically these features are requested by end users who want flexibility and do not have the expertise to realize what those features pose as a risk by abusive users. I've seen it happen again and again.
There are probably many more human factors in play. I will not go through the hypothetical issues at hand. My point of this short article is to provide you readers, with a very simplified design which takes into account the potential abusive use of data.
Usage Design
This design is not a software design. The proposed design is a simple Fact Based Model to illustrate how simple security by design can be. This information model is not focusing on the data for it is trimmed down to a simple "Record". It is not build for extensive user account. This model is to illustrate that communicating data use, may help you build a robust software application which also monitors against abusive employees.
Stating Facts
As in any fact based information model, we state the facts in natural language, using concrete examples. In this case we do not know the actual domain, so we make up the examples to be able to communicate.
Permission
"Jack Smith can open records for exporting."
"Jack Smith can open records for printing."
"John Doe can open records for viewing."
"Mary Blake can open records for editing."
Permission Maximum
There may be multiple maximums, per hour, day or week. This would create the option for having a higher maximum per day to allow busy times and still prevent abusive use of data per week.
"Usage for Jack Smith for exporting is set to a maximum of 100 per day."
Record Use
Define which records are actually used by which user for which purpose.
"Record 958576 is opened by Jack Smith for exporting."
"Record 958576 is opened by Jack Smith for printing."
"Record 958576 is opened by John Doe for viewing."
"Record 958576 is opened by Mary Blake for editing."
....
To illustrate the level of detailed knowledge in apparently simple facts we show a single parsed fact expression with this information grammar:
Usage Count
The amount of records opened by a user for a specific use. This is calculated from the Record Use.
"On Day 1 there are 4 records opened by John Doe for viewing."
Usage Alert
Perhaps the most important fact is to make sure we are able to alert a manager overlooking the potential abuse. Otherwise we'd be limited to audits or random checks. Actively modeling the alert process will allow pro-active features when abuse is detected.
"On Day 1 there are 999 records opened by Jack Smith for exporting which alerts the manager May Black."
Fact Based Diagram
Once the facts are entered in a Fact Based Modeling tool like CaseTalk, the modeled facts and its information can be presented in a diagram.
The logging of record use, the counters of that use, and the threshold alert is shown inside the red dashed line. Using CaseTalk to perform a model to model transformation, the Data Model can be generated with ease. Using the CaseTalk export to generate a relational model in ER/Studio leads to the following generated diagram:
Conclusion
Design nowadays can no longer be covered by optimistic software development. Data, especially privacy sensitive data, must be designed fully. Fully designing includes all required security aspects, which is more than just performing an occasional audit.
This article shows that if one dares to communicate facts, the design will be incredibly solid, and software can leverage that fully. Fact Based Modeling can be an excellent method to start any (software) project. For both data that already exists, or the data that will be in some future time.
Building applications which leak by design is no longer an option.