Abstract Models are a technical choice to save effort in managing code and modeling effort. There's an advantage in re-use of elements, but also a disadvantage in the need for more rules, potentially more code, and harder to understand. The latter seems counter intuitive, but let this article explain using a simplified example of names.

abstract seems easy

The simplified examples shows how to build an abstract structure for person names. A person may have  a first name, family name, and aliases. These are stored in a separate table identified by the person id. This allows a single table to provide storage for all current names, and future name types. It is a generic solution, and therefor abstract. Abstract for it does not provide structure for specific use cases.

Person Name Abstract

The facts for the records are verbalized as:

Person 5000 has alias DJ.

Person 5000 has family name Doe.

Person 5000 has first name John.

 When looking at the structure of these facts they are verbalized as:

Person Name Abstract Expression

abstract is less, and less is harder

At first glance this seems like an elegant solution, but when thinking about it, it also creates real dilemmas. We may easily see that  a person can have multiple aliases, and at the same time a person can only have a single first name or family name. The different name types are now also placing restrictions on the possible populations.

So, even though it is generic, it adds complexity in rule management. This is most likely handled in software code and rules. The database has very limited options to create dynamic rules based on the actual values.

Additionally the expression grammar also shows there's very little sementics to explain how the data is used by the business. The only relevant but is "has", which is clearly not enough in any case.

concrete is more, and more is easier

This way, we can see that even the facts themselves contain very little semantics. Let's try a more concrete manner of modeling, and stay away from the abstract as seen above.

In the diagram below, we can find similar verbalization of facts.

Person 5000 is also known by his nickname DJ.

Person 5000 is called by his first name John.

Person 5000 has family name Doe.

These fact expressions have far more diverse semantics. When looking at only the first expression, you can see the type level expression, and in black font the more descriptive semantics:

Person Name Concrete Expression

concrete is easier to verify and maintain

By modeling the non abstract, we can also apply the constraints more easily, per fact type. Using these constraints, CaseTalk generates the appropriate structure. This leads to columns in the Person table where values are singular for a specific person. Additionally, it generates tables and one-to-many relations for others such as Nickname, Title and Middle name.

Person One To Many

As seen in the above diagram, it becomes more intuitive, easier to read the relationships, and understand the constraints.

the single diagram

For completeness, the full information model diagram is depicted below, in which both structures are presented. Left of the Person is the abstract structure, whereas the real non-abstract model is positioned on the right side. Indeed, the real may be more work initially, but from the perspective of meaning and data quality, this makes more sense and becomes more maintainable in the long run. Especially when code (read: application) gets replaced, having constraints and knowledge in structure instead of in code, is better for all stakeholders at any time.

Person Name Full

The single diagram showing only a few elements for the abstract structure, in contrast to the elaborate list of types for different names: nick name, first name, family name, middle name. The branch to the top-right contains all one-to-many relations, whereas the bottom-right contains the singular facts. The single or many is specified by the unicity constraints (the horizontal arrows).

Summary

This small example might not be a real world issue, but it illustrates the difference between the abstract and the simplicity at first glance. This simplicity also exposes the lack of support for business constraints, and the complexity in code which is required to manage the data entry by users.

Concrete verbiage allows for more specific meaning, which makes the model, the information and the data model, easier to understand and verify its correctness. This is more rich in semantics than our very typical and nondescript "has". Always try to be more expressive and specific, for your own sake, and the future of maintaining your requirements.

CaseTalk

We make IT better. Together!