Formulating knowledge

Terminology, ontology, semantic modeling

Whatever we do – be it transformation, innovation, automation… –, the starting point is always the same: business knowledge. Not only is knowledge a prerequisite, but it also has to be expressed in such a way that it complies with several criteria. It has to be:

  • complete, so as to avoid any surprises along the way,
  • unambiguous, in order to prevent conflicting interpretations,
  • accurate, reflecting reality, to support a relevant design,
  • lean and efficient, without redundancy (expressing as many things as possible using a minimum number of terms),
  • open and free from the presuppositions that could limit its reach.

This quality of expression is not guaranteed from the outset. It is reached only by constant efforts and endeavors. It is found neither on the business side nor the IT side. Indeed, we are forced to admit that it is not readily available in the enterprise. We only have to bring two business experts together and have them work on some definitions to realize that practices are based largely on what is left unsaid and that expressing ideas in words shatters the apparent consensus.

To gain an appropriate expression of knowledge, several techniques are available to us: terminology, ontology and semantic modeling. Each one has its own benefits and limitations. How can we make use of these techniques? Can they be linked together? What level of requirements should we target following the objective we have set ourselves?

These are some of the questions addressed by the article “Formulating knowledge”.

French version: “Formuler la connaissance”.

A data model is not a semantic model

We change names and titles more easily than practices. Today, it is fashionable to pay particular attention to business objects. It is a good thing… provided that the models made from them that are not just “old-fashioned” data models. Below, I answer a question that I was asked about what using UML changes compared to Merise.

What is it, fundamentally, that makes the Merise entities distinct from what we call business objects? Aside from the notion of inheritance (and therefore of specialization in UML), what is it that makes UML, in the formalism and principles, more appropriate than Merise to represent business objects? Is it, in practice, the notions of aggregation and composition that change everything?

Firstly, Merise – an IT design method – does not teach us to represent business knowledge but rather imposes, from the outset, the separation between data and processing. The conceptual data model (CDM) is a data model: this means that it is not able to represent all the knowledge of the business fundamentals. Practical consequence: some information cannot be included in the model, on the pretext that it is calculated (e.g., the age of the person when we know the date of birth). This is quite justifiable for a data model, but not so for a semantic model. From the point of view of the business actor, these distinctions between raw data and calculated data are obsolete: he/she sees information that is part of the concept.

More important still: the semantics of a concept or a business object do not stop at the informative properties. They also contain active properties (what the object can “do”) and transformative properties (how it behaves). If we want to express the full semantics, we must associate the business object with the operations and the conditions of use, the constraints (business rules), the states (lifecycles)… in short everything that dangerously complicates our information systems for want of an appropriate approach.

If the UML notation is used well, it can tool the semantic modeling. The classes carry three types of properties: informative (as attributes, including calculated attributes and class-scope attributes); active (operations); and transformative (constraints and state machines).

Obviously, UML connotes “software” (and is, moreover, largely underexploited by IT specialists, for whom modeling practices have considerably regressed over the last few decades). We must not forget though that the notions and axioms that make up the object logic come from the philosophy of knowledge (the term “class” comes from here): it is therefore not by chance that this logic and the notations used to express it lend themselves so easily to the representation of knowledge. It should also be noted that the broadening of the modeling possibilities (in a semantic model compared to a conceptual data model) does not result in properties simply being added to the model: it changes the model’s structure. What I mean by that is, that it is not only the visible elements that UML brings, such as inheritance, aggregation… (these elements could, in part, be reproduced in a classic approach). By taking into account all the properties of the business object and by following the requirements for genericity and factorization, the modeler is led to arrange all the properties in a different manner. The consequences of this change in structure are considerable: if we follow them through the whole transformation chain (process design, automation…) the stakes can run into millions of euros. The textbook example is the notion of Person (or client, employee, etc.). By rigorously applying this approach, we can easily halve the number of information systems! At the same time, we can enhance ease of use because we will have built solutions that are closer to the natural representations.

In conclusion, a semantic model is far more than a conceptual data model (it contains the CDM and we can, at any moment, extract the data model from a semantic model. Praxeme provides several derivation channels from the semantic model, including the one that produces the logical data model).

The name “Business Object Model” has been overused to refer to conceptual data models (sometimes, incorrectly: there are many dreadful examples on the market). What we call a BOM (for example that of IBM) or the Information Model from ACORD are mere data models (not even normalized). The consequences are dramatic. But that’s another story…

This commentary does not call the legacy of Merise into question. UML is a notation, not a method. Merise is a method and many of its recommendations remain not only topical today, but also clearly superior to what can be found in the object-oriented methods and current practices. Thus, it is in our best interests to fructify our legacy. Praxeme readily recognizes its filiation with Merise.