A data model is not a semantic model

We change names and titles more easily than practices. Today, it is fashionable to pay particular attention to business objects. It is a good thing… provided that the models made from them that are not just “old-fashioned” data models. Below, I answer a question that I was asked about what using UML changes compared to Merise.

What is it, fundamentally, that makes the Merise entities distinct from what we call business objects? Aside from the notion of inheritance (and therefore of specialization in UML), what is it that makes UML, in the formalism and principles, more appropriate than Merise to represent business objects? Is it, in practice, the notions of aggregation and composition that change everything?

Firstly, Merise – an IT design method – does not teach us to represent business knowledge but rather imposes, from the outset, the separation between data and processing. The conceptual data model (CDM) is a data model: this means that it is not able to represent all the knowledge of the business fundamentals. Practical consequence: some information cannot be included in the model, on the pretext that it is calculated (e.g., the age of the person when we know the date of birth). This is quite justifiable for a data model, but not so for a semantic model. From the point of view of the business actor, these distinctions between raw data and calculated data are obsolete: he/she sees information that is part of the concept.

More important still: the semantics of a concept or a business object do not stop at the informative properties. They also contain active properties (what the object can “do”) and transformative properties (how it behaves). If we want to express the full semantics, we must associate the business object with the operations and the conditions of use, the constraints (business rules), the states (lifecycles)… in short everything that dangerously complicates our information systems for want of an appropriate approach.

If the UML notation is used well, it can tool the semantic modeling. The classes carry three types of properties: informative (as attributes, including calculated attributes and class-scope attributes); active (operations); and transformative (constraints and state machines).

Obviously, UML connotes “software” (and is, moreover, largely underexploited by IT specialists, for whom modeling practices have considerably regressed over the last few decades). We must not forget though that the notions and axioms that make up the object logic come from the philosophy of knowledge (the term “class” comes from here): it is therefore not by chance that this logic and the notations used to express it lend themselves so easily to the representation of knowledge. It should also be noted that the broadening of the modeling possibilities (in a semantic model compared to a conceptual data model) does not result in properties simply being added to the model: it changes the model’s structure. What I mean by that is, that it is not only the visible elements that UML brings, such as inheritance, aggregation… (these elements could, in part, be reproduced in a classic approach). By taking into account all the properties of the business object and by following the requirements for genericity and factorization, the modeler is led to arrange all the properties in a different manner. The consequences of this change in structure are considerable: if we follow them through the whole transformation chain (process design, automation…) the stakes can run into millions of euros. The textbook example is the notion of Person (or client, employee, etc.). By rigorously applying this approach, we can easily halve the number of information systems! At the same time, we can enhance ease of use because we will have built solutions that are closer to the natural representations.

In conclusion, a semantic model is far more than a conceptual data model (it contains the CDM and we can, at any moment, extract the data model from a semantic model. Praxeme provides several derivation channels from the semantic model, including the one that produces the logical data model).

The name “Business Object Model” has been overused to refer to conceptual data models (sometimes, incorrectly: there are many dreadful examples on the market). What we call a BOM (for example that of IBM) or the Information Model from ACORD are mere data models (not even normalized). The consequences are dramatic. But that’s another story…

This commentary does not call the legacy of Merise into question. UML is a notation, not a method. Merise is a method and many of its recommendations remain not only topical today, but also clearly superior to what can be found in the object-oriented methods and current practices. Thus, it is in our best interests to fructify our legacy. Praxeme readily recognizes its filiation with Merise.

Formalizing knowledge

In response to a customer request, Dominique Vauquier clarifies the different ways of expressing knowledge. It is the opportunity to formulate some of the thinking that has been taking place within the Praxeme Institute, especially since the workshops held by professors Loïc Depecker on terminology and Christophe Roche on ontologies.

The article is based on the idea that the techniques we have at our disposal for expressing knowledge are spread out according to the level of formalization. The greater the effort made, the greater the possibilities of automation.

In practice, enterprises preoccupied by the topic will be able to decide on a gradual approach, taking advantage of different techniques: classifying documents using taxonomies, drawing up the enterprise terminology, developing ontologies, modeling – in particular semantic modeling.

Download the article (in French)

Big data: the necessary data semantization

In exploiting new data sources, the question as to the importance of this data will arise at one moment or another. Equally, in order to derive maximum benefit from the acquired knowledge, the data item must be linked to a concept, that is to say the object (physical or abstract) that carries it. It must be articulated with the other facets of this object.

The role of semantic modeling is to extract the meaning from the data items, to formalize them as properties of the concept and to place them in a manipulable structure.

Thus, as a tool, semantic modeling cannot be overlooked in order to be able to profit from X-data techniques. In return, X-data influences semantic modeling – if not its procedures, at least its content. Notably:

  • assimilating new properties in the semantic model (completing the semantic classes, already identified, with “details” coming from big data), particularly class-level properties (aggregate value, indicators coming from the open data),
  • extending the model to new notions (for example, better description of the objects belonging to the enterprise environment, the people, their relationships, their behavior, external events…),
  • evolving toward a “style” of model that makes room for behaviors, correlations and anticipations (categories of properties that we can consider to be new or, in any event, rarely used in classical modeling), with considerable enrichment of the state machines and state change propagation.

Praxeme urges the modeler not to reduce the business knowledge to that of data. The choice of the term “semantic” instead of “conceptual” results from this position, the latter strongly evoking the conceptual data model whereas the semantic model takes care of the concept as a whole, in its three facets reconciled in the unity of the class: information, action and transformation.