Ontology vs Taxonomy vs Data Model
Dan Selman unpacks the distinctions among these three concepts critical to meta modeling and offers advice on selecting which to use as a paradigm for your project.
Table of contents
Confused by people throwing words like ontology, taxonomy and data model around? In this post I explain the differences and give some practical advice for which to adopt.
Let’s get the definitions out of the way:
Taxonomy: conceptually the simplest, this is a basic classification tree, used to organise concepts into hierarchical groups. The best known taxonomy is biologists’ attempt to classify the organisms of the world into neat categories. Taxonomies are used to answer categorisation questions, such as “Is a Fungus a Plant?” See the history on JSTOR!🙂
Ontology: much richer than a simple taxonomy, an ontology allows for rich semantic relationships between concepts. For example, an ontology might allows folks to say “a vegetarian is a person who does not eat meat” or “a divorcee is a person who has been married and whose marriage has been legally terminated” or “a widow is a woman who has been married but whose spouse is dead”, with the required definitions of “person” / “meat” / “marriage” / “death” / “spouse” / “woman” / “legally terminated” / “has been” etc. Incredibly powerful and expressive!
Ontologies strive to not only classify the concepts in a domain, but also to show how they interrelate, and to express logical relationships between concepts such that new knowledge can be inferred or deduced from existing knowledge and facts. OWL and its family of standards (a taxonomy of ontologies!) is the gorilla in this space.
Data model: a data model is first-and-foremost an attempt to structure the concepts in a domain, their properties, and their basic relationships. I say “basic relationships” because the semantics of these relationships is simple, for example, “person is a concept, person has a given name property of type string, a person is related to a set of people called friends“. Data models are used to impose order and structure on data, ensuring that instances of modelled concepts adhere to the constraints specified in the data model. Sometimes data models are called data or document schemas, and they allow basic questions about a collection of instances to be answered, such as “find all persons with a given name property equal to Dan”. They are not typically capable of answering complex logical questions, such as “find all widows with a spouse that was divorced twice within the last two years”, unless the structure of all such questions is known a-priori, and is baked into the data model.
Which should you use?
In terms of expressiveness, there is a ladder: Ontology > Data Model > Taxonomy. I.e. an ontology can be used to describe a data model, and a data model can be used to describe a taxonomy.
In my experience taxonomies are rarely expressive enough to model complete business domains, though they are useful for simple classification problems within a domain.
I’m a big fan of ontologies, but there are downsides to adding more semantic richness to your models of the world. You should walk into this with your eyes open. More is not always better.
Data models fall between taxonomies and ontologies and often inhabit the “sweet spot” between expressiveness and ability to usefully describe a business domain, and perform efficient questions answering and querying about the domain, without requiring the logical rigour of developing a full ontology.
The trick is to find a meta-model (taxonomy, ontology or data modelling language) that allows you to express what you need about the world, and in a form that your users (subject matter experts) can understand and take ownership of, and that allows you to answer the questions about the world that you consider important.
You should also consider operational concerns like:
Query efficiency. Generally, the more expressive your meta model the slower your queries become. In some cases queries can become undecidably slow…
Binding to user interfaces, chat, databases and computation. You will need to interact with your model of the world using a variety of tools. Most of these tools are either functional, relational or object-oriented. This will create an impedance mismatch that can become troublesome, based on the skills of your technical team. Perhaps chat interfaces are a way to overcome this?
The skills of your subject matter experts, particularly whether they have a background in semantics, logic, rules and modelling.
The sophistication of your end users to pose the right questions and to interpret the answers.
Choosing a meta model involves tradeoffs: technical, operational as well as trying to predict the types of questions your meta model should help you answer today, and into the future.
As ever, one size does not fit all!
Dan Selman has over 25 years experience in IT, creating software products for BEA Systems, ILOG, IBM, Docusign and more. He was a Co-founder and CTO of Clause, Inc., acquired by Docusign in June 2021.