What’s the difference between Taxonomies and Ontologies? - Ask Dr. Search
Last Updated Feb 2018
A Reader Asks:
What’s the difference between Taxonomies and Ontologies? And do I even need to care !?
Editor’s Note: For basic definitions of the terms in this article please see our online glossary of terms.
Dr. Search Responds,
Wow, that’s a great question! I know how *I* would answer it, but believe it or not, these subjects can be very controversial in some academic circles, so I decided to consult with a few other search experts. Like any good doctor I don’t hesitate to consult with others, so I chatted with Avi Rappaport (SearchTools), Walter Underwood (Netflix and Ultraseek fame), Ross Leher at WAND, and Rennie Walker (of Wells Fargo and SageWare). What I present here are the aggregated opinions. I’ll also throw in a few other terms at the end either overlap are otherwise related.
I’d summarize the similarities and differences this way:
- For casual users, these are very similar concepts. Purists have many reasons to treat them as separate disciplines, and we’re not disagreeing with that, but I’m just saying that when a mere mortal is shopping for tools or consultants for either, I’d search for both terms on the web, and treat them initially as synonyms for each other. To be safe, you might want to include the singular and plural form of both; depending on your search engine you might actually do taxonomy OR taxonomies OR ontology OR ontologies – not necessary on more modern search engines.
- The preference for one term over another (ontology vs. taxonomy) seems to also depend on the group that’s discussing it. “Ontologies” seems to be preferred by academics and deep researchers, while “taxonomies” are almost universally used in the commercial space. So as we said above, a thorough search will look for both terms.
On the technical side, ontologies imply a broader scope of information. People often refer to a taxonomy as a “tree”, and extending that analogy I’d say that an Ontology is often more of a “forest”. An ontology might encompass a number of taxonomies, with each taxonomy organizing a subject in a particular way.
Extending Ross’s excellent example, the term “golf” could appear in several taxonomies. It could be under a “Human Activities” tree (Human activities -> leisure activities -> sports -> golf). It could also be under a taxonomy concerned with Apparel (Apparel -> Casual/Active Apparel -> Sporting Apparel -> Golf Clothing and Accessories) And let’s say it also appears under a third Consumer Electronics Taxonomy (Electronics -> Handheld Gadgets -> Outdoor & Navigational -> Gadgets for Golfers) Each would be considered a taxonomy tree, and you could think of the “branches” as touching each other around the “golf” related nodes. Other intersecting trees might include “Travel Destinations”, “Famous People & Celebrities” and “TV Broadcasts” – these could all also interest with the concept of “golf”. Why you’d want to have such a system is very interesting indeed – but well beyond the scope of this particular column.
Another technical difference between taxonomies and ontologies deals with structure and overall level of detail.
Taxonomies tend to be a reasonably easy to understand trees. If you’ve used Yahoo, you’ve seen a taxonomy. As you proceed from the root through the branches, each level of the tree focuses in on a more specific scope. For example a “Places” taxonomy might have “Places -> Milky Way Galaxy -> Solar Systems -> Sol -> Inner Planets -> Earth -> North America -> United States -> California -> Cupertino”. Each level of this taxonomy zooms in on a smaller and smaller physical area.
Back in high school Biology we all learned about the Kingdom – Phylum – Class – Order – Family – Genes – Species taxonomy of life (Linnaean taxonomy, or Linnean (sic)). Vertebrates are a subset of all animals; dogs are a subset of canines, etc. You DO remember your high school biology class don’t you!?
There’s an important point in those two examples. Taxonomies tend to be a bit casual about what relationship exists between parents and children in the tree. You’ll notice in the first example, Places, each level is a special subset of the parent. Whereas with the “Life” taxonomy, dogs are a logical subset of canines, regardless of where they live. . In some cases it’s left to reader to infer the relationship when using each taxonomy, and in fact the relationships could vary slightly even within one taxonomy.
A rigorously trained Ontologist would have none of this. Relationships of nodes left to subjective interpretation of readers? – scandalous! Two related nodes in a tree might have an “is-a” relationship, “California” “is-a” “state”, but which of the 18 flavors of “is-a” is really the best fit?
Both ontologies and taxonomies can track key words, but an ontology is likely to classify these words more carefully, perhaps as parts of speech, which human language, how precisely one word is the exact synonym for another, etc.
- With or Without Keywords
A taxonomy might only represent a logical structure, like all the life on earth. Other taxonomies may include keywords about each subject, terms that could be used to match against documents or queries. Using our life example, an extended version of the taxonomy might include keywords, for example the “canine” topic might include the terms ‘fur’, ‘fangs’, ‘growl’ and ‘tail’. Other mammals might have some of those same keywords assigned as well, so a document could match multiple nodes with different weights. The point is that not all taxonomies include descriptive keywords for each item. However, Ontologies usually do include vocabulary terms, in some form or another.
- Differences in Computer Science
Beyond academic precision, ontologies try to represent knowledge in a form so carefully that even computers can derive meaning by traversing the various relationships. If a computer were actually relying on this data you can understand that the “is-a” relationship in “Obama is-a president” and “my boss is-a huge pain” have slightly different meanings, the former conferring a job function, the latter a behavioral attribute. Unless you are a researcher or vendor of this technology, most people don’t need to worry about this.
Taxonomies can also be read and used in computer software, for example Verity’s Topic Sets were a form of taxonomy, and could be loaded into a profiler to classify incoming documents; many other companies have had this idea as well. But the linkages between parent and child branches were much simpler in nature, and were designed to simply combine fulltext search terms in various ways. There was no hint of “understanding” in the relationship between a parent and child, beyond simple fulltext matching. This was still very advanced for its time (the late 1980s), but it didn’t attempt to encode meaning.
Why this Matters?
And as to your question “… and do I even need to care?” I hate to fall back on this old standard, but “that depends…”
Are you considering Taxonomies for an upcoming project? If not, and it’s just a matter of curiosity, that’s fine, I hope I’ve scratched your itch sufficiently. If not, I’d hit Wikipedia and remember to use both terms. It’s a great question either way.
If you’re thinking about Taxonomies, have you selected them over other technologies such as Faceted Navigation or Clustering? And you know exactly why? My point is that some people really need taxonomies, and that is the correct answer. Other folks consider taxonomies in the vague hope that they will somehow “fix” search, without being exactly sure how or why. And in those cases, they may not even be aware of the other techniques out there. We call this “taxonomies as a symptom” – somebody who is frustrated with search and wants needs to fix it, but isn’t quite sure how, and has heard the term “taxonomy” and thought it sounded promising. There is no shame in this – it may well represent more than half of all the companies looking at taxonomies at any point in time! I repeat – taxonomies ARE a good fit for some projects, but this is on a case by case basis.
So if you’re really shopping for this stuff, then yes, understanding the differences and similarity certainly matters; you need to know what you’re shopping for! If you’re really interested in the subject please see our articles on the three types of taxonomies in search and matching your data to navigators (and Taxonomy Usage Models – when published).
Other terms associated with Taxonomies
“Topics” and “Topic Trees” have been used by at least 2 companies (Verity and Inrad) to refer to taxonomies, and I’m pretty sure there are others. I believe Verity had the trademark on the term but did not enforce it.
A “Knowledge Base” or “KBase” may also refer to a taxonomy or ontology. An implication is that, in addition to the taxonomy, there is also a set of documents that has been matched up, and each document has been assigned to one or more taxonomy nodes. In this usage a “taxonomy” = “just structure”, whereas “knowledge base” = “structure” + “data”. An analogy would be subdirectories on a hard drive; a taxonomy is like having a directory structure with no files, whereas the KBase includes all of the files, properly stored in each directory. Of course “knowledge base” has many other meanings. A similar connotation can be made for a Category Index – “category” referring to the taxonomy, and “index” referring to the matching documents.
Another set of terms that are sometimes associated with taxonomies is Agents, Profiles, Saved Searches and Rule Sets. In this context, and depending on the vendor, these terms often imply some type of “action” associated with the taxonomy. In many systems you can flag a particular node or branch in a taxonomy and tell the software to perform some action whenever a new matching document is found. For example, whenever an intelligence communiqué matches the “terrorism” topic, notify general Frye immediately! In this case we would say “taxonomy” = “just structure”, whereas “profiles” = “structure” + “action”. Saved searches and rule sets can be much simpler than a typical taxonomy. Agents, on the other hand, were a heavily used and hyped software industry and computer science term. Its broader scope included more interesting definitions that extended in autonomous goal-seeking software that could traverse the globe, jumping from computer to computer, in service of its master. But in reality, in the search engine industry at least, “agent” usually just meant a saved search with some action attached, this is usually what customers wanted anyway.
A Folksonomy is a newer type of taxonomy where users tag content. The photo and video tagging on sites like Flickr and YouTube are a type of Folksonomy. In our classification of taxonomies we’d refer to that as a Behavior Based Taxonomy, since it is driven by users’ actions. In addition to being community driven, folksonomies are often not hierarchal – there is typically no tree structure to tags.
Concepts, NLP (Natural Language Processing), Semantic Analysis, LSA (Latent Semantic Analysis) and the Semantic Web
These terms are used by many of the same companies that talk about taxonomies and ontologies. They usually don’t mean the same thing, but they are other types of advanced search engine features.
“Concepts” is an interesting term in particular. For some vendors a “concept” is a node in a traditional taxonomy tree, similar to a Topic. If a document were found to match that taxonomy branch, the vendor would say that the document is related to that concept. But other vendors employ statistical algorithms to cluster similar documents together based on word sets, and they would say that a cluster of such documents represents a concept. In that case there was no predefined structure, so there was no associated taxonomy. I’m not going to take sides here; I’d say both uses are reasonable within the context of a particular vendor’s product line. The only advice to readers is that, a vendor mentions “concepts”, you should make sure you understand how they define it.
Taxonomies and ontologies could be viewed a subset of NLP. Natural Language Processing originally meant machines that would have full comprehension of human speech, and you could cat away with your computer asking questions and getting intelligent answers, AKA “The HAL9000 model”. Although that level of computation is generally not available, much more modest subsets of language based techniques have been brought to market. While not as “exciting”, these contemporary features have the advantage of actually existing, and often a reasonable price and passable accuracy. We consider pedestrian features like automatic language detection, stemming, thesaurus support, entity extraction and phonetic matching all to be subsets of NLP that actually work. Using a keyword laden taxonomy to automatically match new documents to nodes in a taxonomy tree also counts as a form of contemporary NLP. As we mentioned earlier, ontologists are attempting to go much further and approach the originally envisioned chatty computer comprehension level.
The various “semantic” variants have different and sometimes multiple meanings, many of which are related to advanced fulltext search features. Most have to do with tabulating the occurrence of words or sets of words, or parsing the structure of sentences. The Semantic Web is a bit different, and is related to the next section.
The next set of words has to do with the overlap of Ontologies, Meta Data and machine learning. Normally we speak of Meta Data as being simple attributes of a document, such as title, author, creation date, assigned category tags, etc. However, taken to the extreme, Meta Data can also encompass complex attributes and facts about documents or particular subjects, and the formats used to record all these facts. When you start organizing many bits of information for millions of documents into rigorous data structures, with the intent to have machines communicate and make sense of all these facts, you enter the alphabet soup of RDF, OWL, OIL, DAML, SPARQL, the Semantic Web, Dublin Core, and many more. The line between classic Ontology and Meta Data blurs.
The average enterprise search customer doesn’t usually need to worry about these standards. If you see them, they probably imply a bit of expense or complexity. However, if the application being designed really does require lots of advanced features, then vendors and open source tools will likely use some of these standards, and you’ll need to understand them.
Faceted Navigation and Taxonomies:
These are generally not the same thing. Faceted navigation tends to deal more with structured data. However, both facets and taxonomies can be used as part of search results list. They can both give the user clickable links to help drill down into the results set and narrow down their search. And facets can be nested. From a broad definition standpoint, a results list taxonomy could be considered a particular type of faceted navigation. And on the backend, a taxonomy could be used to help populate data that would be used to drive facets, so there can also be implementation overlap.
So while the two terms are commonly used to mean different things, there is certainly overlap.
In Closing:
There’s quite a bit of overlap between Taxonomies and Ontologies, but I’d summarize the differences as:
- Ontologies are often broader in scope.
In some cases a taxonomy could be considered a subset of a larger ontology, a taxonomy “tree” in an ontological “forest”.
- Ontologies tend to hit upon a topic from multiple perspectives; “golf” being viewed as a human activity and also a type of apparel.
- Ontologies often include quite a bit of controlled vocabulary and rigorous definitions of relationships. They may also concern themselves much more with process and methodology, regardless of the digital representation.
- Academics tend to talk about Ontologies, whereas the search industry tends to use Taxonomies. When you’re researching or shopping for technology, I’d suggest using both.
- Taxonomies vary by how much additional information they contain. It can be just a logical grouping of a subject, possibly with or without supporting terms, and whether or not data and actions have been associated with each node.
And these concepts both overlap with a host of other information retrieval, knowledge management and advanced metadata systems.