Search this site:
Enterprise Search Blog
« NIE Newsletter

A Review of Taxonomies in Enterprise Search

Last Updated Nov 2010

By: Mark Bennett & Miles Kehoe, New Idea Engineering, Inc. - Volume 3 Number 2 - February - March 2006

When we started Enterprise Search in 2003, we wanted to provide technical content to help the IT staff, content managers, and developers who were facing the task of making enterprise search work. Along the way we have covered a number of subjects in and around finding better results, and perhaps the area where most companies have been investing time and money is in the area of taxonomies. We've covered different aspects of the subject and we thought this seemed like a good time to look back to review where we have been as we start to look to where we think enterprise search is going starting next month.

There are four primary areas where we've seen taxonomies and related technologies play a role:

  • Content based taxonomies
  • Behavior based taxonomies
  • Parametric search
  • Faceted search

We also have a taxonomy white paper that discusses taxonomies in general here.

Content Based Taxonomies

In our first issues , we were fortunate enough to feature articles by John Lehman, one of the original founders of Verity and the person most often credited with the concepts of Topics. John, now founder of HighClassify, continues to be active in creating vertical market taxonomies and the tools to make them work. John contributed three articles in our first few months:

In issue 1, John discussed what a taxonomy is when used with respect to organizing corporate or organizational content, and he provided eight families of taxonomic elements. He also presented the four elements that characterize useful elements of a taxonomy.

In Issue 2 and Issue 3, he went on to provide a framework which will help you go from not knowing what a taxonomy is for in your corporation to being able to locate resources to help you create your companies own taxonomy; and finally how to implement an initial taxonomy for your organization

Behavior Based Taxonomies

At New Idea Engineering we've always believed in the importance of search analytics in understanding your customer and learning what content you need to provide. By the winter of 2004, we had seen many corporations and government organizations working on huge taxonomy projects. People told us We want to start our enterprise search implementation but first we need to create a taxonomy. That was when we realized that enterprise-wide taxonomy projects are nice, and can eventually be helpful, but the taxonomies that corporations created were serious overkill. Worst still, they generally didn't address the most important taxonomy terms to use: the top queries that your users and customers enter in your search engine If you can insure that the top 100 queries you get on your web site return great results, you will have happy users!

We called this list of terms a Behavior Based Taxonomy because it is based not on the universe off all possible terms that might apply to your content and organization: it applies to what people really want to find on your site. This kind of taxonomy also gives you an indication of what primary areas your visitors are interested in. Behavior Based Taxonomies are so critical that we make them part of our Search Best Practices Audit.

Parametric Search

It seems most users don't like to use 'Advanced Search', so to help users understand more about the nature of the content on a given site, vendors began to support what was called 'parametric search'. Essentially, the software made it possible to display which metadata fields had been extracted, and allow users to 'drill down' within those fields.

For example, somebody shopping for cars might have included the term "mileage" in their query; a parametric search engine would return the results, but also show a side bar that offered "mileage" information for SUVs, sedans, light trucks, etc. Next to each of these additional offerings, the result list would display the number of matches within that subcategory contained. A subsequent click on any of those side bar choices would rerun the search, but limiting the scope to that area.

Faceted Search

Faceted search is an extension to parametric search where the additional suggested searches are not limited to just well defined document meta data groups, and instead may be automatically derived using statistical methods. More importantly, faceted search engines do not blindly suggest choices that won't match any documents. Also, faceted search engines are a bit more dynamic in how they break up the range of data in a particular field; for example, if all matches were in the same city, then it would not bother to offer city as a choice. Conversely, if matches were scattered among thousands of cities, the faceted engine might choose to suggest searches by state instead.

Related Articles in Previous Issues of Enterprise Seach

New Idea Engineeting has written about many of these subjeects in previous newsletter articles and white papers. Links to the relevant articles include:


April 2003    Taxonomies for Practical Information Management
June 2003    Building A Taxonomy
July 2003    Implementing The Taxonomy
January 2004    Behavior Based Taxonomies
July 2005    Parametric and Faceted Search
White Paper    An Introduction to Taxonomies and Categorization

Resources