Search this site:
Enterprise Search Blog
« NIE Newsletter

A New Kind of Taxonomy

Last Updated Jan 2009

By: Mark Bennett & Miles Kehoe - New Idea Engineering - Issue 6 - January / February 2004

Behavior Based Taxonomies: Far Better

Which is more useful to you? A categorized list of all subject matter in your industry or trade, or a list of the subjects your users are asking about? Consider a medical site: is it really necessary to know six different ways of saying "heart attack", or would you rather know the one or two ways your visitors actually say it?

If you're like most people, you'd prefer knowing what your users are asking about. After all, your site is unique, and while having a general taxonomy might be helpful to create a Yahoo-style portal, knowing what your actual users want to find is your key to connecting with your customers.

So why do so many companies spend huge amounts of money to create general purpose taxonomies, when they are not even looking at the data they really want?

Usually it is because companies want to improve how their site users can find information. For any of a number of reasons, their site search technology is not working as well as they would like. Search relevance is generally statistical, and content is often unstructured, so it is difficult to really find the most relevant documents for a particular single term query. Taxonomies let you tag data to add structure, in hopes that having better structure will help the search engine rank documents better.

Nonetheless, a general taxonomy and brilliantly tagged content can't make search perfect, so companies continue to struggle to improve search.

We think the best approach to improving your search results starts with a different kind of taxonomy: one that is based on the questions your users are actually asking for on your site. We call this Behavior Based Taxonomy.

What is a Behavior Based Taxonomy?

Your visitors are telling you every day what terms are the most important to them. You may not know it, and worse yet, you may not be capturing the data, but every time someone does a search on your site, they are telling you what is important to them. They tell you what answers they need so they don't call your hotline; they are telling you what products they want to buy; they are telling you what they want from you. Are you listening?

We believe that knowing what your visitors are searching for is the first step in improving your search results. This set of the top queries from your search engine is, in fact, the most important taxonomy you can address, because when your search engine produces high quality results for your top queries, you have happy visitors and your web site is successfully enabling self-service. We call this set of top queries from your search engine a "behavior based taxonomy".

Why is it better than conventional taxonomies?

Taxonomies in general are typically large sets of industry-specific categories and vocabularies. In a sense, a taxonomy is a map of related terms that you can use to categorize content. And by categorizing content, you can create browsable category trees like Yahoo, and you can expand your search technology to find category-related content.

When you implement a industry taxonomy in conjunction with your enterprise search technology, you're hoping to get pretty good answers to almost all of the queries your visitor might ever use.

Behavior based taxonomies, on the other hand, are not industry-wide: they are specific to the visitors to your site. And because a behavior based taxonomy is generally made up of a relatively small number of terms, you can focus your time, effort, and budget on getting the right answers for the queries that people are actually using.

Creating your own BBT

There are a number of ways to create your own behavior based taxonomy. An increasing number of search engines provide some level of search activity logging. Or, depending on how you process your search, you may be able to extract the data from your web server logs.

Capture the Data

The first step to creating a behavior based taxonomy is to capture the actual search terms your visitors are using. There are a number of different ways to accomplish this.

Search engine Logging

Search engines, like Ultraseek and newer versions of K2, provide the ability to log all search activity. Unfortunately, some of these can slow overall search performance, so you should evaluate how the performance impacts your visitors. Be sure to capture as much data as your engine can provide, not just the search term. Log the time, number of documents returned, IP address of the searcher, and any other relevant data.

Click-Through Logging

If your search engine does not provide any means of logging search activity, or if it does not capture all of the information you want, you can use a light-weight search proxy to capture and log search activity between the query form and the search engine. Make sure your proxy handles performance issues first - replying to user input - and does the data capture after the fact.

Search log analysis

When you process your search request via a web server, you have the choice of using a POST or a GET. If you use a POST action, the search term is logged by the web server, and you can get analyze what searches your visitors are looking for. Unfortunately, all the web server captures is the query and IP address: you don't have information on which terms returned too many documents - or which returns no documents. You'll have to do some manual work performing queries to get results counts.

Process the list of top queries

Once you have the list of top queries, you have your site behavior based taxonomy. Just knowing the raw count can be helpful not only in tuning your search engine, but also in providing better direct links. Search analytics doesn't just help improve your search: it helps you improve your site over time.

Determine best results for each term

Once you have your behavior based taxonomy, you need to work with your content people to identify the best document for each query. Then you need to find a way to 'promote' that document in your search engine.

If your search engine supports "best bets" - the ability to easily promote documents, that is, to go back and promote the newly identified documents for the top queries.

If your search engine does not support easily promoting documents outside of their relevance methods, you can use a product like New Idea Engineering's SearchTrack to recommend the right document for each term in your behavior based taxonomy.

Otherwise, you may need to change your content to trick the search engine into producing the desired results. (Be sure to re-index your data!)

In any case, once you have identified the top terms - the Behavior Based Taxonomy - and found a way to tune your search results so that the best document comes shows up at the top of the result list, you have started on the road to happier site visitors, lower costs, and a better bottom line!

Refine, Refine, Refine

Like conventional taxonomies, a Behavior Based Taxonomy needs ongoing care and maintenance. New terms show up on your web site all the time as the market changes and as products and people come and go. Check your search activity logs often - weekly, if not daily. Make sure the right documents are making it to the top of the results lists. Every few months, review the top terms and consider updating your web content.

Summary

To minimize your site costs, maximize visitor satisfaction, and make your web site truly self-service, you need to:

  • Capture search terms
  • Identify the best documents or landing pages for the top terms
  • Alter your search result list behavior to recommend the best documents
  • Review your terms frequently to understand trends on your site

Every time you can answer a user question or make a sale on your web site, you're preventing a hotline or telesales call and reducing your costs. And saving money is very much in vogue.