Search this site:
Enterprise Search Blog
« NIE Newsletter

How do I expand user queries in K2 to always include my custom thesaurus? - Ask Dr. Search

Last Updated Mar 2009

By: Mark Bennett, Volume 2 - Number 4 - January 2005

Using a custom thesaurus in Verity K2 is a powerful way to provide relevant results for your users, especially when you have a specialized vocabulary in your organization that your end users may not know well. The problem is that you either need to count on your users knowing enough to use the <THESAURUS> operator on their queries; or you need to do your own query tweaking to expand your user queries. The former is not likely, and the latter adds to your query processing.

Dr. Search points out that there are other options, including topic sets. But even these need an operator at query-time. But using a little known trick from the old days, Dr. Search will show you how to use a thesaurus in every query, one that will be compatible with term highlighting but that will never make you post-process user queries to work. The trick? A knowledge base can reference a thesaurus,

Create the Thesaurus

Start by creating a thesaurus control as described in Ask Dr. Search in March 2004. However, there is no need to replace the standard vdk30.syd file: in fact, Dr. Search suggests you compile your thesaurus file near your document directory just for easier maintenance.

Let's say that your control file, listed in Figure 1, is in D:\Data\kb:

$control:1
synonyms:
{
list: "founders,abe,phil,john,michael,dave,cliff"
list: "ceo,anthony,philippe,mike"
}

Figure 1: people.ctl

When you compile the file, run it in the same directory:

D:\Data\kb> mksyd -f people.ctl -syd people.syd

You'll find you now have a compiled thesaurus file D:\Data\kb\people.syd. Normally, you would replace the vdk30.syd with this file; but not if you want to use the thesaurus terms in every query!

Create the Knowledge Base File

Now you are ready to create the knowledge base control file one level up in he directory structure as D:\Data\kb1.kbm as shown in Figure 2.

$control:1
kbases:
{
kb: "kb1"
/kb-path = "D:\\Data\\kb\\people.syd"
}

Figure 2: kb1.kbm

It is critical that you use your path separators very carefully. On both Windows and Unix platforms, you can use a single forward slash "/" character. However, if you use the backslash separator "\" standard on Windows, be sure to use two of them as illustrated in Figure 2! Using a single "\" will mean your knowledge base is not used.

Register the KBM with K2

Now we have a compiled thesaurus; and a knowledge base control file. All that we need to do is tell K2 about the setup.

Start the K2 dashboard and sign in as an administrator. If you want to apply the thesaurus to all searches on a given K2 server instance, select that server name at the main dashboard menu. From the Action pull down, select "Expert Settings", and locate the Knowledge Base Path entry. Enter the fully qualified name of the knowledge base control file we created earlier - D:\Data\kb1.kbm. Click on Modify; and return to the server instance and do full restart.

Now you're ready to test! In our example here, any document that has the words Abe, Phil, Michael, Dave or Cliff will return when you use the query "founder" . If you are displaying documents with highlighting, the terms will displayed just as if you entered a search for the synonym; and you didn't need to use the awkward <THESAURUS> operator.

Problems?

If you have problems with the operations described here, verify:

  • The source thesaurus file compiled with no errors
  • The knowledge base control file is of type/extension ".kbm"
  • You have specified the correct directories
    • On Unix, be sure to use forward slash "/" in path names
    • On Windows you can use either "/" or "\"
  • Is the knowledge base control file the same as the "kb:" field in the KBM file?

If you work through this checklist and still cannot find the problem, feel free to contact us and we'll help you work it out.

Ask Dr. Search

Remember, Dr. Search is here to solve your technical problems with your search engine. Don't hesitate to email us any time, or contact us. We're here for you!