|
Locator: NIE Home / Publications / Enterprise Search Newsletter / Issue 2 / Article 1
Adjusting Search Engine Relevancy Improving results list relevancy can greatly reduce visitor frustration and improve overall visitor retention. A well-tuned search engine can also reduce calls to customer service and support, as visitors are able to find answers to questions on their own. The average Web search term entered by surfers is one to two words (1.4 words); search engines often have trouble ranking the large amount of matching documents. But many enterprise-class search engines allow site developers to modify the visitor searches before they are handed to the underlying search engine. This important but often-underutilized feature can be used to improve results list relevancy, returning better results to your customers. The basic idea is to augment the usual one or two word visitor searches with additional weighted terms so that important documents are given a higher "score". We will refer to these adjustments as "search tweaking". This is not an exact science. For any given set of search-engine tweaks, a sharp QA person can probably find edge cases that are not helped by your adjustments, where the "best" web page may actually be pushed down in the results list. If you embark on such a project, be sure to set expectations appropriately. Also, you should have a way to easily enable and disable search tweaking, perhaps even making it a check box on your advanced search page. You might also consider bypassing your search tweaking logic if you detect advanced query syntax; if a visitor is using advanced query syntax, presumably they know what they are doing, and modifying their search may only serve to frustrate them. General areas of Search Tweaking:
By default, some search engines already employ some combination of the first three techniques. So before tweaking, you should check your documentation or do some testing. If your engine doesn't implement one of them, it's likely a good place to start! As examples, by default Verity K2 does consider word density in its scoring; Ultraseek favors matches in titles and heading zones; and Google is well known for its popularity-based ranking techniques. Examples: Some examples using Verity's query syntax. These are "sub queries", which would later need to be combined with other sub-queries and the main search terms to form the complete, weighted search. These examples assume a one-word search of "upgrades". Matching the word in the title:
<word>('upgrades')<in>title
Matching the actual word in the URL:
URL<CONTAINS>'upgrades'
Matching recent documents:
date > today-30
When matching "recent" documents, it's important that the dates of your content be accurate. In last month's issue we discussed how many web servers are misconfigured in this regard, so may give false dates; in such cases, the above query would likely match ALL pages on the site, which would not be very helpful. An alternate workaround for matching recent documents: URL<CONTAINS>2003 Clearly the above workaround is not going to work as well as actually fixing the dates on your web site. Many URLs don't even have the year in them. You could even look for the year by itself:
<word>('2003')
Of course the syntax for performing any of these sub-queries will be different for each search engine, so please consult your documentation. Once you have created sub-searches that match on these ancillary qualifiers, it's time to combine them with the main search. Most search engines have syntax for combining sub-searches, and even given each sub-search it's own weight. For example, the open source Lucene search engine provides a way to "boost" certain sub-searches. In this Verity example, we will use Verity's <accrue> operator, which acts like a traditional "OR" operator, while additional weight given when multiple sub-queries match. The Verity syntax for weighting uses square brackets. Please note that subtlety in weighting will often provide the best results. The idea is to slight boost certain documents; if you give the sub-queries too much weight you can actually overwhelm any internal ranking that the engine is trying to provide. Also, remember, the main query will almost always match by itself, so it will already have a rather large score. The idea is to just "add" to that score, not to "replace it". Our advice: start with small weights first! Here is a partially expanded search for "upgrades", showing the Verity syntax for combining the sub searches. It is displayed on multiple lines for readability, but would normally all be on one line. <ACCRUE>(
[0.8]<ACCRUE>(
[0.90](<word>('upgrades')<in>title),
[0.60]'upgrades',
[0.80](URL<CONTAINS>'upgrades')
),
[0.5]<ACCRUE>(
[.10]size<75000,
[.10](date > today-7),
[.05](date > today-30),
[.20](URL<CONTAINS>2003),
[.15](URL<CONTAINS>2002),
[.10](URL<CONTAINS>2001)
)
)
This search is actually nested into two main branches. The upper branch is concerned more with "direct evidence" and is given an overall weight of 80% under the main accrue operator. The secondary branch is more "extra credit" for items not directly related to the search term. Notice that the product of its efforts are only weighted at 50%. Verity offers many other advanced query language features and operators, as do other enterprise-class search engines. Finally, there are several common mistakes sites can make that will actually break or seriously impair the built in search ranking algorithms of many engines. Common items that BREAK search engine relevancy:
Return to the Table of Contents |