How do I change relevance in Ultraseek using predefined values? - Ask Dr. Search
Last Updated Mar 2009
By: Mark Bennett, Volume 2 - Number 6 - June 2005
Ask Dr. Search
This month's question comes from a customer who is using Ultraseek in an enterprise search application.
Question:
Our web content resides within a relational database management system, and we use Ultraseek 5.3 to index the content. Our content creators go to great lengths to assign 'quality' scores to many of the documents to highlight the documents our web marketing staff feels will be more useful to our customers.
Is there a way to use the pre-defined quality value to influence the relevance Ultraseek will assign a document as a result of a query?
Dr. Search answers:
Yes. Using Ultraseek, you can make changes to the patches.py to influence the score Ultraseek will assign to a document. Anytime you are going to modify an Ultraseek system file, be sure to make a back-up copy of the file so you can recover from errors without having to re-install Ultraseek!
For the purposes of this example, let's assume that the database table you want to index is called 'WEB_CONTENT', and that the field in the table that contains the document quality score is called 'DOC_WEIGHT'.
In this case, our DOC_WEIGHT scores range between 0 and 100, whereas Ultraseek supports a quality value between -15 and +16. For simplicity, we'll just map our positive scores into the Ultraseek positive range so we do not need to consider negative quality.
Because we will be using a function to map our weights, we need to include the Python math library near the beginning on patches.py Insert this line after the existing includes:
import math # add for doc_quality in insert doc
Next, locate the portion of patches.py that defines new_insert and replaces
def new_insert(col,statusproc,title,description,publisher,url,size,modtime,
idxtime,flags,nlinks,extra,doctype,checksum,fields,terms,
quality,stemlang,filtlang,dict):
indexer.insert = new_insert
Our new code is going to right between these two lines.
The New Code
The new code we are going to add will perform the following steps:
- Save the variable 'quality' which Ultraseek passes is
- Log our processing
- Clean up Ultraseek field names that may contain a '.'
- Log that we have a field called DOC_WEIGHT and convert it to the integer data type
- If the rawWeight is negative, set it to zero
- Multiply the rawWeight from the database by 0.16 to map into Ultraseek 0-16 range
- If the resulting mapped weight is non-zero, update the goodquality variable
- Log our posting activity
- Post the updated values to Ultraseek
# lines added to file for doc_weight mapping
#
# first, save the current value of the Verity 'quality' field
# in case we need to use it after we attempt to re-weight
#
goodquality = quality
# log our processing
log.log(log.info,"Processing record %s" % (url,) )
for field in fields:
fieldvalue = field[0]
fieldname = field[1]
#
# trim the ':' in some Ultraseek fields
#
if len(fieldname)>0:
fieldname = fieldname[:-1]
#
# if the source is from WEB_CONTENT table, the column DOC_WEIGHT
# will be present; if so, process it as long as it is numeric;
# otherwise set it to zero
#
if fieldname == "DOC_WEIGHT":
log.log(log.info,"Input DOC_WEIGHT is %s" % (fieldvalue,) )
try:
rawWeight = string.atoi(fieldvalue)
except:
rawWeight = 0
#
# all DOC_WEIGHTs are positive but Ultraseek
# supports weights from -15 to +16. For simplicity, we'll
# just map our positive scores from 0 to 100 into positive
# Ultraseek scores from 0 to 16.
#
# verity that the DOC_WEIGHT now in rawWeight is in the range 0 - 100
if rawWeight < 0.0:
rawWeight = 0.0
if rawWeight > 100.00
rawWeight = 100.00
# now map the rawWeight in the 0 - 100 range into the Ultraseek range 0 - 16
if rawWeight <= 1.0:
outWeight = 0
else:
outWeight = rawWeight
outWeight *= .16
outWeight = round( outWeight )
if outWeight > 15.0:
outWeight = 15
log.log(log.info,"outWeight is %d" % (outWeight,) )
#
# finally test if some unexpected error has caused a
# problem and outWeight is zero, set the output
# quality to the original (saved) in out_ weight
#
weight = outWeight
if weight != 0:
goodquality = weight
log.log(log.info,"posting goodquality = %d " % (weight,) )
#
# now let ultraseek post the updated values: note that standard
# patches.py would post the variable 'quality' in this call rather than the
# variable 'goodquality' we have updated in this code
#
old_insert(
col,statusproc,title,description,publisher,url,size,modtime,idxtime,
flags,nlinks,extra,doctype,checksum,fields,terms,goodquality,stemlang,
filtlang,dict)
# end of quality/weigh changes
The should just about do it. Be sure to check the log file entries to make sure that everything is working properly, then see how your results change. To be safe, be sure to test with some values of DOC_WEIGHT of zero, some of 100; and to really push the code, try unexpected values (-20, 5400).
Dr. Search will be back next month. Let him answer your questions for you.