How do I rebuild a collection using the K2 spider? - Ask Dr. Search
Last Updated Mar 2009
By: Mark Bennett, Volume 2 - Number 5 - April 2005
This month's question comes from a customer who is using K2 in an enterprise search application.
Question:
In previous versions of K2, we could rebuild a collection from scratch simply by taking the collection off-line, running mkvdk with the "-purge" option, and bringing the collection back online again.
We're now using the K2 Spider; and when we use our old scripts, the collection does come up empty; but the K2 spider seems to have its own database of indexed documents. When we restart the spidering job, no new documents are indexed.
How can we reset the spider to tell it to spider everything again?
Dr. Search answers:
Your assumption that the K2 spider maintains an independent list of indexed documents is correct. VSpider does a similar thing, and you need to use vsdb to maintain the vspider database. But it's easier to manage the K2 spider database using the rcadmin as long as you are using K2 Spider jobs to manage your collection.
As you see in the script shown in Figure 1, all you need to do is use indexstateset and jobpurge. In the example, the server hosting the collection is named bean_server; and the collection is test_coll. The job that builds the collection is build_test_coll_job.
# rcadmin purge script created $fileDate
# login
# username
# password
#
indexstateset
bean_server
c
test_coll
0
y
jobpurge
build_test_coll_job
bean_server
y
indexstateset
bean_server
c
test_coll
2
y
quit
Figure 1: rcadmin script to clear a collection and theK2 Spider cache
Note that the login is commented out in this script because we use automated login on our servers. If you do not, you will need to either set up automatic login, or include the login, user name, and domain in the script.
You can start this script in a K2 job; and even chain it to the indexing job to insure that your collection is clean before you begin indexing new documents.
Ask Dr. Search
Remember, Dr. Search is here to solve your technical problems with your search engine. Don't hesitate to email us any time, or contact us. We're here for you!