Any published benchmarks between Google, FAST, Verity, Autonomy, or other enterprise search engines? - Ask Dr. Search
Last Updated Apr 2009
By: Mark Bennett, Volume 3 Number 2 - February - March 2006
This month's question comes to Dr. Search from a subscriber who asks via email "Is there an article which benchmarks Google against Verity, FAST, and other enterprise search applications?"
Dr. Search answers:
We've seen several companies where a vice-president reviewing an enterprise search project plan has asked "Why don't we just use Google?".
It's a great question, and certainly one worth asking if it makes sense for your corporate environment. With respect to Google versus the old guard like FAST, Autonomy IDOL, Autonomy/Verity K2 and OmniFind, it's difficult to provide a single, simple answer. We have clients who have switched from Verity to Google and who are very happy; and some who were using Verity K2, evaluated Google, and decided to stay with Verity. What we can do is give you an idea of some of the things that seem to differentiate the two groups.
- Turnkey system versus toolkit
- Inter-linked content versus stand-alone data
- Index and Search Performance
The Google Search Appliance (GSA) is a black-box system in that you install it, set up your options, and it runs. It is certainly standards-based: it indexes HTML and other popular formats; and the results are typically defined using XML and style sheets. But the options you can customize with regard to data sources, relevance ranking, and extended search (thesaurus, taxonomies, and parametric or faceted search) are somewhat limited.
FAST, Autonomy/Verity K2, OmniFind and other traditional enterprise search engines have always been toolkits. You install the software and begin the process of customizing it for your environment. Data in databases or content repositories? No problem. Custom security implementation? Modify the indexing and search methods. Have custom thesauri or existing taxonomies? Plug them in. Need parametric or faceted search results? Small matter of programming - although not much. Want to change the way results are ranked or sorted? Use the native query syntax - for example, FAST Query Language (FQL) or the Verity Query Language (VQL).
Ultraseek, which Autonomy recently acquired with the rest of the Verity, is the exception to the toolkit approach. Like the GSA, you install it and it pretty much is ready to run. They've done an excellent job of making it easy to customize without a great deal of effort. Unfortunately, its capacity limitations mean that for really large enterprise applications, it can't compete with GSA.
Another capability common to traditional enterprise search applications is the ability to tune the relevance so you can emphasize the parts of your documents you know are important. Companies with well structured abstracts can put additional weight on the abstract; companies that want to stress the author can do so. With GSA you get the fields Google wants, and the relevance engine is fixed (see below).
There are other indexing and search time considerations which have always given the edge to the old guard of enterprise search. Things like flexible datastores, specialized security and filtering, and granular control over indexing processes are part of just about every installation we've seen in our years in the market. Whether BeringPoint can make enough of a difference with the GSA remains to be seen.
The 'secret sauce' that differentiates Google's public search site from its competitors is the rank popularity model: more sites and pages that link to a particular site tend to make that site's content more relevant. That remains one element of many that seem to be used in the GSA.
The bad news is that not all corporate data is interlinked. If you have a web site of mostly interlinked HTML documents, the GSA page popularity algorithms should do a pretty good job of giving you results you people will be happy with.
On the other hand, if your intranet content is mostly Office documents, PDF, and data maintained in content management systems, chances are your content is not very well interlinked. If this is the case, the ranking algorithms that make Google shine on the web may not help you much within your intranet. We've seen some reviews lately that claim that when companies use a GSA for their web site search, many employees find the public Google search, limited to their own site, returns far better results than the GSA running within the site. This happens because of all the external sites that "vote" on page relevance by linking to the web site content in question. Internally, the GSA just cannot add that boost to the relevant page. And, as a closed system, you cannot tweak the relevance algorithms to modify he ranking.
Both of these categories of differences may become less relevant after BearingPoint begins to market the GSA - rumor has it that they are working with Google to enhance the product, so we'd expect they will address some of these details that have kept the GSA from enjoying widespread acceptance in the enterprise environment.
With respect to pure benchmarking of indexing and search performance, we do not have a great deal of information to offer. Most search companies specify as part of their partnership or software license agreement that the receiving party cannot disclose results of benchmarks or side-by-side comparisons. And to be honest, the indexing time is not usually critical in most search applications; and we think it's fair to say that generally any well-tuned commercial system can return results pretty quickly for users. So on this issue, we think it's a wash.
We hope this has been of some use to you; feel free to contact me directly if you have any follow-up or additional questions. Remember to send your enterprise search questions to Dr. Search. Every entry (with name and address) gets a free cup and a pen, and the eternal thanks of Dr. Search and his readers.