2009 Overview of the Enterprise Search Market
Last Updated Apr 2009
Miles Kehoe
For the last several years we’ve presented our analysis of the enterprise search vendors, and we’re excited to present our 2009 report covering the tier 1 and tier 2 commercial vendors, open source solutions, and search-related tools and utilities.
In this year’s overview, we’ll cover the following topics:
- The State of Enterprise Search
- The DNA of Search Technology Companies
- Leading Vendor Overview
- Vendors by Market
- Search Resources
- Looking forward in 2009... and beyond
We hope you will find our analysis helpful, and will let us know how we’re doing.
Our Criteria
Our findings are based on the information we gather from customers, press, and our own insight into the marketplace. However, unlike many other analysts, our observations are also based on first-hand experience with products “in the wild” – our real-world experience and observations of the products.
Regardless of the vendor, there really is no such thing as “bad search”. Rather, we see customers having problems with search technology that doesn’t meet their needs – simply the wrong solution to the problem; or a problem with the customer’s methodology.
If you look at the Forrester Wave for Enterprise Search, or at the Garner Quadrant animated over the last few years here, you see the usual suspects: Autonomy, FAST, IBM, Endeca, Google and others. When we speak at shows like the Enterprise Search Summit about search vendor selection, I point out that there is an excellent chance attendees are using one of these upper-right quadrant technologies. What makes people thing that switching from one highly ranked technology to another highly rated technology will really solve the problem. Our experience is that methodology is the problem; and the good news is you can often change your methodology and not have to spend money on a new search technology!
The State of Enterprise Search
Enterprise search – the technology that drives intranet and customer facing content retrieval – is a dynamic and growing market. Not everyone is happy with the implementation, but as we’ve said over and over in our newsletter and at conferences, that’s more a problem with methodology than technology.
That said, we see two recent trends in enterprise search beginning to escalate: High level consolidation and low end commoditization.
Consolidation
If consolidation is a sign of a maturing industry, then we can say confidently that the enterprise search space is maturing. We’ve had small acquisitions going on for years, of course – check out the M & A page on our web site. But in the last few years we’re started noticing an interesting trend.
For quite a while, we had search companies like Autonomy, Verity, FAST, and Endeca; they sold search. Of course, search is only useful in the context of content, and most of the search technologies could hook up to just about any kind of data source including web, file systems and content management systems. Then we had content management companies like Documentum, Stellent, Vignette, and FileNet; typically they made deals to bundle some ‘bare bones’ search technology from one of the search leaders. (We have to include SharePoint in this list, even though Microsoft is not a ‘content management company’).
Do you notice an interesting thing about the companies just mentioned? The search companies and the content management companies are hooking up. Autonomy, after the Verity/Ultraseek acquisition, picked up content conglomerate Interwoven. Microsoft has acquired FAST for both its SharePoint market and beyond for corporate intranet search. IBM acquired FileNet. The exceptions here are Endeca, Vignette, and Documentum, which itself was acquired by storage array giant EMC.
This trend offers great promise for companies looking to make their content easier to find – rather than minimal search capability, we’re seeing the stage set for really great Enterprise Search 2.0 capabilities inside of content management systems. In some ways this trend increases the importance of really great federated search, because few companies have single repositories for all content in all divisions; so being able to easily access multiple of these ‘islands of content’ will be critical across the corporation.
Finally, we’re seeing fewer and fewer ‘high end’ search companies for those areas that have demanding search requirements including eDiscovery, pharma and biotech research, and even intelligence. Microsoft has announced its plans to maintain FAST ESP independent of SharePoint; and Autonomy, while building a risk management suite, still has a powerful search technology at its core. But will some newcomer focus on these demanding search needs with laser focus and surprise us all? Stay tuned…
Commoditization
[Top]At the high end of the market in terms of functionality and performance, vendors continue to command prices in-line with historic averages. However, a new trend has started: free and low-cost software that, while not as robust or capable, is pretty darned good.
The open source movement certainly has contributed to this trend in many areas of the software industry. Apache Project Lucene and Solr are the best known, but there are several other excellent quality open source search toolkits. Rather than impose limitations, theses open-source programs arrive with ‘Developer Packaging’ rather than the ‘Enterprise Packaging’ companies expect with most enterprise applications. And, with companies like Lucid Imagination recently receiving funding, we’d expect to see more of a move towards more traditional packaging and support.
Ironically, the open source programs are not the prime movers towards search becoming a commodity. The ultimate in low-cost commercial software comes from IBM and Microsoft. Both have introduced free search products that come with Enterprise Packaging’ like the high-end commercial software, albeit with license limitations on the number of documents allowed. But the price of this enterprise software is Zero. Zip. Free.
One sign of this commoditization of enterprise search software is the recently announced ESP for SharePoint, an interim solution on the way to fully integrated FAST ESP and SharePoint/ ESP for SharePoint is priced at $25K per server, plus maintenance. Granted there is a three server minimum, but in the real world you’d want at least three servers anyway – perhaps a development server and two load-balanced production servers. There are limits on the connectors that are available, and it requires Microsoft eCALs; but this packaging is only the first step towards rich feature set high end search moving into the low-cost range.
The DNA of Search Technology Companies
In the last few years, most major search vendors have technology that is applicable to a poad range of applications, although it seems that most tend to focus on only a few specific industries or markets. Sometimes this happens because of market opportunity – other vendors have not sold aggressively into a given vertical industry, for example. Other times it’s a function of the technology’s initial design. The researchers behind nearly all of these technologies started with a world view which heavily influenced the design of the product, and often this basic “DNA” remains with the product forever.
Consider Verity, acquired in 2005 by Autonomy. Verity started as a research project to bid on a large government intelligence project, the Automated Message Handling System (AMHS). The intelligence community needed a product that could route incoming message traffic the appropriate General or Admiral or Colonel based on the content of the message. Humans – often Tech Sergeants - had been doing this for years, and Verity in its Topic product provided a way to represent the acquired knowledge of these intelligence folks in software. The requirements: High speed content classification; a rich and hierarchical security model; last in, first out results; and short index latency. All of these requirements are deep in the kernel of the original Verity product Topic, and they remained as core design principles until the acquisition by Autonomy and subsequent replacement of the K2 kernel.
What the DNA of each product gives us is a way to classify the suitability of each product for a given application, if only as a first-pass in our minds when we first talk to prospective customers. Again, this is only an initial impression, but it helps us figure out what technologies might work best in a given environment and application.
For example, suppose your intranet looks a lot like a small version of the internet: lots of HTML and PDF documents, maybe mixed in with some office documents; lots of interlinks between your content; and all of you content, even from a database, is available from a specific URL. When we talk to a prospect with this kind of environment, we start thinking that the Google Search Appliance might be a likely candidate.
Let’s say you need a search engine for selling products on-line, and relevance to you means something like ‘next Monday at 8AM, a query for a 8 megapixel camera that meets the user’s specifications should return a Canon, because Canon is going to give us an extra 2 points on each product we sell until Friday night at 6PM’. While many vendors can and do work fine in applications like this, we think of Endeca pretty early on in the discussion.
One more example: You have a terabytes of content of every type – office, database, HTML and more – and you want faceted navigation, entity extraction, and highly predictable scalability - we start thinking that FAST ESP may be a good fit.
[Top]Leading Vendor Overview
We tend to think of enterprise search vendors in tiers rather than quadrants or waves. You’’ find the usual suspects in Tier 1; these are the big vendors everyone knows, and the ones that have the strongest market presence across many industries and regions. Tier 2 vendors are those who are less well known, or whose strengths have thus far been limited to specific verticals or geographic regions. Note there is nothing inherently wrong with the technology in Tier 2 companies, and no reason to believe that the technology may well be able to handle even the most stringent of environments. Please remember that John Ruskin’s famous quote on using price alone as a buying criterion applies most directly to physical goods, but we encourage you to not make enterprise decisions based solely on price!
We’ll cover the overview in four areas:
- Tier 1 Vendors
- Tier 2 Vendors
- Other Commercial Offerings
- Low-Cost and Free Solutions
- Tools for Search
Let's have a look.
This year’s Tier 1 matches last year’s list exactly, with two exceptions: First, FAST Search is now a wholly owned subsidiary of Microsoft, which includes the bonus that the latter is now a Tier 1 player; and Exalead, a niche player in the US, has entered the US market with a new US Headquarters in California and with a rich, scalable technology – desktop to internet.
Autonomy
While Autonomy IDOL clearly remains one of the top search kernels, the company seems to be continuing its move ‘beyond search’ to include many of the applications associated with risk management and compliance. Their suite of applications and modules that work together with the IDOL core technology at the center include Aungate, Etalk, Echo, Meridio, Virage and recently ZANTAZ; and together, these place Autonomy beyond ‘simply search’ into a much poader market. Read more about our Autonomy Practice on our web site.
Endeca
While Endeca has a number of enterprise, government, and lipary customers, we also think of them as the leader in eCommerce search. The only privately held company among the Tier 1 players, Endeca has built a name for itself with their Information Access platform by innovating search to support relevance based on business rules with Guided Navigation® and Content Spotlighting™. Read more about Endeca on our site.
Exalead
The new entry into our Tier 1 list, this French company has opened a US operation to make a big push into the North American market. While we list Exalead as a challenger, feel they belong in the top tier: Their technology drives a web portal , which includes the richest of enterprise capabilities on the scale of the web.
FAST/Microsoft
The newly combined team of FAST ESP and Microsoft puts both companies in even a stronger position than last year; and they are a formidable player in search and content management. Some question whether the combination will be successful over time; but FAST’s ESP with Microsoft’s penchant to ‘document, document, and document’ will pay off for new and existing FAST customers. Read about our FAST Practice on our web site.
The best known of the Tier 1 players, the Google Mini and the Google Search Appliance certainly have made strides in the enterprise. Google has put together a strong solution for the enterprise search market by providing features like database indexing and security, along with innovative features like the ‘wiki results’. As with the other Tier 1 companies, we partner with Google and, in many environments, find it a pretty darned good enterprise solution. But while users may ask “Why not just use Google”, it’s not that simple and it’s not always the best answer for every search requirement. Read about our Google Practice on our web site.
We’ve trimmed the Tier 2 list for this year, removing companies like Intellisearch, which seems to have exited the US market, and adding newcomers Attivio is the same as last year, but that doesn’t mean these companies have not been making strides. Attivio is a newcomer to the market
Attivio
A newcomer in search, Attivio combines enterprise search with traditional relational structure to facilitate queries with "SQL-like" results. Attivio uses some Lucene-based technology and has a number of former FAST employees on staff.
Dieselpoint
A solid pure-Java-based search technology, Dieselpoint Search is solid, full-featured, and a strong contender for eCommerce and enterprise search, especially for companies that need or want the flexibility of a product that is hardware neutral. DieselPoint has extensive reporting and a strong promotions module. They are also creators of the proposed Open Pipeline standard we hope other vendors will adopt.
IBM
This well-known giant has a number of search technologies, from the free IBM/Yahoo! search discussed below to mainframe engines. We think the technology they acquired from iPhrase in 2007 and marketed as IBM OmniFind Discovery Edition is among the most interesting technologies out there. One of the features we like best is the ability to integrate business rules into the query processing, and as the scalability and feature set grows it could move IBM into our Established Player category.
ISYS
ISYS is a successful Australian company with a new executive team in the United States and the ISYS:web product line. The technology is straightforward and capable, and has been quite successful marketing to companies, government agencies, and law enforcement. We recently published an interview we did with a new ISYS customer who is quite happy with his selection.
Recommind
A solid search technology for enterprise search, Recommind's MindServer is based on technology that recognizes context and concept search to deliver meaningful, high quality results. While the technology is applicable to enterprise search requirements, Recommind has enjoyed the most success among legal and eDiscovery applications.
Thunderstone
A long-established name in search, Thunderstone continues to enjoy limited success in the enterprise market. They offer a quite innovative line of products, including a search appliance which includes their TEXIS search product. Thunderstone features a relational model search technology with commit points and roll-back indexing with a rich scripting language, and seems to be a promising technology.
Vivisimo
Originally a clustering technology for federated search still available at http://www.clusty.com, Vivisimo has grown into a solid full-capability search platform with a number of key public customer sites including http://usasearch.gov and the National Lipary of Medicine http://www.nlm.nih.gov/. Vivisimo is moving into social search with Enterprise Search 2.0 methodology.
ZyLabs
An interesting vendor which shows well in other analysts’ findings, and they seem to have string presence in government and non-profit markets. Nonetheless, the do not seem to be a major enterprise player, at least in the markets we work with.
We see vendors including dtSearch, Siderean Systems, and X1 positioned as enterprise solutions, but we still see these as niche players that are not yet enjoying the enterprise audience. These are generally solid, well-established companies, so don’t disqualify them from your search until you’re sure they don’t fit your requirements.
This year we also want to mention a newcomer in search, MyRoar. Formed by technical folks who understand the financial services market, MyRoar is a natural language question machine. They claim that their technology does not need to be trained like vendors InQuira and Q-Go; we’re hopeful to do more with them this year so we can report back on our findings.
SLI Systems is a hoisted search technology that provides enterprise search for public-facing content. Their real strength is their ability to link pubic search engine rankings with your own public-facing search: think of it as enterprise search meets public SEO.
Low-cost and Free Solutions
Lucene
One of two related Apache Software Foundation search projects, Lucene is perhaps the best known of open source search toolkits. While the original Lucene was written in Java, it has been ported to virtually all major languages and platforms, and is probably on more web sites than any other technology. Large companies like IBM have invested heavily in Lucene, and I expect we'll see increasingly capable versions over the coming years.
Solr
Another Appache project, Solr, based on Lucene, provides a higher level API and more functionality than its parent including term highlighting, faceted search, and a simple administration console. The Solr toolkit, which runs in a Java servlett container such as Tomcat, lets you submit XML-structured records, a vast improvement over the low level calls in Lucene.
IBM/Yahoo
These two giants have teamed up to deliver an excellent free search platform for organizations with no more than 500,000 documents to be indexed. Packaged as the IBM Omnifind Yahoo! edition, it features a rich powser-based administrative console comparable to those in the best commercial products to control indexing, thesaurus/spelling suggestions and rudimentary search analytics. http://omnifind.ibm.yahoo.net/
Google Site Search
Google has introduced a low-cost hosted solution for public-web-based web sites called Google Site Search. Like hosted search companies like as SearchBotton.com, Google Site Search lets you add a free search to your public web content and display results with no ads and with full control over what users see. Pricing starts at $100 per year for up to 5000 web pages, and is, of course, powered by Google. http://www.google.com/sitesearch/
Search Server Express
A limited capacity version of Microsoft's Search server, Express is quite sufficient as a stand-alone search technology. The engine installs with a SQL Server Express, which limits search fo 2.5M documents, probably quite sufficient for many web and departmental applications.
There are so many categories of tools and utilities for enterprise search that sometimes it’s hard to keep them straight. Here’s a high-level view.
Federators
MuseGlobal
If you use a vendor-supplied federator, chances are good you're using MuseGlobal technology. MuseGlobal markets their products and framework - which includes over 6000 connectors - to most of the large enterprise search vendors and to companies that need to syntactic and semantic processing of multiple content feeds. Find them at www.museglobal.com/
Grokker
A San Francisco-based company that provides a combination of search result federation and results visualization, Grokker has a very cool product that lets you see visual clusters of terms from a number of sites. Grokker works with just about any search technology, and has a very cool demo on the public web site. Find Grokker on the web at http://www.grokker.com/.
Deep Web Technologies
Created by one of the earliest Verity employees, Deep Web has an impressive federation product that can create really great facet-like topics to effectively drill down into results from multiple web sites. Deep Web supports public and secure data, and handles all the details enterprises need to have. They have a number of excellent demos linked to from their home page at http://www.deepwebtech.com/.
Raritan Technologies
Based in New Jersey, Raritan has an excellent federation product which includes the ability to search directly from the research pane in the Microsoft Office suite of applications. Find them at http://www.raritantechnologies.com/.
Search Analytics and Promotion
[Top]We’d be remiss if we didn’t list our own SearchTrack product, the first to directly link great search results analytics to best bets and result boosting. Like most search-related utilities, SearchTrack logs activity to relational database systems for performance, and can work with any search vendor. You can add reporting and boosting to even the most basic search – call us today at +1-408-446-3460.
Vendors by Market
Based on our methodology, our experience with the vendors and their DNA, we’ve identified a handful of key markets for search technology. These, and the companies we see as leaders in each, are included in the figure below as the NIE Industry Grid for 2009.
NIE Industry Grid for 2009
Compliance/eDiscovery
|
eCommerce
|
Vertical Search/Public Portals
|
Government/Law Enforcement
|
Enterprise/Intranet
|
Customer Support
|
CMS Search
|
OEM/Bundled
|
Hosted
|
The order of vendors in each market is based on the position they have based again on our methodology and analysis. However, remember that just about every technology can fit – sometimes pretty darned well – in just about any industry. Sometimes this will require extra consulting or product add-ons to customize a given technology to a specific market/application: as they say in the software industry, it’s SMOP – a small matter of programming. Or even better, SMOM: a small matter of methodology.
Search Resources
There are a number of resources on the web and beyond for business and technology assistance on the web. Here are a few of our favorites, along with some we run or moderate.
User Forums
SearchDev.org: The independent search developer's forum. A forum on the business and technology of search.
SearchDev also has two technical forums for detailed vendor-specific questions dealing with everything from coding and scripting to problem resolution, with more in the works:
LinkedIn Groups
Enterprise Search Engine Professionals Group: A fast-growing LinkedIn group for people working in or involved with enterprise search in corporate environments worldwide. Search for it under the Groups menu.
Enterprise Search Summit Group: A new group run by Michelle Manafy at Information Today which will provide industry news and information as well as details and podcasts about upcoming EDD events.
Newsletters
Enterprise Search Newsletter: Produced by New Idea Engineering, this newsletter covers both business and technical issues of search, generally at a more detailed technical level. It covers all vendors, provides advice for improving your search, and includes Ask Dr Search who answers technical questions from subscribers.
Blogs
Enterprise Search Blog: A blog produced by New Idea Engineering that covers all topics around the business and technology of enterprise search including opinion, news, events and more.
The Noisy Channel: This insightful blog, run by Daniel Tunkelang, CTO of Endeca, has a perspective on technology of enterprise search from someone who knows search from the ground up.
Beyond Search: Run by search guru Steve Arnold, Beyond Search contains news, interviews, and opinion on the search market delivered
SearchTools: Avi Rappoport runs this blog which summarizes new content from her website http://searchtools.com/ which covers almost every search technology known to mankind!
SLI Systems Blog: Hosted search service SLI Systems provides a newsletter that talks about the kinds of problems they see in working with their customers. http://www.sli-systems.com/newsletter.php
FAST Forward Blog: A blog run by FAST Search staffed by FAST, Microsoft, and independent bloggers who write about search and IT issues at http://www.fastforwardblog.com/.
Attivio:The search vendor has a useful blog at that had good general information as well as Attivio-specific material.
Mark Logic Blog: Written by CEO Dave Kellogg, who shares interesting information about technology. A fun read, and always informative.
Vivisimo Blog: Vivisimo runs the 'Search Done Right ' blog that provides great background information on enterprise search. Like Attivio's blog, it has good information that anyone can benefit from.
Two other blogs i find most interesting are not directly related to enterprise search, but I find good value when I follow them:
Andrew McAfee, a Professor at Harvard Business School. writes about IT issues, and he always has interesting material.
John Battelle, author of 'The Search...', has an interesting blog as well, and it's always fun to follow what he's doing.
Trade Shows
Enterprise Search Summit New York: Every May, Information Today sponsors the premier show for enterprise search in New York City. If you only go to one show a year, this is the one to go to. That's also the advice we give to new vendors entering the marketplace. We'll be back again this year, speaking about how you can save money by making your existing search engine work rather than replace it. By the way, you can listen to a preview of our talk, as well as talks by other speakers including Matt Brown of Forrester and Sid Probstein of Attivio.
Search Engine Meeting: Search Engine Meeting in an interesting show run by Infonortics from the UK. In its 14th year, this year's show returns to Boston in April 27-28; see you there!
[Top]
Looking forward in 2009 - and beyond
Enterprise search is a hot field now, and the companies in the space are competing hard to provide capabilities that will win over users. As companies integrate more features, look for the leaders to begin looking at the content of the user and the data to improve search results: this is what Google does that makes it such a winner in public search; and it’s what companies need to make their users confident in search again.