Search this site:
Enterprise Search Blog
« NIE Newsletter

2008 Enterprise Search Vendors: The new "Fab 4... and 1/2"

Last Updated Feb 2009

By Mark Bennett, New Idea Engineering, Inc. - Volume 5 Number 1 - January 2008

Newsflash: Microsoft is acquiring FAST Search

This issue went to press prior to the announcement that Microsoft has agreed to buy Norway's FAST Search & Transfer for $1.2 billion US. Obviously if this goes through, and we expect it will, it moves Microsoft into the Tier 1 position and creates an interesting situation for Autonomy. Links to the Enterprise Search Blog and the New York Times -- many others online.

Search was red hot in 2007 and continues to be in 2008, but since we did our last roundup, there's been a bit of a Sea Change.

Autonomy and FAST Search are still holding onto the high end -- no change there -- Autonomy IDOL and FAST ESP continue to do quite well. But the other two classics from early in this decade, K2 and Ultraseek, are finally fading. As you may recall, Verity had previously acquired Ultraseek, and then Verity had itself been acquired by Autonomy. Autonomy then offered 3 engines: their own IDOL, Verity's K2, and the very popular Ultraseek. But over the past two years K2 and Ultraseek have not been a priority for new features, and many customers can see the writing on the wall. They have either migrated to other engines, or are making plans to change when their current license term is up. To be fair, Ultraseek was stalwart of the early Internet, as K2 was for Enterprise search, and Autonomy has maintained and supported the customer base for both. They have also borrowed code from both of those engines to enhance their IDOL product line. To ease the transition to IDOL, Autonomy released "K2 v7", that featured their IDOL engine at the core, but with the more familiar Verity K2 wrappings, and some customers did take that route. But as much as we love them, K2 and Ultraseek can no longer be considered "Tier 1" players.

 
Vendors On the Move!
Animated graphic showing vendor movement over the past 5 years.

Sidebar: Our Criteria

Our opinions are based first and foremost on what our clients talk to us about or hire us to work on. When large companies spend money, it's a great indicator. This answers the "what are they shopping for" and "what are they buying" piece. The questions you send us and post to searchdev.org group are also a great "pulse" of what's on peoples' minds, and tell us more about what people are finding confusing or frustrating, or things they'd like to buy. We also track and talk with other analysts and consultants to see who's talking about what at industry trade shows. We take some care, however, as this often has more to do with what vendors and analysts are talking about, so we do our best to talk with actual corporate users there, too. And yes, blogs, wikis, and press releases help round out the "buzz factor" for search engine rankings, who's getting "air time" and "mindshare" etc. From a technical side, we've been down in the trenches and have some strong opinions about usability and applicability. Let's face it, we're "search groupies" -- we love this stuff! Our experience provides a "reality check" and translation matrix for vendor hype.

So, not every engine is mentioned here, and we don't always agree with Gartner and Forrester, although we doubt they're losing any sleep over that. Don't agree with us? Have we short-changed your favorite vendor?? Drop us an email!!!

From our previous roundups of the Top 4 (IDOL, K2, FAST and Ultraseek), only 2 remain, from the old guard. So who's been added?

We had expected IBM to be the next entrant into the "Tier One" lineup, with the OmniFind Search Discovery Edition, based on their iPhrase acquisition, and we had labeled them the best surprise of 2006. But most of our clients don't seem to have noticed. To our great surprise, when we saw IBM at one of the big industry shows, they were featuring one of their other engines, the OmniFind Enterprise Edition. Although similarly named, this product is based on IBM's older code base, and not iPhrase. It is certainly a scalable performer, but not as cutting edge as the iPhrase code. IBM is also involved with Lucene and Yahoo offerings, discussed below. But no, IBM OmniFind is still not one of our new Fab 4 and a half.

Also not yet joining Tier One again this year are Microsoft's Enterprise Search and Oracle's Secure Search products, though both companies also are tunneling into the search space via their respective CMS offerings, which include built-in search. Microsoft's SharePoint is becoming wildly popular, and Oracle acquired Stellent. If your content resides primarily in one of those two systems, you may very well already have all the search you need. When asked why Microsoft is pushing so hard to enter the search space one of their trade show reps quipped "... because that's where the money is" -- it's hard to argue with that logic.

It's perhaps no surprise that Google, with its new version 5 appliance and marketing might, has finally arrived in the enterprise search mainstream. The v5 box addresses many of the integration issues that have been holding Google back, and its phenomenal brand name does the rest. The box won't fill every requirements sheet -- some organizations are still going to need the raw horsepower and integration of a FAST or IDOL engine -- but for many mid-level applications, you can expect to see the Google logo on a lot more enterprise portals.

Another inductee into the 2008 Enterprise Tier 1 players is Endeca. Of course Endeca has set the bar for retail public-facing search for some time, but we had been skeptical about their more recent "me too" entry into the Enterprise search space. But lately our customers have been much more aware of the Endeca brand, and have either considered it or know somebody who has. Endeca has created some very slick administration and development tools, and recently did very well in a head-to-head comparison. Although Autonomy and FAST continue to make progress in this area, Endeca has passed them in some admin features. Given the importance of administration, we're getting more enthusiastic about them in the Enterprise space.

Our final inductee into the new Enterprise Search "Tier 1" (the "half" in "4 and ½" is a set of open source software based on Lucene, including Nutch and Solr. Technically it doesn't qualify for the classical "enterprise" status, as we define it. By rights, they should stay on our "Tier 1.5" list. But, and this is a big one, they are being considered, or at least discussed again and again by clients. It happened often enough in 2007 that we have to include Lucene/Nutch/Solr (which we'll refer to as "LNS") Let's be clear -- LNS is not a drop-in replacement for any of the generic Enterprise Search engines out there. It's not that there's anything wrong with the core of LNS, as a toolkit, but it's the current packaging as a full-on enterprise application that we take issue with. Don't be fooled by any of the hype you read about LNS being conveniently packaged and ready for general enterprise duty. We love the core engine and have even used it on several projects. And some progress has been made in packaging. But unless you are staffed by programmers and/or Linux nerds with time on their hands, the typical IT group is not ready for the hassle of configuring and deploying LNS.

Sidebar: "Enterprise" Software Packaging vs. Lucene / Nutch / Solr

We don't want to offend any open source fans out there; we like open source software too, and have done some projects with the Lucene based tools ourselves. But the level of packaging and support for these open source tools is not aligned with what commercial search vendors offer and what most companies expect. Though the adjectives "commercial" and "enterprise" mean different things to different people, here are some bare minimums for enterprise search:

Some Enterprise Search Software Technical essentials:

  • A GUI installer
  • A fully functional spider that can be controlled and monitored from a web interface
  • A full set of robust document filters for Microsoft Word, Excel, PowerPoint, Adobe PDF, etc., compiled and installed by default.
  • Ready-to-install Gateways to popular content management systems
  • Integrated navigators

Business:

  • Complete professionally edited Documentation
  • An 800 number for tech support
  • A staff of Professional Services integrators
  • Vendor certified training classes
  • Help with detailed requirements analysis

Contrast this with one of the Lucene / Nutch / Solr (LNS) offerings, which tend to come with a Unix tar file, some documentation, a few shell scripts and sample code, and a community-driven mailing list. In all fairness, some offerings do also have a partial Web UI, and we've been very impressed with the LNS communities, which are very lively and do tend to answer questions quickly. The roughest gaps with the LNS offerings are in the spidering process and document filters. Yes, there is code there, but it's not functionally complete for the tasks an enterprise engine needs to perform. Navigators are also missing in the base package, though another open source project, Carrot^2, is attempting to address that.

You could argue that the contents of an LNS package are on a par with what you get when you download Apache, and look how popular Apache is! While this is true on the surface, Apache has been around for a lot longer, has been widely written about, and has many more IT professionals familiar with it. More importantly, the integration complexity of a basic enterprise search solution is much higher than a bare bones web server. Yes, Apache is a complex package as well with many optional modules, but you can afford to ignore most of them on Day 1. With search, however, many modules must be turned on and configured before search will be useful enough to serve even a midsized company, so this comparison is not really applicable.

So why are clients still interested? LNS fits into some very interesting niches that the classic vendors don't always fill, and the bean counters are initially enticed by the $0 licensing fee. Of course, some companies DO have lots of programmers. Other companies want to micromanage document relevancy and ranking. Small startups are often willing to trade coders' time for saving some up-front costs. Other companies who want to eventually be acquired don't want any strings attached to their intellectual property, at least in terms of future licensing or maintenance fees.

We hope we haven't offended any open source fans out there. Please see the sidebar about "packaging" We've used Lucene, we like it, we respect it, but some of the recent hype about its level of packaging is simply not true.

It's been widely reported that IBM is taking Lucene under its wing and helping to evolve and package it, while hopefully still allowing the open source community to use those enhancements. In the past IBM's open source efforts have greatly benefitted Linux and Eclipse, for example.

These are not the only companies out there; there are many other vendors in the search space. We hope to feature some of them in upcoming issues.

We should certainly mention ZyLAB, who have recently entered Gartner's coveted upper-right "magic quadrant" They list many government and nonprofit agencies as references, though we haven't seen much interest from our corporate clients yet. Also of interest, they have a downloadable ROI worksheet on their demos page.

Vivisimo, which started in the clustering space, continues to do well in the government and commercial sectors, and have a presence at many of the search trade shows. You can try them out on Clusty.com.

Reccomind, also at many of the shows, has a strong presence in the legal technology space.

Dieselpoint has opted to compete on performance vs. a lowball price. This is rather unusual for a non Tier-1 player, but they feel strongly that their technology is worth it. Our own Dr. Search calls them "the best search engine you've never heard of." They also launched a new release in '07 which they believe is extremely scalable.

Intellisearch is the "other search engine from Norway" not to be confused with FAST Search. They are now expanding their marketing efforts here in the US and have landed some key accounts.

Siderean Software and Exalead have been talking about advanced Entity Extraction at various industry events.

Sadly for InQuira, we're seeing it phased out at some locations, or having the built in search be replaced by another engine. Users have complained that, though it handles natural language queries well, it has trouble with the more common 1 and 2 word queries modern users type in. We have not verified this first hand, and to be fair these types of anecdotal stories might just indicate some configuration problems.

We're seeing X1 and dtSearch embedded in some vertical applications. We're not sure how this will compete with open source engines, presumably on features and support, or well-packaged scalability.

And there are many more engines in the market, and seems to be growing every day.

Recapping the 2008 Tier 1 Roster

As we move into 2008, FAST and Autonomy will continue to hold the high end, if they aren't acquired by somebody else first. (Update: Tue 1/8/07: This will include Microsoft once their acquisition of FAST is complete.) They are evolving into a search "platform", vs. a traditional drop in application / solution. Endeca will make some gains in the enterprise and continue to do great in the ecommerce space. The mid and low end markets will beat a path to Google's door, and armed with their v5 offering, Google will take some higher end business as well, even though it may not always be the best fit. Lucene and its derivatives will expand by being embedded in other software packages and services, though many users won't even realize they're using it. Lucene, Google and the free IBM/Yahoo! engine will continue to drive down prices, but not as dramatically as a casual observer might think. Google certainly isn't free, the IBM/Yahoo! product lacks some features, and packaging issues will continue to block Lucene & Co. from following in Apache and Tomcat's footsteps (though we hope this gets addressed).

Outside of "Tier 1" other vendors will win some good deals because they can respond to very specific customer requirements, or provide substantially better performance. Also look for a parade of small companies touting algorithms based on advanced mathematics and clustering. We're huge fans of this stuff, but corporate clients seem apathetic, and the business cases for why companies would need this level of effort have not fully matured. Beyond that, the demos some companies are showing are simply horrible; this is not an indictment of the core PHD-worthy technology, but more of storytelling and UI design. There are also several technical aspects of these futuristic products that leave us mildly suspicious.

Still in and sailing along:
FAST Search and Autonomy IDOL
(Update: Tue 1/8/07: This will include Microsoft once their acquisition of FAST is complete.)
Retiring from Tier 1:
K2 and Ultraseek
Added:
Google, Endeca and Lucene (with an asterisk)
Still not there:
Microsoft and Oracle (though big players in database and CMS search), and IBM's OmniFind
(Update: Tue 1/8/07: This will change once Microsoft completes their acquisition of FAST.)

Leading Search Engine Selection Factors

Conversely, viewed from the consumers of search, the companies and agencies buying this stuff, the vendors can be viewed more in terms of features that might be a priority. Here are some technical and business factors that drive purchasing decisions, and some starting solutions to consider. This is a very incomplete list, both in terms of vendors and features, so please don't exclude anybody just because they're not shown here.

Extreme Scalability:
FAST or Autonomy
Generic enterprise search:
Google (an easy-to-sell choice)
Extreme meta data:
Endeca, also FAST or Autonomy
Simplicity:
Google, or hosted or free IBM/Yahoo! offering
eCommerce tie-in:
FAST or Endeca
Elaborate business rules:
FAST, Endeca, and IBM OmniFind Discovery
Ultraseek or K2:
replacement cycle planning
Deep integration:
Lucene/Nutch/Solr/Carrot^2, Tier 3 engines
Broad integration:
FAST, Autonomy, Endeca, Google (v5)
Audio / Video mining:
Autonomy and FAST (via partner), other options
Natural Language Processing (NLP):
FAST, specialized vendors
Federated search:
Many options. Can be provided by some search providers, and also some dedicated third party offerings. Be sure you define whether or not you need to maintain document level security in distributed search. Some starting points include: FAST, Deep Web and Grokker.
Entity extraction:
Many options. Can also be provided by a search vendor, or from a dedicated third party offering.
Content in CMS or RDBMS only, simple search:
use built-in search (for example: SharePoint, Stellent, Oracle)
Very low up-front cost:
Google mini, hosted, IBM/Yahoo!, low cost vendors, Lucene/Nutch/Solr (with sweat equity)

Features Becoming Standard in Commercial Offerings

  • Document level security
  • Navigators and basic Taxonomy framework
  • Basic search analytics
  • Custom Thesaurus
  • Minimal support for search-term specific suggested results
  • Support for many languages, document formats, database connectors.

Broad Shortcomings still present in Most Commercial Offerings

  • Proper distributed administration
  • Thorough spider monitoring and debugging tools
  • Federated search of secure content
  • Frequent update transactions and low latency requirements
  • Full SQL syntax, set operators, and transaction support (with the exception of embedded RDBMS fulltext engines)
  • Easy page recognition and custom entity extraction
  • Robust and easy-to-configure content suggestions
  • Easy integration of client side scripting and presentation
  • WYSIWYG results list formatting
  • Ease of tuning automatic clustering