Enterprise Information Retrieval NeedsWe have been searching computer-based collections of content in enterprises for fifty years. The price of storing/hosting content has plummeted, while the availability of content, particularly textual content, has skyrocketed, with the boom of audio, image and video content making their mark as well. But the amount of content is not the problem; in fact it is the greatest knowledge exploitation opportunity in history. The unmet challenge of this age is filling the users display device, eyes and intellect with every appropriate detail about his/her information need, in a form that is processible, digestible and succinct..and without his/her repeated asking. The major elements of this challenge are: Searching without borders. Users with search needs must have a single access mechanism for all "enterprise" content; owned, licensed or "of interest". If every collection of content requires different signon, user interface, search request language and rules and result evaluation, it is a guarantee that most collections will be ignored in the normal (re)search process.
Cross-index or Multi-Index searching. Even inside the firewall, users with search needs must have a single access mechanism for all "enterprise" collections. If proprietary engines cannot access their competitors' proprietary indices, and results, the complexity of enterprise search becomes hopeless…isn't it amazing that the audio/video media figured this issue out almost instantaneously?….and the dbms companies standardized ages ago.. probably because the ES technical and business models are archaic…stemming from the 1960s.
Search "Verticals" Search needs to be an application that is dictated by the needs of each related group of users….the community-of-interest…by the taxonomy, the search features, the sources, the results analytics. In-depth interpretation/analytics of content. Content needs to be treated according to its type, as its type and structure will greatly influence how it should be searched. What's more, the "evidence" that makes a document interesting is typically very concentrated, and users shouldn't have to find the evidence… it should "find" them. Currently, at the document level, every word in the document, in structured fields or the body is indexed for speed and evidence highlighting. But in virtually every case, the documents are treated as an indistinguishable mass…e.g. an email message is the same as a 500 page report is the same as a web page. Meaning uniformity (Search)term meaning is still left entirely to the user, to interpret AFTER the search by examining the results…. Painstakingly. Ease of expressing search needs After 50 years, the average size of a user search expression has increased from one word (NOT one meaning!) to 1.3 words….hardly a pat on the back to the ES software vendor community. II. The Nature of Enterprise Search and Federated Search
The meaning of Enterprise Search (ES) is typically understood as "full-text search used by and within the enterprise" It is the historical "information retrieval" for something beyond the personal filesystem, and developed its name to distinguish it from Internet (Web Page) search engines such as Yahoo, AltaVista and Google. It is characterized by:
Federated search (FS) is simultaneous multi-collection; multi-engine access, non-indexed search, within and without the corporate firewall. FS has gained most of its popularity in the academic/public library space. While current industry analysts consider FS a part of ES; in fact its purposes, approach and power enable it to be not only a viable separate entity, either driving the acquisition of content for ES purposes, or being the true ES, with traditional ES becoming an post-search text analytic technique. FS is characterized by:
Many industry pundits have labeled FS as "slow" by using the Google "Time-To-Result-Set" metric. This comparison is a mistake, as the two are very different applications with very different audiences. First of all, FS is a method for accessing ALL content, and even if there is a price of a few extra seconds, or even minutes, the benefit of a true world-wide search scope far outweighs the clock time of interactive result list generation. Secondly, the far more effective way to use FS is to use it as a smart alert function, with each search request constantly monitoring the world of content sources for new and interesting material and delivering the content once discovered. Thirdly, the FS ability to source select and normalize all-source results with practical, meaningful evaluation-ranking overcomes the "I'll see what everyone else sees" mentality of the "citation-popularity" method of consumer web engines. Fourthly, the Google "popularity contest" utility on enterprise material, that has no links, is non-existent.
III. What Traditional Enterprise Search LacksTraditional ES makes its customers with true enterprise needs suffer. Incompatible, widely varying approaches insure that the search user gets LIMITED BENEFIT FROM THE application and is discouraged from investing in it further. While each of the areas described in Paragraph II above indicate areas lacking in ES, this paper concentrates on search scoping only. Every investment in traditional ES by the enterprise guarantees that the user experience will either be unaffected or complexity will increase. ES by its very implication must be an ENTERPRISE APPLICATION and have:
IV. Deep Web Technologies/EXPLORIT has BECOME the Enterprise SearchExplorit, as the FS application with the broadest and deepest capability set, is the true enterprise search application. Its input is the entire world of content (by source) and its output is (1) the direction to full-text indexers inside the firewall about which content to keep and analyze further, and (2) search results enabling the review and/or retrieval of relevant material. Explorit uniquely among FS applications indexes content for full-text searching when necessary.
The characteristics below not only make FS/Explorit an important enterprise application in its own right, but provide the foundation for FS/Explorit to be the true enterprise search application leader going forward:
V. Risks/Flaws with FS Becoming the "TRUE" ES.The risks are mainly political. Traditional ES engines are going to fight it tooth and nail. They're going to holler "too costly", "too complicated"….too "not us!". They may even have legitimate concerns such as… 1. What if there aren't any fields? Explorit could full-text index too. 2. FS is too slow. If you can't wait 10 seconds or 10 minutes for the answer, the purpose of your search can't have many long term implications (of course the boss could have demanded an immediate "answer"). See II above. 3. What if the interesting content isn't "available" Well, GET IT! The main risk is an attitude adjustment. The only way to achieve TRUE Enterprise Search is to insert/use a seamless, interoperable WWW 2-compatible, mature universal search application that preserves every dime of previous ES investment. VI. Still on the HorizonThe FES application is ready; it's time to adopt it. If FS was perfect, or even if FES was perfect, this paper would be the first to state it. In adoption-integration, several factors need to be considered.
|