20+ Differences Between Internet vs. Enterprise Search - And Why You Should Care (Part 3)
Last Updated Feb 2009
By Mark Bennett, New Idea Engineering, Inc. - Volume 5 Number 4 - Summer 2008
Read Part 1 and Part 2 of this series. | ||
Part 3: Strategic and Business ConsiderationsThe previous two installments of this series have focused on the "bits and bytes" of search. In this final section we're going to step back and look at features in a broader context. In a few places we may cover a couple topics again, but this time in the context of "why" vs. "how". An Analogy: Food vs. CuisineYou can run a fine French restaurant or you can feed an army of a million soldiers – they are both impressive feats - but the level of care and detail afforded to each individual diner is quite different. The drastic differences between these services are a lot like the situation facing IT departments managing corporate and customer facing search engines that were originally designed to search the Internet. In a French restaurant, you can fuss over every diner at every table. Are the potatoes warm enough? Did the sauce break? Does table 3 need more butter? This is Enterprise Search. Every document and product listing matters, and every meta data item should be as close to perfection as possible. |
Part 3: Outline / Contents [Part 1] [Part 2]
Editor's Note: This series of newsletter articles is a summary of a new White Paper that we're working on. If you'd like a copy of the final version, please email us at info@ideaeng.com |
|
In contrast, search engines designed to index and search the entire Internet are like feeding a million-man army. Some soldiers get potatoes, some get rice. Soldiers on patrol will get prepackaged ready-to-eat meals (MREs). And in the heat of battle vegetarian soldiers might have to compromise at times. Similarly, if an Internet spider misses a hundred pages on a particular subject, it will likely pickup millions more from other web sites, and normalizing custom meta data is simply out of the question. [back to top]ROI: Myth or Fact?A detailed explanation of the Return on Investment of Search is beyond the scope of this article, but we would like to make a few points. First off, it's true that if you improve search, you may earn more money from increased sales, or save money by improving employee efficiency and customer retention, depending on the type of improvements you make. Improvements that can be measured are considered Hard ROI, whereas the more intangibles would be Soft ROI. However, predicting and accurately measuring how much money you'll earn or save can be difficult. Some of the popular ROI studies were done in the late 1990s, and even studies that appear to be newer are often citing the earlier work, so much of the ROI data is almost 10 years old! And beware of vendor ROI figures for improved employee productivity. When vendors present their ROI calculations, they usually multiply the hours per day spent searching times the number of employees, etc. to arrive at some astronomical annual amount of wasted time. The implied and flawed assumption is that if you upgraded to their solution you would magically recapture all of that lost productivity – which is simply not true. A good search engine might save 5 to 10% of that wasted time, maybe in the extreme case 30%, but the point is that it's not 100%! Of recent note, Q-Go stands apart in modern search engine ROI in offering an ROI money back guarantee to qualifying customers. We'd like to see other vendors be so confident in their ROI numbers. Summarizing the commonly cited ROI benefits of improving search:
The Business Intelligence Perspective of SearchMost people just think of search in terms of helping users find things. A business person may then consider the ROI impacts of search, using it to sell more inventory or saving employees' time. But search has benefits at a more strategic level as well, that appeal to Marketing and corporate management. We say that search has three levels of benefits: 1: The direct benefit to users: With a good search engine, employees or customers can find what they're looking for quickly. This is the aspect of search that everybody is aware of. 2: Financial benefits / ROI: This can be from the direct generation of additional revenue and cost savings, improving efficiency, i.e. "hard ROI" vs. "soft ROI". We talked about this in the previous section. Although the ROI of search gets mentioned quite a bit in the press, we don't think it justifies new search projects to management very often, except in the case of a customer facing B2C or B2B commerce site. 3: Strategic / BI (Business intelligence): Spotting search and content trends, and being able to respond more quickly. Here are some examples of the potential BI benefits of search:
Old school "click-tracking" of web site analytics shows you which links a user follows and the number of seconds spent on each page, leaving you to guess why a user clicked on certain links and whether it answered their question or not. The more modern approach uses Search Analytics to gives a much clearer view. Search Analytics shows you exactly what the user wanted, because you know what they typed in! And you can certainly see which searches produced zero results, which is a very good indicator that they were not satisfied. These analytics can also spot trends and changes in behavior and spot vocabulary mismatch between the search terms typed in and the language used on your web pages. Modern search engines can look at search terms, phrases and sentences at a statistical level. This can be applied to both submitted searches and to recently authored content, possibly including tech support incident descriptions and bug reports, mailing list and blog postings, and other highly dynamic internal content. Modern software can detect statistically significant changes, but assigning meaning and action to these changes is still best left to human experts within the company. We have ideas about how this can all be coordinated and turned into concrete actions, but most organizations are still busy working on more basic search upgrades. When justifying search projects, we encourage clients to think in terms of all three levels of benefits. When thinking about the BI benefits of search, we suggest including additional stakeholders even in the earliest parts of planning. Most companies already involve IT and site designers in their planning process. But these BI benefits will also be of interest to upper management, content creators, customer service / tech support and Marketing. Planning of Enterprise Search projects (behind the firewall) should also include Human Resources, helpdesk staff, corporate librarians, sales engineers and professional services, security and compliance officers, CFO and legal staff, and any knowledge workers central to the company's core competence. We're not suggesting you design by committee, this isn't about governance, it's about gathering input. Some companies formalize search terms into an SCOE (Search Center of Excellence), and maintain a list of other stakeholders in the company to routinely communicate with. And finally, an area where search vendors are still mostly quiet is what to do when you make these discoveries, how do you turn them into actions that will improve the situation. For example, very few search analytics tools directly tie search reports into the content promotion engine so that an identified problem can be immediately fixed. There are manual procedure based Best Practices for further acting on these discoveries, and products like our own SearchTrack integrate analytics and defining suggestions into a single unified interface. [back to top]Compliance and eDiscovery: Do You Need Deep Iteration and Deep Facets?A subject that isn't talked about much, because it doesn't affect casual searchers, is the question of whether or not a search engine can return every single document that matches a query, even if there are a million matches. A corollary is whether or not the terms and counts presented in document navigators reflect every single matching document, or are just an estimate based on looking at the first few hundred or thousand docs. Many engines do not allow you to see every matching document. This is normally fine, since even a determined human will typically give up after looking at 20 or so pages of results. But if your company is responding to a subpoena to produce all documents related to a particular set of terms, the judge is likely to mean ALL matching documents, no matter how many there are. Similarly, a particular term may appear in thousands of articles written by a hundred different authors. It might happen that none of a particular author's articles show up in the first thousand pages, but that he has written many articles on the subject. A search engine that shows author as a facet, based only on the first thousand matches, would not even display his name in the list. On the Internet we could argue that this author's articles must not have been that relevant so it doesn't matter, but a researcher in a specialized field might have recognized that author's name, had it been listed, and might have been very interested in what he had written. Such stringent requirements are rare in general usage, but if they do apply they are likely to be very important, and may not be easy to find in vendor literature. Both Endeca and Dieselpoint claim to be capable of handling this. [back to top]Pairing Data with NavigatorsOK… this is a little techie… but we need to revisit the subject of your data one more time, because it relates to which search engine features are likely to work well. Search engine vendors offer a confusing selection of clickable results list gadgets for users to drill down and refine their results, technologies with names like "parametric search", "faceted navigation", "tag clouds", "taxonomies", "automatic clustering"… the list goes on and on. Since all these clickable links look about the same, casual users assume they are all very similar. But it's important to understand that the implementations of these techniques are radically different, and each is best paired with different types of data. A vendor who is proud of one particular method will tend to see all search problems in terms of their patents and PhDs, whereas companies are better served by looking at their data first, and then pairing it up with the ideal navigator technology. Public Internet search portals tend to not use these advanced techniques because of various technical reasons, including the wider variety and amount of data they must contend with, and the less sophisticated nature of their users. Corporations can actually surpass the level of search functionality offered by Internet search, because companies have more control over these technical issues. Yahoo does use taxonomies, some portals offer clustering, and some social sites run on user submitted tags, but public Internet search actually lags behind in these areas. Although IT departments are used to the question "Why can't our Intranet search be just like Google?", we think management should be asking "How can our search be better than Yahoo, MSN and Google?" There are five general methods of results list navigators, which we organize into three levels of effectiveness: Level 1: Likely to provide optimal results
Level 2: Not quite as accurate
Level 3: Effectiveness varies widely
For best results, follow one of these general rules:
Comparing Faceted Search / Parametric Search and Taxonomies:These two are not the same, though there is some overlap. You've probably used Facets on one of the large consumer electronic sites, and Taxonomies on the Yahoo or DMOZ.org search portals. Some data is better suited for Facets; other systems may work better with Taxonomies. Both assume that the raw content or data has some structure or organization. If you have documents with a lot of high quality meta data, or database records which have well defined fields, then the data is structured at the document level and would typically be paired with Faceted Navigation / Parametric Search. Content that lacks that individual document structure but still has an overall organization (the "corpus" level), would typically be paired with a content-based taxonomy, or perhaps a very simple facet. Parametric Search / Faceted NavigationOverall, this provides some of the best results for users, but requires the documents (or database records) to have quality meta data (or database fields). Vendors differ on their meanings of the terms Faceted Navigation and Parametric Search. Generally these are very similar techniques, although some experts define additional functionality for Facets. We will cover this in future articles or blog postings. If content does not have this quality meta data, these techniques won't work very well. Another option for content that lacks meta data is to "upgrade the data" by parsing it from the text or otherwise deriving meta data. This topic is discussed earlier in this section. [back to top]Data CleanupSome businesses find themselves in the awkward spot of having some meta data, but perhaps not enough to drive faceted search. Or their database fields are not populated consistently enough, or with high enough quality, to power search facets. We certainly agree that data quality is a big concern, the "garbage in, garbage out" computer saying still holds. However, we counsel to such clients not to give up so easily. Document meta data can be normalized and improved, or source database fields cleaned up. Some of the search vendors now offer tools to do this, as do various third parties. Content with marginal meta data, but that exists in an overall structure, can have additional meta data derived from that structure. And finally, Entity Extraction can be used to generate additional meta data. So in this case, if at all possible, we would advise upgrading the data to fit facets, vs. going with one of the lesser methods. [back to top]Entity Extraction and Fact Extraction:These methods are somewhat less predictable, but do allow you to find people, places, companies and other well-understood objects in your content, then present a very simple set of navigators based on those items. Even simpler navigators can be had by just showing the number of matches from each data source and letting the user click on a source to further narrow the search. [back to top]Taxonomies / Ontologies / Topic Sets / "Road Maps"Some products and demos assume taxonomies will be used to leisurely browse a set of content, without doing a specific search. This was the earlier usage model for them. When we talk about Taxonomies in relation to search, we mean the clickable trees that show up next to a result list, that allow you to narrow your search results by clicking on a node and finding the matches just within that category. By some definitions this could even be thought of as a specialized type of faceted navigation, though that distinction is not particularly important, as the tools for working with taxonomies tend to be different from those used for parametric and faceted data. Taxonomies can be great if you have a taxonomy and your data has been organized into it. If it hasn't, it may be possible to upgrade the data with some automated tool, placing all of your documents into a taxonomy. And if you don't have a taxonomy at all, other tools exist to help create it, usually available from the same vendor. This subject is so broad that there are even Taxonomy Bootcamps offered by some companies. Taxonomies have three basic flavors:
The first two are more traditional, more predictable. The third, organizing the site or content by what users are actually searching for, can provide a quick fix while longer term improvements are being worked on. How data gets into Taxonomies is an interesting subject. There are newer tools on the market that take completely unstructured data and try to coax it into a taxonomy, effectively "upgrading" the data by adding a logical structure; older tools had humans manually create the rules. Taken together these tools are sometimes referred to as automatic classification systems, taggers or profiling systems. Even now a few high value industries use human experts to manually assign documents to categories in a taxonomy, or carefully supervise automated tools. A simpler cousin of this manual input is the tagging that is popular on many web sites, typically adding descriptive tags to photos or videos. However, in most systems these tags are not presented in any sort of nested way, so we would not tend to call them a proper taxonomy. Some have used term "Folksonomy" to describe this type of site. But again, users are adding descriptive data to documents, thus upgrading the meta data of the content. [back to top]User Driven Navigators: Tag Clouds / Folksonomy / Community Driven ContentEditor's Note: This is an extension of the third type of taxonomy, "behavior based", but is a large enough topic that we break it out separately.Tag clouds can be great if you have enough active users. The problem is that if participation is about 1% on a big public site, that is still a critical mass of contributors; but in organizations with only hundreds or thousands of users, there may not be enough contributors to get adequate tags in place. There are techniques which retool socially-driven content for smaller groups, which is possible by using inferred tags, but the effectiveness of doing so has detractors and it is generally not a "canned" feature in mainstream search software. A further driver is if the items you want to search do not have text, such as photos or videos. In such a case user-driven navigators may be the only reasonable option, though some vendors now offer audio and video mining. Tags and behavior can be used to drive both relevancy and results list navigators. Further, like taxonomies, Tag Clouds are sometimes used only for browsing or an initial search. In other words, all visits to the site see the same tag cloud prior to doing a search, can then click into a tag to see matching documents, then click on other tags. To be considered a navigator, tags would need to appear next to search results and be specific to the set of matching documents so that you could "drill down" into them to further narrow results. Users would see different tag clouds for each search; this has been a point of confusion for some companies when comparing different types of navigators. [back to top]Automatic Clustering/Unsupervised Clustering:This is a more recent development which leverages advanced statistics and may or may not provide acceptable results. It goes by other more obscure names as well. These mathematically rooted techniques are still relatively new and rather unpredictable, though vendors are quite proud of the PhDs and Patents that power these new engines. Vendors who offer this technology tend to think it's applicable for any problem you might have – it even removes tough coffee and tea stains! As you can probably tell, we're a tad more skeptical. There are some good implementations and bad. Usually we suggest these techniques only for data that has no structure or meta data, and that cannot be upgraded, and are therefore not applicable for facets or taxonomies. Google is betting heavily on this type of navigator, over the more traditional techniques. If it can be done well, it is certainly more consistent with the "appliance" model of radically simplified administration. We promise to keep an open mind and report back on this; Google's technical prowess has certainly surprised us all in the past. [back to top]Summary of Strategic and Technical Search PointsHere we present a quick summary of this three part series, and how technical design decisions can affect your search engine. Sort of a mini Enterprise Search Manifesto. [back to top]High Level / Strategic:
Technical:
In ClosingThe subject of Enterprise and Customer Facing Search, and how they differ from public Internet search, could fill volumes. But there is one overriding point to take away: Feeding an army of a million soldiers is different than running a fine French Bistro because details matter! Most engines can be adjusted to meet your critical business needs - if you know where to look. New Idea Engineering always welcomes your questions and input. Feel free to contact us at info@ideaeng.com [Read Part 1 and Part 2 of this series.][back to top] |