Search this site:
Enterprise Search Blog

Glossary

Navigator:

A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z 

Definitions:

16 bit characters
The text of most Asian languages is written with a very large alphabet, so each letter must be stored in a computer as at least two bytes (16 bits). Western languages use a much smaller alphabet, and therefore can be stored as just 1 byte (8 bits). Newer systems will typically store all text in Unicode.
[back to top]
 
24/7
See Main Definition:  Twenty Four by Seven
[back to top]
 
32 bit
Technically refers to 32 bits (4 bytes) of computer data. However, it also refers to group of operating systems that process data 32 bits at a time. These OSs were the dominant variety for business computers in the 1990s and early 2000s, and included Windows and Linux. However, the 32 bit architecture eventually limited how much memory could be used, and 64 bit operating systems are starting to take their place. There are also 32 and 64 bit versions of computer languages and Java. High end search applications are now being deployed on 64 bit versions of Windows and Linux.
[back to top]
 
5 by 8
See Main Definition:  Normal Business Hours Synonyms:  8 / 5
[back to top]
 
5 by 9 (reliability)
See Main Definition:  Five Nines
[back to top]
 
5 by 9 (time)
See Main Definition:  Normal Business Hours Synonyms:  9/5, 9 to 5
[back to top]
 
64 bit
Technically refers to 64 bits (8 bytes) of computer data. However, it also refers to group of more powerful operating systems that process data 64 bits at a time. These OSs are replacing the older 32 bit OSs that had been dominant in the 1990s and early 2000s. The newer 64 bit versions of Windows and Linux can access much more memory and can use 64 bit versions of computer languages and Java. High end search applications are now being deployed on 64 bit versions of Windows and Linux.
[back to top]
 
7 by 24
See Main Definition:  Twenty Four by Seven
[back to top]
 
8 bit characters
The text of most Western languages can be written with a relatively small alphabet, so each letter can be stored in a computer as a single byte. Asian languages use a much larger alphabet, and therefore can only be stored as 16 bit characters or as Unicode.
[back to top]
 
8859
See Main Definition:  ISO-8859-1
[back to top]
 
8859-1
See Main Definition:  ISO-8859-1
[back to top]
 
99.999
See Main Definition:  Five Nines
[back to top]
 
  A   [back to top]
 
Access Control List
Synonyms:  ACL Related Terms:  SSO, ACL, document level security A set of permissions attached to a specific file or piece of data. ACL's typically list individuals and groups of people who can have access to the data, and who should specifically be denied access. It can also specify what level of access, such as "read only" or "modify". ACLs can be useful when implementing document level security.
[back to top]
 
accessibility
See Main Definition:  Americans with Disabilities Act
[back to top]
 
ACL
See Main Definition:  Access Control List
[back to top]
 
ACL
See Main Definition:  Access Control List
[back to top]
 
Active Directory
Synonyms:  Microsoft Active Directory, AD Related Terms:  SSO, ACL, document level security Software sold by Microsoft to store information about company resources such as people, machines and data. AD can be used as part of a system for doing document level security.
[back to top]
 
Active Server Pages
Synonyms:  ASP Related Terms:  web server, JSP ("Java Server Pages") A Microsoft programming language and environment for building interactive web sites. ASP stands for "Active Server Pages". Allows a programmer to easily embed computer programs inside of web pages. Though they have similar names, Microsoft ASP and Sun JSP are generally not compatible with each other.
[back to top]
 
AD
See Main Definition:  Active Directory
[back to top]
 
ADA
See Main Definition:  Americans with Disabilities Act
[back to top]
 
Adobe Acrobat
See Main Definition:  Portable Document Format
[back to top]
 
Adobe PDF
See Main Definition:  Portable Document Format
[back to top]
 
Advanced Query Language (FAST)
See Main Definition:  FAST Advanced Query Language
[back to top]
 
advanced search
Allowing a power user to further refine their search by including additional search options. These options often cause additional relational field operators to be added to the search, creating a hybrid filtered search. Though powerful, few users were actually inclined to visit a page labeled "Advanced Search", so this has been generally replaced by Parametric Search, which allows for similar functionality, but does so interactively after the first search results have been displayed.
[back to top]
 
agents
Related Terms:  automated document classification With regard to the search engine industry, the term "Agents" usually means a saved search that keeps running; it checks each new document that is added to the system, and takes some predefined action when it gets a match. Usage: "agents" was a very hot marketing term in the 1990s; when search engine vendors talked about agents what they were typically referring to was their automated document classification products ("agents" sounded cooler)
[back to top]
 
AJAX
Related Terms:  Java Script, XML A form of client side scripting including Java Script and XML that makes web sites more interactive. Although this makes for more interactive web sites, search engine spiders sometimes have trouble navigating such sites, and therefore parts of the site might not be indexed. There are ways to make a web site accommodate both technologies.
[back to top]
 
Algorithm
Related Terms:  statistical methods An algorithm is a sequence of instructions that describe the steps required to perform a task. With respect to search technology, it is typically used to describe how the search technology determines the most relevant document for a given search, as well as how to order the result list from most relevant to least relevant.
[back to top]
 
alternate term suggestions
Synonyms:  alternate terms, alternative terms, related terms Related Terms:  content promotion, "Did you mean?" A search engine suggests other words a user might be interested in, based on the search they just issued. Sometimes the alternate terms might a different / corrected spelling of a word, for example a user typing "Arkansaw" might get the suggestion "Arkansas"; Google suggests such corrections with their "Did you mean:" links in their results list. Other alternate terms might be other subjects that are directly related, for example somebody looking to buy a flashlight might also get a suggestion for the term "batteries".
[back to top]
 
Americans with Disabilities Act
Synonyms:  ADA, site accessibility Laws that mandate web sites be usable by persons with disabilities who may not be able to use technologies like Flash or JavaScript. The changes have also helped web sites be more accessible to search engine spiders.
[back to top]
 
AMHS
See Main Definition:  Automated Message Handling System
[back to top]
 
AML
See Main Definition:  Anti Money Laundering
[back to top]
 
analog operators
See Main Definition:  weighted operators
[back to top]
 
analytics
See Main Definition:  Search Analytics
[back to top]
 
AND Operator
Related Terms:  Boolean operators The specific Boolean operation that requires two terms to be present in order to include a document in the result list. For example, "federal AND state" will return only those documents which include both terms.
[back to top]
 
annual maintenance
Related Terms:  perpetual license A fee paid yearly to the creator of a piece of software that entitles the customer to software updates, bug fixes and technical support. This type of fee is more common with perpetual software licenses than term licenses.
[back to top]
 
Anti Money Laundering
Synonyms:  AML Related Terms:  BSA, Bank Secrecy Act, KYC, Know Your Customer, SAR, Suspicious Activity Report Automated and semi-automated software tools for spotting unusual financial activity. Banks and other financial institutions are required to help the government track unusual financial transactions that might be related to illegal activity or the funding of terrorism. Search engine technology is sometimes used to assist in these efforts.
[back to top]
 
Anti Terrorism Financing
Synonyms:  ATF Related Terms:  AML, Anti Money Laundering Using search technology to help spot financial transactions that might be used to fund terrorist activities.
[back to top]
 
API
See Main Definition:  Application Programming Interface
[back to top]
 
Application Programming Interface
Synonyms:  API Related Terms:  embedded search engine A software library that allows computer programmers to add features to existing software products. API stands for "Application Programming Interface", and is usually a set of documentation and sample computer code showing how to extend the product. Search engine APIs allow developers to embed a search engine into other applications, for example to add search capabilities to an email program.
[back to top]
 
Application Service Provider
Synonyms:  ASP Related Terms:  hosted search, FreeFind A form of software that does not need to be installed on web server that is using it. Transparent to the site visitor, their browser is directed to talk to the ASP server for specific tasks. An example of an ASP is FreeFind; they provide search functionality for web sites without the need for each web site to install software.
[back to top]
 
AQL
See Main Definition:  FAST Advanced Query Language
[back to top]
 
Ariba ready
Synonyms:  punch-out Related Terms:  punch-out, eProcurement, B2B, eCommerce, CXML Ariba was one of the earlier companies to offer punch-out capabilities in eProcurement systems. Their idea was so popular many other companies emulated their protocol and are mutually compatible, or "Ariba-Ready".
[back to top]
 
ASP ("Active Server Pages")
See Main Definition:  Active Server Pages
[back to top]
 
ASP ("Application Service Provider")
See Main Definition:  Application Service Provider
[back to top]
 
ATF
See Main Definition:  Anti Terrorism Financing
[back to top]
 
attributes
See Main Definition:  Meta Data
[back to top]
 
audio mining
Related Terms:  Autonomy Automatically extracted the words and phrases from a recorded conversation and converting them to text; the system will also typically record a time code representing when each word was spoken. Later, a search engine could take user search terms and find the appropriate audio or video clip containing those words. This technique is useful to some business, but technical barriers have kept it from widespread corporate usage.
[back to top]
 
automated document classification
Synonyms:  agents, AMHS ("Automated Message Handling System"), profiling Related Terms:  Verity Real-Time, automated document profiling A system that is preloaded with a set of searches and then watches for new documents that match each search; when a match is found, an action is taken. Common actions taken when a match is found include adding a meta tag to the document to flag it as being part of particular category of documents, or automatically forwarding the document to the person interested in that search.
[back to top]
 
automated document profiling
See Main Definition:  automated document classification
[back to top]
 
Automated Message Handling System
See Main Definition:  automated document classification Related Terms:  Verity Real-Time, agents, profiling Usage: AMHS was the preferred term used by the government in the 1990s for automated document profiling and distribution. Their "documents" tended to be government intelligence reports, aka "messages".
[back to top]
 
automatic failover
See Main Definition:  failover
[back to top]
 
Autonomy
Related Terms:  Autonomy IDOL, Autonomy K2, Autonomy Ultraseek A publicly held search engine company headquartered in England. In 2005 Autonomy acquired Verity, Inc. Autonomy now owns three of the well established enterprise search engine brands: IDOL, K2 (formerly Verity K2), and Ultraseek.
[back to top]
 
Autonomy IDOL
Related Terms:  Autonomy The core search engine technology created by Autonomy.
[back to top]
 
Autonomy K2
Synonyms:  K2, Verity K2 K2 was the core technology of many search engine products developed and sold by Verity, Inc. starting in the mid 1990s. Verity was acquired by Autonomy in 2006 and the K2 brand name was extended to also include some of Autonomy's products. Earlier versions of K2 were developed by Verity, ending in the K2 6.x product line. As of 2006, Autonomy has combined their IDOL core engine with the K2 interface and re-released it as K2 v7. K2 v6.x was the last version based on the Verity core technology; K2 v7 uses the Autonomy IDOL engine as its core.
[back to top]
 
Autonomy Ultraseek
Synonyms:  Ultraseek, Verity Ultraseek, Inktomi Ultraseek, Internet search syntax, Infoseek A very popular commercial search engine currently sold by Autonomy; Autonomy is the fourth owner of this product line. It was originally developed at Infoseek in the 1990s, it was then briefly owned by Inktomi, it was then acquired by Verity. Verity worked to integrate its K2 product line with Ultraseek, though the two search engines were originally developed independently. Autonomy has also started integrating some of the Ultraseek's functionality for use with their own IDOL product line.
[back to top]
 
average QPS
Related Terms:  qps, peak qps The average number of searched performed each second. This is calculated by taking the total number of searches for some time period, and dividing it by the number of seconds. For example, a site man know how many searches were done in a day. Dividing that number by the number of seconds in a day (86,400) would give the average QPS. However, search activity rises and falls during the day, and some engines limit how many searches can be done in any one second, regardless of the average, see Peak QPS.
[back to top]
 
  B   [back to top]
 
B2B
See Main Definition:  Business to Business
[back to top]
 
B2C
See Main Definition:  Business to Consumer
[back to top]
 
backwards compatible
Synonyms:  backwards compatibility As software evolves the specific format of data it expects to read or output changes, or the protocols way in which it connects to other systems change. If the new software can still understand or interact using the old syntax as well, then it is said to be "backward compatible".
[back to top]
 
Bank Secrecy Act
Synonyms:  BSA Related Terms:  AML, Anti Money Laundering Government regulations requiring banks and other financial institutions to help the government track unusual financial transactions that might be related to illegal activity or the funding of terrorism. Search engine technology is sometimes used to assist in these efforts.
[back to top]
 
bare metal
Related Terms:  virtualization A physical machine running a very efficient and lightweight environment for virtual machines. Generally such a server an only run virtual machines, and cannot be used for other computing tasks.
[back to top]
 
batch file
Related Terms:  script file, script, shell script A series of Microsoft Windows commands stored in a text file usually having a .bat extension.
[back to top]
 
batch-mode spidering
Related Terms:  spider A spider that completely revisits every page on a web site when it wants to respider the site. This is the older, simpler design of a web spider, but it is not practical for sites with large amounts of content.
[back to top]
 
bcp
See Main Definition:  Business Continuity Planning Synonyms:  failover, hot space
[back to top]
 
Behavior-Based Taxonomy
Related Terms:  taxonomy Unlike a generalized taxonomy, a Behavior-Based Taxonomy is a list of the search terms your site visitors use when they perform searches on your own site search engine. A Behavior-Based Taxonomy is a great start for a relevant generalized taxonomy, as well as a great source of information about how your visitors ask for content on your site. See http://ideaeng.com/pub/entsrch/issue06/article01.html
[back to top]
 
Best Bets
See Main Definition:  content promotion Usage: Ultraseek name for rule based content promotion.
[back to top]
 
BI
See Main Definition:  Business Intelligence
[back to top]
 
binary files
Files stored in a computer hard drive that contain seemingly random bytes of data, not easily intelligible by a human reader; the file looks like "gibberish". The contents of binary files can usually only be understood by the program that created them, or by other compatible software packages. The advantage of binary files is that, for programs that can understand their contents, they are more efficient in terms of space and/or access speed.
[back to top]
 
Binary Formats
Related Terms:  binary files Most computer programs store content in a format optimized for the application rather than for human convenience. Thus, even when the content of a document is mostly text, the actual file will contain information that is difficult to view without using an application specifically designed to read that optimized format. The Windows 'Notepad' application stores only the characters and words entered by the user, so advanced features like 'revision history' are generally not available. In Microsoft word, however, the actual file will contain a great deal of information for tracking changes, for storing information about the file (known as metadata) such as Author Name, and other information meaningful only to the word application itself. To search these binary formats, a specialized filter is required to remove all but the useful textual content of the document.
[back to top]
 
binary indices
See Main Definition:  index (noun)
[back to top]
 
blob
See Main Definition:  zone Usage note: The term "blob" is used more frequently by people with a relational database background; search engines typically refer to zones. "zone" tends to be associated with Verity's vocabulary.
[back to top]
 
block mode terminals
Related Terms:  web browser A method of communication between a remote client computer and a central server where data is transmitted in chunks or discrete transactions, instead of sending the data one character at a time. The modern web browser and the HTTP protocol can be viewed as a similar system, but implemented with more modern technology and graphics.
[back to top]
 
Boolean operators
In relation to search engines, a search syntax that only supports "yes" or "no" logic, and allows parts of a query to be joined together with the logical AND, OR and NOT operators.
[back to top]
 
Boolean Search
Related Terms:  Boolean operators A search syntax that only supports "yes" or "no" logic, and allows parts of a query to be joined together with the logical AND, OR, XOR and NOT operators.
[back to top]
 
boost
A search syntax that allows for certain search terms to be given more weight in relevancy calculations.
[back to top]
 
bot
See Main Definition:  spider Synonyms:  'bot, robot
[back to top]
 
brokered indexing
Related Terms:  Autonomy K2 spider, FAST Search and Transfer A series of cooperating software modules that can index vast amounts of data into a search engine, by efficiently dividing up the many different indexing tasks.
[back to top]
 
brokered search
A users' search is received by one search engine, which then forwards the request on to other engines and combines the results. This is similar to federated search, except that all the remote search engines are typically from the same vendor, so query syntax and relevancy are the same.
[back to top]
 
BSA
See Main Definition:  Bank Secrecy Act
[back to top]
 
Business Continuity Planning
Synonyms:  bcp, backup systems, hot spare Related Terms:  failover systems, backup systems A set of computers that will take over operations if the production computers fail or go offline.
[back to top]
 
Business Intelligence
Synonyms:  BI Looking at data and statistics to gain insight into business trends and to spot problems and opportunities. BI usually refers to looking at internal company data such as revenue, server logs, communications and search activity / search analytics. When BI is focused on customer data is often referred to as Data Mining. When focused on other companies and competitors, it can be called Competitive Intelligence.
[back to top]
 
Business to Business
Synonyms:  B2B Related Terms:  b2c, c2c, eCommerce, eProcurement A segment of eCommerce focused on business buying and selling good and services to other businesses or the government. Small amounts of products can be bought in the same way that consumer would, for example a business could buy some paper at a local store and pay for it with a credit card. But larger businesses typically pay each other with a system of purchase orders and invoices, and often negotiate volume pricing. Companies may not need to charge each other sales taxes, depending on the transaction. And two large companies that do lots of business with each other may link their systems together to facilitate partially automated or totally automated purchasing, often called eProcurement. As an example, a car manufacturer may buy millions of tires from a tire manufacturer, and can have tires automatically reordered as needed.
[back to top]
 
Business to Consumer
Synonyms:  B2C A segment of eCommerce focused on companies selling items to consumers and individuals. This is the most well know type of eCommerce, given the popularity of sites like Amazon and iTunes. Having good search on these sites is particularly important because consumers have many sites to choose from and will generally not tolerate confusing or frustrating search engines.
[back to top]
 
  C   [back to top]
 
C2C
See Main Definition:  Consumer to Consumer
[back to top]
 
Call out
A Call Out in the search environment is a methodology by which the search application retrieves results from one or more external content sources to provide better results for the search user. Federated Search is typically done using code to execute a query on additional content sources, so it is essentially a Call Out. However, the term usually applied more narrowly to refer to high confidence data sources for one or two specific results - the objective is that the result list is the answer to the query. For example, if the search application recognizes that a query looks like a name, it may perform a 'Call Out' to the employee directly to return the person's name, email address, and phone number. Google does this with FedEx shipping numbers, area codes and even airplane flight numbers.
[back to top]
 
case insensitive
Search terms will match words in a document without regard to upper and lowercase letter differences. This is the default behavior of most search engines.
[back to top]
 
case sensitive
Search terms must match words in a document exactly in regards to upper and lowercase letters. This type of matching can be helpful when looking for proper names or abbreviations.
[back to top]
 
catalog
See Main Definition:  search indices Synonyms:  document catalog
[back to top]
 
Catalog Markup Language
Synonyms:  CXML Related Terms:  eProcurement, punch-out, eCommerce, B2B An electronic format for product information; an entire catalog of products can be transmitted in this XML based format. It is useful when companies buy products from each other, it allows for more automation and tighter integration and is part of the overall eProcurement movement.
[back to top]
 
CGI
[back to top]
 
CGI field
Within the scope of search engines, a CGI field is a piece of data submitted to the search engine from a web page search form. It may contain the text that the user typed in, or it may represent various check boxes or items selected from a dropdown list. CGI fields are the key interface between a search form and the underlying search engine. These are not the same as regular "fields" in a search engine or database.
[back to top]
 
Click Tracking
A report showing what parts of a web site a visitor looked at This information is gathered by keeping track of which links the visitor clicked on, or by looking in the web site's log files. Click tracking does not provide as much insight into a visitor's intent as the newer "Search Analytics" product do. (defined below)
[back to top]
 
Click-through
Related Terms:  search analytics A report showing which specific link a user clicked on when looking at the results of a search. Since searches often bring back many pages, it can be useful to see which page the users think is relevant.
[back to top]
 
closed network
See Main Definition:  Intranet
[back to top]
 
cloud computing
Related Terms:  virtualization, ec2 An extension to virtualization where many thousands or millions of virtual machines are automatically deployed over a vast network of physical machines. Most virtual machine users do not own the physical machines they are using; they may pay a small fee for each hour of usage. In the extreme form, Cloud Computer does away with even the virtual machine, and instead just automatically distributes and runs software.
[back to top]
 
clustering
Related Terms:  Search 2.0, results list visualization, noun phrase extraction, n-gram Grouping together similar documents in a search results list. There are many different techniques for doing this, most using statistics to analyze the words in the document.
[back to top]
 
CMS
See Main Definition:  content management system
[back to top]
 
codepage
Related Terms:  Unicode An older numerical system for representing written characters for specific languages. The problem with this old system was that different languages used the exact same numbers to represent entirely different symbols, so the languages could not be mixed together. Also, if the specific codepage used to write text was not know or set incorrectly, the data would be misinterpreted. Unicode is a more modern method which accommodates all characters into a single unified numbering system. Newer search engines are based on Unicode, and convert text created with other codepages into Unicode. However, codepages are still used in several operating systems and in older programs for routine work in the user's local language, so search engines still need to understand it.
[back to top]
 
collection
See Main Definition:  search indices Usage: Usually associated with Verity K2 terminology
[back to top]
 
collection level security
Related Terms:  collection Controlling access to sensitive documents at a high level by grouping similar documents together into specific collections, and then allowing users to have access to only certain collections. For example, all users can search the public content, but only employees can search both public and "Intranet" collections. This is the most common form of search engine security because it is relatively easy to implement; however it is not flexible enough for more complex security requirements.
[back to top]
 
command line
Synonyms:  command line arguments, command line options Traditionally, software was started by typing in commands to the computer. These commands had options that could control the details of what the software would do. On Microsoft Windows, these command line options usually start with a forward slash (/). On Unix, these commands usually start with a single or double hyphen (- or --) Many search engines include tools that are run from the command line. This allows them to be run from a script file or "cron job".
[back to top]
 
Common Gateway Interface
Synonyms:  CGI Related Terms:  web page form, dynamic content, URL The means by which software running on a web server interacts with visitors. For example, when you submit a search form on a web site, the query is sent via CGI. A link to a CGI will sometimes have a question mark in the URL.
[back to top]
 
company portal
See Main Definition:  enterprise portal Usage: the specific phrase "company portal" may imply that the enterprise portal is available to the general public, or at least to customers and partners.
[back to top]
 
Competitive Intelligence
Related Terms:  BI, Business Intelligence Looking at other businesses in the same industry to spot trends, problems and opportunities. These systems often use search technology and spiders to routinely comb through online web sites. The more sophisticated systems run continuously and alert business managers of interesting changes.
[back to top]
 
compliance
Related Terms:  Sarbanes-Oxley Act, "SOX" In this context, insuring that 100% of data is represented and searchable in a vertical application. For example, making sure that a search for particular client's name will always reliably bring back all pertinent records. Vocabulary: Sarbanes-Oxley Act: AKA: "SOX": Compliance regulations relating to what information companies must maintain and provide. SOX compliance is often related to Knowledge Management Systems and related search technology. See http://www.sarbanes-oxley-forum.com
[back to top]
 
compound document
See Main Definition:  sub-document indexing
[back to top]
 
Consumer to Consumer
Synonyms:  C2C This is a newer segment of eCommerce, focusing on individuals buying, selling and trading with other consumers, though often still facilitated by a company. Auction sites like eBay are a good example. As with B2C, having a good search engine is important. An additional challenge for search is that individual sellers' vocabularies vary widely, so the same product might be described many different ways.
[back to top]
 
content
Related Terms:  document In the context of search engines, "content" is a general term referring to the data that is to be indexed and searched by the search engine. It might include web pages, files, database records, or other textual data that needs to be searched.
[back to top]
 
Content Based Taxonomy
Related Terms:  Taxonomy A taxonomy based primarily on the content you already have, as opposed to the more traditional version based on subjects. For example, some automated tools can take all of the pages on a web site and automatically sort them into categories. Or the URL structure of web pages could be mapped into a Taxonomy.
[back to top]
 
content management system
Synonyms:  CMS, document management system Related Terms:  embedded search engine Software that manages corporate documents and other important data. CMS often includes document version tracking, document security enforcement, workflow automation, and often have an embedded search engine to allow users to search through all the documents quickly.
[back to top]
 
content mining
Related Terms:  ETL, legacy data The process of extracting valuable data that is stored in a normally inaccessible format. For example, many companies have textual data in Word, PDF or PowerPoint presentations, but they might like to load that into a database. Content mining software can go in and parse out the bits of data that are needed. See also "Legacy Data"
[back to top]
 
content promotion
Synonyms:  directed results, best bets, quick links Related Terms:  alternate term suggestions, "Did you mean?", SearchTrack A system to allow more precise control over which documents are returned as the result of a search; some systems allow an informed employee to suggest specific web pages that will best answer specific questions. For example, many pages on a web site might contain the term "support", but content promotion allows the main Tech Support home page to be suggested, above all other matches, when somebody searches for support.
[back to top]
 
content scraper
See Main Definition:  scraper
[back to top]
 
context
Synonyms:  adding context Related Terms:  social network, subject domain disambiguation In regards to search engines, context is a way of improving search relevancy by considering factors beyond what the user actually typed in; in other words, the engine adds in additional data or assumptions to the search to get better results. There are many forms of "additional data" that search engines might consider. For example, the system may consider social networking data to boost relevancy of popular documents. Or the system may limit the scope of search to a particular subject domain; for example, if an computer technician searches for "sun", the system might assume they are referring to the computer company Sun Microsystems, whereas an elementary student may have been referring to the Sun at the center of our solar system.
[back to top]
 
Contextual Summaries
Related Terms:  dynamic summaries Search applications often display titles, dates, and summaries of relevant documents to assist the user in determining which document(s) to view. A summary is often provided by the content owner and be included as metadata within the document, but this summary is static in that it will be the same regardless of what query term is used. Some technologies create Dynamic Summaries as the search is performed, but independent of the query. Newer search technology provides the ability to create Contextual Summaries which are generated dynamically but based on the specific search terms used. This insures that the summary the user sees is that portion of the document which is most relevant to the query terms.
[back to top]
 
corporate network
Related Terms:  private network, firewall, Intranet The secure network that links together computers at a company. Traffic and visitors from the outside global Internet are kept out by a network isolation filter called a firewall.
[back to top]
 
corporate portal
See Main Definition:  enterprise portal
[back to top]
 
crawler
See Main Definition:  spider Usage: Some vendors do make a distinction between a "crawler" and a "spider". The different terms sometimes involve the decoupling of downloading web pages and creating the actual search indices.
[back to top]
 
cron job
Related Terms:  script file, script A way of scheduling and automatically starting Unix shell scripts at regular time intervals. For example, many sites use a cron job to run their search engine spider at night when the network is not being heavily used.
[back to top]
 
cross vendor
Related Terms:  third party vendor, NIE, SearchTrack A product, service or tool that works with multiple search engines. Most larger companies actually use more than one search engine, but the tools each search vendor provides tends to work only with their engine. Third party vendors can offer tools that work with multiple search engines. For example, when a Marketing department is looking at search activity for different parts of the site, they probably don't care what specific search vendor was used, they just want to know what visitors searched for. A third party tool could offer search analytics across all the search engines in use on the site and thus provide this type of global view.
[back to top]
 
CXML
See Main Definition:  Catalog Markup Language
[back to top]
 
  D   [back to top]
 
Data Mining
Related Terms:  ETL Analysis of large volumes of relatively simple data to extract important trends and new, higher level information. For example, a data mining program might analyze millions of product orders to determine trends among top-spending customers, such as their likelihood to purchase again, or their likelihood to switch to a different vendor.
[back to top]
 
Data Mining
Related Terms:  BI, Business Intelligence Looking at customer and sales data to spot trends, problems and opportunities. For example, a phone company could try predicting which customers are most likely to leave, and what offers might entice them to stay. Although Data Mining traditionally looked at numerical data, we believe the customers' searches should also be used.
[back to top]
 
data quality
See Main Definition:  search engine data quality
[back to top]
 
data silo
A system containing a set of documents or data. Silo sometimes also implies a rather standalone self contained system, which includes its own data storage and an embedded search engine. A silo that includes an advanced fulltext search engine and use primarily for that purpose may be referred to a search appliance. In many cases a silo may have its own embedded search engine and also have its content indexed and search by an external search engine as well.
[back to top]
 
database
See Main Definition:  relational database Related Terms:  collection, document index, catalog Usage: database is a very broad term, usually requiring more context to define precisely.
[back to top]
 
database gateway
Related Terms:  relational database, index (verb) A means of hooking up a search engine to a relational database, so that the database's records can be searched search engine.
[back to top]
 
database index
Synonyms:  index (noun) Related Terms:  relation database The binary files associated with a traditional database that hold the actual data; typically stored on a hard drive.
[back to top]
 
database offloading
Related Terms:  database, zero term search, fulltext engine Using search technology to perform queries that a traditional database would have normally been used for. Some search engines can produce reports and do fielded searches. The benefit claimed by vendors is that a site can have just one engine (the fulltext search engine) instead of two (a fulltext engine and traditional database engine), which is easier to maintain. Also, that performing searches with the fulltext search engine allows the database engine to focus on read/write transactions.
[back to top]
 
DB
See Main Definition:  Database Synonyms:  RDBMS
[back to top]
 
deep web
Related Terms:  scraper, spider, dynamic content, web page form A way to more deeply spider a web site, beyond simply following links. For example, a deep web spider often fills in web page forms with a range of terms and can submit the form repeatedly, and then capturing the various results. Many simple web spiders will miss content that is only accessible by searching with forms.
[back to top]
 
deferred search
Related Terms:  repository database, federated search This can be thought of as an extended form of federated search. Some remote systems may not accept distributed searches from a federated search engine. As a workaround, these remote systems are described in a repository database, and references to that remote system are returned. For example, a company may choose to not include HR payroll information in the Intranet federated search system. An employee searching for "salaries" could instead be given a notice telling them to visit the HR Payroll system to find salaries. If the employee has a login to that separate system, they can go there and do the search. In this way, highly sensitive data can still be located by those who need it, but not accidentally included in casual federated search results.
[back to top]
 
dev
See Main Definition:  Development systems
[back to top]
 
development systems
Related Terms:  qa, staging, prod, bcp A set of computers used to write new search applications and other software. These machines may be smaller or fewer in number than the production machines the software will eventually run on. Software is typically migrated from "dev" to qa or staging machines for additional testing.
[back to top]
 
Did you mean?, "Did you mean?"
See Main Definition:  alternate term suggestions Usage note: this particular phrasing was popularized by Google.
[back to top]
 
directed results
See Main Definition:  content promotion
[back to top]
 
disk storage
Related Terms:  SAN, NAS, NFS, RAID, IDE, SCSI, SATA, binary indexes, collections Simple computers store their data on a device called a hard drive, which is located inside the computer's case. Fast and reliable disk storage is critical for search engines, since they use the disks much more intensely than most other software applications to store their binary indexes. There are various standards for the internal electrical connections, usually IDE, SATA, SCSI or fiber channel. In more advanced computers there are other options. In some computers there are multiple hard drives which are configured to act as a group, in a RAID configuration. Other computers attach to external devices to store there data, often shared with other nearby computers, using protocols such as NAS, SAN and NFS. The concept of disk storage applies to any of these methods, whereas "hard drive" usually refers to only one physical component.
[back to top]
 
DMOZ
Synonyms:  dmoz.org An open source Internet taxonomy which attempts to catalog and organize all the web sites on the Internet. It is maintained by volunteers and used as a data source for many popular web portals including Google. It is sometimes said to be the open source competitor to Yahoo's ontology of web sites.
[back to top]
 
DNS
See Main Definition:  Domain Name Services
[back to top]
 
document
Related Terms:  record, hit, page, web page, URL, result, content A unit of data indexed and searched by a search engine; typically each document is equivalent to a web page on a web site, or perhaps a Microsoft Word or Adobe PDF file, or a record in a database.
[back to top]
 
document attributes
See Main Definition:  Meta Data
[back to top]
 
document catalog
See Main Definition:  search indices
[back to top]
 
document count
Related Terms:  license restriction Usage of this term varies. Sometimes it refers to the number of documents that matched a particular search. Other times it refers to the total number of documents that a search engine has indexed and can search against. This overall count may be restricted by the software license and license key. Other vendors measure document count indirectly by instead measuring the total size of all the documents, often in Gigabytes.
[back to top]
 
document fields
See Main Definition:  Meta Data
[back to top]
 
document filter
Synonyms:  filter The part of search engine technology that converts binary formatted documents, such as Microsoft Word, into a stream of text that is then processed by the search technology during indexing.
[back to top]
 
document frequency
Related Terms:  inverse document frequency The number of documents in a system that contain a particular word. The assumption being that if a word appears in many documents, it is LESS LIKELY to help in relevancy calculations. This ratio is often inverted so that larger numbers indicate more relevancy (Inverse Document Frequency)
[back to top]
 
document highlighting
The practice of visually highlighting the users' search terms in a matching document when the user opens it up to read it. This is sometimes confused with the highlighting of document summaries in the results list.
[back to top]
 
document index
See Main Definition:  search indices
[back to top]
 
document indexer
See Main Definition:  indexer
[back to top]
 
document level security
Related Terms:  collection level security, sub-document level security Controlling access to sensitive content on a document by document basis.
[back to top]
 
document management system
See Main Definition:  content management system
[back to top]
 
document meta data
See Main Definition:  meta data
[back to top]
 
document pipeline
Related Terms:  document indexing pipeline A set of processes that a document passes through while being indexed. Each process is designed to modify the document in a certain way. For example, a process may look for dates within the text of the document, and add any such dates to the document's meta tags.
[back to top]
 
document profiling
See Main Definition:  automated document classification
[back to top]
 
document tagging
Synonyms:  meta tagging, tagging Related Terms:  Meta Data, automated document profiling, scope of search, taxonomy When documents are fed through an automated document profiler, meta tags can be added to the document to reflect which profiles matched. Later, that meta data can be used to limit the scope of the search.
[back to top]
 
Documentum
Related Terms:  Content Management System A popular content management system.
[back to top]
 
Domain Name Services
Synonyms:  DNS Related Terms:  reverse DNS, symmetric DNS name resolution In computer networking, machines have a numerical TCP/IP address which is hard to remember, and a textual name, which is easier for humans to remember. DNS takes the name of the machine and looks up the numerical IP address. Some search engines require specific DNS configurations in order to run correctly.
[back to top]
 
DPump
Related Terms:  XPump, XML, API An enhancement to NIE's XPump language that allows Java programmers to add new features into XPump. DPump is the "API" for XPump.
[back to top]
 
drill-down
Related Terms:  Search 2.0, results list navigation Providing clickable choices in a results list so that the user can further refine the search results. For example, on a shopping site, a search for "plasma tv" might provide drill down links for various price ranges, various manufacturers, and links for particular stores where the product is sold. Clicking on any of these links will narrow the search to just those matches.
[back to top]
 
dynamic content
Synonyms:  dynamically generated content, dynamic web pages Related Terms:  URL, CGI, spider, relational database, CMS, static content Web pages on a web site that are generated dynamically whenever a visitor needs it. A simple example is a web page that includes an advertisement that changes each time a different visitor views the page. A more elaborate example would be a web based content management system (CMS) where each document is actually stored in a relational database and is looked up and shown whenever needed. Some spiders have trouble indexing dynamic content.
[back to top]
 
Dynamic Navigators
Links on a web page or search result page that change depending on the path a user has followed through a site, or based on the user query. See Guided Navigation.
[back to top]
 
dynamic summaries
A textual summary of a document is often displayed in the results list under the title of a document. Dynamic summaries show portions of the document that contain the specific search terms entered by the user; the exact terms are often highlighted or bolded in the summary. Dynamic summaries are very popular.
[back to top]
 
Dynamic Summaries
Related Terms:  contextual summaries Search applications display titles, dates, summaries of relevant documents and other metadata to assist the user in determining which document(s) to view. A summary is often provided by the content owner and be included within the document, but this summary is static in that it will be the same regardless of what query term is used. A Dynamic Summary extracts those sentences that best summarize the document as a whole, and can be quite useful when the content owner has not provided unique document summaries for every document. A Dynamic Summary does not change based on the query term itself as do Contextual Summaries.
[back to top]
 
  E   [back to top]
 
early binding security
An efficient method of providing document level security. A user's search terms are augmented by field level operators that setup a filtered search based on which documents that user can see. This change to the query happens before it is submitted to the search engine, so that the search engine only returns documents that the user can see.
[back to top]
 
eCommerce
Related Terms:  eProcurement, b2b, b2c, c2c The buying and selling of products, with some part of the transaction taking place in electronic form, and thus the "e" prefix. Search engines are often integrated with these systems to help locate the products to be purchased. Examples include shopping and buying products from a web site, auction sites, and eProcurement. eCommerce is typically divided into at least three segments: B2C (Business to Consumer), B2B (Business to Business), and C2C (Consumer to Consumer). Different types of software and methods work best in particular submarkets.
[back to top]
 
eDiscovery
Searching for documents related to a particular legal event, often as the result of a subpoena, used to be called just "discovery". However, since most corporate documents are now stored in electronic form inside a computer, most searching is done electronically (vs. manually), often with the help of a search engine. The "e" prefix refers to the more modern electronic method. eDiscovery also refers to the industry that supplies computer hardware and software that do that actual searching and handle the storage and management of the documents.
[back to top]
 
embedded search engine
Related Terms:  search engine, API When a search engine is included as part of a larger software application. For example, many content management systems allow users to search through all the documents in the system; the search engine has been embedded in the CMS via the search engine's API. Many email programs also have embedded search engines, to help users find old emails by keyword.
[back to top]
 
End User License Agreement
Synonyms:  EULA, license Related Terms:  software license A form of legal contract defining the rights a user has to a piece of software or service. This is a variation of a standard software license, and actually the abbreviation is more common than the full phrase. EULA's are sometimes presented on screen for viewing during the installation of software and the person running the computer must specifically acknowledge agreement to the terms before continuing. In larger companies software licenses are handled by a dedicated legal staff, instead of each employee agreeing to individual EULAs.
[back to top]
 
Endeca
Related Terms:  taxonomy, parametric search A search engine vendor with excellent parametric search technology. See http://endeca.com
[back to top]
 
enterprise portal
Synonyms:  company portal, corporate portal Related Terms:  portal site A portal site that is specific to one company. Usually the portal will be inside the company's secure Intranet and only be accessible to employees. It will usually include an enterprise search engine as an important component.
[back to top]
 
enterprise search engine
Related Terms:  search engine, Intranet, search engine vendors A search engine that indexes and searches content with a company's Intranet. Unlike a local site search engine, enterprise engines typically index the content of multiple web servers on the their local Intranet. Usage: The adjective "Enterprise" also sometimes implies handling a very large amount of data.
[back to top]
 
Enterprise Search Newsletter
See Main Definition:  NIE Enterprise Search Newsletter
[back to top]
 
entity
Related Terms:  entity extraction A piece of data of a known type, such as a date or amount of money or a reference to a particular city. Entities are often normalized to a common format, such as representing a date in YYYY-MM-DD hh:mm:ss format, regardless of how it was originally written in the source document.
[back to top]
 
entity extraction
Synonyms:  entity extractor Related Terms:  entity, ETL Automatically identifying and extracting specific patterns of text and treating them as a specific data type. For example, the phrases "Jan-01-2006", "January 1st, 2006" and "Near Years Day '06" all refer to the same date; an entity extraction system would understand this, and store all three as 2006-01-01. Entity extraction is useful to capturing dates, times, geographic locations, amounts of money, the names of people and companies, address, phone numbers, etc. By recognizing and properly storing these entities, a system can properly match user searches to the source documents, even though no actual words will match. For example, a user searching for "fifty dollars" could match a document with "$50.00".
[back to top]
 
eProcurement
Related Terms:  b2b, eCommerce, punch-out, cxml, Ariba-ready The buying and selling of products between businesses, with orders and payments conducted mostly in electronic form, and thus the "e" prefix. Search engines are often integrated with these systems to help locate the products to be purchased. eProcurement also refers to the industry that supplies the software and services to integrate various servers and databases, and the protocols used.
[back to top]
 
ESP
See Main Definition:  FAST Enterprise Search Platform Related Terms:  FAST ESP
[back to top]
 
ETL
See Main Definition:  Extract, Transform and Load
[back to top]
 
EULA
See Main Definition:  End User License Agreement
[back to top]
 
Explicit Ranking
Part of an enterprise search implementation includes developing and maintained an overall relevancy ranking (see Query Tuning). Content owners can associate specific terms that may not be part of the original document to assist with retrieval. These additional terms provide metadata, or information about the document. However, newer innovations provide search users with the ability to associate additional metadata with a document after a search, again providing additional context to assist subsequent users. This ad-hoc process is called 'tagging'. Whether provided by the original content owner or by a user, this metadata is explicitly associated with the document and often provides high value in determining the relevance of a document for a particular query term.
[back to top]
 
explicit summaries
Related Terms:  static summaries A textual summary of a document is often displayed in the results list under the title of a document. Many document formats allow the author to specifically create a summary. This is a very common practice in HTML documents. The summary may not contain the specific key words the user typed in.
[back to top]
 
export
Related Terms:  relational database, indexer, spider, import In traditional databases, data needed to be imported into the database system; when moving data from one system to another, data would be exported from the source system, and then imported into the destination system. Most full-text search engines do not offer robust import and export capabilities; some do offer import-only tools. Instead, search engines use the process of "indexing" or "spidering" to index the documents, and the original source documents are left where they were. Some third party vendors do offer limited import and export tools to move data between search engines.
[back to top]
 
Extensible Markup Language
Synonyms:  XML Related Terms:  XPump, DPump A very useful standard format for computer data, which makes it easy to move data between different computer programs and systems. XML has become widely accepted in the past few years. Officially XML stands for "Extensible Markup Language"
[back to top]
 
external Meta Data
Synonyms:  overlay Meta Data, overlaid Meta Data Related Terms:  Meta Data, CMS Meta data for a document is usually stored inside the document file. However, in some cases, meta data can be assigned to a document after it was created and not be stored directly inside the document. An example of this is when a document is uploaded into a Content Management System; the user can assign additional document properties in the CMS. Special indexing procedures may be required to insure that the external Meta Data is properly associated with the contents of the actual document inside the search engine index.
[back to top]
 
Extract, Transform and Load
Synonyms:  ETL Related Terms:  content mining, meta tagging, scraping A process of gathering, converting and storing data, often from many locations. The data is often converted from one format to another in the process. Officially, ETL is an abbreviation for "Extract, Transform and Load"
[back to top]
 
Extranet
Related Terms:  Intranet A semi-private controlled network run by a company for the benefit of its customers and partners. Enterprise search engines are often used to index content on the company's Extranet.
[back to top]
 
  F   [back to top]
 
Faceted Navigation
See Main Definition:  Faceted Search
[back to top]
 
faceted search
Related Terms:  Facets, parametric search, hybrid search, scope of search, taxonomy Faceted search is an extension to parametric search where the additional suggested searches are not limited to just well defined document meta data groups, and instead may be automatically derived using statistical methods. More importantly, faceted search engines do not blindly suggest choices that won't match any documents (later parametric engines fixed this as well). Also, faceted search engines are a bit more dynamic in how they break up the range of data in a particular field; for example, if all matches were in the same city, then it would not bother to offer city as a choice. Conversely, if matches were scattered among thousands of cities, the faceted engine might choose to suggest searches by state. See http://ideaeng.com/pub/entsrch/v2n6/article03.html
[back to top]
 
Facets
Related Terms:  faceted search Faceted search is an extension to parametric search where the additional suggested searches are not limited to just well defined document meta data groups, and instead may be automatically derived using statistical methods. More importantly, faceted search engines do not blindly suggest choices that won't match any documents (later parametric engines fixed this as well). Also, faceted search engines are a bit more dynamic in how they break up the range of data in a particular field; for example, if all matches were in the same city, then it would not bother to offer city as a choice. Conversely, if matches were scattered among thousands of cities, the faceted engine might choose to suggest searches by state. See http://ideaeng.com/pub/entsrch/v2n6/article03.html
[back to top]
 
fact extraction
Related Terms:  entity extraction An automated process of extracting specific facts from the text of many different documents. These systems usually do not use true artificial intelligence; they usually rely on simpler statistical analysis of words, phrases and entities. Sentences using ambiguous language or pronouns will usually not result in an extracted fact. If a fact appears consistently in many documents, it may be display in the results list.
[back to top]
 
failover
Synonyms:  hot spare Related Terms:  failover systems Being able to immediately switch to another system if the primary system goes down. In most systems this is setup to happen automatically. In other systems, load balancing is used to divide user searches between two equal fully running systems, so if one system fails the other system is already running. Servers might be run in failover mode instead of load balanced mode due to licensing restrictions or cost.
[back to top]
 
failover systems
Related Terms:  failover The set of computers used to run the search software and other applications if the main servers go down or are otherwise unavailable.
[back to top]
 
FAQ
See Main Definition:  Frequently Asked Question
[back to top]
 
FAST Advanced Query Language
Synonyms:  AQL, Advanced Query Language Related Terms:  FQL An older syntax in FAST Search's products for expressing simple and complex searches. It is similar to FAST ESP's modern FQL (FAST Query Language), though used a different syntax. AQL and FQL are not directly compatible, though some modern FAST products have limited backward compatibility support for AQL.
[back to top]
 
FAST Enterprise Search Platform
Synonyms:  ESP, FAST, FAST ESP The primary search product from the company FAST Search (now part of Microsoft)
[back to top]
 
FAST ESP
See Main Definition:  FAST Enterprise Search Platform
[back to top]
 
FAST Query Language
Synonyms:  FQL A syntax in FAST ESP for expressing simple and complex searches. Supports advanced Boolean, nested and weighted queries. It's syntax is bit reminiscent of SQL. Users rarely use this is syntax directly, instead they type in a few search terms and those words are transformed into a complete FQL query via a query transform in the query pipeline.
[back to top]
 
FAST Search and Transfer
Synonyms:  FAST One of the high end vendors of enterprise search software. FAST stakes their reputation on searching incredibly large amounts of content very quickly. See http://fastsearch.com
[back to top]
 
feature vector
A set of interesting words, phrases or entities that are of statistical significance within a document. These specific items may be useful in finding other related documents, which should have a similar set of features.
[back to top]
 
federated indexing
Related Terms:  federated search In contrast to federated search, federated indexing allows a single search engine to index content among many distributed systems, often crossing organizational boundaries. This allows a single search engine to search all of the distributed content.
[back to top]
 
federated search
Synonyms:  heterogeneous search Related Terms:  federated indexing Taking a users' search and sending it to multiple search engines, then combining the results back together. This approach is sometimes preferred as it doesn't require any single engine to index all of the content. Disadvantages can include different query languages for each engine, combining relevancy scores that use different scales, duplicate content, and timeout issues.
[back to top]
 
Federated Search
Taking a users' search and sending it to multiple search engines, then combining the results back together. This approach is sometimes preferred as it doesn't require any single engine to index all of the content. Disadvantages can include different query languages for each engine, combining relevancy scores that use different scales, duplicate content, and timeout issues.
[back to top]
 
fiber channel
Related Terms:  disk storage, SAN A high speed connection between computers and hard disk drives, often based on fiber optics. It may connect to internal hard drives, or to an external SAN. Search engines make heavy use of disk storage, and therefore are very sensitive to disk connections.
[back to top]
 
field
Related Terms:  Meta Data, zone, hybrid search In traditional relational databases, fields were the pieces of data stored for each record in the database. Search engines have a similar concept, but tend to refer to these well defined pieces of data as Meta Data or document fields. Search engines also allow for large amounts of unstructured data, sometimes referred to as zones, which act more like database blobs. Having both types of data allows for hybrid searches.
[back to top]
 
fielded search
See Main Definition:  hybrid search
[back to top]
 
fielded search operator
Synonyms:  fielded search Related Terms:  fulltext search operator Search engines can perform searches that are very similar to traditional database searches, for example using the equals operator, or <=, >=, etc. When fielded search is combined with fulltext search operators, this is sometimes referred to as a filtered search or hybrid search.
[back to top]
 
field-level security
Synonyms:  sub-document security Controlling access to specific parts of a document, such that different users can see different parts of the document. For example, a technical support person might be able to see most of the data for a customer, but not specific financial information.
[back to top]
 
file transfer protocol
Synonyms:  FTP A network protocol for transferring files over the Internet. Some search engine spiders are able to retrieve an index documents stored on FTP servers.
[back to top]
 
filelist.txt
Synonyms:  sitelist.txt Related Terms:  Autonomy Ultraseek A file format stored on a web server that summarizes the recent changes to the site's pages. The Ultraseek spider can read this file and efficiently reined only the pages that have been added or changes, without the need to respider the entire site. This file format is an open standard and is very easy to parse, although at this time only Ultraseek and Ultraspider support it.
[back to top]
 
filter (index)
See Main Definition:  document filter
[back to top]
 
filter (search)
See Main Definition:  filtered search
[back to top]
 
filtered search
Synonyms:  source query text, hybrid search Related Terms:  fielded search operator, scope of search, hybrid search A portion of the query sent to the search engine that limits the scope of the search, but is not used for highlighting or relevancy calculations.
[back to top]
 
Firefox
Related Terms:  Mozilla An open source web browser based on the Mozilla code based.
[back to top]
 
firewall
Related Terms:  Intranet, corporate network, private network A device that separates a company or institutions private Intranet from the public Internet and only lets carefully selected data cross between the two.
[back to top]
 
five nines
Synonyms:  99.999, 99999 Related Terms:  qos Achieving a QOS of 99.999% uptime. Note that this does not mean a system will never go down, it can actually represent 5 minutes per year. This is because a year has almost 8,000 hours and 1 / 100,000 of that is almost 9% of one hour. Generally five nines is used to mean very reliable, or high availability.]
[back to top]
 
fixed price project
Paying a fixed price for custom programming or consulting based on a mutually agreed to Statement of Work (SOW). If the work to be performed is not well defined, billing by T&M (Time and Materials) might be more appropriate.
[back to top]
 
fixed summaries
See Main Definition:  static summaries
[back to top]
 
FK
See Main Definition:  Foreign Key
[back to top]
 
Flash
Related Terms:  swf A software add on for web browsers that allows for rich animation and highly interactive web pages. FLASH is generally not supported by most search engine spiders, and sites that require FLASH for navigation will typically have trouble being indexed.
[back to top]
 
Flex
Related Terms:  Flash An advanced form of client side scripting based on Adobe Flash that makes web sites more interactive. Although this makes for more interactive web sites, search engine spiders sometimes have trouble navigating such sites, and therefore parts of the site might not be indexed. There are ways to make a web site accommodate both technologies.
[back to top]
 
Folksonomy
Related Terms:  Taxonomy, Behavior Based Taxonomy A taxonomy or other organization of data suggested by users. For example, on popular photo sites, users can tag photos with descriptive words. These words can then be searched for. In the enterprise, some search systems allow employees to tag certain documents with key words. These terms are then found when other employees search for those terms.
[back to top]
 
Foreign Key
Related Terms:  database, table, field, join A field in one database table that refers to a row in another table is a Foreign Key. In a traditional databases each table has special field with a unique identifier for each row in that table called a primary key. For example, each department in a company might have a unique department ID, so the dept_id field would be the primary key for the dept table. Other tables may want to link to the records in this table, and they do so by referring to the table name and the primary key value. The references to PK in these other tables are referred to as Foreign Keys. Continuing with the above example, each employee might be listed in a tabled called emp. Each employee also belongs to one of the departments listed in the dept table. Each employee record has a reference to the department they work for by having a dept_id. If a search engine needed to index and search the table of employees, and wanted to include information about the department they work in, it would need to index the combined records from both tables, emp and dept. It would accomplish this by doing a join on the dept_id field which is in both tables. It would then use the emp_id to track which records it had indexed, and to call up matching records from a results list. A foreign key is the start of a link to other tables.
[back to top]
 
form
See Main Definition:  web page form
[back to top]
 
FQL
See Main Definition:  FAST Query Language
[back to top]
 
FreeFind
Related Terms:  search engine, site search engine, ASP ("Application Service Provider") An excellent low cost search engine for small to mid sized public web sites. FreeFind is an ASP, and therefore web sites using their service do not have to install any software on their local web server. See http://freefind.com
[back to top]
 
Frequently Asked Question
Synonyms:  FAQ, FAQs (plural) Related Terms:  compound documents A section of a web site that lists questions that are frequently asked, and the answers to those questions. Many sites also allow visitors to search over all these questions and answers with their search engine.
[back to top]
 
FT (business)
See Main Definition:  Full Time (employee) Usage: Included in this glossary to disambiguate from the other "FT" which refers to "fulltext".
[back to top]
 
FT (technical)
See Main Definition:  Fulltext
[back to top]
 
FTP
See Main Definition:  file transfer protocol
[back to top]
 
Full Time (Employee)
Synonyms:  FT (business) An employee of a company that works at least 35 hours a week at a company. Usage: Included in this glossary to disambiguate from the other "FT" which refers to "fulltext".
[back to top]
 
fulltext operator
Search engine query syntax that is specific to word and phrase matching, vs. more traditional field operators like =, <=, etc.
[back to top]
 
Full-Text search engine
See Main Definition:  search engine Synonyms:  fulltext search engine Usage: in this form the "Full-Text" prefix is used to emphasis the fact that these searches work on unstructured textual data, verses traditional databases' emphasis on structured data.
[back to top]
 
Full-Text search index
See Main Definition:  search indices Synonyms:  fulltext index Usage: in this form the "Full-Text" prefix is used to emphasis the fact that these searches work on unstructured textual data, versus traditional databases' emphasis on structured data.
[back to top]
 
fuzzy matching
Synonyms:  fuzzy search Related Terms:  wildcard, typo, stemming, soundex Allowing search terms to match a wide variety of words found in a document. There are many types of fuzzy matching, such as stemming, wildcard matching, thesaurus, and common misspellings or typos. Users may become frustrated if fuzzy matching brings back too many seemingly irrelevant matches, or if fuzzy matches are allowed to swamp exact matches.
[back to top]
 
  G   [back to top]
 
Gartner Magic Quadrants
Synonyms:  Gartner Quadrants, Magic Quadrants A yearly ranking of search engine vendors by Gartner, Inc. The report includes a graph based on two primary factors: vision and ability to execute; the upper right hand quarter of the graph indicates strength in both areas, and is the preferred ("magic") quadrant to be in. Many large companies uses this report to help select which vendors to seriously evaluate. They publish similar reports for other industries as well.
[back to top]
 
GB
See Main Definition:  Gigabyte
[back to top]
 
generalized taxonomy
See Main Definition:  taxonomy Usage: Just "taxonomy" is normally sufficient. The prefix "generalized" is used to distinguish a regular taxonomy from the new Behavior Based Taxonomies
[back to top]
 
geocoded search
Synonyms:  location aware search, location sensitive search Searchable data that includes longitude and latitude in its meta tags. Users can then restrict the scope of their search to content related to their local geographic area.
[back to top]
 
gig (business)
A temporary job as a software contractor or project manager. As in "I have a gig in New York next month".
[back to top]
 
gig (technical)
See Main Definition:  Gigabyte
[back to top]
 
Gigabyte
A unit of measure indicating one billion (US) bytes of computer data (AKA one thousand million). Note that the term "billion" has different meanings in different parts of the world. Technically a Gig is 1024 * 1024 * 1024 or 1024^3. This can be used to represent the amount of memory in a computer, or the amount of storage on a disk drive, or the size of a document or file. Years ago (1990s, early 2000s) this was considered a large amount of computer data, but due to the increases in computer storage and power, it is now quite common. Some search engine software has license restrictions on the amount of data it will index measured in Gigabytes.
[back to top]
 
Google
Related Terms:  web search engine, Internet search syntax The world's best known web search engine. http://google.com
[back to top]
 
Google Appliance
Related Terms:  enterprise search engine, Google Google has packaged their web search engine into an actual computer case that can be installed at companies to provide enterprise search for their private network.
[back to top]
 
Graphical User Interface
Synonyms:  GUI Related Terms:  UI, Web UI A more modern User Interface allow the user to control software with a mouse and keyboard, and is graphically displayed as a set of windows, menus and icons. When the human user clicks or types, the UI then takes the appropriate action. Examples include Microsoft Windows, Mac OS, GNome and KDE. Enterprise search vendors have added GUIs to their products over the years to make them easier to use and administer.
[back to top]
 
grep
Related Terms:  Unix, Linux A program on Unix and Linux systems used to find text in a file. Many extremely simple search engines approximate the simplistic behavior of grep. For example, a "grep like" search engine, given the search term "red", would match on the word "shredded", since the 3 letters 'r', 'e' and 'd' do appear in the center of the word. But this is considered a bad match, since somebody searching for the color red is clearly not looking for information about shredding. Our motto is "grep is not a search engine!"
[back to top]
 
GUI
See Main Definition:  Graphical User Interface
[back to top]
 
Guided Navigation
Synonyms:  results list navigators Related Terms:  faceted search, taxonomy Guided Navigation or Faceted Navigation refers to the capability of fine-tuning search results by clicking on dynamically generated category links. While the eventual results are similar to those that result from asking the user to complete an advanced search form, Guided or Faceted Navigation engages the user in a conversation and encourages exploration of the result set.
[back to top]
 
  H   [back to top]
 
Hard ROI
Related Terms:  ROI, Soft ROI If a company spends money on new technology, such as a new search engine, how much additional revenue will it generate. For example, upgrading the search engine on a shopping web site might help customers find items more quickly, and therefore they might buy more things or be more likely to shop from the site again in the future. This type of ROI is referred to as "Hard" because it can be predicted and measured with specific numbers, i.e. "hard facts", such as sales figures. Ironically "hard" ROI is actually easier to measure than "soft" ROI.
[back to top]
 
heterogeneous search
See Main Definition:  federated search
[back to top]
 
hierarchical data
Related Terms:  taxonomy A way of organizing and storing information, where specific details are nested inside broader and broader categories of data. The broadest level of data can be through of as the "root" of a "tree", while smaller and smaller levels of detail can be thought of as "branches". For example, the World has Countries. Countries have States. States have cities. Cities have streets. Etc. This data could be nested, such that the "World" would be the broadest item of information; the "root" of a hierarchal database storing geographical data. XML is a particular format of hierarchal data.
[back to top]
 
high availability
Synonyms:  five nines Related Terms:  qos, five nines, load balancing, failover A system that is very reliable and has a very high QOS. To achieve this it may use load balancing or failover.
[back to top]
 
highlighting
See Main Definition:  search highlighting
[back to top]
 
histogram
Related Terms:  relevance histogram, search activity histogram A graph of tabulated data items, where each bar represents the number of times that particular item appeared. It slopes from the upper left to the lower right, showing the most frequent to least frequent items. The right side of the graph, where the counts trail off, is often called the long tail. Histograms are used in search engines for many things, including query tuning relevancy histograms and to show the most popular searches.
[back to top]
 
hit
See Main Definition:  result list entry
[back to top]
 
hosted search
Synonyms:  hosted search engine Related Terms:  ASP ("Application Service Provider"), local site search engine, search engine, FreeFind A search engine that is packaged as an ASP; the advantage is that search can be easily added to a web site, without the need to install software locally. An example of hosted search is FreeFind (http://freefind.com)
[back to top]
 
Hosted Software
Software that can be used on a computer without installing it on that computer. For example, it is possible for a web site to include a search box without needing to install software on their server.
[back to top]
 
Htdig
An early type of Internet search engine.
[back to top]
 
HTML
See Main Definition:  Hypertext Markup Language
[back to top]
 
HTML form
See Main Definition:  web page form
[back to top]
 
html scraper
See Main Definition:  web page scraper
[back to top]
 
HTTP
See Main Definition:  HyperText Transport Protocol
[back to top]
 
hybrid search
Synonyms:  fielded search, filtered search Related Terms:  taxonomy, parametric search, faceted search, scope of search A search that includes both fulltext and traditional database search criteria. For example, a tech support person could look for "installation errors" (full-text) within a particular product line (more like a traditional database field search). By combining together the additional criteria of "product='accounting software'", the tech support person gets a more targeted scope of search, and is more likely to find the installation error they were looking for. Another example, an analyst might search for "depreciation allowance" (the full-text) within a particular jurisdiction (a traditional database-like field). By combining together the additional criteria of "state='FL'", the analyst gets a more targeted scope of search, and is more likely to find relevant documents.
[back to top]
 
Hypertext Markup Language
Synonyms:  HTML Related Terms:  web page, World Wide Web, XML The most common format used to create web pages on the World Wide Web. HTML looks somewhat similar to XML. Document files in this format often have an extension of .html or .htm.
[back to top]
 
HyperText Transport Protocol
Synonyms:  HTTP A network protocol used by web browsers to talk to web servers.
[back to top]
 
HyperV
Related Terms:  virtualization A type of efficient virtualization offered by Microsoft.
[back to top]
 
  I   [back to top]
 
IBM OmniFind
Related Terms:  search engine vendors A search engine offered by IBM
[back to top]
 
IBM OnmiFind
A fulltext search engine sold by IBM under various specific product names. OmniFind is often now bundled with iPhrase, a technology that IBM acquired.
[back to top]
 
IDE
Related Terms:  disk storage An older, lower end standard for the internal electrical connection of hard drives inside of a computer. IDE is being replaced by SATA. Higher end machines use SCSI, fiber channel, RAID or SAN. Since search makes heavy use of disk storage, the slower IDE and SATA connections may not offer as good of performance as a higher end standard, although the performance penalty may be acceptable for development systems
[back to top]
 
IDF
See Main Definition:  inverse document frequency
[back to top]
 
IDOL
See Main Definition:  Autonomy IDOL
[back to top]
 
Implicit Ranking
Related Terms:  explicit ranking In contrast to explicit ranking, which requires a user to specifically provide search terms that can assist with subsequent searches, implicit ranking is a more passive method of determining which documents a user finds helpful, and these methods are used to perform implicit ranking of results. For example, when a user performs a search and clicks to view a document in a result list, the user is indicating that the selected document is relevant for the query terms used. This information can be fed back into the search application without further action, but can be as useful as explicitly provided information. An undesired side effect can be the favoring of a poor document that looks promising based on its title.
[back to top]
 
import
Related Terms:  relational database, indexer, spider, export In traditional databases, data needed to be imported into the database system; when moving data from one system to another, data would be exported from the source system, and then imported into the destination system. Most full-text search engines do not offer robust import and export capabilities; some do offer import-only tools. Instead, search engines use the process of "indexing" or "spidering" to index the documents, and the original source documents are left where they were. Some third party vendors do offer limited import and export tools to move data between search engines.
[back to top]
 
incremental spidering
Related Terms:  spider A method of spidering a web site that attempts to only download pages that are new or have changed. Over time, incremental spiders create a database of individual page URLs and track how often they change; they use this data to guess which pages need to be refetched and when. This form of spidering may delay the reindexing of pages that have recently changed, but who have historically been static. This method of spidering may also allow for "orphaned pages".
[back to top]
 
index (noun)
Synonyms:  collection, database index, word index, search indices, word inversion, binary indices Related Terms:  indexer Typically refers to a set of large binary data files stored on a disk.
[back to top]
 
index (verb)
Related Terms:  index (noun), indexer, spider, search indices, word index, database gateway The tabulating and storing of data into the binary indices. The term has substantial technical differences when applied to search engines vs. traditional relational databases.
[back to top]
 
indexer
Synonyms:  document indexer Related Terms:  search indices, spider, import/export, index (noun), index (verb) Before a search engine can quickly search through documents, it must first create search indices that list every word in every document, along with information about each document's Meta Data. The program that performs this task is often referred to as an indexer, and the task it performs is the indexing. Usage: "indexer" is an older term, and is typically used when the process of indexing will be fairly simple and can be run from the command line; for more complicated web crawling the term "spider" is preferred.
[back to top]
 
Indexing
See Main Definition:  indexer
[back to top]
 
indexing pipeline
See Main Definition:  document pipeline
[back to top]
 
indigenous search engine
Synonyms:  native search engine Related Terms:  embedded search engine, data silo The built in search engine inside of a system, such as the built in search capability in a document management system, data silo or search appliance.
[back to top]
 
inference
A statistical method used by search engines to find relevant documents even if they do not contain the words in the user's query. This technology is not always accurate.
[back to top]
 
Infoseek
See Main Definition:  Autonomy Ultraseek Infoseek was the original creator of Ultraseek. They were also an early web search engine.
[back to top]
 
Inktomi
See Main Definition:  Autonomy Ultraseek Usage: Inktomi briefly owned Ultraseek. They bought it from Infoseek and then eventually sold it to Verity
[back to top]
 
Intellectual Property
Synonyms:  IP (business) Technology, patents, trade secrets and other materials that a company owns. This knowledge can be quite valuate and is not shared with competitors. When a company is sold or liquidated, this intellectual property is treated as a tangible asset, and affects the price and/or value of the company. Search engine companies are often very proud of their IP and believe that it allows them to provide much better search results than their competitors.
[back to top]
 
interactive results list
See Main Definition:  results list navigation
[back to top]
 
interactive search
See Main Definition:  results list navigation
[back to top]
 
Internet
Synonyms:  "The Net" Related Terms:  World Wide Web (vs. Intranet and Extranet) The global public computer network. Usage: Sometimes people are actually referring to the World Wide Web, which is a subset of the entire Internet. Examples of the Internet that are outside the scope of the World Wide Web include email, ftp, instant messaging, file sharing, etc. The Internet predates the World Wide Web by many years. Indexing and searching the entire public Internet is very different from handling Enterprise data on a private Intranet and Extranet.
[back to top]
 
Internet Explorer
Related Terms:  Firefox, Mozilla The very popular web browser that ships on all Microsoft Windows machines.
[back to top]
 
Internet Protocol
Synonyms:  IP (technical) IP stands for Internet Protocol, and is generally used in conjunction with other terms or abbreviations, such as in TCP/IP or IP address. Search engines make heavy use of TCP/IP and related protocols.
[back to top]
 
Internet query syntax
See Main Definition:  Internet search syntax
[back to top]
 
internet search engine
See Main Definition:  web search engine
[back to top]
 
Internet search syntax
Synonyms:  Internet query syntax Related Terms:  Google, Verity Ultraseek, web search engine, SQL An informal set of syntax rules for expressing advanced searches in modern search engines. The most common attribute is the use of a plus sign ("+") to mean that a term is required, and a hyphen or minus sign ("-") to exclude a word from the search Unlike the VQL used in relational databases, search engines do not have a universally accepted cross vendor syntax. Internet search syntax also often recognizes quotation marks to demark exact phrases, and ()'s to convey precedence.
[back to top]
 
Intranet
Synonyms:  corporate network, enterprise network, secure network, private network, closed network Related Terms:  firewall, enterprise search engine The secured network connecting all the computers of a particular company or institution. Intranets are usually shielded from the public Internet via a device called a firewall.
[back to top]
 
inverse document frequency
Synonyms:  IDF Related Terms:  document frequency A popular mathematical technique used to calculate a document's relevancy to a particular search term. A term that appears in FEWER documents is assumed to be more important than a common word appearing in many documents.
[back to top]
 
IP (business)
See Main Definition:  Intellectual Property
[back to top]
 
IP (technical)
See Main Definition:  Internet Protocol Related Terms:  TCP/IP
[back to top]
 
IP address
See Main Definition:  Network Address Related Terms:  TCP/IP Computers connected to a network each have a unique address, so that data can be sent to them. A common type of address for computers using on a TCP/IP network is their IP address.
[back to top]
 
iPhrase
Related Terms:  OmniFind, interactive search Originally a privately held search technology company, which has since been acquired by IBM and is now part of the OmniFind product line. iPhrase uses advanced techniques for recognizing and analyzing common phrases and abbreviations, and presenting them to users in an innovative way, which facilitates interactive search.
[back to top]
 
ISO-8859-1
Synonyms:  Latin 1, 8859, 8859-1, ISO Latin 1 Related Terms:  codepage, Unicode A semi-modern numerical system for representing written characters for many of the languages spoken in Europe and North America. Although Unicode and UTF-8 are becoming much more common, the 8859 codepage is still used in many countries and is very common on web pages. Search engines routinely handle this type of text.
[back to top]
 
IT
See Main Definition:  IT Department
[back to top]
 
IT Department
Related Terms:  RFI, RFP The group inside a company that is responsible for maintaining the computer hardware and software, typically staffed with highly technical people. IT may not be the initial part of the company to inquire about search technology, but they will likely be involved at some point. They will ask technical questions of potential vendors, and can express concerns to other managers. If the IT department's technical questions are not answered to their satisfaction, in many companies they can veto a software purchase.
[back to top]
 
Iterative
See Main Definition:  iterative search
[back to top]
 
iterative search
See Main Definition:  results list navigation Generally allowing the user to repeatedly adjust their search to find what they are looking for. Normally this is done with results list navigators. When applied to search, the process of providing additional search terms to refine and improve the search results. This can also apply to Guided Navigation, where the search encourages the user to explore a result set based on relevant aspects of the documents.
[back to top]
 
  J   [back to top]
 
Java Database Connectivity
Synonyms:  JDBC A protocol used to access data stored in a database, part of the Java programming language. Many search engines use JDBC to fetch text from a database that is to be indexed and searched. JDBC and ODBC have similar names and both are used to access database records, but they are very different protocols, with ODBC being the older of the two.
[back to top]
 
Java Server Pages
Synonyms:  JSP Related Terms:  web server, ASP ("Active Server Pages") A Sun programming language and environment for building interactive web sites. JSP stands for "Java Server Pages". Allows a programmer to easily embed computer programs inside of web pages; the computer programs are written in Sun's Java programming language. Though they have similar names, Microsoft ASP and Sun JSP are generally not compatible with each other.
[back to top]
 
JavaScript
A programming language for adding interactivity to web sites. Many search engine spiders do not understand JavaScript; if a site requires JavaScript for navigation, many spiders will not be able to index the site.
[back to top]
 
JDBC
See Main Definition:  Java Database Connectivity
[back to top]
 
join
Related Terms:  relational database, table, SQL In traditional databases, a "join" is a SQL query that pulls records from multiple tables and connects the records via common fields. For example, a join between the employee table the department table could show the names of each department, and the names of each employee in that department. Full-text engines do not usually do "joins" at search time; if database data of that sort is to be searched then it would be joined at index time, not search time.
[back to top]
 
JSP
See Main Definition:  Java Server Pages
[back to top]
 
  K   [back to top]
 
K2
See Main Definition:  Autonomy K2
[back to top]
 
k2 spider
Related Terms:  Autonomy K2 A brokered spider used to build K2 collections.
[back to top]
 
KeyView
Synonyms:  KeyView filters, Key View Related Terms:  Autonomy, Verity A set of filters used to interpret various document types including Microsoft Word, Excel and PowerPoint. KeyView was bought by Verity, and is now owned by Autonomy.
[back to top]
 
Know Your Customer
Related Terms:  BSA, Bank Secrecy Act, AML, Anti Money Laundering, KYC, Know Your Customer In order to spot suspicious financial activity, financial institutions try to understand the normal patterns and habits of their customers. In a broader sense, Know Your Customer could also include data mining to enhance customer loyalty. Search engines and search activity can help business and agencies understand their customers normal behavior.
[back to top]
 
knowledge worker
A white collar employee who works primarily with information, such as legal or medical documents, technical data, financial data, etc. These users often make heavy use of search engines to do their jobs, and benefit greatly from improvements to their search engine. They are often search power users.
[back to top]
 
KYC
See Main Definition:  Know Your Customer
[back to top]
 
  L   [back to top]
 
late binding security
A less efficient method of providing document level security. A user's search is submitted directly to the search engine, so that the search engine returns all matching documents regardless of whether the user can see them or not. Then every single document is checked against the security system that controls access to documents, to verify whether the user can see it or not. Only allowed documents are then display to the user. This method is easier to implement than early-binding security, but puts a heavy load on the corporate security system since every single document in every results list needs to be checked. Further, if the user only has access to a small percentage of total documents, the system may need to screen hundreds or thousands of documents just to find 10 documents that the user is allowed to see on one page of results.
[back to top]
 
Latin 1
See Main Definition:  ISO-8859-1
[back to top]
 
LDAP
See Main Definition:  Lightweight Directory Access Protocol
[back to top]
 
legacy data
Related Terms:  content mining Information that is stored in a format that is not easy to work with using modern computer software. Previously "Legacy Data" often referred to reports and text that only existed in paper format, but could not be accessed by a computer. More recently, enterprise data published on web pages, or in PDF and Word documents, has become difficult for modern software to access and process in an automated way.
[back to top]
 
legacy data
Related Terms:  XPump, PDF Important data that is stored in a format that cannot be easily indexed by search engines, or that presents other technical challenges. Paper documents are often cited as an example of legacy data. However, these days even content that is stored in some electronic formats such as HTML, Microsoft Word and PDF are difficult for some systems to access. Most search engines can index the words in these document formats, but may not understand the structure of the document. For example, a PDF file may contain a table of technical data for various products. But a simple search engine indexer will not be able to associate the specific numbers and terms with the correct product; such data takes manual intervention by a human to understand.
[back to top]
 
lemmatization
Related Terms:  stemming The process of dynamically expanding search terms to include other variants of each word, including synonyms. For example, a search for the word "car" might be expanded to include the additional words "cars" and "automobile". The advantage of this technique is that the list of expanded words can changed at any time; a potential disadvantage is slower performance caused by the additional search terms. Lemmatization can be considered as a dynamic form of stemming.
[back to top]
 
Library Sciences
Related Terms:  taxonomy A field of study dealing with the organization of vast amounts of data. Prior to computers, this dealt with techniques for organizing books, papers, film and other objects. This field is now widely expanded and stresses the organization and searchability of electronic data, which is often accomplished with a search engine. Library sciences professionals bring a much more in depth perspective to search engine technology than casual or business users do.
[back to top]
 
license
See Main Definition:  software license
[back to top]
 
license key
Related Terms:  software license, license restriction A special set of letters and numbers that unlock software and enable it to run, or enable it to perform additional functions. When a customer wants an additional feature, or more processing capacity, they pay the software vendor more money, and then they are given a new license key. License keys are simply an enforcement of a license agreement between the two parties, and some vendors do not bother with license keys.
[back to top]
 
license restriction
Related Terms:  software license, load balancing, failover, qos, qps Limits may be placed on how software is used, for example how many machines it can be run on, or how many queries per second it can handle, or how many documents it can index. These restrictions may be enforced with a license key.
[back to top]
 
Lightweight Directory Access Protocol
Synonyms:  LDAP An open standard for storing information about company resources such as people, machines and data. LDAP can be used as part of a system for doing document level security. LDAP is sometimes referred to as a competing standard to Microsoft's Active Directory. There are adapters which allow LDAP systems to interoperate with Active Directory systems.
[back to top]
 
Lightweight Publishing
Synonyms:  LWP Related Terms:  BI, Business Intelligence Systems that allow people to write things down in a quick or casual way, with the intention of other people seeing them. On the public Internet well know examples of LWP include blogs and wikis. In the enterprise, other systems could be considered LWP, including email, bug reports, search logs and Tech Support incident descriptions. Of course these systems are already useful for their primary intended tasks, but they can also provide BI / Business Intelligence to a company about what is happening. Search technology can be used to spot new terms and trends by looking at all this text in a coordinated manner.
[back to top]
 
like operator
Related Terms:  grep, database, SQL A part of the SQL syntax that is used to local text within a database field. Many extremely simple search engines approximate the simplistic behavior, or actually use the like operator. For example, given the user search term "red", it would be expanded to DESC LIKE '%red%', and would match on the word "shredded", since the 3 letters 'r', 'e' and 'd' do appear in the center of the word. But this is considered a bad match, since somebody searching for the color red is clearly not looking for information about shredding. Due to its very simplistic matching rules, the SQL LIKE operator is similar to the Unix grep utility, and is generally considered inferior to using a true fulltext search engine.
[back to top]
 
Linux
Related Terms:  Unix, OS, Operating System, RedHat An operating system based loosely on Unix. Many versions are now available, including RedHat. Many search engines will run on Linux and Windows.
[back to top]
 
load balancing
Related Terms:  failover A configuration of multiple computers, or entire sets of computers, to automatically share the work. With search engines, two different search servers could each handle half of the traffic. If one system fails, the other system could then handle 100% of the traffic, in a form of automatic failover. Load balancing can also be done between 3 or more systems, perhaps in separate locations. The search engine vendor may charge more money to have both systems running in a load balancing mode than if one of them is only used in a failover mode.
[back to top]
 
local search engine
See Main Definition:  local site search engine
[back to top]
 
local site search engine
Synonyms:  local search engine Related Terms:  search engine, web site, enterprise search engine, hosted search engine A search engine that indexes and searches the content for a particular web site.
[back to top]
 
location aware search
See Main Definition:  geocoded search
[back to top]
 
location sensitive search
See Main Definition:  geocoded search
[back to top]
 
long tail
The large set of unusual searches that users submit to a search engine.
[back to top]
 
Lucene
Related Terms:  search engine, enterprise search engine, open source An excellent open source search engine written primarily in Java. Even for users of commercial search engine software, studying Lucene can provide a much more in depth understanding of how modern search engines work. http://lucene.apache.org
[back to top]
 
LWP
See Main Definition:  Lightweight Publishing
[back to top]
 
Lynx
A character based (non graphical) web browser that was popular in the late 1990s. It was designed for people using Unix systems on character mode terminals that could not display graphics or use a regular web browser. It is also a good tool for testing that a site contains enough plain text that it can be used by a disabled person using a screen reader and also easily indexed by a spider.
[back to top]
 
  M   [back to top]
 
Magic Quadrants
See Main Definition:  Gartner Magic Quadrants
[back to top]
 
MB
See Main Definition:  Megabyte
[back to top]
 
Megabyte
Synonyms:  meg, megs, MB Related Terms:  GB, Gigabyte, TB, Terabyte A unit of measure indicating one million bytes of computer data. Technically a Meg is 1024 * 1024. This can be used to represent the amount of memory in a computer, or the amount of storage on a disk drive, or the size of a document or file. Years ago (1980s, early 1990s) this was considered a large amount of computer data, but due to the increases in computer storage and power, it is now considered rather small.
[back to top]
 
megs
See Main Definition:  Megabyte
[back to top]
 
Meta Data
Synonyms:  fields, document fields, meta fields, attributes, document attributes, document meta data Extra attributes of documents, beyond just raw text. Examples include Title, Last Modified Date, Author, etc. In HTML content, often defined with the tag in the header of web pages.
[back to top]
 
Meta Language
Related Terms:  XPump A type of computer programming language that uses higher level commands than more traditional computer languages. Programmers are more productive because they can focus on the overall goals of a program, versus having to specify every little detail; the computer takes care of the smaller and more tedious tasks.
[back to top]
 
meta tagging
See Main Definition:  document tagging
[back to top]
 
Metadata
See Main Definition:  Meta Data
[back to top]
 
Microsoft Office SharePoint Server
Synonyms:  MOSS Microsoft SharePoint is a content management system and group productivity tool. It runs on a centralized server and includes basic fulltext search capabilities. Customers needing more advanced search functionality can purchase more powerful search engines that work with SharePoint.
[back to top]
 
Microsoft SharePoint
Synonyms:  SharePoint, Share Point Related Terms:  Content Management System A popular content management system sold by Microsoft.
[back to top]
 
mkvdk
Related Terms:  Autonomy K2 A command line indexing tool for the Autonomy K2 search engine (originally the Verity K2 engine)
[back to top]
 
Money Services Business
Synonyms:  MSB Related Terms:  AML, Anti Money Laundering A place where cash can be sent to another party. Example: Western Union
[back to top]
 
MOSS
See Main Definition:  Microsoft Office SharePoint Server
[back to top]
 
Most Significant Bit or Byte
Synonyms:  MSB Computer Data is stored as zeros and ones (bits), which are often grouped in sets of eight (a byte). The order of the bits (or bytes) is important, where the position conveys it's overall weight. Even in normal numbering systems, the position of a digit matters. When comparing the digit "2" in the numbers "2q" and "12,345", it's clear the 2 actually stands for "20" in the order, and for "2,000" in the latter. Various algorithms sometimes give special meaning to the MSB
[back to top]
 
Mozilla
An open source web browser that was derived from the Netscape code based.
[back to top]
 
MSB (business)
See Main Definition:  Money Services Business
[back to top]
 
MSB (technical)
See Main Definition:  Most Significant Bit or Byte
[back to top]
 
  N   [back to top]
 
NAS
See Main Definition:  Network Attached Storage Related Terms:  disk storage
[back to top]
 
native search engine
See Main Definition:  indigenous search engine
[back to top]
 
Natural Language Processing
Synonyms:  NLP Instead of treating text as simple numbers stored in a computer, NLP attempts to view the text at a higher level of understanding, nominally trying to mimic some aspect of human thought. Supports believe that NLP can provide much better search results because the computer will "understand" the question and also understand the meaning of all the documents being searched. The term is used broadly to mean many different things. Some simpler aspects of NLP have become commonplace, such as understanding that synonyms, recognizing plural past tense forms of words, and understanding that "12/25/09" and "December 25th, 2009" both refer to a date. Intermediate examples of NLP include being able to recognize nouns and the adjectives that relate to them, such as understanding the noun phrase "white house". Very elaborate forms of NLP, as envisioned by science fiction authors 40 years ago, are still rare in the search engine market, though almost all vendors support some forms of NLP.
[back to top]
 
navigators
Related Terms:  Search 2.0, drill-down, results list navigation, parametric search, faceted search, taxonomies A set of clickable links in a search result list that allow a user to drill down or otherwise modify their search without the need to type anything. There are many different technologies for generating navigators; the end user they all look similar, but the methods used to generate them vary widely. The different types of navigators are confusing even to many inside the search industry.
[back to top]
 
NDA
See Main Definition:  Non-Disclosure Agreement
[back to top]
 
Netegrity
See Main Definition:  SiteMinder The company who created SiteMinder.
[back to top]
 
Netscape
Related Terms:  Mozilla, Firefox Originally a commercial web browser popular in the late 1990s. It was later reborn as the Mozilla web browser.
[back to top]
 
Network Attached Storage
Synonyms:  NAS Related Terms:  disk storage A type of disk storage that is connected to over a network, and is shared by multiple computers. Access to files is handled at a relatively high level, and NAS is generally considered less efficient than SAN, especially for high end search applications. Most vendors do not recommend using NAS.
[back to top]
 
New Idea Engineering, Inc.
Synonyms:  NIE Related Terms:  SearchTrack, XPump, DPump, search engine data quality Usage: Usually as the prefix "NIE" in front of a product or publication. This is our company, the company that maintains this glossary. NIE was founded in 1996 to focus on search engine related products and services that would enhance our clients' investments in the search engine software they had already purchased. Our flagship product is SearchTrack, a cross vendor search analytics and content promotion tool. We also do strategic consulting, Search Best Practices, search engine customization and integration, search engine checkups, and have tools to create highly customized spiders and content extraction systems.
[back to top]
 
NFS
See Main Definition:  Network File System Related Terms:  disk storage
[back to top]
 
n-gram
A type of search index that enables additional search engine functionality and/or faster search results. Varying length sequences of characters or words are parsed and recorded into the search engine index.
[back to top]
 
NIE
See Main Definition:  New Idea Engineering, Inc.
[back to top]
 
NIE Enterprise Search Newsletter
Related Terms:  NIE, enterprise search A free newsletter published by New Idea Engineering that covers the field of enterprise search engine technology. Back issues at http://ideaeng.com/pub/entsrch/index.html, subscribe at http://www.ideaeng.com/subscribe/
[back to top]
 
NLP
See Main Definition:  Natural Language Processing
[back to top]
 
no-index tag
Synonyms:  no-index meta tag An HTML header tag that tells spiders to not index the content of that web page.
[back to top]
 
Non-Boolean operators
See Main Definition:  weighted operators
[back to top]
 
Non-Disclosure Agreement
Synonyms:  NDA Related Terms:  Intellectual Property A legal agreement between two parties to keep information confidential. For example, if two companies are discussing a future collaboration, they may want to tell each other about their upcoming product releases. However, they would not want that information leaked to the general public or their competitors.
[back to top]
 
Normal Business Hours
Synonyms:  9 to 5, 8 by 5, 5 by 8 Related Terms:  Seven by Twenty Four Access to Tech Support or other services is only available during the normal weekday, understood to be Monday through Friday and not counting local holidays. And between the hours of 8 or 9am and 5, 6 or 7pm local time. The "9 to 5" version refers to 9am to 5pm, which constitutes 8 hours. The somewhat odd version "5 by 8" version refers to 5 days a week for 8 hours each day, and is used as the compliment of the much more popular "7 by 24". The numbers 5 and 8 are coincidently in the appropriate range of clock times and counts, but since the overall meaning is similar, normal business hours, then the distinction is less important.
[back to top]
 
normalize
Related Terms:  stemming Transform a piece of data, such as a word or date, into a common basic representation. For example, accept dates and times written in many different formats, and transform them into a format of YYYY-MM-DD hh:mm:ss. For words, reduce them to their common root, removing suffixes that indicate plural forms or past tense; this particular process is called "stemming".
[back to top]
 
NOT Operator
Related Terms:  Boolean operators NOT is the Boolean operation that requires that a specified term not be present in order to retrieve a document. It can be used as a single operation: "NOT tax" returns all documents that do not contain the word 'tax'. It is more often used with another terms, specifying that one term must be present and the other may not: "federal NOT state" returns all documents containing the word 'federal' that do not include the word "state".
[back to top]
 
noun phrase extraction
It is generally believed that nouns convey a majority of the relevance for a document, and therefore some search engines have special logic to recognize and handle nouns. A more recent trend is to capture not only the noun in a sentence, but also any leading adjectives as well.
[back to top]
 
NTFS
Related Terms:  ACL, security The filesystem used on Microsoft servers and some workstations. It supports full ACLs.
[back to top]
 
NTLM
Related Terms:  security A system of user login verification on Microsoft Windows driven networks.
[back to top]
 
  O   [back to top]
 
OCR
See Main Definition:  optical character recognition
[back to top]
 
ODBC
Related Terms:  HTTP, CGI A standard for connecting to relational databases from various vendors; there is no similar widely accepted standard for search engines. Most modern search engines do offer access via the HTTP protocol's CGI mechanism, though the specific field names to use vary widely from vendor to vendor.
[back to top]
 
ODBC
See Main Definition:  Open Database Connectivity
[back to top]
 
OmniFind
See Main Definition:  IBM OmniFind
[back to top]
 
ontology
Related Terms:  content based taxonomy Organizing data in a logical way, into different categories and subcategories. The emphasis is usually on the data that is currently present in the system (vs. trying to anticipate all possible future documents and subjects)
[back to top]
 
Open Database Connectivity
Synonyms:  ODBC A protocol used to access data stored in a database. Some search engines still use ODBC to fetch text from a database that is to be indexed and searched. ODBC and JDBC have similar names and both are used to access database records, but they are very different protocols, with ODBC being the older of the two.
[back to top]
 
Open Office
An open source set of applications used for creating word processing documents, spreadsheets and slide presentations. The file formats are based on a compressed version of XML and are easy for programmers to interpret.
[back to top]
 
open source
Synonyms:  open source software Related Terms:  Lucene Software that is available for free, sometimes with copyright restrictions. Good resources include http://sourceforge.net and http://apache.org
[back to top]
 
Operating System
Synonyms:  OS Related Terms:  Linux, Unix, Windows The base software on a computer that coordinates other tasks on the computer. Examples include Windows and Unix.
[back to top]
 
optical character recognition
Synonyms:  OCR Related Terms:  PDF, raster image Software that convert an image into text, by recognizing the specific words and characters in the image. Search engines usually only handle text, so legacy data such as paper documents need to be scanned and then OCR'd before they can be indexed and searched. PDF documents can contain both a raster image of a scanned page and the text that was recognized.
[back to top]
 
OR Operator
Related Terms:  Boolean operators OR is the Boolean operator that requires that at least one of the two words specified are present: "california OR nevada" will return documents with either the word California or the word Nevada present. Some search technologies will fail to return a document if both words are present, but most interpret the OR operator as 'either or both terms meets the requirement'.
[back to top]
 
ordered set
Related Terms:  set theory When measuring the performance of a search engine, the actual order of documents that are returned should be taken into account, not just the absolute number items returned. This is a more useful definition than the simple "precision" measurement some academics use.
[back to top]
 
organic results
The default results and relevance returned by a search engine. On newer systems, rules can be defined to override the default behavior for specific searches.
[back to top]
 
Organic Search Results
The default results order and relevance returned by a search engine. On newer systems, rules can be defined to override the default behavior for specific searches.
[back to top]
 
orphaned links
See Main Definition:  orphaned paged
[back to top]
 
orphaned paged
Synonyms:  orphaned links Related Terms:  incremental spider, spider Pages on a web site that are no longer linked to by any other pages but that still exist on the web server; users (or spiders) who have previously saved the URL can revisit the page. However, a new visitor to the site, or a new web spider, would not be able to find these pages, and the spider would not index them. The safest assumption for a spider to make is that, if the page is no longer linked to, that the webmaster had some reason for "removing" the content from the site, and therefore a polite spider will no longer index those pages, and would delete any previous versions from its index. Pages can be orphaned by accident, for example if the site's navigation menus have recently changed, but this is generally not a safe assumption for a spider to make. Incremental mode spiders may incorrectly leave orphaned pages in their search indices, whereas a batch mode spider that is rerun would typically not include those pages.
[back to top]
 
OS
See Main Definition:  Operating System
[back to top]
 
overlay Meta Data
See Main Definition:  external Meta Data Related Terms:  Meta Data Usage: Usually indicates Meta Data that is brought in from another data source, such as a database.
[back to top]
 
  P   [back to top]
 
page
See Main Definition:  web page Related Terms:  url, document
[back to top]
 
paid listing
See Main Definition:  sponsored result
[back to top]
 
parametric search
Related Terms:  faceted search, hybrid search, scope of search, taxonomy, Verity, Endeca, FAST Search, Stratify An extension to hybrid searching, allowing for fulltext and fielded searches to be combined. Parametric search is more interactive - it proactively suggests fielded searches that could be combined with the fulltext search to further refine or expand the scope of search. For example, somebody shopping for cars might have included the term "mileage"; the search engine would return the results, but also show a side bar that offered "mileage" information for SUVs, sedans, light trucks, etc. Next to each of these additional offerings, the number of matches within that subcategory would be displayed. Clicking on any of those choices would rerun the search, but limiting the scope to that area. See http://ideaeng.com/pub/entsrch/v2n6/article03.html Parametric Search is similar to Guided or Faceted Navigation. It is more interactive than traditional search, in that it provides hints to the user as to what other elements of a search result may be helpful. It was one of the first solutions to the problem caused by users completing advanced search forms in such a way that no results met the requirement. Consider a property rental form: Specify a number of bedrooms, baths, amenities and a price and finding no properties are available: the user has no indication which parameter would need to change in order to see listings. Parametric search, in leading a user through the options, will never provide an option that results in no hits.
[back to top]
 
partition
See Main Definition:  segment Usage: Usually associated with Verity's vocabulary.
[back to top]
 
PDF
See Main Definition:  Portable Document Format
[back to top]
 
PDF filter
Related Terms:  Portable Document Format A filter that can read Adobe PDF files and extract the textual content and then feed it into a search engine indexer.
[back to top]
 
peak QPS
Related Terms:  qps, average qps The maximum number of queries does in any one second of time, regardless of the overall average usage. For many businesses the peak search activity happens during the workday, usually in mid morning and early afternoon, and may be 20 times higher than searches done at 3am on a Sunday morning, for example.
[back to top]
 
Perpetual License
Related Terms:  term license, software license, annual maintenance A software license that does not expire. In other words, a company buys a piece of software and has a legal right to use it forever. This is the most common type of software license and is what most people are used to. There may be an annual maintenance charge that must be paid to get software updates, bug fixes and technical support.
[back to top]
 
Petabyte
A unit of measure indicating one quadrillion (US) bytes of computer data (AKA one thousand million million). Note that the term "quadrillion" has different meanings in different parts of the world. Technically a Petabyte is 1024 * 1024 * 1024 * 1024 * 1024, or 1024^5. This can be used to represent the amount of memory in a computer, or the amount of storage on a disk drive, or the size of a document or file. In the late 2000s this is still considered to be a very large amount of computer data, so much so that it is more of a theoretical milestone. However, due to the increases in computer storage and power, it is now likely achieved in extremely high end computers facilities, though may not be publically acknowledged. It would be expensive to fully index and search this amount of data with a search engine. Such a large amount of data would likely include a mix of non-textual content such as sound, photos and video, or archived database transactions. However, having this much data and having no search capability would present an almost impossible challenge of finding anything. Petabyte search engines are very likely to be running in labs and agencies. It is unclear where you would encounter Petabytes of strictly textual information.
[back to top]
 
pipeline
Related Terms:  document pipeline Generally a set of cooperating processes organized in a linear fashion, where each phase of the pipeline takes some action on the data passing through it. Many search engines use a document indexing pipeline.
[back to top]
 
PK
See Main Definition:  Primary Key
[back to top]
 
port
Related Terms:  TCP/IP, socket, IP address A numerical channel number used to organize multiple TCP/IP sockets connected to the same machine. Computers have many types of simultaneous connections to other systems. Each type of connection has a customary port number assigned to it. For example, web pages are usually sent over port 80. Search engines make heavy use of sockets and ports. Some of the ports an engine uses might be standard, while other port numbers are non-standard and used only by that particular brand of search engine.
[back to top]
 
Portable Document Format
Synonyms:  PDF An electronic representation of a document that preserves its precise visual layout. The PDF standard is maintained by the software company Adobe. Many important corporate documents are now stored in this format, so it is important that search engines be able to index PDF files.
[back to top]
 
portal
See Main Definition:  portal site
[back to top]
 
portal site
Synonyms:  portal, enterprise portal, web portal Related Terms:  web search engine, enterprise portal, portlet Portals are web sites with organized links to other web sites; portals usually also have a search engine that can search through all of the sites that it links to. Portals are used as starting places to other sites content. Portals that are on the public Internet are referred to as web portals, such as Yahoo and MSN However, large companies also often have private portals inside their corporate networks that are only for their employees to access it. These corporate portals have useful links to important resources within the company, such as Human Resources and the IT department content, as well as a search box to search the company's internal data. The main difference between a portal and a plain search engine is that the portal also presents the links in an organized hierarchical manner that can be conveniently browsed, and may also include other editorial content.
[back to top]
 
portlet
Related Terms:  enterprise portal, CGI A way of packaging small software components so they can easily be used and reused as part of an enterprise portal. Some companies package their search engine as a portlet, so that owners of different parts of the enterprise portal can easily add search to their section. In some ways a portlet acts as a CGI.
[back to top]
 
Power Point
Synonyms:  PPT Related Terms:  KeyView A popular slide presentation format created by Microsoft. PPT slides are often indexed by enterprise search engine spiders.
[back to top]
 
power users
Related Terms:  advanced search A person who uses search engines frequently and is very proficient at it.
[back to top]
 
precision
Related Terms:  recall, ranking, set theory The percentage of matching documents that a user would believe to be relevant compared to the total number of documents retrieved. A precision of 20% means that only 1/5th of the documents returned were really of interest to the user, whereas the other 80% had only spurious or irrelevant words that happened to match one of the search terms. This term is often used by academics when discussing search engine performance. More modern benchmarks consider the ranking or ordering of the documents when evaluating relevancy; even a low precision (lots of extraneous documents) may be acceptable to the user, if the few relevant documents are shown at the top of the results list.
[back to top]
 
precision vs. recall
Related Terms:  precision, recall, ranking, ordered set, set theory, single shot relevancy These two factors are used by academics who research and evaluate search engines. It was generally believed that these two factors were a tradeoff against each other; if a search engine's "recall" were improved, then its "precision" would degrade, and vice versa. More modern search engine evaluation criteria also take info effect the order of the returned results, and the overall ability of a user to find an answer in a search session, rather than relying on a search being submitted only once.
[back to top]
 
Primary Key
Related Terms:  database, table, field, record, row, column, join, FK, foreign key In a traditional database each table has special field with a unique identifier for each row in that table. For example, each department in a company might have a unique department ID, so the dept_id field would be the primary key for the dept table. If a search engine needed to index and search that table of department information, it would use the dept_id field to track which records it had indexed, and to call up matching records from a results list. A primary key is also the destination link for other tables that refer to this table's records.
[back to top]
 
private network
Related Terms:  corporate network, firewall, Intranet, enterprise search engine Conceptually similar to a corporate network, but also applies to other secured networks in places like government agencies, institutions and even the secured network in peoples' homes.
[back to top]
 
Prod
See Main Definition:  Production systems
[back to top]
 
production systems
Synonyms:  prod Related Terms:  failover, load balancing, bcp, qos, qa, staging, uat The main set of computers used to run the search software and other applications. These are the servers the customers and employees normally interact with, and therefore prod systems are very important and must be reliable. Before software is loaded onto these servers it has likely been tested in qa, staging or uat systems. Production systems may be distributed in a load balanced or failover manner, so that if one part goes down, a hot backup or BCP server will take over.
[back to top]
 
profiling
See Main Definition:  automated document classification
[back to top]
 
protocol
Related Terms:  HTTP An agreed to standard for computers to exchange specific information. For example, HTTP is the protocol used by spiders to index web sites.
[back to top]
 
proximity matching
Related Terms:  sub-document context A multiword search where the terms must appear near each other in order for the document to be considered a match. Some engines refer to this as the "near" operator, or "within N words", or within the same sentence or paragraph. For sentence and paragraph proximity operators, some engines may not accurately detect proper sentence and paragraph word boundaries; instead they many approximate the behavior by using the "within N words" with a preset value. Proximity matching is typically only applied at search time, whereas sub-document context is often used to also affect advanced statistical calculations for the entire set of documents.
[back to top]
 
PT
See Main Definition:  Petabyte
[back to top]
 
Punchout
Synonyms:  punch-out, Ariba-ready Related Terms:  eProcurement, b2b, eCommerce A specific part of eProcurement B2B systems where the two systems of the buying and selling companies are deeply connected to make purchasing even more efficient. In this setup, an employee from company A can search for and purchase items from company B, while staying inside company A's purchasing system. The employee a company A is used to their own system and procedures, and would like to stay inside their own system, so only a part of their computer screen is taken over by the search engine and inventory of company B. Once items are selected from company B, the rest of the transaction is handled automatically between the A and B systems, and the employee is back in their own system. There are other advantages, in terms of reduced training and comparison pricing, for punchouts. However, it is a somewhat newer concept, and there are various protocols. The search engine at company B needs to know how to send search results back through the punchout protocol to system A. A company called Ariba was one of the earlier companies to offer this type of deep interconnection, and some other vendors have mimicked their framework.
[back to top]
 
  Q   [back to top]
 
q
See Main Definition:  query text
[back to top]
 
qos
See Main Definition:  quality of service Usage: In its abbreviated form, this often refers to automated settings in network equipment. In the spelled out form it usually means the promise or contractual obligation to maintain a defined level of reliability.
[back to top]
 
QPS
See Main Definition:  Queries Per Second
[back to top]
 
qt
See Main Definition:  query text
[back to top]
 
quality of service
How reliable a system will be. May refer to contractual obligations, or to specific technical configurations to achieve reliability. For example, when a business buys a search engine, they want to know how reliable it is. Search engines that are used inside of a small company may have a lower QOS requirement (or higher acceptable down time) than a large online retailer. Higher QOS requires both redundant hardware and more reliable software, and often costs more. There may be financial penalties in the contract if the agreed to QOS is not met. Usage: In its abbreviated form, QOS often refers to automated settings in network equipment. In the spelled out form it usually means the promise or contractual obligation to maintain a defined level of reliability.
[back to top]
 
Queries Per Second
Synonyms:  QPS Related Terms:  license restrictions The number of queries a search engine will run in one second. Obviously there is some practical limit to how much any system can do in one second. However, some search engines are also artificially restricted due to license restrictions. Most sites know how many searches were done in a day. Dividing that number by the number of seconds in a day (86,400) would give the average QPS. However, search activity rises and falls during the day, and some engines limit how many searches can be done in any one second, regardless of the average.
[back to top]
 
query
See Main Definition:  search Usage: relational databases tend to use the term "query", search engines tend to use the term "search".
[back to top]
 
query cooking
See Main Definition:  query tuning
[back to top]
 
query pipeline
Related Terms:  query cooking, query transform, stemming, lemmatization A chain of cooperating processes or steps that together perform a complete query transform. Each step performs one part of the transformation. This modular approach allows for more flexibility and easier debugging.
[back to top]
 
query text
The search terms or query submitted to the search engine.
[back to top]
 
query transform
Related Terms:  query cooking, query pipeline, stemming, lemmatization A means of taking the search terms typed in by a user and transforming them into a richer query that will then be sent to the search engine. For example, the expanded query will typically use an advanced syntax supported by the search engine. It can also specify the handling of word variations such as stemming or lemmatization, apply a search filter, or specify which parts of the document should be searched. Query transform is slightly broader in scope than query cooking, which is usually concerned with just adjusting relevancy.
[back to top]
 
query tuning
Synonyms:  query cooking The process of automatically adding additional parameters to a users' search in order to improve relevancy. For example, if a user types in the search "budget", the query cooker might include extra parameters to boost documents that are more recent. Alt definition: The process of enhancing the default relevancy algorithms provided by a search application. This process allows companies to focus on parts of documents that are known to be relevant within a document format that may not be globally applicable. For example, a firm may include an Abstract in each research piece; so user search terms that are found in the Abstract may indicate a higher relevancy.
[back to top]
 
Quick Links
See Main Definition:  content promotion Usage: This is another name for a content promotion based on key words, just like "Best Bets".
[back to top]
 
  R   [back to top]
 
RAID
Related Terms:  disk storage Refers to configuring sets of individual disk drives to act as a team, typically to either improve performance, reliability or both. Just saying "RAID" is insufficient, as the various configurations are quite different from each other. Since search engines make such heavy use of disk storage, RAID configurations geared towards increasing performance can benefit search. RAID is often used in conjunction with high speed SCSI disks.
[back to top]
 
ranking
Related Terms:  recall, precision The order of documents returned in a results list, usually based on relevancy calculations; more relevant documents will be shown first.
[back to top]
 
raster image
A picture of a document that is composed of simple dots, verses the actual text of the document. Raster images must be converted to text by an OCR process before they can be indexed and searched.
[back to top]
 
rcvdk
Related Terms:  Autonomy K2 A command line utility for performing test searches against a K2 collection.
[back to top]
 
RDBMS
See Main Definition:  relational database
[back to top]
 
read/write transactions
Related Terms:  database offloading Traditional database perform queries against their data, but they also update the data. This is in contrast to most search engines, which typically only read data.
[back to top]
 
recall
Related Terms:  precision, ranking, set theory The percentage of documents that were retrieved by a search vs. the "correct" number of documents that the user would believe to be relevant. A recall of 50% means the engine only found half of the documents it should have. This term is often used by academics when discussing search engine performance.
[back to top]
 
record
Related Terms:  document, page, web page, result A unit of retrieval data, usually in a relational database. For example, each employee would have a record in the employee table. Records are composed of one or more fields. With search engines, the closest equivalent term is "document". A "record" may also refer to a single results list entry.
[back to top]
 
redacting
Synonyms:  redacting documents Related Terms:  sub-field security The process of removing specific words and phrases from a document, while still allowing users to view the rest of the content; the removed terms are often visually represented by black rectangles. In search applications, this can be considered a form of sub-field document security.
[back to top]
 
RedHat
Related Terms:  Linux A company that sells a particular version of Linux that they compile and maintain. Some companies prefer to pay money to RedHat, instead of using a free version of Linux, so that they can get reliable patches and support should a problem occur.
[back to top]
 
relational database
Synonyms:  database, RDBMS, traditional database Related Terms:  SQL, join, table, record, database gateway The traditional software used to store large amounts of structured electronic data on a computer. Oracle is an example of a relational database.
[back to top]
 
Relational Database Management System
See Main Definition:  relational database Usage: This long form is rarely used, common is short form "database", or "relational database" to distinguish from other types of databases.
[back to top]
 
relevance
Synonyms:  relevancy Related Terms:  single shot relevancy, organic results, query tuning In search engines, a mathematical calculation that attempts to estimate how well a particular document matches a user's search, how "relevant" each document is to the search. Documents are typically returned with the highest relevancy estimate first.
[back to top]
 
relevance histogram
Synonyms:  search histogram A histogram that shows the weights of all matching documents returned by a search, with the highest relevancy shown first. This is useful when designing relevancy tuning techniques. Ideally the histogram will show a well defined peak and a relatively smooth curve downwards.
[back to top]
 
relevancy
See Main Definition:  relevance
[back to top]
 
repository
A source of documents or data, such as a web server, file server, database, content management system or email archive.
[back to top]
 
repository database
A database describing all the sources of content within an organization. Each entry should include a starting URL or database access information, and a thorough description of the type of data that can be found. Large companies may have thousands of individual web sites and data silos.
[back to top]
 
Request for Information
Synonyms:  RFI Related Terms:  RFP A written request for product information from one company to another; in this case the company who wrote the RFI might be a potential customer of the company who responds to the RFI. An RFI can be considered more preliminary, when compared to an RFP, and may not include as many detailed questions.
[back to top]
 
Request for Proposal
Synonyms:  RFP Related Terms:  RFI A written request for product and pricing information from one company to another; in this case the company who wrote the RFP might be a potential customer of the company who responds to the RFP. An RFP is often more detailed than an RFI, and include pricing questions.
[back to top]
 
result list entry
Synonyms:  hit, result Related Terms:  document When a search is run against a search engine, a set of matching documents is returned. Each matching document is represented as a result list entry, and includes the various Meta Data fields defined for that document. To display the full results, this list is iterated over, and information about each document is displayed in a tabular form. A result list entry will also usually include a link to view the fulltext of the original document.
[back to top]
 
results list highlighting
Synonyms:  summary highlighting The practice of visually highlighting the users' search terms in the summary of the document that is displayed in the results list. This is sometimes confused with the highlighting of the search terms when a document is actually opened up for viewing.
[back to top]
 
results list navigation
Synonyms:  search results navigation Related Terms:  navigators Arranging a search results list in such a way that a user can easily understand the types of documents that are available, and click on links to drill down without the need to type anything else. There are many different technologies for generating navigators; the end user they all look similar, but the methods used to generate them vary widely. The different types of navigators are confusing even to many inside the search industry.
[back to top]
 
results list visualization
Synonyms:  results visualization Related Terms:  Search 2.0, results list navigation Using graphical elements or high level summaries in the results list to convey the types of documents that matched the search. This technique is useful when searches routinely match thousands of documents. The user can usually then click on a particular element to drill down While it is related to results list navigation, it is more likely to use graphics or other novel ways of displaying information.
[back to top]
 
results visualization
See Main Definition:  results list visualization
[back to top]
 
Return on Investment
Synonyms:  ROI Related Terms:  Hard ROI, Soft ROI, TCO If a company spends money on new technology, such as a new search engine, what will the financial payback be. The payback is typically in the form of future cost savings (soft ROI) or by directly generating more business or higher profits (hard ROI).
[back to top]
 
reverse DNS
Related Terms:  DNS In computer networking, machines have a numerical TCP/IP address and a textual name, which is easier for humans to remember. Sometimes a computer has the numerical TCP/IP address and wants to know the name of that machine; it can find this out by doing a reverse DNS lookup. It'd called "reverse" because usually DNS is used to translate a name into an IP address. Some search engines require proper reverse DNS configuration in order to run correctly.
[back to top]
 
RFI
See Main Definition:  Request for Information
[back to top]
 
RFP
See Main Definition:  Request for Proposal
[back to top]
 
RHEL
See Main Definition:  RedHat Enterprise Linux Related Terms:  Linux
[back to top]
 
RHL
See Main Definition:  RedHat Linux
[back to top]
 
rich media
Related Terms:  audio mining Data that contains non-textual data such as images, video or audio. Searching this type of data requires very sophisticated algorithms and is not yet widely in use.
[back to top]
 
robot
See Main Definition:  spider Synonyms:  bot, 'bot
[back to top]
 
robots.txt
Related Terms:  spider A policy file on a web site that informs spiders where it is and is not safe to spider.
[back to top]
 
ROI
See Main Definition:  Return on Investment
[back to top]
 
rows and columns (databases)
Related Terms:  record, field, document, row, meta data Traditional databases arrange data in tables, organized into rows and columns. A row can also be called a record. A column can also be called a field. fulltext search engines have similar concepts, but refer to their entries as "documents" and "meta fields", respectively.
[back to top]
 
rows and columns (scalability)
Related Terms:  scalability, sizing A grid-like configuration of multiple computers that, together, can index many documents and handle many simultaneous user searches. Typically the document indices are divided up evenly between the various "columns", and the user searches are divided up evenly between each row; rows also provide a fail-over for each other, as each row can search all columns.
[back to top]
 
  S   [back to top]
 
SAAS
See Main Definition:  Software as a Service
[back to top]
 
SAN
See Main Definition:  Storage Array Network Related Terms:  disk storage
[back to top]
 
SAR
See Main Definition:  Suspicious Activity Report
[back to top]
 
Sarbanes Oxley
See Main Definition:  Sarbanes-Oxley Act
[back to top]
 
Sarbanes-Oxley Act
Synonyms:  SOX Related Terms:  compliance, eDiscovery, search engine data quality A stricter set of US government regulations controlling how companies must handle their data, and other procedural matters. Search engines are sometimes used for this task, and there may be legal requirements to insure that all documents are properly indexed and searchable. Complying with these rules may involve purchasing new software. See http://en.wikipedia.org/wiki/Sarbanes_oxley
[back to top]
 
SATA
Related Terms:  disk storage An newer but still lower end standard for the internal electrical connection of hard drives inside of a computer. SATA is replacing IDE. Higher end machines use SCSI, fiber channel, RAID or SAN. Since search makes heavy use of disk storage, the slower IDE and SATA connections may not offer as good of performance as a higher end standard, although the performance penalty may be acceptable for development systems
[back to top]
 
SBP
See Main Definition:  Search Best Practices
[back to top]
 
scalability
Related Terms:  sizing How easily can software be scaled up in order to handle more documents and/or searches. Scaling up generally means deploying the same software on multiple machines, and coordinating those machines to work together efficiently. Older software tended to not scale to multiple machines easily.
[back to top]
 
Schema (database)
In a database, the schema describes the database columns (AKA fields) in each table, including what type of data to expect (such as text, numbers, dates, etc)
[back to top]
 
Schema (Fulltext search)
Related Terms:  style.ddd, style.ufl, FAST Index Profile Many older fulltext search engines required Meta Data and other special document attributes to be defined ahead of time, similar to the schema for a traditional database. By declaring special fields, search engines can often do sorting and hybrid filtered searches on those fields.
[back to top]
 
Schema (URL/URI)
At the top of an XML file there will typically be a reference to a schema that explains the structure of the file; the reference will look like a URL.
[back to top]
 
Schema (XML)
Related Terms:  Schema (URLL) A specially formatted file that describes the structure of an XML file. At the top of an XML file there will typically be a reference to a schema that explains the structure of the file; the reference will look like a URL.
[back to top]
 
SCOE
See Main Definition:  Search Center of Excellence
[back to top]
 
scope of search
Synonyms:  scoped search, filtered search Related Terms:  hybrid search, faceted search, taxonomy, FAQ Controlling which sets of data a full-text search will be run against. For example, instead of searching the entire internet for a coach, a user might instead search on a furniture site. More commonly, within a web site, a user can choose to "search entire site", or search within a particular section of a site (for example, within FAQs). Behind the scenes, these scoped searches are often implemented by issuing a hybrid search to the underlying search engine. Another way to control the scope of search is to combine the full-text search with sections of a taxonomy, so that a search is limited to content within that taxonomy.
[back to top]
 
scope of search
Related Terms:  drill-down, advanced search, parametric search Limiting the set of documents a search engine will look at for users' search. For example, instead of searching an entire web site for information, a visitor might choose to only search the Technical Support section.
[back to top]
 
scraper
Synonyms:  scraping Related Terms:  ETL, spider, entity extractor Using software to automatically gather data from sources that are normally intended for humans to read. The software simulates mouse clicks and can even type in data into a form and submit it. A set of carefully constructed patterns are used to sift through the results that the server sends back. For example, a web page scraper might look for the text inside an H1 tag in HTML content to find the title of a web page. Classical scrapers had very sensitive rules and would often break even if simple cosmetic changes were made to the output; more modern scrapers can have more flexible rules, or employ multiple rules for the same item. This technology can be used for ETL or entity extraction. Most spiders use some form of scraping to make sense of the pages they are downloading, and could be considered highly specialized web page scrapers.
[back to top]
 
screen scraper
Related Terms:  scraper An early form of scraper used to gather data from mainframe terminal based applications. The screen scraper would simulate a user on a text mode or block mode terminal. Usage: Some people still use this term when they are actually referring to web page scrapers.
[back to top]
 
script
Related Terms:  shell script, batch file A set of high level commands stored in a text file that can be run when needed, or on a set schedule.
[back to top]
 
script file
See Main Definition:  shell script Synonyms:  script Related Terms:  batch file Usage: usually refers to a Unix shell script. May also refer to a Windows batch file.
[back to top]
 
SCSI
Related Terms:  disk storage An older high end standard for the internal electrical connection of hard drives inside of a computer; there are many versions of SCSI as the standard has evolved. SCSI equipment is typically more expensive than IDE or SATA. Modern SCSI may use electrical or optical connections. For maximum performance SCSI is often combined with RAID. Search applications make very heavy use of disk storage, so will usually benefit from this higher end equipment.
[back to top]
 
search
Synonyms:  query Related Terms:  Internet search syntax, VQL, SQL A set of terms or other criteria given to a search engine or relational database; matching documents are returned.
[back to top]
 
search 2.0
The next generation of search engine technology. Search 2.0 attempts to address the shortcomings of earlier search engines by providing more information in the results list, and making the results list more interactive so that users can drill down to find the specific result they are looking for.
[back to top]
 
search activity histogram
A histogram that shows the most popular searches and how many times each occurred. The left side of the graph represents popular queries, whereas the right side is referred to as the "long tail". Different techniques are employed to improve relevancy.
[back to top]
 
Search Analytics
Related Terms:  SearchTrack A series of reports showing what visitors are searching for on a web site or database. The reports will include most frequently searched for terms, search terms that produce no results, search terms that produce too many results, and the general trends of search, such as which search terms are gaining in popularity. This level of information is often more informative than simple click tracking.
[back to top]
 
search appliance
Related Terms:  data silo, embedded search engine A computer that has been preconfigured with search engine software and is sold or leased as a complete search solution, with an easy to use administration interface.
[back to top]
 
Search Best Practices
Synonyms:  SBP A set of procedures followed by many companies in order to provide excellent search results for customers and employees. A company wishing to add, improve or upgrade search functionality can take advantage of what other companies have learned works and does not work. SBP requires experience with many other companies who were selecting and deploying search, and typically this knowledge is only available from consultants who have worked with search for a long time. New Idea Engineering offers SBP services.
[back to top]
 
Search Center of Excellence
Synonyms:  SCOE A group or department within a company that is devoted to improving their company's search technology. In some companies this is an actual department, whereas in other companies it is team of representatives from various departments.
[back to top]
 
search dialtone
Related Terms:  search engine, hosted search Having a search engine up and running with basic functionality all the time. Does not offer any advanced results list navigation. Can also refer to a hosted search engine service.
[back to top]
 
Search DQ
See Main Definition:  search engine data quality
[back to top]
 
search engine
Synonyms:  Full-text search engine Related Terms:  enterprise search engine, web search engine, search engine vendors, embedded search engine, API, hosted search Software that indexes and searches vast amounts of content very quickly, based on the words in each document. Search engines borrow and expand upon many of the techniques originally used in traditional databases, extending search into the realm of unstructured textual data.
[back to top]
 
search engine data quality
Synonyms:  Search DQ Related Terms:  search engine indices, meta data, third party vendor, SOX A series of tools and procedures for insuring that the search indices and search results accurately reflect the source documents, and that search results are generally acceptable to the users. See http://ideaeng.com/pub/entsrch/v2n3/article01.html and http://ideaeng.com/pub/entsrch/v2n4/article01.html
[back to top]
 
Search Engine Optimization
Synonyms:  SEO Optimizing a web site so that search engines will index the content and rank it very high on the list. Usually this is focused on changing the content and organization of a company's public web site, so that web portals like Yahoo, Google and MSN will rank a company above its competitors. However, the techniques used in for public web site SEO are often also helpful with Enterprise Search.
[back to top]
 
search engine vendors
Related Terms:  Verity, FAST Search, Google, Autonomy, FreeFind, Lucene Vendors who sell search engine software. Lucene is a free open source search engine.
[back to top]
 
search form
See Main Definition:  web page form Usage: A web page form specifically used to submit a search to a web site's search engine.
[back to top]
 
Search Front End
Synonyms:  SFE Related Terms:  FAST ESP, java, application stack The sample Java code provided with FAST ESP. Many customers start with this sample code to write their own customer search applications.
[back to top]
 
search highlighting
Synonyms:  highlighting The practice of visually highlighting the users' search terms in the text of matching documents. This can happen in the summary that is displayed in the results list, or when the entire document is opened up.
[back to top]
 
search histogram
See Main Definition:  relevance histogram
[back to top]
 
search indices
Synonyms:  search indices (plural), search index (singular), search indexes, Full-Text search index, search engine index, collection Related Terms:  relational database, index (noun), document index, document catalog The set of binary files that the search engine creates and uses to do the actual searches. An analogy in a library would be the card catalog at the font, although search engine indices contain much more details, including every instance of every word in every document. Unlike a traditional database, the original document to be indexed and search is not stored in the search index; it usually remains in its original location.
[back to top]
 
search precision
See Main Definition:  precision
[back to top]
 
search recall
See Main Definition:  recall
[back to top]
 
search results navigation
See Main Definition:  results list navigation
[back to top]
 
search results visualization
See Main Definition:  results list visualization
[back to top]
 
search wizard
A form of interactive search that guides the user through creating or adjusting their search to find more relevant documents. Search Wizards can be combined with other results list navigation navigators.
[back to top]
 
SearchTrack
Related Terms:  Search Analytics, content promotion, NIE A software product offered by New Idea Engineering that provides integrated cross vendor Search Analytics, search logging and content promotion. See http://www.ideaeng.com/optimize/
[back to top]
 
secure network
See Main Definition:  Intranet
[back to top]
 
security
Related Terms:  collection level security, document level security, sub-document security, sub-field security, early-binding security, late-binding security In relation to search engines, security generally means carefully controlling which users can see which documents in the search results. This is becoming a very important topic in the enterprise search engine market.
[back to top]
 
segment
Synonyms:  partition Related Terms:  search indices A subset of the binary files in a search index that contain the index data for a portion of the documents. Many search engines divide up their search indices in this manner to make managing the indexes easier.
[back to top]
 
sentiment analysis
Related Terms:  results list visualization, FAST Search and Transfer An automated technique to determine the opinion or frame of mind of the author of a document. For example, a company selling a product might look at blog entries on the Internet to see how many people like the product and how many do not. These methods are not 100% reliable, but may be useful in market research.
[back to top]
 
SEO
See Main Definition:  Search Engine Optimization
[back to top]
 
set theory
Related Terms:  precision, recall, ranking A mathematical technique used by academics to measure the performance of a search engine. Logical sets of documents are defined and then tabulated for each test search. The numbers from these various sets are then compared to each other to form specific ratios that can be compared.
[back to top]
 
Seven by Twenty Four
See Main Definition:  Twenty Four by Seven
[back to top]
 
SFE
See Main Definition:  Search Front End
[back to top]
 
SGML
Related Terms:  HTML, XML An older document format for storing structured document data. Though popular with some groups, it was quite complex. It is the direct predecessor of XML and HTML.
[back to top]
 
SharePoint
See Main Definition:  Microsoft SharePoint
[back to top]
 
shell script
Related Terms:  script file, script, batch file A series of commands stored in a text file. "shell script" usually refers to Unix shell scripts, which usually have extensions of .sh, .csh, .tcsh. Sometimes people also refer to Microsoft Windows batch files as shell scripts, in which case they will usually have a .bat extension. Many companies use shell scripts to automate the process of indexing their content.
[back to top]
 
SI
See Main Definition:  Systems Integrator
[back to top]
 
silo
See Main Definition:  data silo
[back to top]
 
silo database
See Main Definition:  repository database
[back to top]
 
siloed approach
A somewhat ambiguous phrase, usually indicating that content is stored in a data silo. It may also imply that federated search is being used to search that content.
[back to top]
 
Single Shot Relevancy
Related Terms:  Search 2.0, drill-down, guided navigation, interative search The belief that a search engine can reliably accept searches from users and bring back the most relevant match to the top of the results list. Whether this is a reasonable or even achievable goal for a computer has been called into question as document counts get larger and larger; sometimes even humans can't agree on the "right" answer to a question. Though many techniques have been designed to improve the likelihood of this happening, there are now many searches where a computer simply cannot determine the most relevant document. This problem is magnified by the increasing number of documents now in search engine indices, and the growing habit of users to type in only one or two search terms. The fix for this in Search 2.0 is to provide an interactive results list where the user can drill down though the thousands of matching documents to find the answer they need. This term was coined by NIE.
[back to top]
 
Single Sign On
Synonyms:  SSO A means of allowing a user to login once to a corporate network and then access all business applications without the need to login again for each one; behind the scenes each application talks to the SSO system to verify the user's access. Applying SSO security to search engines, to insure that users can only search and find documents they are entitled to see, is a very hot topic these days.
[back to top]
 
site search engine
Synonyms:  local search engine, local site search engine Related Terms:  enterprise search engine, search engine vendors A search engine that indexes and searches only the content only its own web site. This might be on a public web site for finding public content, or for an enterprise web site; when a local site search engine is used behind a firewall to search that company's secure content it is usually referred to as an Enterprise Search Engine.
[back to top]
 
SiteMinder
Synonyms:  Site Minder Related Terms:  Netegrity A product uses for Single Sign On (SSO), allowing employees to access multiple separate systems inside of a company with a single unified login.
[back to top]
 
sizing
Related Terms:  scalability, rows and columns (scalability) The calculation of how many computers will be needed to accommodate a large amount of searchable documents and user searches, and how those machines should be configured.
[back to top]
 
social networking
In the context of search engines, social network attempts to improve the relevancy of matching documents by considering the previous actions of other uses. For example, a system may rank a certain document higher based on the previous number of times other users clicked on that document vs. other documents in the results list. Other forms of social input are considering links to particular document, or preferring certain sets of documents for certain classes of users. Some vendors refer to using this type of social network data to "add context" to a new users' search.
[back to top]
 
socket
Related Terms:  TCP/IP A communications channel between two computers or two programs, that allows them to send data back and forth. Most search engines communicate heavily over TCP/IP sockets.
[back to top]
 
Soft ROI
Related Terms:  ROI, Hard ROI If a company spends money on new technology, how much money will they save? For example, if a company upgrades the search engine on their Tech Support web site customers might be able to find answers more easily there instead of calling Tech Support on the phone. Reducing calls to Tech Support, or at least avoiding a rise in calls as the number of customers increase, can reduce the number of people the company has to hire. Improving the efficiency of a corporate portal with improved search, which would presumably make employees more efficient, is another frequent Soft ROI claim. These ROI claims are "soft" because they are difficult to precisely predict and track.
[back to top]
 
Software as a Service
Synonyms:  SAAS Software that can be used by computers remotely, without the need to install the software locally. Search functionality can be delivered in this way.
[back to top]
 
software key
See Main Definition:  license key
[back to top]
 
software license
Related Terms:  license restriction, qps, document count An legal and financial agreement between the software vendor and the customer about how the software will be used. This agreement may be enforced with a license key. Limits may be placed on the software, for example how many machines it can be run on, or how many queries per second it can handle, or how many documents it can index.
[back to top]
 
source query text
See Main Definition:  filtered search
[back to top]
 
SOW
See Main Definition:  Statement of Work
[back to top]
 
SOX
See Main Definition:  Sarbanes-Oxley Act
[back to top]
 
SOX
See Main Definition:  Sarbanes Oxley
[back to top]
 
spider
Synonyms:  crawler, web crawler, robot, bot, 'bot Related Terms:  indexer, import/export, robots.txt is a special type of document indexer that follows links on a web site, to eventually index the entire web site. It goes from web page to web page, via the HTML hyperlinks, until the entire site has been indexed.
[back to top]
 
sponsored link
See Main Definition:  sponsored result
[back to top]
 
sponsored result
Synonyms:  sponsored link, paid listing A search hit presented in the results list who's owner has paid to have their listing presented at the top of the list or in some other prominent manner. In effect, business rules have been used to override the organic relevancy.
[back to top]
 
SQL
See Main Definition:  Structured Query Language
[back to top]
 
SSO
See Main Definition:  Single Sign On
[back to top]
 
staging systems
Synonyms:  qa, qa systems, uat, user acceptance testing A set of computers used to test search applications and other software before it is deployed to the production servers. In some companies, staging and qa systems are the same machines; in a larger company the staging machines are typically the last testing platform before production, whereas qa systems might be closer to development.
[back to top]
 
static content
Synonyms:  static web pages Related Terms:  dynamic content, HTML, spider Web pages that rarely change, and are often stored as a physical HTML file. Much of the early World Wide Web was composed of this static content. Later and more elaborate web sites now often use dynamic content. Most spiders have an easy time indexing static content.
[back to top]
 
static summaries
Synonyms:  fixed summaries A textual summary of a document is often displayed in the results list under the title of a document. Although dynamic summaries are popular, they are not always technically feasible, depending on the search engine and repository being used. Instead, a summary is calculated at index time and stored in the search engine index; the summary may be automatically generated by the search engine indexer, or may be grabbed from the explicit summary declared in the document meta data, such as an HTML summary This summary will be the same for all users who see the document in the results list, regardless of the search terms they entered. These types of summaries are not usually highlighted.
[back to top]
 
stemming
Related Terms:  lemmatization, normalize The process of normalizing words to their base form, usually while a document is being indexed. Later, when a search is run, those terms can also be normalized, such that the search terms will match other variants of each word. For example, a document containing the word "golfing" would be stemmed down to the base term "golf" and stored in the document index. Later, a search for the word "golfed" would also be reduced to "golf", so that the search "golfed" would match the document with "golfing". The advantage of this technique is that the searches are very efficient even for words with many variant forms; the disadvantage of this technique is that if the list of acceptable variants is changed, the documents will typically need to be reindexed. Stemming offers similar functionality to lemmatization, but by using very different means; stemming is a "reduction" technique, whereas lemmatization is an "expansive" technique.
[back to top]
 
stop words
Common words that are deemed to be unimportant. Early search engines discarded such words entirely, with the negative side effect breaking phrase searches; for example the phrase "We The People" would not match a historical document because the word "the" had been dropped from the search engine's index. In more modern times, common stop words may be discarded from lists of search terms, or ignored in statistical calculations.
[back to top]
 
Storage Array Network
Synonyms:  SAN Related Terms:  disk storage A type of disk storage that is connected to with some high speed connection, or occasionally over a network, and is shared by multiple computers. Access to files is handled at a relatively low level, AKA "block level", and SAN is generally considered to be more efficient than NAS, though it still might not be good enough for some high end search applications. Most vendors do not officially recommend using SAN, but many customers do it anyway.
[back to top]
 
Stratify
Related Terms:  taxonomy, parametric search A software vendor that offers tools to help analyze full-text content and help build appropriate taxonomies. See http://stratify.com
[back to top]
 
structured content
Related Terms:  structured data Data that contains well defined fields or meta data within the documents, or documents that are at least well organized in a consistent and useful manner.
[back to top]
 
Structured Query Language
Synonyms:  SQL Related Terms:  search, relational database, join A standardized syntax for specifying queries that are to be sent to a relational database. There is no similar universal standard for search engines, though the informal versions of the Internet search syntax are gaining wide acceptance. The abbreviation SQL is actually much more common than the fully spelled out phrase.
[back to top]
 
sub-document context
Traditional search engines treated all the words in a document as a single large group; longer documents tended to have many words that were not tightly related to each other. This would skew statistics that tracked the occurrence of these distant pairs of words. A newer approach is to break the document up into smaller sections, such as at the paragraph or sentence level. Two distant words in a document may still match the terms of multiword search, but the words will not be considered to have coincided with each other for the more advanced calculations, and the document will not be given as high of a relevance.
[back to top]
 
sub-document indexing
Synonyms:  compound document Related Terms:  sub-document context, FAQ Some long documents actually contain many logically separate pieces of information that have simply been lumped together into a single web page or document. For example, the homepage of a newspaper site will have many different stories on the homepage. This causes several problems for search engines, for example a user's search that contains two terms might match one term in one news story, and the second in a completely separate story, giving a false match. Also, titles shown in the results list will not be specific to the individual stories. Sub-document indexing is an attempt to break up these compound documents up into smaller parts, such that each section contains only one logical story. This technique can also be useful for indexing FAQ style web pages.
[back to top]
 
sub-document security
See Main Definition:  field-level security Related Terms:  sub-field security
[back to top]
 
sub-field security
Related Terms:  redacting Controlling access to specific parts of a document, even within a portion of a document. For example, a customer might be able to read most of a bug report, but specific paragraphs or sentences containing proprietary information are not visible. There may or may not be any visual clue that information has been removed. This process is sometimes call "redacting". It can sometimes be implemented using XSLT.
[back to top]
 
Subject Based Taxonomy
Related Terms:  Taxonomy A taxonomy based on a logical set of subjects, organized in a logical order. The popular web portals Yahoo and DMOZ.org are modern examples. The organization of cards in a traditional library, called the Dewey Decimal System, is an older example.
[back to top]
 
subject context
Narrowing the scope of search to a particular subject area or field of interest. Many words are reused to mean very different things and a simple word matching algorithm will therefore bring back irrelevant results. However, if the data has been categorized appropriately, the system may know which documents are within specific domains, and could therefore limit the search to that area; the user also needs to indicate which of those areas they are interested in. As an example, a search for the term "bush" might be referring to the President of the United State, in which case political documents would be appropriate; or the term "bush" might have referred to the plants used in landscaping (shrubbery, etc.), and therefore documents related to Home Improvement or Botany might be more appropriate. For this system to work, both the documents and user searches need to categorized correctly.
[back to top]
 
subject domain disambiguation
Related Terms:  subject context Providing an interactive way for a user to indicate which subject domain they are interested in. This is particularly useful when a common term or abbreviation has very different meanings. For example, a user search for "cd" would be shown some results from both Financial Documents (where CD means Certificate of Deposit) and also from Popular Music (where CD means Compact Disc). The user could then click on one of those headings to further drill down into the search results. The search engine now knows which meaning of "cd" the user had in mind, and can give much better results.
[back to top]
 
summary highlighting
See Main Definition:  results list highlighting
[back to top]
 
super target
Related Terms:  search tuning, relevancy A form of query tuning that elevates the importance of a particular search term.
[back to top]
 
Suspicious Activity Report
Synonyms:  SAR Related Terms:  AML, Anti Money Laundering, BSA, Bank Secrecy Act A report filed by a bank or financial institution about a financial transaction that is considered unusual for some reason. An SAR by itself does not necessarily mean that anything is actually wrong, it just means something is unusual and should be looked at further. SAR's form one aspect of AML (Anti Money Laundering), which search engines are sometimes used to implement.
[back to top]
 
symmetric DNS name resolution
Related Terms:  DNS, reverse DNS In short, the DNS and reverse DNS for a server name and IP address pair must give back the same results. Longer explanation: In computer networking, machines have a numerical TCP/IP address and a textual name, which is easier for humans to remember. Computers can use DNS to lookup a name, and get the numerical address. Or it can use reverse DNS to take numerical TCP/IP address and find out the name of that machine. Some search engines require that the name to IP address (DNS) and IP address to name (reverse DNS) match in both directions. So if you start with a name, use DNS to a number, and then use reverse DNS with that number to get back to the name, that the starting and engine name must match. Or if you go from IP address to name and back to IP address, the two addresses must match.
[back to top]
 
Systems Integrator
Synonyms:  SI A company that can link together various pieces of hardware and software from other vendors, to accomplish a larger task.
[back to top]
 
  T   [back to top]
 
T&M
See Main Definition:  Time and Materials Synonyms:  by the hour
[back to top]
 
table
Related Terms:  segment, partition, collection, search indices In relational databases, tables were a way to logically organize a set of data. Search engines do not normally talk of "tables". In search engines, a similar concept to a table is that of partitions or segments. These are typically transparent to the casual search engine administrator; you typically do not address individual partitions or segments with a search engine.
[back to top]
 
tagging (documents)
See Main Definition:  document tagging
[back to top]
 
tagging (social networking)
Related Terms:  social networking, context Allowing individual users to add descriptive words to a particular piece of data. These words can later be used to improve search results.
[back to top]
 
tar
Related Terms:  tar file TAR was originally an abbreviation for "Tape Archive", but in more modern times it is a means of storing sets of files on disk and transmitting them over the Internet, and where magnetic tape is now rarely involved. Use of tar is very common on Unix and Linux based operating systems. There are several dialects of tar files, some of them use compression. Most search engine software for Unix and Linux is distributed with tar or some similar variant.
[back to top]
 
tar file
Synonyms:  tar ball Related Terms:  tar A set of files and directories bundled together into a single file, which is more convenient for transmitting over a network. Tar files are created and read by the Unix tar command. An uncompressed tar file will typically have a ".tar" file extension. Other extensions typically indicate some type of compression has been used to make the file a bit smaller.
[back to top]
 
taxonomy
Related Terms:  hierarchical data, scope of search, faceted search, hybrid search, Behavior Based Taxonomy, Topic, Topic set, Endeca, Stratify, Verity An organized set of concepts or definitions, usually labeled by keywords; for search engines, a taxonomy can also be a set of organized searches. Taxonomies are typically nested in a hierarchal manner, often called a "tree", going from broader to more specific concepts as you navigate further into the taxonomy "branches" and "leaves" (or "end points").
[back to top]
 
Taxonomy Browse Tree
Related Terms:  taxonomy A Taxonomy Browse Tree is a visual representation of a taxonomy that allows users to click on elements of the taxonomy to view child branches and/or leaves. When connected to search, a Taxonomy Browse Tree provides a kind of Guided Navigation, displaying appropriate results. Typically, a Taxonomy Browse Tree will display all branches and leaves, not just those that are relevant and which will return results as with Parametric and Faceted Search.
[back to top]
 
TB
See Main Definition:  Terabyte
[back to top]
 
TCO
See Main Definition:  Total Cost of Ownership
[back to top]
 
Terabyte
A unit of measure indicating one trillion (US) bytes of computer data (AKA one million million). Note that the term "trillion" has different meanings in different parts of the world. Technically a Terabyte is 1024 * 1024 * 1024 * 1024 or 1024^4. This can be used to represent the amount of memory in a computer, or the amount of storage on a disk drive, or the size of a document or file. Years ago (late 1990s, early 2000s) this was considered to be such a large amount of computer data that it was more of a theoretical milestone, but due to the increases in computer storage and power, it is now frequently achieved in high end computers and business class servers. It is now possible for search engines to index and search Terabytes of data, although it could prove expensive in terms of hardware and software licensing.
[back to top]
 
term density
Synonyms:  word density, TF, term frequency, word density Related Terms:  term frequency A calculated percentage of how frequently a term appears in a document, relative to the overall size of the document. This fixes the problem with simple term frequency calculations. For example, if a word appears 5 times in a 2 page document and 10 times in document a 100 page document, the first document is probably still more relevant, even though it has 5 less occurrences of the term.
[back to top]
 
term frequency
Synonyms:  TF, term density, word density Related Terms:  term density The number of times a particular word (or term) appears in a particular document. The assumption being that if a word occurs more often in the document, than it is more relevant to that document. Some algorithms scale this number based on the overall size of the document, employing what is sometimes called term density. When the raw count is turned into a percentage, via some formula, it is technically then a term density, though often still referred to as term frequency.
[back to top]
 
Term License
Related Terms:  perpetual license A software license that is only granted for a limited time and then will expire. If the company who bought it wishes to keep using it they need to pay more money to renew the license for an additional period of time. The reason this type of license is sometimes considered is because the initial price is lower than a perpetual license for the same software. However, after the initial contract and one or two renewals, the total amount of money paid might wind up being more. A term license may also include software updates, bug fixes and technical support at no additional charge.
[back to top]
 
Test Harness
Borrowed from electronics hardware design, a Test Harness for search provides a way for business and technical owners to experiment with various relevancy models, search result layouts, and other aspects of search. Ideally, many of these changes can be done with no application changes by use of a Tuning Console.
[back to top]
 
TF
See Main Definition:  term frequency
[back to top]
 
third party vendor
Synonyms:  3rd party vendor, 3rd party tools Related Terms:  cross vendor, NIE Third party vendors offer tools and services for enterprise search engines, but are not a part of company who makes the search engine. Search engine vendors tend to focus on their core search capability, and may not offer a full set of supporting tools to manage a search engine, or the vendor may not offer tools to integrate data into another search engine; third party vendors can fill this gap. Also, since many companies use more than one search engine, it's much likely that at third party vendor will offer cross vendor solutions that work with all of the search engines being used.
[back to top]
 
Threaded Kernel, Multithreaded Application
A computer program that can do more than one thing at once; each separate task the program is doing is called a "thread". Threaded programs can often do things faster because they make more efficient use of the computer's resources. A "kernel" is the central core of the program that does most of the actual work.
[back to top]
 
Time and Materials
Synonyms:  T&M Related Terms:  fixed price project Paying for custom programming or consulting by the hour (or by the day), instead of agreeing to a fixed price for all work performed. "Materials" would also cover agreed to expenses such as travel expenses and supplies. If a project is not very well defined it is difficult for the contractor to offer a fixed price.
[back to top]
 
Topic
See Main Definition:  Verity Topic
[back to top]
 
Topic set
Synonyms:  topicset Related Terms:  taxonomy A set of Verity Topics organized together to form library of predefined searches. Good topic sets have descriptive names for each main topic.
[back to top]
 
Total Cost of Ownership
Synonyms:  TCO Related Terms:  ROI Calculating how much money, in total, that acquiring and installing a new piece of technology will cost. The costs typically include the initial software license, the computers to run the software on, the cost to install and integrate the software, training the users, and the cost to maintain it. As an example, some open source search engines look attractive because there is no up front license fee. However, in some cases the software might require more hours of time to install, configure and customize. If that additional time were much longer, the extra cost of paying those employees or consultants might be more than make up for any cost savings on license.
[back to top]
 
traditional database
See Main Definition:  relational database
[back to top]
 
Tuning Console
An interface for performing query tuning or relevancy adjustments. Creating an overall relevancy algorithm to enhance the out-of-the-box search application (Query Tuning) is part of any search implementation. Because search activity and documents change over time, companies need to monitor and review search activity to insure that the algorithm produces the best possible results. Unfortunately, most relevancy algorithms are implemented within application code, so even minor changes require code updates, resulting in long lead times to improve search results. Search technologies are beginning to include consoles that allow dynamic changes to the overall relevancy algorithm, and the application that provides this capability is called a Tuning Console or a Relevancy Tuning Console.
[back to top]
 
Twenty Four by Seven
Synonyms:  24/7, 24x7, 7/24, 7x24 24 hours a day and 7 days a week, basically all the time. Can indicate very high availability for a system, or ability to contact a company all the times of the day and night. If customers pay extra for software maintenance, they may be allowed to contact Tech Support at any hour of the day or night.
[back to top]
 
typo
Related Terms:  fuzzy matching A shorthand notation for matching words with common misspellings. These operators can also be designed to catch mistakes made by OCR systems.
[back to top]
 
  U   [back to top]
 
uat
See Main Definition:  User Acceptance Testing systems Synonyms:  qa, staging
[back to top]
 
UI
See Main Definition:  User Interface
[back to top]
 
Ultraseek
See Main Definition:  Autonomy Ultraseek
[back to top]
 
UltraSpider
Related Terms:  Autonomy Ultraseek, Autonomy K2 A modified version of the Ultraseek spider that can submit content to the K2 search engine.
[back to top]
 
Unicode
Related Terms:  UTF-8, UTF-16, codepage Unicode is a numerical system for representing all written characters from all human languages. Though very early versions of Unicode only had 16 bit characters, limiting them to approximately 65,000 symbols, modern versions of Unicode have been extended well past that limit. However, the first 65,000 symbols are still the most common ones used. Most search engines now understand Unicode. If they encounter a language using some other numerical representation, they will usually rewrite that text to map it into equivalent Unicode characters. When Unicode is to be stored on a hard disk or sent across the network it is typically packaged up and encoded in UTF-8.
[back to top]
 
Unix
Related Terms:  Linux, OS, Operating System An operating system first developed in the 1970s, and formed the basis for many operating systems currently in use, including Linux.
[back to top]
 
unstructured content
Synonyms:  unstructured data Documents that contain mostly text, with little or no meta data. If the documents are grouped and stored in some logical way, then some structure can be inferred by the URLs or filenames.
[back to top]
 
URL
Synonyms:  Universal Resource Locator, web page address Related Terms:  web page, CGI A sequence of characters that uniquely identify web pages and other resources on the Internet or an Intranet. They often start with http:// or https://. A URL with a question mark is a link to some type of CGI.
[back to top]
 
User Acceptance Testing systems
Synonyms:  uat, qa, staging A set of computers used for test search applications and other software before it is deployed to the production servers. In some companies these machines are synonymous with qa or staging servers. In other companies the tests performed on these machines are more focused on getting user input on the new software.
[back to top]
 
User Interface
Synonyms:  UI Related Terms:  GUI, Web UI The means by which a software program interacts with a human user, often using keyboard and/or mouse input.
[back to top]
 
UTF-16
A less common means of storing and transmitting Unicode characters, where most Unicode characters take up two bytes of storage. UTF-16 can be used to represent all Unicode characters, including characters above 65,000, by sometimes using four byte sequences. Many databases and programming languages use UTF-16 or a variant for highly optimized internal storage of textual data, but they still use UTF-8 when communicating with other programs.
[back to top]
 
UTF8
See Main Definition:  UTF-8
[back to top]
 
UTF-8
A very common means of storing and transmitting Unicode characters. Although people will sometimes use the terms UTF-8 and Unicode interchangeably, they are not the same. UTF-8 is just one specific way of storing Unicode data. UTF-8 can be used to represent all Unicode characters, including the 16 bit characters, and even the characters above 65,000, by sometimes using multiple byte sequences. Many search engines use UTF-8 as their default encoding when exchanging textual data with other processes or storing data on disk. UTF-8 is also the default encoding for XML data.
[back to top]
 
  V   [back to top]
 
Value Added Reseller
Related Terms:  Systems Integrator A company that typically buys, improves, and then resells goods and services at a profit; they have "added value" by improving it or combining it with other technologies in some way.
[back to top]
 
VAR
See Main Definition:  Value Added Reseller
[back to top]
 
Verity K2
See Main Definition:  Autonomy K2 Synonyms:  K2
[back to top]
 
Verity Query Language
Related Terms:  SQL, Internet search syntax A syntax in Verity K2 for expressing simple and complex searches. Supports advanced Boolean, nested and weighted queries. Can also be used to create Verity Topics. Since VQL predates the World Wide Web, it's syntax is more reminiscent of SQL than modern Internet search syntaxes, but it is also more powerful.
[back to top]
 
Verity Real-Time
Related Terms:  automated document categorization, AMHS Technology from Verity that watches a stream of documents and compares them to predefined searches; upon finding a match, various actions can be taken. Verity has had various products over the years with this capability; one of the earlier offerings in the 1990s was actually called Verity "Real-Time"
[back to top]
 
Verity Search97
Verity's earlier web enabled search engine product. It was sold during the late 1990s, and discontinued in the early 2000s in favor of more modern K2 technology based products. Verity K2 has similar tools and data structure.
[back to top]
 
Verity Topic
Synonyms:  Topic Related Terms:  VQL, taxonomy A stored or dynamically generated Verity search expressed in VQL; stored topics are usually given a meaningful name. A group of named and logically organized Topics is called a Topic Set, and in some ways can be used as a taxonomy. "Verity Topic" was also the name of a product line offered in the 1990s by Verity, but that product line was discontinued in favor of the more modern Verity K2 product line.
[back to top]
 
Verity Ultraseek
See Main Definition:  Autonomy Ultraseek
[back to top]
 
Verity, Inc.
Synonyms:  Verity Related Terms:  Autonomy Verity was a publicly traded software company (VRTY) headquartered in Sunnyvale, California. They were one of the leading enterprise search engine vendors, along with selling other software products. Products include Verity K2, Verity Ultraseek, KeyView document filtering and the Liquid Office product line. Older product lines included "Verity Topic", "Verity Topic Real-Time", Information Server, Agent Server, Verity CD Publisher Toolkit, VDK and Search 97. They were bought by Autonomy in 2005.
[back to top]
 
Vertical Application
In this context, a highly specialized search application, which may be more complex than a "generic" web search application. Examples would include a pharmaceutical research database, legal evidence management and discovery, a corporate or technical documentation library, or managing regulatory and compliance documents.
[back to top]
 
Vignette
Related Terms:  Content Management System A popular content management system.
[back to top]
 
virtualization
A simulated computer running inside of a real computer. Running one or more virtual computers inside of one or more physical computer. Although this seems like a strange idea at first, it does have advantages for more efficient usage of machines, and much better software testing. For example, multiple virtual computers can share one physical computer. Also, the virtual computer can be reset back to a known state, which is useful for testing. For example, a single physical computer, running a specific operating system version, can still be used to test software on many different operating systems and versions, each running inside its own virtual machine.
[back to top]
 
VMWare
Related Terms:  virtualization A company that offers many types of popular virtualization software for both Windows and Linux bases servers.
[back to top]
 
VQL
See Main Definition:  Verity Query Language
[back to top]
 
vspider
Related Terms:  Autonomy K2, batch-mode A command line batch mode spider for creating K2 collections.
[back to top]
 
  W   [back to top]
 
WAIS
Related Terms:  Z39.50 An early standard used by early search engines.
[back to top]
 
web crawler
See Main Definition:  spider
[back to top]
 
web page
Synonyms:  page Related Terms:  URL, document A unit of data residing on a web server, either on the public Internet or on a private network / Intranet. Web pages are usually accessed through a URL.
[back to top]
 
web page form
Synonyms:  web form Related Terms:  CGI, search engine, deep web A section of a web page that lets the visitor type in data or searches, or lets them make various selections. Search engines let users type in what they are looking for and then click the "search" or "go" button. A form is declared using the HTML form tag, and is submitted using the CGI protocol.
[back to top]
 
web page scraper
Synonyms:  html scraper, content scraper Related Terms:  scraper, spider A more modern form for scraper that can process web pages and HTML.
[back to top]
 
web portal
See Main Definition:  portal site Related Terms:  web search engine Usage: The "portal" aspect of this term emphasizes the information offered that is above and beyond simply searching for web pages. Most portals pride themselves on their taxonomy of web sites, and other services such as free email accounts.
[back to top]
 
web search engine
Related Terms:  portal site, Internet search syntax, Google, Yahoo, others A search engine that indexes most of the public Internet / World Wide Web. Most web search engine sites also allow for the browsing of web pages in an organized taxonomy.
[back to top]
 
web server
Related Terms:  web site, JSP, ASP ("Active Server Pages") A computer connected to a network that serves up web pages for one or more web sites. Larger web sites often use more than one web server behind the scenes, but this is usually transparent to a casual visitor to the web site. Web servers are often where traditional search engine software is installed.
[back to top]
 
web site
Related Terms:  web page, URL, web server A set of web pages belonging to a company, group, institution or individual. Pages on the same web site usually have similar URLs. Larger web sites usually offer a search engine to visitors that only searches web pages on that web site.
[back to top]
 
web spider
See Main Definition:  spider Usage: A spider that is specifically designed to traverse the World Wide Web, vs. a spider that is used within an Intranet.
[back to top]
 
Web UI
Related Terms:  UI, GUI A user interface implemented as a series of interactive web pages. Many modern search engines allow the administrator to control settings and spidering behavior via a Web UI. The advantage of a Web UI is that it allows most of the convenience of a modern GUI but does not require software; since it runs in a web browser such as Internet Explorer, the Web UI is available on any computer that can access the web server running the Web UI.
[back to top]
 
weighted operators
Related Terms:  query tuning, Boolean operators A search syntax that goes beyond Boolean logic to allows for relevancy on a decimal scale; this allows for more complex and subtle adjustments to searches.
[back to top]
 
wildcard
Related Terms:  fuzzy matching Query syntax that allows for many variations of a word to match. Usually the asterisk character (*) is used to represent a part of the word that can take on any value. For example, a wildcard search for "re*" would match "red", "read", "repeat", "react", etc.
[back to top]
 
word density
See Main Definition:  term density
[back to top]
 
Word filter
A filter that can read Microsoft Word documents and extract the textual content and then feed it into a search engine indexer.
[back to top]
 
word index
Synonyms:  word inversion Related Terms:  search indices The portion of the overall search indices that actually stores the specific locations of every single word in every single document. This is often the largest overall part of the search index. Some search engines may store different versions of these indices internally, to facilitate different operations such as upper vs. lower case, spelling variants of words, etc.
[back to top]
 
word inversion
See Main Definition:  word index Synonyms:  inverted index Usage: "word inversion" tends to be used by computer scientists and programmers because it hints at the underlying data structures used to store a word index; "word inversion" is rarely used outside of these groups.
[back to top]
 
World Wide Web
Synonyms:  "The Web", www Related Terms:  HTML, Internet, web site A subset of the Internet dedicated to web sites and web pages, and other web content. Much of the web is composed to text and graphics organized into HTML pages. Usage: Some people use the terms "World Wide Web" and "Internet" interchangeably; technically "the web" is a subset of the complete Internet.
[back to top]
 
WWW
See Main Definition:  World Wide Web Usage: "www" is also the prefix of many URLs, typically denoting a web page on a public web site.
[back to top]
 
  X   [back to top]
 
Xen
Related Terms:  virtualization An efficient type of virtualization offered on Linux based servers.
[back to top]
 
Xen kernel
Related Terms:  virtualization, xen, Linux The core part of the Linux operating system that has been specifically configured to run Xen virtualization.
[back to top]
 
XML (Adobe)
Related Terms:  PDF An unfortunately named file format related to Adobe PDF files; though this format looks similar to the more widely known modern XML syntax. It contains highlighting information specific to search terms, so that the PDF Viewer can highlight certain words when it display the document. Even though it shares the same name with the more common XML standard, and some of the syntax, is not compatible and is specific to Adobe's PDF viewer.
[back to top]
 
XML (common usage, "Extensible Markup Language")
See Main Definition:  Extensible Markup Language Usage: This entry is for the common usage of the abbreviation "XML"; the abbreviation also has a much less common and different meaning in relation to Adobe's PDF viewer.
[back to top]
 
XOR Operator
Related Terms:  Boolean operators XOR is the Boolean operator that is similar to the OR operator, except that it is an 'exclusive OR', that is either word must be present, but both words cannot be present: "california XOR nevada" will return documents with either individual term, but not return documents with both words. XOR is rarely used in search technology today.
[back to top]
 
XPump
Related Terms:  DPump, NIE, meta language, XML A computer language from New Idea Engineering that makes it easy to process and transform computer data. For example, XPump can take data from a web site and put it into a database.
[back to top]
 
  Z   [back to top]
 
Z39.50
Related Terms:  WAIS An obsolete standard used by early search engines. The syntax was rather complicated compared to what modern search engines expect.
[back to top]
 
zero term search
Related Terms:  Search 2.0 Allowing a user to search simply by clicking predefined choices, without requiring them to type in any search terms. This is another form of interactive search results lists where the initial search is provided as a set clickable links. The initial clickable links may be based on popular terms searched for by other users, or popular tags, or by offering faceted navigation.
[back to top]
 
zone
Synonyms:  blob Related Terms:  sub-document context, proximity search A named section of a document; the scope of searches can be restricted to only look in specific named zones. Zones are sometimes also called fields, though they are often larger than typical fields. In some search engines zones and fields are actually different, and have different functionality; in such products it may be possible for text to be simultaneously both a field and a zone. With relational databases this would somewhat analogs to a blob.
[back to top]