Types of Information CollectionsEssays in this search series:
The behavior of a search system depends on the kind of information the collection contains. Looking for a web site about Star Wars is not the same as looking for a book on Star Wars; it's not just that you are looking in different places for this information, the difference derives from the difference of the information types. Most people on the web are probably familiar with looking up web pages: you go to a search engine page like Google or AltaVista and you type in some words that you expect to appear on the page, click the submit button, and the engine goes out and does its magic and returns you several billion lines of results. Document CollectionsWhen you search against web pages, you are performing a search based on text. The search engine may take the string you typed in and use text retrieval logic to look for that exact sequence of characters in the web pages it has indexed, or it may perform logic on your string to derive words or word stems and then look for those values. Web searches usually let you specify the relationship between the words, such as their proximity to one another, as well as their required location on a page, such as whether they must be in a page title. Some web page search engines even look for the semantic meaning of your search parameters. Library science deals extensively with strategies and methodologies for information retrieval within document collections, and a collection of web pages is just another type of document collection. Many users employ search as a mode of navigation, rather than purely as a means of information retrieval. According to Jakob Nielsen,
This use of search for navigating a site's information space is a source for many criticisms about a site's usability, as shown by Jared Spool's findings:
Product CataloguesOn the other hand, product catalogues are not document collections, and searching catalogues requires different understandings. Looking up book titles about Star Wars at an online book store is different from looking for Star Wars web sites because the collection of information about books, the product catalogue, is very different from a collection of web pages. Some of these differences include:
Users are less likely to use a search against a product catalogue as a means to navigate the site; for example, my research of Borders.com's logs shows that users don't typically use a book keyword search form as a way to locate the site's help files or information on shipping. Users do use searches to locate categories of information, such as topic or subject sections. The Convergence of Product and Document CollectionsDigital products are blurring some of the distinctions between document collections and product catalogues. The observation that product characteristics -- and not the products themselves -- are searched against is less valid when the product itself is textual. For example, if a company sells text reports, and provides a mechanism to perform textual searches against the report itself, then the rules for textual information retrieval would seem to apply. Product-Related ContentMany commerce sites have information about their products that goes beyond the basic product catalogue. For example, commerce sites may have reviews about products and articles on using products. A music seller may have articles about artists, articles about genres of music, comparisons of musical works, etc. This information covers a middle ground between products and text documents, even though the storage of the information may be document based or database based. Even if the site decides to link this product-related content closely with products, in effect making content such as reviews a characteristic or attribute of the product, it makes sense to make this content text-searchable. Providing a mechanism for searching against content about products will aid users in tasks ranging beyond the simple "I'm looking for this product". The chances of successfully completing a constellation of tasks increase, as do the possible routes to specific product information. |