Testing Search

Essays in this search series:

How do you go about testing your site's search functionality? First, start by identifying what can be tested. Some characteristics of search engines and systems are general, common to most versions, and I've identified some of these below. If your search has any special modifications or search strategies unique to your site/company, I recommend you get a handle on these common points first, before exploring how your search extends or pushes any informational retrieval boundaries.

Please note that all of the information on this page is aimed specifically at search against product catalogues.

Accuracy

The accuracy of a search system is its ability to find all the matching items in the information collection, usually a product database; in other words, if you search on the word metonymy, the query should correctly find every instance of this word.

Test: Comparison of query results for back-end and front-end

Since the user interacts with an interface that mediates for a back-end query against the database, the simple way to measure accuracy is to compare the results of a query made with the web interface with the results of a query made directly against the database. If the queries are identical, the returned results should also be identical. Any difference in returned hits indicates a problem.

If there is a systemic reason why the web returns fewer results than a direct query, that reason should be attacked as a significant quality and usability problem. If the web interface is returning more results than the direct query against the database, that indicates a scope problem, with the query generated via the interface running against too many table columns.

Test: Consistency of results over time

Build a list of search terms and queries, and run them at periodic intervals; if the product catalogue is updated regularly, performing some consistent searches will allow you to track trends in the data. For example, if your data collection is increasing, you should reasonably expect your results to increase.

Having a set of "benchmark" queries is also useful when evaluating changes and enhancements to your site's search functionality.

Search Performance

Performance in the context of search refers to the speed of returning results, the time between the user clicking the submit button and the page of results being fully displayed on the client's screen. Most search systems typically query against a database or index for the matching data, which means you actually have two lengths of time to measure:

the time taken by the back-end query
the time taken for the results page to display on the client's browser

Test: Back-end Time

In practice, timing the query can be a chore if the measurement must be done manually from a command line. If possible, have your programmers code the search program to output the query timings to the interface -- just for testing, not for use by your customers.

Test: Front-end Time

Testing general performance for a particular search should be a simple matter of recording the length of time from the submission of the form to the complete display and rendering of the search results page on the tester's client

Precision

The precision of a system is its ability to select only relevant products and reject the irrelevant ones. Relevancy is more difficult to pin down, because of the following issues:

there is a disjunction between what a user wants and what a user enters as their search; if a user enters low value search parameters, they are unlikely to receive results that are valuable to them;
even if a user enters in appropriate search parameters, the results may be valid hits but may not be relevant to what the user wants;
the information displayed on the results page may not indicate why an item was returned, so there is no obvious relevancy;
the search engine may apply logic to the search parameters in such a way that the results aren't obviously relevant to the literal search parameters.

Test: Apparent or Visible Relevancy

This test measures how obvious the relevance of the search results is by verifying whether the search terms are visible in the surfaced product information. So for example, if you search for a video using the term "spanish", any products that include the word "spanish" in the product information displayed on the results page would be visibly relevant, while a product that had no mention of the word "spanish", even if it was about spanish cooking", would not be visibly relevant.

This distinction is important because users should have an indication of why a particular result was returned; users shouldn't have to ponder the search engine's reasoning.

The measurement is in the form of the ratio of obvious inclusion to non-inclusion, which will trap those terms that were matched against non-displayed fields. The ratio should obviously be higher for searches that don't query against non-displayed fields; the operative usability principle here is that clear relevance provides better information to the user.

To generate the VR (visible relevancy),

create a set of search terms;
perform the searches;
count the total results, which will be assigned the value T;
count the instances of the search terms in the surfaced results citations, with the citation count being equal to C;
generate the ratio for each query: VR = (100 * C) / T

Test: Literal Accuracy

This test measures the introduction of specific errors and the instances of the corrected query being returned. This is especially useful for evaluating fuzzy logic.

To generate the literal accuracy,

create a set of search terms that have been purposely misspelled; perform the searches;
count the total results, which will be assigned the value T;
count the instances of the search terms in the surfaced results citations, with the citation count being equal to C;
generate the ratio for each query: LA = (100 * C) / T

Test: Objective Accuracy

This test measures the efficiency of the search functionality against the results of a separate database query.

To generate the objective accuracy,

create a set of search terms;
have the searches performed against the appropriate tables (appropriate to the tables queried by the particular search forms);
the total results form the database query will be assigned the value DB
perform the searches via the search interface;
count the total results, which will be assigned the value T;
generate the ratio for each query: OA = (100 * DB )/ T