Systematic Relevance: Evaluation

Perfect relevance is the holy grail of Search. If possible we would like to give every user the document or piece of information they are looking for. Unfortunately, our chances of doing so are slim. Not even Google, the great librarian of our age, manages to do so. Google is good but not perfect.

Nevertheless, as IT professionals, search experts and information architects we try. We construct complicated document processing pipelines in order to tidy up our data and to extract new metadata. We experiment endlessly with stop words, synonym expansion, best bets and different ways to weigh sources and fields. Are we getting any closer? Well, probably. But how can we know?

There are a myriad of knobs and dials for tuning in an enterprise search engine. This fact alone should convince us that we need a systematic approach to dealing with relevance; with so many parameters to work with the risk of breaking relevance seems at least as great as the chance of improving on it. Another reason is that relevance doesn’t age gracefully, and even if we do manage to find a configuration that we feel is decent it will probably need to be reworked in a few months time. At Lucene Eurocon Grant Ingersoll also said that:

“I urge you to be empirical when working with relevance”

I favor the trial and error approach to most things in life, relevance tuning included. Borrowing concepts from information retrieval, one usually starts off by creating a gold standard. A gold standard is a depiction of the world as it should be: a list of queries, preferably popular or otherwise important, and the documents that should be present in the result list for each of those queries. If the search engine were capable of perfect relevance then the results would be 100% accuracy when compared to the gold standard.

The process of creating such a gold standard is an art in itself. I suggest choosing 50 or so queries. You may already have an idea of which ones are interesting to your system; otherwise search analytics can provide this information for you. Furthermore, you need to decide which documents should be shown for each of the queries. Since users are usually only content if their document is among the top 3 or 5 hits in the result list, you should have up to this amount of documents for each query in your gold standard. You can select these documents yourself if you like. However, arguably the best way is to sit down with a focus group selected from among your target audience and have them decide which documents to include. Ideally you want a gold standard that is representative for the queries that your users are issuing. Any improvements achieved through tuning should boost the overall relevance of the search engine and not just for the queries we picked out.

The next step is to determine a baseline. The baseline is our starting point, that is, how well the search engine compares out of the box to the gold standard. In most cases this will be significantly below 100%. As we proceed to tune the search engine its accuracy, as compared to the gold standard, should move from the baseline toward 100%. Should we end up with accuracy below that of the baseline then our work has probably had little effect. Either relevance was as good as it gets using the default settings of the search engine, or, more likely, we haven’t been turning the right knobs.

Using a systematic approach like the one above greatly simplifies the process of working with relevance. It allows us to determine which tweaks are helpful and keeps us on track toward our ultimate goal: perfect relevance. A goal that, although unattainable, is well worth striving toward.

The ROI of Enterprise Search—Where’s the Beef?

When faced with the large up-front investment of an Enterprise Search installation, executives are asking for proof that the investment will pay up. Whereas it is easy to quantify the value of search on an e-commerce site or as part of the company helpdesk—increased sales, shorter response times—how do you go about verifying that your million-dollar Enterprise Search application has the desired effects on your revenue stream?

Search engines on the Web have changed the landscape of information access. Today, employees are asking for similar search capabilities within the firewall as they are used to having on the Web. Search has become the preferred way of finding information quickly and accurately.

Top executives at large corporations have heard the plea and nowadays see the benefits of efficient Findability. However, it costs to turn the company information overload from a storage problem of the IT department to a valuable asset and business enabler for everybody. So how do you prove the investment worthwhile?

The Effects of Enterprise Search

Before you can prove anything, you need to establish the effects you would like your Enterprise Search solution to have on your organization. Normally, you would want an Enterprise Search solution to:

  •  Enable people to work faster
  •  Enable people to produce better quality
  •  Provide the means for information reuse
  •  Inspire your employees to innovate and invent

These are all effects that a well-designed and maintained Enterprise Search application will help you address. However, the challenge when calculating the return on investment is that you are attempting to have an effect on workflows that are not clearly visible on your revenue stream. There is no easy way to interlink saved or earned dollars to employees being more innovative.

So how do you prove that you are not wasting money?

There are two straightforward ways to address the problem: Studying how users really interact with the Enterprise Search application and asking them how they value it.

User Behavior through Search Logs

By extracting statistics from the logs of your Enterprise Search application, you can monitor how users interact with the tool. There are several statistic measures that can be interesting to look at in order to establish a positive influence on one or more of the targeted effects.

A key performance indicator for calculating if the Enterprise Search application enables people to work faster is to monitor the average ranking of a clicked hit in the result list. If people tend to scroll down the result set before clicking a hit and opening up a document, this implies the application does not provide proper ranking of the results. In other words, users are forced to review the result set, which obviously slows them down.

By monitoring the amount of users that are using the system, by following the number of different documents they open up through search and by observing the complexity of the queries they perform, you can estimate the level of information your users are expecting to find through searching.

If the application is trusted to render relevant, up-to-date results, more users will use it, they will carry out more complex queries and they will open up a wider range of different documents. If your users do not trust the system, however, they will not use it or they will only search for a limited set of simple things such as “news”, “today’s menu” or “accounting office”. If this is the case, you can hardly say your Enterprise Search application has met the requirements posed on it.

Conversely, if the users access a wide set of documents through search and you have a large number of unique users and queries, then this implies your Enterprise Search application is a valued information access tool that promotes information reuse and innovation based on existing corporate knowledge.

User Expectations through Surveys

Another way to collect information for assessing the return of investment of your Enterprise Search initiative is to ask the users what they think. If you ask a representative subset of your intended users how well the Enterprise Search application fits their specific purposes, you will have an estimate of the quality of the application.

There are a lot of other questions you can ask: Does the application help the user to find relevant corporate information? Are the results ranked properly? Does the application help the user to get an overall picture of a topic? Does it enable the user to get new ideas or find new opportunities? Does it help him avoid duplicating work already done elsewhere within the organization?

A Combination of Increased Usage and Perceived Value

As we have seen, the return on investment of an Enterprise Search initiative is often hard to quantify, but the impact such an application has on a set of targeted effects can be measured using search logs and user surveys. The data collected this way provides an estimate of the value of Findability within the firewall of an organization.

Nowadays, hardly anybody questions the marketing value of a good corporate web site or the impact email has on the way we communicate. Such channels and services are self-evident business enablers today. In this respect, the benefits of precise and quick information access within the corporation should be self-evident. The trick is to get the tool just right.