Systematic Relevance: Evaluation

Perfect relevance is the holy grail of Search. If possible we would like to give every user the document or piece of information they are looking for. Unfortunately, our chances of doing so are slim. Not even Google, the great librarian of our age, manages to do so. Google is good but not perfect.

Nevertheless, as IT professionals, search experts and information architects we try. We construct complicated document processing pipelines in order to tidy up our data and to extract new metadata. We experiment endlessly with stop words, synonym expansion, best bets and different ways to weigh sources and fields. Are we getting any closer? Well, probably. But how can we know?

There are a myriad of knobs and dials for tuning in an enterprise search engine. This fact alone should convince us that we need a systematic approach to dealing with relevance; with so many parameters to work with the risk of breaking relevance seems at least as great as the chance of improving on it. Another reason is that relevance doesn’t age gracefully, and even if we do manage to find a configuration that we feel is decent it will probably need to be reworked in a few months time. At Lucene Eurocon Grant Ingersoll also said that:

“I urge you to be empirical when working with relevance”

I favor the trial and error approach to most things in life, relevance tuning included. Borrowing concepts from information retrieval, one usually starts off by creating a gold standard. A gold standard is a depiction of the world as it should be: a list of queries, preferably popular or otherwise important, and the documents that should be present in the result list for each of those queries. If the search engine were capable of perfect relevance then the results would be 100% accuracy when compared to the gold standard.

The process of creating such a gold standard is an art in itself. I suggest choosing 50 or so queries. You may already have an idea of which ones are interesting to your system; otherwise search analytics can provide this information for you. Furthermore, you need to decide which documents should be shown for each of the queries. Since users are usually only content if their document is among the top 3 or 5 hits in the result list, you should have up to this amount of documents for each query in your gold standard. You can select these documents yourself if you like. However, arguably the best way is to sit down with a focus group selected from among your target audience and have them decide which documents to include. Ideally you want a gold standard that is representative for the queries that your users are issuing. Any improvements achieved through tuning should boost the overall relevance of the search engine and not just for the queries we picked out.

The next step is to determine a baseline. The baseline is our starting point, that is, how well the search engine compares out of the box to the gold standard. In most cases this will be significantly below 100%. As we proceed to tune the search engine its accuracy, as compared to the gold standard, should move from the baseline toward 100%. Should we end up with accuracy below that of the baseline then our work has probably had little effect. Either relevance was as good as it gets using the default settings of the search engine, or, more likely, we haven’t been turning the right knobs.

Using a systematic approach like the one above greatly simplifies the process of working with relevance. It allows us to determine which tweaks are helpful and keeps us on track toward our ultimate goal: perfect relevance. A goal that, although unattainable, is well worth striving toward.

Importance of Interaction Design

Lately I’ve been working in a couple of projects involving big companies which has given me a lot of new experience and knowledge. One of the things I’ve realized is how important it is to have a good interaction design and how that is not always the case.

The common thing in these projects have been that the customer has already started a new IT project. As time comes to implement the search functionality, they contact us. Thus, involvement from our side is after the interaction design has been made.

Since the customers are big companies, the interaction design has been made by external consultants who usually have a long going relationship with our customer, but don’t have a great knowledge about search. When the implementation starts, we’ve discovered that the interaction design is not perfect in terms of giving the end users a great search experience. This is due to lack of knowledge about search technology and what can be made with it. Using my knowledge in the search area I can propose changes in functionality that will give a better user experience. These changes of course requires new interaction design, but since the interaction design consultants has finished their assignment, the interaction design decisions needs to be worked out by our company.

In the worst case scenario this means that the complete interaction design needs to be redone from scratch. This will not be popular for the customer which needs to pay for the same thing twice. However since we at Findwise are search experts with lots of experience from past project and dedicated people working with interaction design we know how to create a good interaction design for search.

In the end this means that the customer is happy with the end result, but hiring us to also do the interaction design would have resulted in less cost for the customer!