Improving User Intelligence with the ELK Stack at SCA

SCA is a leading global hygiene and forest products company, employing around 44,000 people worldwide. The Group (all companies within SCA) develops and produces sustainable personal care, tissue and forest products. Sales are conducted in about 100 countries under many strong brands. Each brand each has its own website and its own search.

At SCA we use Elasticsearch, Logstash, and Kibana to record searches, clicks on result documents and user feedback, on both the intranet and external sites. We also collect qualitative metrics by asking our public users a question after showing search results: “Did you find what you were looking for?” The user has the option to give a thumbs up or down and also write a comment.

What is logged?

All search parameters and results information is recorded for each search event: the query string, paging, sorting, facets, the number of hits, search response time, the date and time of the search, etc. Clicking a result document also records a multitude of information: the position of the document in the result list, the time it took from search to click and various document metadata (such as URL, source, format, last modified, author, and more). A click event also gets connected with the search event that generated it. This is also the case for feedback events.

Each event is written to a log file that is being monitored by Logstash, which then creates a document from each event and pushes them to Elasticsearch where the data is visualized in Kibana.

Why?

Due to the extent of information that is indexed, we can answer questions from the very simple, such as “What are the ten most frequent queries during the past week?” and “Users who click on document X, what do they search for?” to the more complex like “What is the distribution of clicked documents’ last modified dates, coming from source S, on Wednesdays? The possibilities are almost endless!

The answers to these questions allow us to tune the search to meet the needs of the users to an even greater extent and deliver even greater value. Today, we use this analysis for everything from adjusting the relevance model, to adding new facets or removing old ones, or changing the layout of the search and result pages.

Experienced value – more than “just” logs

Recording search and click events are common practice, but at SCA we have extended this to include user feedback, as mentioned above. This increases the value of the statistics even more. It allows an administrator to follow up on negative feedback in detail, e.g. by recreating the scenario. It also enables implicitly evaluated trial periods for change requests. If a statistically significant increase in the share of positive feedbacks is observed, then that change made it easier for users to find what they were looking for. We can also find the answer to new questions, such as “What’s the feedback from the users who experience zero hits?” and “Are users more likely to find what they are looking for if they use facets?”

And server monitoring as well!

This setup is not only used to record information about user behavior, we also monitor the health of our servers. Every few seconds we index information about each server’s CPU, memory and disk usage. The most obvious gain is the historic aspect. Not only can we see the resource usage at a specific point in time, we can also see trends that would not be noticeable if we only had access to data from right now. This can of course be correlated with the user statistics, e.g. if a rise in CPU usage can be correlated to an increase in query volume.

Benefits of the ELK Stack

What this means for SCA is that they get a search that is ever improving. We, the developers and administrators of the search system, are no longer in the dark regarding what changes actually change things for the better. The direct feedback loop between the users and administrators of the system creates a sense of community, especially when users see that their grievances are being tended to. Users find what they are looking for to a greater and greater extent, saving them time and frustration.

Conclusion

We rely on Elasticsearch, Logstash and Kibana as the core of our search capability, and for the insight to continually improve. We’re excited to see what the 2.0 versions bring. The challenge is to know what information you are after and create a model that will meet those needs. Getting the ELK platform up and running at SCA was the part of the project that took the least amount of our time, once the logs started streaming out of our systems.

How relevance models work

A relevance model is what a search engine uses to rank documents in a search result, i.e. how it finds the document you are looking for. An axiomatic analysis of relevance models is asking the questions: how and why does a relevance model work? Findwise attended the ICTIR 2013 conference in Copenhagen where one of the recurring topics was the axiomatic analysis of relevance models.

The relevance model is represented through a mathematical function of a set of input variables, and therefore just by looking at its formula it is likely to be very difficult to answer those two questions. What the axiomatic analysis aims to do is to break down the formulas and to isolate and analyze each of its individual components, with the goal of making improvements in the performance.

The idea is to formulate a set of axioms, meaning laws that a relevance model should abide by. One of the more obvious axioms, from a purely statistical point of view, relates to term frequency (TF): a document d1, where the terms of the query occur more times than in some other document d2, is to be assigned a higher relevance than d2. These are called axioms because they should be relevance truths – statements that are obvious and that everyone can agree on. Other examples of axioms could be that very long documents should be penalized simply because they have a higher probability to contain any word, and that terms frequent in many documents should contribute less to the relevance than terms that are more unique.

From an Enterprise Search perspective, these axioms do not have to be general relevance truths, but more adapted to your organization and your users. Here we see a shift in the type of axioms from pure statistics-based towards more metadata-based, e.g. which fields are more relevant than others and which sources are more relevant. A very simple example of this is that a hit in the title is more relevant than a hit in the body. These are usually conveniently configurable in most search engines, e.g. Apache Solr.

This concept is useful and interesting for many reasons since it not only allows you to modify and improve existing relevance models but you can also create new ones from scratch. This process can also be automated using Machine Learning algorithms, which leaves us with the task of finding the optimal set of axioms. Can you think of axioms that can be applied to your organization, your users informational needs and the content that is made searchable?

Solving Diversity in Information Retrieval

How to solve diversity in information retrieval and techniques for handling ambiguous queries was a topic of interest at the SIGIR 2013 conference in Dublin, Ireland, which I attended recently.

The issue of Diversity in Information Retrieval was covered at a number of presentations at the conference. It is search engine independent, since it uses only the set of result documents as input. When applied to the world of search it basically means an aim to produce a search result that covers as many of the relevant topics as possible.

This is done by retrieving, say 100-500 documents, instead of the normal 10.
These documents are then clustered based on their contents to create a number
of topic clusters. The search result is then constructed by selecting
(the normal 10) documents from the clusters in a round-robin fashion. This will
hopefully create a diverse search result, with as broad coverage as possible.

The technique can not only be used to solve the problem of ambiguous queries,
but also queries with several sub-topics associated with it. By iteratively
running a clustering algorithm on the result documents with 2 to 5 (or so)
clusters and measuring the separation between them and choosing the outcome
with the greatest separation, a diverse result set of documents can be created.
The clusters can also be used to ask follow up questions to the user, where
he/she is allowed to click on one of several tag clouds, containing the most
central terms of each cluster.

A cluster set of size 2 with a good separation would indicate that the query
may be ambiguous, with two different semantics meanings, while a size of 3-5
likely means that the there are a number of sub topics identified in the
results. In a way these clusters can be seen as a dynamic facet, but it is
still shallow since it only operates on the returned documents. Yet, it does
not require any additional knowledge about the documents other than the
information that is returned. This could also be extended by using topic
labelling to present the user with a single term or phrase, instead of a tag
cloud.

Regarding the conference itself I found it to be a nice and professional arrangement with lots of in depth topics and nice evening activities, including a historical tour of Dublin.