Solving Diversity in Information Retrieval

How to solve diversity in information retrieval and techniques for handling ambiguous queries was a topic of interest at the SIGIR 2013 conference in Dublin, Ireland, which I attended recently.

The issue of Diversity in Information Retrieval was covered at a number of presentations at the conference. It is search engine independent, since it uses only the set of result documents as input. When applied to the world of search it basically means an aim to produce a search result that covers as many of the relevant topics as possible.

This is done by retrieving, say 100-500 documents, instead of the normal 10.
These documents are then clustered based on their contents to create a number
of topic clusters. The search result is then constructed by selecting
(the normal 10) documents from the clusters in a round-robin fashion. This will
hopefully create a diverse search result, with as broad coverage as possible.

The technique can not only be used to solve the problem of ambiguous queries,
but also queries with several sub-topics associated with it. By iteratively
running a clustering algorithm on the result documents with 2 to 5 (or so)
clusters and measuring the separation between them and choosing the outcome
with the greatest separation, a diverse result set of documents can be created.
The clusters can also be used to ask follow up questions to the user, where
he/she is allowed to click on one of several tag clouds, containing the most
central terms of each cluster.

A cluster set of size 2 with a good separation would indicate that the query
may be ambiguous, with two different semantics meanings, while a size of 3-5
likely means that the there are a number of sub topics identified in the
results. In a way these clusters can be seen as a dynamic facet, but it is
still shallow since it only operates on the returned documents. Yet, it does
not require any additional knowledge about the documents other than the
information that is returned. This could also be extended by using topic
labelling to present the user with a single term or phrase, instead of a tag
cloud.

Regarding the conference itself I found it to be a nice and professional arrangement with lots of in depth topics and nice evening activities, including a historical tour of Dublin.

Leave a Reply

Your email address will not be published. Required fields are marked *