Solving Diversity in Information Retrieval

How to solve diversity in information retrieval and techniques for handling ambiguous queries was a topic of interest at the SIGIR 2013 conference in Dublin, Ireland, which I attended recently.

The issue of Diversity in Information Retrieval was covered at a number of presentations at the conference. It is search engine independent, since it uses only the set of result documents as input. When applied to the world of search it basically means an aim to produce a search result that covers as many of the relevant topics as possible.

This is done by retrieving, say 100-500 documents, instead of the normal 10.
These documents are then clustered based on their contents to create a number
of topic clusters. The search result is then constructed by selecting
(the normal 10) documents from the clusters in a round-robin fashion. This will
hopefully create a diverse search result, with as broad coverage as possible.

The technique can not only be used to solve the problem of ambiguous queries,
but also queries with several sub-topics associated with it. By iteratively
running a clustering algorithm on the result documents with 2 to 5 (or so)
clusters and measuring the separation between them and choosing the outcome
with the greatest separation, a diverse result set of documents can be created.
The clusters can also be used to ask follow up questions to the user, where
he/she is allowed to click on one of several tag clouds, containing the most
central terms of each cluster.

A cluster set of size 2 with a good separation would indicate that the query
may be ambiguous, with two different semantics meanings, while a size of 3-5
likely means that the there are a number of sub topics identified in the
results. In a way these clusters can be seen as a dynamic facet, but it is
still shallow since it only operates on the returned documents. Yet, it does
not require any additional knowledge about the documents other than the
information that is returned. This could also be extended by using topic
labelling to present the user with a single term or phrase, instead of a tag
cloud.

Regarding the conference itself I found it to be a nice and professional arrangement with lots of in depth topics and nice evening activities, including a historical tour of Dublin.

Search Conferences 2011

During 2011 a large number of search conferences will take place all over the world. Some of them are dedicated to search, whereas others discuss the topic related to specific products, information management, usability etc.

Here are a few that might be of interest for those of you looking to be inspired and broaden your knowledge. Within a few weeks we will compile all the research related conferences – there are quite a few of them out there!
If there is anything you miss, please post a comment.

March
IntraTeam Event Copenhagen 2011
Main focus: Social intranets, SharePoint and Enterprise Search
March 1, 2 and 3, 2011, Copenhagen, Denmark

Webcoast
Main focus: A web event that is an unconference, meaning that the attendees themselves create the program by presenting on topics of their own expertise and interest.
March 18-20 , Gothenburg, Sweden

Info360
Main focus: Business productivity, Enterprise Content Management, SharePoint 2010
March 21-24, Walter E. Washington Convention Center, Washington, USA

April
International Search Summit Munich
Main focus: International search and social media.
4th April 2011, Hilton Munich Park Hotel, Germany

ECIR 2011: European Conference on Information Retrieval
Main focus: Presentation of new research results in the field of Information Retrieval
April18-21, Dublin, Ireland

May
Enterprise Search Summit Spring 2011
Main focus: Develop, implement and enhance cutting-edge internal search capabilities
May 10-11, New York, USA

International Search Summit: London
Main focus: International search and social media
May 18th, Millennium Gloucester Hotel, London, England

Lucene Revolution
Main focus: The world’s largest conference dedicated to open source search.
May 25-26, San Francisco Airport Hyatt Regency, USA

SharePoint Fest – Denver 2011
Main focus: In search track: Enterprise Search, Search & Records Management, & FAST for SharePoint
May 19-20, Colorado Convention Center, USA

June
International Search Summit Seattle
Main focus: International search and social media
June 9th, Bell Harbor Conference Center, Seattle, USA

2011 Semantic Technology Conference
Main focus: Semantic technologies – including Search, Content Management, Business Intelligence
June 5-9, Hilton Union Square, San Francisco, USA

October
SharePoint Conference 2011
Main focus: SharePoint and related technologies
October 3-6, Anaheim, California, USA

November
Enterprise Search Summit Fall Nov 1-3
Main focus: How to implement, manage, and enhance search in your organization
Integrated with the KMWorld Conference, SharePoint Symposium and Taxonomy Bootcamp,

KM-world
(Co-locating with Enterprise Search Summit Fall, Taxonomy Boot Camp and Sharepoint Symposium)
Main focus: Knowledge creation, publishing, sharing, finding, mining, reuse etc
November 1 – 3, Washington Marriott Wardman Park, Washington DC, USA

Gilbane group Boston
Main focus: Within search: semantic, mobile, SharePoint, social search
November 29 – December 1, Boston, USA