Overview of the 9th European Summer School in Information Retrieval

This was an excellent week of knowledge sharing and exchanging of ideas, seen mostly (but not exclusively) from an Information Retrieval (IR) research perspective, given the high percentage of PhD students in the 100+ audience. Here are just a few brief summaries of my favourite talks from a great crop that were made throughout the week.

Highlights for me started with the IR ‘guru’ Bruce Croft (University of Massachusetts; twitter: @wbc11) who stormed through over 150 slides over the morning session in which a speedy and concise history of IR was given, its foundations and formal models and the current IR research focus. Three main issues were seen to dominate: relevance, evaluation and users along with their information needs.

He emphasised that the IR research focus was still very much on document and query representations and the various/mixed retrieval models attempting to marry them in order to produce the most relevant results (with relevance here incorporating topical and people relevance, task context and novelty). A key shift within retrieval models however was noted from the processing of text into units of language, towards the use of distribution of word counts with more statistical and predictive properties in retrieval models and algorithms. The current research ‘gap’ was seen as dealing with the long(er) query at the specific passage-level answer.

Michail Salampasis (from the Institute of Software Technology and Interactive Systems, Vienna University of Technology) gave a challenging talk on Integrating IR Technologies for Professional Search in which he highlighted the increased difficulties when dealing with a specialized domain and a smaller user base. Enterprise Search (ES) was very much seen as comparable to IR Search in terms of relevance, evaluation, user informational needs and user interactions. It was rightly noted that within ES however, other factors also play a more important part such as system performance, incorporating new data, scalability, information freshness and the presence of multiple information sources along with the need for tuning for different applications.

IR and Social Media by Arjen de Vries (leader of the Interactive Information Retrieval research group Centrum Wiskunde & Informatica, Utrecht, Netherlands; twitter: @arjenpdevries). This was an inspirational talk telling researchers of the vast and varied social media data out there ready to be culled as a direct result of users’ interactions. He talked of having the key ideal information triangle of linked data between people (their connections and profiles), items and tags/ratings (endorsement and sharing). The take home messages were that social media can at times give IR research a rich resource of context that is an alternative to click data, although that finding one theory to address the various recommendation and retrieval tasks is going to be problematical. An example was shown where a band’s popularity was shown to be static after having won a Grammy on both Spotify and EchoNest, while analysis using bitly clearly showed a sharp increase in interest.

Tony Russell-Rose (UX Labs, London; twitter: @tonygrr) gave an entertaining talk, very much from the user perspective, on Designing the User Experience. He noted how the earliest classic IR models either lacked a user perspective completely or were too linear. He proposed four Dimensions of Search User Experience involving the user themselves (their level of expertise), their goal (its scope and complexity), the context (again its complexity) and the type of search mode to be employed (depending on whether the user was looking up something, learning or investigating). His talk went on to urge for the adoption of a principled approach to design using these dimensions, with particular reference to the differing contexts requiring differing designs. Finally there was a call to apply proven design patterns and principles but to look at search holistically – towards both the analysis and the sense making of information.

Norbert Fuhr (from the Faculty of Engineering Sciences at the University of Duisburg-Essen, Germany) gave a measured talk on Interactive IR. Firstly quantitative modelling and the Probability Ranking Principle (PRP) were looked at. The PRP ranks documents according to decreasing values of the probability of relevance (based on user choices represented as binary choices) that in turn yield optimum retrieval quality. Some obvious shortcomings with this ranking were shown: the user-assessment focus; the relevance judgements of documents are independent; the users’ search paths are often non-linear and the fact that their information need may alter during a search session.

The second more impactful part of the talk dealt with various cognitive models of information seeking and searching and showed how better understanding of the user has influenced interface designs to go beyond the more traditional query-result list paradigm.

Paul Clough (University of Sheffield) gave an authoritative talk on Multilingual Retrieval, with much information coming from his new book out on the subject. His talk mentioned the many reasons that multilingual search is becoming more important in its various forms as well as highlighting the more common stumbling blocks. He mentioned the many models employed today, including the more sophisticated language models. He stated that lab test results can often reach up to 99% of monolingual IR but it is often the cost and the uncertainty of results due to nuances in language that has prevented the presence of more multilingual IR systems. Again the users themselves and their needs have been taken into account more with regards system functionality.

There were of course other talks of note but some of the above topics will be covered in more depth in upcoming blog posts.

/Peter Voisey


