Event related data – the buzz word at ECIR 2013

One of the major trends at the 35th annual European Conference on Information Retrieval was event related data. The conference took place between the 24th and 27th of March this year in a snowy Moscow, Russia. It attracted around 300 participants from all over the globe, 3 of them findwizards. While ECIR 2013 provided talks on a large variety of topics from across the field, event related data was definitely a buzz word.

The keynote speaker opening the second day of conference was Rutgers University assistant professor and Mahaya inc. CTO Mor Naaman. In his talk, Mr Naaman let the following image explain why Mahaya inc. are in business.

 rome-then-and-now

The past two papal elections.

The image above clearly shows that the way people act at events has changed considerably in the past few years, nowadays everyone is a reporter and their stories can be found on social media. Using platforms such as Twitter, Facebook and YouTube as data sources Naaman’s company creates products which not only extracts, but also synchronizes event coverage. One interesting feature in their latest product is the synchronization of video clips, making it possible for a user to easily switch view when watching video footage of for example a concert.  An arguably even stronger feature of this use of social media is the fact that news and event footage can reach the world even if no press is present at the scene. Slides from this inspiring talk can be found here.

Another presentation the same day displayed promising results in the task of automatic event detection. Using machine learning algorithms a team of researchers from Hanover, Germany have designed a system for detecting and summarizing entity related events from Wikipedia edit history data. Basically the idea is that when a Wikipedia article is edited by a large amount of users in a short period of time that can mark an important event considering the subject of the article. More information about this research can be found here.

The last day of the conference opened with a presentation from Jimmy Lin of Twitter. His talk centered on the importance of fast real-time indexing in social media platform architecture. One of the strengths of Twitter is presenting the users with information about events as they happen. As an example of this he used the event of an earthquake hitting eastern USA in 2011. Tweets from locations closer to the epicenter of the earthquake reached Twitter users in New York City before the actual quake did. I have to admit “Twitter, faster than earthquakes” is a pretty good slogan.

So whether it’s using social media data to let people (re)visit events, automatic event detection in open source dictionaries, making sure your indexing is fast enough to let your users cover events as they happen or something else, event based data seems to be one of the driving forces in the field of IR at the moment.

A look at European Conference on Information Retrieval (ECIR) 2012

European Conference on Information Retrieval

The 34th European Conference on Information Retrieval was held  1-5 April 2011, in the lovely but crowded city of Barcelona, Spain. The core conference attracted over 100 attendees, with a total of 35 accepted full papers, 28 posters, and 7 demos being presented. As opposed to the previous year, which had 2 parallel sessions, this year’s conference included a single running session. The accepted papers covered a diverse range of topics, and were divided into query representation, blog and online-community search, semi-structured retrieval, applications, evaluation, retrieval models, classification, categorisation and clustering, image and video retrieval, and systems efficiency.

The best paper award went to Guido Zuccon, Leif Azzopardi, Dell Zhang and Jun Wang for their work entitled “Top-k Retrieval using Facility Location Analysis” and presented by Leif Azzopardi during the retrieval models session. The authors propose using facility location analysis taken from the discipline of operations research to address the top-k retrieval problem of finding “the optimal set of k documents from a number of relevant documents given the user’s query”.

Meanwhile, “Predicting IMDB Movie Ratings using Social Media” by Andrei Oghina, Mathias Breuss, Manos Tsagkias and Maarten de Rijke won the best poster award. With a different goal from the best paper, the authors of the poster experiment with a prediction model for rating movies using a set of qualitative and quantitative features extracted from the stream of two social media channels, YouTube and Twitter. Their findings show that the highest predictive performance is obtained by combining features from both channels, and propose as future work to include other social media channels.

Workshop Days

The conference was preceded by a full day of workshops and tutorials running in parallel. I attended two workshops: Information Retrieval Over Query Sessions (SIR) during the morning and Task-Based and Aggregated Search (TBAS) in the afternoon. The second workshop ended with an interactive discussion. A third, full-day workshop was Searching 4 Fun!.

Industry Day

The last day was the Industry Day. Only 2 papers here, plus 5 oral contributions, and around 50 attendees. A strong focus of the talks given at the industry day was on opinion-mining: four of the six participating companies/institutions presented work on sentiment analysis and opinion mining from social media streams. Jussi Karlgren, from Gavagai, argued that sentiment analysis from social media can be used by companies for example in finding reviews or comments made about their product or service, analyse their market position, and predict price movements. Rianne Kaptein, from Oxyme, backed this up by adding that businesses are interested by what the consumers say about their brand, products or campaigns on social media streams. Furthermore, Hugo Zaragoza from Websays identified two basic needs inside a company: a need for help in reading so that someone can act, and a need for help in explaining so that it can convince. Very interesting topic indeed, and research in this direction will advance as companies become more aware of the business gains from opinion mining of social media.

Overall, ECIR 2012 was a very inspiring conference. It also seemed a very friendly conference, offering many opportunities to network with the fellow attendees. Despite that, several participants said that the number of attendees at this year’s conference has decreased in comparison with previous years. The workshops and the core conference gave me the impression that it has a strong focus on young researchers, as many of the accepted contributions had a student as a first author and presenter at the conference. The fact that there was only one session running at a time was a good decision in my opinion, as the attendees were not forced to miss presentations. Nevertheless, the workshops and tutorials were running in parallel, and although the proceedings of the workshops will be made freely available, I still feel that I missed something that day. The industry day was very exciting, offering the opportunity to share ideas between academia and industry. However, there were not so many presentations, and the topics were not as diverse. I propose that next year Findwise will be among the speakers at the Industry track!

ECIR 2013 will be held in Moscow, Russia, between 24-28 March. See you there!

Tagging, Social Networks, Interaction and Findability

Events the past days has got me thinking about the power of social tagging and its connection to findability. Thoughts that commend me to writing my most personal (and perhaps off topic) post yet on this blog. (All thoughts expressed in this post are my own and do not necessarily reflect the opinions of my employer.)

Rumors about the shut down of Delicious have been circling the web. Even though it is still unconfirmed from Yahoo, my Twitter feed has been filled with comments about how to save your bookmarks, export bookmarks to other services, petitions to Yahoo about saving Delicious or making it open source.

Traditionally when talking about user tagging of content the topic is re-finding things. Users tag information on the web or an intranet in order to be able to find their way back to them. However most of the comments that I’ve seen about Delicious being shut down has nothing to do with this. As I see it, users don’t claim to be missing the bookmarks themselves, but the social network, research, collaboration and search capabilities that came with the bookmarking service. Delicious seems to have emerged from a service that helps you bookmark your things for re-finding them to a service that helps you find new things based on the tagging of others. Tagging, or social bookmarking may very well have started as a way of re-finding your information but has grown into a new way of discovering information, in parallel to search. (Maybe that is an explanation to the tweets wishing for Google to buy delicious from Yahoo?)

So, tagging can not only help you re-find your own stuff but also explore new things and spread information. One good example of this is what is currently going on in the swedish Twitterverse. It all started with one journalist’s discussion with her friends about the disbelief towards the women accusing Julian Assange of sexual assault. It quickly turned into so much more; a profound discussion about the fine lines of sexuality, what is OK, what we want and like and how to say no. Using the hash tag #prataomdet swedish twitter users are writing about and discussing their experiences in an effort to change the cultural climate so that people talk about it, start communicating with each other about sexuality. You can easily follow all the tweets real time and read blog posts on the topic at prataomdet.se. Many of the major news sites have now started reporting on this as well after the massive activity on twitter. (For non-swedish speaking readers an effort has also been made to start discussions in English as well at #talkaboutit on twitter.)

The feed in itself is thought provoking and can easily keep you busy for hours. Besides the content and openness of the discussions I find something else amazing. In a matter of hours this one tag joined together users, many of whom have never interacted with each other before, helping them share and find new information about something that was unspoken of earlier. Combining the power of social networks and tagging made this possible.

I usually write very different sorts of blog posts at this blog. This one time I just wanted to revel over the amazing possibilities for interaction that technology offers us today. Then maybe the next step is to think about how to tap into this power of interaction and how findability within the enterprise can benefit from this as well. In the mean time I recommend reading about What social networks reveal about interaction or how Västra Götalands Region are currently working on incorporating user tagging into their metadata.

Real Time Search in the Enterprise

Real time search is a big fuzz in the global network called Internet. Major search engines like Google and Bing are now providing users with real time search results from Facebook, Twitter, Blogs and other social media sites. Real time search means that as soon as content are created or updated, it is immediately searchable. This might be obvious and seems like a basic requirement, but working with search you know that this is not the case most of the time. Looking inside the firewall, in the enterprise, I dare to say that real time search is far from common. Sometimes content is not changed very frequently so it is not necessary to make it instantly searchable. Though, in many cases it’s the technical architecture that limits a real time search implementation.

The most common way of indexing content is by using a web crawler or a connector. Either way, you schedule them to go out and fetch new/updated/deleted content at specific interval during the day. This is the basic architecture for search platforms these days. The advantage of this approach is that the content systems does not need to adapt to the search platform, they just deliver content through their ordinary API:s during indexing. The drawback is that new or updated content is not available until next scheduled indexing. Depending on the system this might take several hours. Due to several reasons, mostly performance, you do not want to schedule connectors or web crawlers to fetch content too often. Instead, to provide real time search you have to do the other way around; let the content system push content to the search platform.

Most systems have some sort of event system that triggers an event when content is created/updated/deleted. Listening for these events, the system can send the content to the search platform at the same time it’s stored in the content system. The search platform can immediately index the pushed content and make it searchable. This requires adaptation of the content system towards the search platform. In this case though, I think the advantages outweighs the disadvantages. Modern content systems of today are (or should be) providing a plug-in architecture so you should fairly easy be able to plug in this kind of code. These plug-ins could also be provided by the search platform vendors just as ordinary connectors are provided today.

Do you agree, or have I been living in a cave for the past years? I’d love to hear you comments on this subject!

IASummit – Information Architecture and Search

This upcoming week my colleague Lina and I will participate in the IASummit in Phoenix Arizona. Search, information architecture and user experience and the relationships between them is the focus for us this upcoming week. We look forward to hearing a lot of great talks, meeting interesting people and enjoying the sunny weather in Arizona.

We will be blogging from the conference but if you don’t want to wait for that you can follow me, Maria on twitter or follow the hashtag for the IASummit #ias10 so see what everyone is tweeting about.

Welcome to the Enterprise Search and Findability Blog!

The Enterprise Search and Findability Blog is here. As some of you already know, Findwise has been blogging at findwise.se for several years now. However, we thought it was time to separate the blog from our web site and create a forum especially dedicated to the exciting area of findability, the Enterprise Search and Findability Blog. From a Findwise perspective, findability is the art of making information easy to find by using (enterprise) search technology, this regardless of when the information is needed or where it may be stored.

Here we invite you to learn more about findability and we welcome you to give us feedback and keep a dialogue with us. We will, among other things, keep you updated on relevant research within the findability area, exciting search functionality and news about enterprise search vendors.

New Features at the Search and Findability Blog

Our new blog includes features that were not available in our previous blog. These are: rss subscription, Findwise Twitter feed and the possibility to share information via other social medias. We hope and believe our readers will appreciate these features and we are looking forward to discussing with you here at the Enterprise Search and Findability Blog.

Enterprise Search 2.0?

While visiting Enterprise Search Summit in San Jose I realized that enabling Enterprise 2.0 within enterprise search is the hottest trend at the moment. Is it Enterprise Search 2.0?

Andrew McAfee who coined the term Enterprise 2.0 and has released a book on the subject, spoke about how to use altruism to develop the enterprise. People are wired to help and if we stop obsessing about the risks and lower the bars for how people can help each other it is possible to make this work within a corporate environment.

He also spoke about how process control and how much workflow control. How much do we really need? Make it easy to correct mistake instead of making it hard to make them. With regards to innovation he pointed out that we need to question credentialism and build communities that people want to join. To leverage the intelligence aspects within the enterprise we should explore and experiment with collective intelligence such as prediction markets and open peer review processes. All in all make it easy for people to interconnect.

Very high improvement in access to knowledge, internal experts, satisfaction, increased innovation and customer satisfaction.

I also recommend to read Price Waterhouse Coopers Technology Forecast Summer 2008 to get a good overview of the available tools and technologies.

So how does this impact enterprise search? Search can be made to be the facilitator for Enterprise 2.0. Of course it is possible to index and make all blogs, wikipedias, tweets (yammer), online communities and social networks searchable, but that is only one way to make it this new environment more findable. If someone tweets or blogs about information we should use that information to impact on the search results and ranking. We could also track user behavior on a site to make certain information more visible with regards to implicitly expressed interests.