Event related data – the buzz word at ECIR 2013

One of the major trends at the 35th annual European Conference on Information Retrieval was event related data. The conference took place between the 24th and 27th of March this year in a snowy Moscow, Russia. It attracted around 300 participants from all over the globe, 3 of them findwizards. While ECIR 2013 provided talks on a large variety of topics from across the field, event related data was definitely a buzz word.

The keynote speaker opening the second day of conference was Rutgers University assistant professor and Mahaya inc. CTO Mor Naaman. In his talk, Mr Naaman let the following image explain why Mahaya inc. are in business.

 rome-then-and-now

The past two papal elections.

The image above clearly shows that the way people act at events has changed considerably in the past few years, nowadays everyone is a reporter and their stories can be found on social media. Using platforms such as Twitter, Facebook and YouTube as data sources Naaman’s company creates products which not only extracts, but also synchronizes event coverage. One interesting feature in their latest product is the synchronization of video clips, making it possible for a user to easily switch view when watching video footage of for example a concert.  An arguably even stronger feature of this use of social media is the fact that news and event footage can reach the world even if no press is present at the scene. Slides from this inspiring talk can be found here.

Another presentation the same day displayed promising results in the task of automatic event detection. Using machine learning algorithms a team of researchers from Hanover, Germany have designed a system for detecting and summarizing entity related events from Wikipedia edit history data. Basically the idea is that when a Wikipedia article is edited by a large amount of users in a short period of time that can mark an important event considering the subject of the article. More information about this research can be found here.

The last day of the conference opened with a presentation from Jimmy Lin of Twitter. His talk centered on the importance of fast real-time indexing in social media platform architecture. One of the strengths of Twitter is presenting the users with information about events as they happen. As an example of this he used the event of an earthquake hitting eastern USA in 2011. Tweets from locations closer to the epicenter of the earthquake reached Twitter users in New York City before the actual quake did. I have to admit “Twitter, faster than earthquakes” is a pretty good slogan.

So whether it’s using social media data to let people (re)visit events, automatic event detection in open source dictionaries, making sure your indexing is fast enough to let your users cover events as they happen or something else, event based data seems to be one of the driving forces in the field of IR at the moment.

A look at European Conference on Information Retrieval (ECIR) 2012

European Conference on Information Retrieval

The 34th European Conference on Information Retrieval was held  1-5 April 2011, in the lovely but crowded city of Barcelona, Spain. The core conference attracted over 100 attendees, with a total of 35 accepted full papers, 28 posters, and 7 demos being presented. As opposed to the previous year, which had 2 parallel sessions, this year’s conference included a single running session. The accepted papers covered a diverse range of topics, and were divided into query representation, blog and online-community search, semi-structured retrieval, applications, evaluation, retrieval models, classification, categorisation and clustering, image and video retrieval, and systems efficiency.

The best paper award went to Guido Zuccon, Leif Azzopardi, Dell Zhang and Jun Wang for their work entitled “Top-k Retrieval using Facility Location Analysis” and presented by Leif Azzopardi during the retrieval models session. The authors propose using facility location analysis taken from the discipline of operations research to address the top-k retrieval problem of finding “the optimal set of k documents from a number of relevant documents given the user’s query”.

Meanwhile, “Predicting IMDB Movie Ratings using Social Media” by Andrei Oghina, Mathias Breuss, Manos Tsagkias and Maarten de Rijke won the best poster award. With a different goal from the best paper, the authors of the poster experiment with a prediction model for rating movies using a set of qualitative and quantitative features extracted from the stream of two social media channels, YouTube and Twitter. Their findings show that the highest predictive performance is obtained by combining features from both channels, and propose as future work to include other social media channels.

Workshop Days

The conference was preceded by a full day of workshops and tutorials running in parallel. I attended two workshops: Information Retrieval Over Query Sessions (SIR) during the morning and Task-Based and Aggregated Search (TBAS) in the afternoon. The second workshop ended with an interactive discussion. A third, full-day workshop was Searching 4 Fun!.

Industry Day

The last day was the Industry Day. Only 2 papers here, plus 5 oral contributions, and around 50 attendees. A strong focus of the talks given at the industry day was on opinion-mining: four of the six participating companies/institutions presented work on sentiment analysis and opinion mining from social media streams. Jussi Karlgren, from Gavagai, argued that sentiment analysis from social media can be used by companies for example in finding reviews or comments made about their product or service, analyse their market position, and predict price movements. Rianne Kaptein, from Oxyme, backed this up by adding that businesses are interested by what the consumers say about their brand, products or campaigns on social media streams. Furthermore, Hugo Zaragoza from Websays identified two basic needs inside a company: a need for help in reading so that someone can act, and a need for help in explaining so that it can convince. Very interesting topic indeed, and research in this direction will advance as companies become more aware of the business gains from opinion mining of social media.

Overall, ECIR 2012 was a very inspiring conference. It also seemed a very friendly conference, offering many opportunities to network with the fellow attendees. Despite that, several participants said that the number of attendees at this year’s conference has decreased in comparison with previous years. The workshops and the core conference gave me the impression that it has a strong focus on young researchers, as many of the accepted contributions had a student as a first author and presenter at the conference. The fact that there was only one session running at a time was a good decision in my opinion, as the attendees were not forced to miss presentations. Nevertheless, the workshops and tutorials were running in parallel, and although the proceedings of the workshops will be made freely available, I still feel that I missed something that day. The industry day was very exciting, offering the opportunity to share ideas between academia and industry. However, there were not so many presentations, and the topics were not as diverse. I propose that next year Findwise will be among the speakers at the Industry track!

ECIR 2013 will be held in Moscow, Russia, between 24-28 March. See you there!

Search Conferences 2011

During 2011 a large number of search conferences will take place all over the world. Some of them are dedicated to search, whereas others discuss the topic related to specific products, information management, usability etc.

Here are a few that might be of interest for those of you looking to be inspired and broaden your knowledge. Within a few weeks we will compile all the research related conferences – there are quite a few of them out there!
If there is anything you miss, please post a comment.

March
IntraTeam Event Copenhagen 2011
Main focus: Social intranets, SharePoint and Enterprise Search
March 1, 2 and 3, 2011, Copenhagen, Denmark

Webcoast
Main focus: A web event that is an unconference, meaning that the attendees themselves create the program by presenting on topics of their own expertise and interest.
March 18-20 , Gothenburg, Sweden

Info360
Main focus: Business productivity, Enterprise Content Management, SharePoint 2010
March 21-24, Walter E. Washington Convention Center, Washington, USA

April
International Search Summit Munich
Main focus: International search and social media.
4th April 2011, Hilton Munich Park Hotel, Germany

ECIR 2011: European Conference on Information Retrieval
Main focus: Presentation of new research results in the field of Information Retrieval
April18-21, Dublin, Ireland

May
Enterprise Search Summit Spring 2011
Main focus: Develop, implement and enhance cutting-edge internal search capabilities
May 10-11, New York, USA

International Search Summit: London
Main focus: International search and social media
May 18th, Millennium Gloucester Hotel, London, England

Lucene Revolution
Main focus: The world’s largest conference dedicated to open source search.
May 25-26, San Francisco Airport Hyatt Regency, USA

SharePoint Fest – Denver 2011
Main focus: In search track: Enterprise Search, Search & Records Management, & FAST for SharePoint
May 19-20, Colorado Convention Center, USA

June
International Search Summit Seattle
Main focus: International search and social media
June 9th, Bell Harbor Conference Center, Seattle, USA

2011 Semantic Technology Conference
Main focus: Semantic technologies – including Search, Content Management, Business Intelligence
June 5-9, Hilton Union Square, San Francisco, USA

October
SharePoint Conference 2011
Main focus: SharePoint and related technologies
October 3-6, Anaheim, California, USA

November
Enterprise Search Summit Fall Nov 1-3
Main focus: How to implement, manage, and enhance search in your organization
Integrated with the KMWorld Conference, SharePoint Symposium and Taxonomy Bootcamp,

KM-world
(Co-locating with Enterprise Search Summit Fall, Taxonomy Boot Camp and Sharepoint Symposium)
Main focus: Knowledge creation, publishing, sharing, finding, mining, reuse etc
November 1 – 3, Washington Marriott Wardman Park, Washington DC, USA

Gilbane group Boston
Main focus: Within search: semantic, mobile, SharePoint, social search
November 29 – December 1, Boston, USA

Faceted Search by LinkedIn

My RSS feeds have been buzzing about the LinkedIn faceted search since it was first released from beta in December. So why is the new search at LinkedIn so interesting that people are almost constantly discussing it? I think it’s partly because LinkedIn is a site that is used by most professionals and searching for people is core functionality on LinkedIn. But the search interface on LinkedIn is also a very good example of faceted search.

I decided to have a closer look into their search. The first thing I realized was just how many different kinds of searches there are on LinkedIn. Not only the obvious people search but also, job, news, forum, group, company, address book, answers and reference search. LinkedIn has managed to integrate search so that it’s the natural way of finding information on the site. People search is the most prominent search functionality but not the only one.

I’ve seen several different people search implementations and they often have a tendency to work more or less like phone books. If you know the name you type it and get the number. And if you’re lucky you can also get the name if you only have the number. There is seldom anyway to search for people with a certain competence or from a geographic area. LinkedIn sets a good example of how searching for people could and should work.

LinkedIn has taken careful consideration of their users; What information they are looking for, how they want it presented and how they need to filter searches in order to find the right people. The details that I personally like are the possibility to search within filters for matching options (I worked on a similar solution last year) and how different filters are displayed (or at least in different order) depending on what query the user types. If you want to know more about how the faceted search at LinkedIn was designed, check out the blog post by Sara Alpern.

But LinkedIn is not only interesting because of the good search experience. It’s also interesting from a technical perspective. The LinkedIn search is built on open source so they have developed everything themselves. For those of you interested in the technology behind the new LinkedIn search I recommend “LinkedIn search a look beneath the hood”, by Daniel Tunkelang where he links to a presentation by John Wang search architect at LinkedIn.

The Right Information at the Right Time; or Control vs Openness

There is obviously a difference between what people want and do and what the organisations think and want to do.

I saw a good definition of what enterprise 2.0 is the other day. Meet Charlie is a good example of how web 2.0 tools can be used in the enterprise area. Because people do use them; these new tools have changed the way we communicate and collaborate. If your not an organization that is.

I think social media is here to stay. Things like flickr and youtube ultimately changed the way we deal with our photos and videos. Look at the competitive analysis valuecurve for flickr to see how it changed the business behind photo services. (Flickr is now also the second most popular photo site.) And social media isn’t just for kids. You can find booktips from the library in Norrköping at youtube, many professionals have profiles on LinkedIn, we subscribe to dozens of blogs and blog ourselves.

There is a lot of professional networking going on on the web. People of today have a need to share their thoughts and ideas. So there are a lot of Charlies out there. Howcome there are so few of his employers?

According to Gartner, today 80% of Business is conducted on unstructured information, which is about 85% of all data. And yet most of the development för IT is done for the rest of the information, the 15% that is structured and semi-structured. People go for openness and collaboration but organizations go for structure and control…