Presentation: Enterprise Search in SharePoint 2013

Presented by Paula Petcu and Ludvig Aldrin at Microsoft Campus Days, #cddk12, 31 October 2012, in Copenhagen Denmark.

Learn how easy it is to build powerful search experiences using SharePoint 2013.
The presentation will showcase the Search in SharePoint 2013 and provide a technical and functional walkthrough of what is new.  The presentation will take you through the out-of-box search experience, and you will get tips and tricks on how to extend the search platform to create a great custom experience for your users. Also discussed is the new search architecture and how search plays a central role in the new SharePoint 2013.

The presentation is divided into three parts. The first part will include an overview of search and will walk you through the out-of-the-box search experience, showcasing the new or improved functionalities and discussing how this affects the search experience. This part is all about finding what the users are looking for and getting answers to their questions. The new product revolves around the user more than ever, and you will be able to see this in the new search experience.

Then information about about the new search architecture, and this will make the transition to the second part of the presentation, which is all about extending. And a bit about executing queries under the new architecture and more specifically on how to extend the way they are executed.

Prior to SharePoint 2013, the only way to inspect and manipulate managed property values for items before being added to the search index was by extending the item processing pipeline in FAST Search for SharePoint. Clients using SharePoint search were out of luck as the functionality was not available to them. Now, MS has introduced three new items for content processing and enrichment: parsers, custom entity extractors, and web service callouts. These new features will be featured and one of the demoed.

But what happens next to the search engine? The third part of the presentation will be about the governance of your search solution. More specifically, it will focus on search analytics.

Search in SharePoint 2013

There has been a lot of buzz about the upcoming release of Microsoft’s SharePoint 2013, how about the search in SharePoint 2013? The SharePoint Server 2013 Preview has been available for download since July this year, and a few days ago the new SharePoint has reached Release to Manufacturing (RTM) with general availability expected for the first quarter of 2013.

If you currently have an implementation of SharePoint in your company, you are probably wondering what the new SharePoint can add to your business. Microsoft’s catchphrase for the new SharePoint is that “SharePoint 2013 is the new way to work together”. If you look at it from a tech perspective, amongst other features, SharePoint 2013 introduces a cloud app model and marketplace, a redesign of the user experience, an expansion of collaboration tools with social features (such as microblogging and activity feeds), and enhanced search functionality. There are also some features that have been deprecated or removed in the new product, and you can check these on TechNet.

Let’s skip now to the new search experience provided out-of-the-box by SharePoint 2013. The new product revolves around the user more than ever, and that can be seen in search as well. Here are just a few of the new or improved functionalities. A hover panel to the right of a search result allows users to quickly inspect content. For example, it allows users to preview a document and take actions based on document type. Users can find and navigate to past search results from the query suggestions box, and previously clicked results are promoted in the results ranking. The refiners panel now reflects more accurately the entities in your content (deep refiners) and visual refiners are available out-of-the-box. Social recommendations are powered by users’ search patterns, and video and audio have been introduced as new content types. Some of the developers reading this post will also be happy to hear that SharePoint 2013 natively supports PDF files, meaning that you are not required anymore to install a third-party iFilter to be able to index PDF files!

Search Overview in SharePoint 2013

Search results page in SharePoint 2013 – from the Microsoft Office blog

While the out-of-the-box SharePoint 2013 search experience sounds exciting, you may also be wondering how much customization and extensibility opportunities you have. You can of course search content outside SharePoint and several connectors that allow you to get content from repositories such as file shares, the web, Documentum, Lotus Notes and public Exchange folders are included. Without any code, you can use the query rules to combine user searches with business rules. Also, you can associate result types with custom templates to enrich the user experience. Developers can now extend content processing and enrichment, which previously could have only be achieved using FAST Search for SharePoint. More than that, organizations have the ability to extend the search experience through a RESTful API.

This post does not cover all the functionalities and if you would like to read more about what changes the new SharePoint release brings, you can start by checking the TechNet material and following the SharePoint Team Blog and the Findwise Findability Blog, and then get in touch with us if you are considering implementing SharePoint 2013 in your organization or company.

Findwise will attend the SharePoint Conference 2012 in Las Vegas USA between 12-15 November and this will be a great opportunity to learn more about the upcoming SharePoint. We will report from the conference from a findability and enterprise search perspective. Findwise has years of experience in working with FAST ESP and SharePoint, and is looking forward to discussing how SharePoint 2013 can help you in your future enterprise search implementation.

A look at European Conference on Information Retrieval (ECIR) 2012

European Conference on Information Retrieval

The 34th European Conference on Information Retrieval was held  1-5 April 2011, in the lovely but crowded city of Barcelona, Spain. The core conference attracted over 100 attendees, with a total of 35 accepted full papers, 28 posters, and 7 demos being presented. As opposed to the previous year, which had 2 parallel sessions, this year’s conference included a single running session. The accepted papers covered a diverse range of topics, and were divided into query representation, blog and online-community search, semi-structured retrieval, applications, evaluation, retrieval models, classification, categorisation and clustering, image and video retrieval, and systems efficiency.

The best paper award went to Guido Zuccon, Leif Azzopardi, Dell Zhang and Jun Wang for their work entitled “Top-k Retrieval using Facility Location Analysis” and presented by Leif Azzopardi during the retrieval models session. The authors propose using facility location analysis taken from the discipline of operations research to address the top-k retrieval problem of finding “the optimal set of k documents from a number of relevant documents given the user’s query”.

Meanwhile, “Predicting IMDB Movie Ratings using Social Media” by Andrei Oghina, Mathias Breuss, Manos Tsagkias and Maarten de Rijke won the best poster award. With a different goal from the best paper, the authors of the poster experiment with a prediction model for rating movies using a set of qualitative and quantitative features extracted from the stream of two social media channels, YouTube and Twitter. Their findings show that the highest predictive performance is obtained by combining features from both channels, and propose as future work to include other social media channels.

Workshop Days

The conference was preceded by a full day of workshops and tutorials running in parallel. I attended two workshops: Information Retrieval Over Query Sessions (SIR) during the morning and Task-Based and Aggregated Search (TBAS) in the afternoon. The second workshop ended with an interactive discussion. A third, full-day workshop was Searching 4 Fun!.

Industry Day

The last day was the Industry Day. Only 2 papers here, plus 5 oral contributions, and around 50 attendees. A strong focus of the talks given at the industry day was on opinion-mining: four of the six participating companies/institutions presented work on sentiment analysis and opinion mining from social media streams. Jussi Karlgren, from Gavagai, argued that sentiment analysis from social media can be used by companies for example in finding reviews or comments made about their product or service, analyse their market position, and predict price movements. Rianne Kaptein, from Oxyme, backed this up by adding that businesses are interested by what the consumers say about their brand, products or campaigns on social media streams. Furthermore, Hugo Zaragoza from Websays identified two basic needs inside a company: a need for help in reading so that someone can act, and a need for help in explaining so that it can convince. Very interesting topic indeed, and research in this direction will advance as companies become more aware of the business gains from opinion mining of social media.

Overall, ECIR 2012 was a very inspiring conference. It also seemed a very friendly conference, offering many opportunities to network with the fellow attendees. Despite that, several participants said that the number of attendees at this year’s conference has decreased in comparison with previous years. The workshops and the core conference gave me the impression that it has a strong focus on young researchers, as many of the accepted contributions had a student as a first author and presenter at the conference. The fact that there was only one session running at a time was a good decision in my opinion, as the attendees were not forced to miss presentations. Nevertheless, the workshops and tutorials were running in parallel, and although the proceedings of the workshops will be made freely available, I still feel that I missed something that day. The industry day was very exciting, offering the opportunity to share ideas between academia and industry. However, there were not so many presentations, and the topics were not as diverse. I propose that next year Findwise will be among the speakers at the Industry track!

ECIR 2013 will be held in Moscow, Russia, between 24-28 March. See you there!

Searching for Zebras: Doing More with Less

There is a very controversial and highly cited 2006 British Medical Journal (BMJ) article called “Googling for a diagnosis – use of Google as a diagnostic aid: internet based study” which concludes that, for difficult medical diagnostic cases, it is often useful to use Google Search as a tool for finding a diagnosis. Difficult medical cases are often represented by rare diseases, which are diseases with a very low prevalence.

The authors use 26 diagnostic cases published in the New England Journal of Medicine (NEJM) in order to compile a short list of symptoms describing each patient case, and use those keywords as queries for Google. The authors, blinded to the correct disease (a rare diseases in 85% of the cases), select the most ‘prominent’ diagnosis that fits each case. In 58% of the cases they succeed in finding the correct diagnosis.

Several other articles also point to Google as a tool often used by clinicians when searching for medical diagnoses.

But is that so convenient, is that enough, or can this process be easily improved? Indeed, two major advantages for Google are the clinicians’ familiarity with it, and its fresh and extensive index. But how would a vertical search engine with focused and curated content compare to Google when given the task of finding the correct diagnosis for a difficult case?

Well, take an open-source search engine such as Indri, index around 30,000 freely available medical articles describing rare or genetic diseases, use an off-the-shelf retrieval model, and there you have Zebra. In medicine, the term “zebra” is a slang for a surprising diagnosis. In comparison with a search on Google, which often returns results that point to unverified content from blogs or content aggregators, the documents from this vertical search engine are crawled from 10 web resources containing only rare and genetic disease articles, and which are mostly maintained by medical professionals or patient organizations.

Evaluating on a set of 56 queries extracted in a similar manner to the one described above, Zebra easily beats Google. Zebra finds the correct diagnosis in top 20 results in 68% of the cases, while Google succeeds in 32% of them. And this is only the performance of the Zebra with the baseline relevance model — imagine how much more could be done (for example, displaying results as a network of diseases, clustering or even ranking by diseases, or automatic extraction and translation of electronic health record data).