Continuous crawl in SharePoint 2013

Continuous crawl is one of the new features that comes with SharePoint 2013. As an alternative to incremental crawl, it promises to improve the freshness of the search results. That is, the time between when an item is updated in SharePoint by a user and when it becomes available in search.

Understanding how this new functionality works is especially important for SharePoint implementations where content changes often and/or where it’s a requirement that the content should instantly be searchable. Nonetheless, since many of the new SharePoint 2013 functionalities depend on search (see the social features, the popular items, or the content by search web parts), understanding continuous crawl and planning accordingly can help level the user expectation with the technical capabilities of the search engine.

Both the incremental crawl and the continuous crawl look for items that were added, changed or deleted since the last successful crawl, and update the index accordingly. However, the continuous crawl overcomes the limitation of the incremental crawl, since multiple continuous crawls can run at the same time. Previously, an incremental crawl would start only after the previous incremental crawl had finished.

Limitation to content sources

Content not stored in SharePoint will not benefit from this new feature. Continuous crawls apply only to SharePoint sites, which means that if you are planning to index other content sources (such as File Shares or Exchange folders) your options are restricted to incremental and full crawl only.

Example scenario

The image below shows two situations. In the image on the left (Scenario 1), we are showing a scenario where incremental crawls are scheduled to start at each 15 minutes. In the image on the right (Scenario 2), we are showing a similar scenario where continuous crawls are scheduled at each 15 minutes. After around 7 minutes from starting the crawl, a user is updating a document. Let’s also assume that in this case passing through all the items to check for updates would take 44 minutes.

Continuous crawl SharePoint 2013

Incremental vs continuous crawl in SharePoint 2013

In Scenario 1, although incremental crawls are scheduled at each 15 minutes, a new incremental crawl cannot be started while there is a running incremental crawl. The next incremental crawl will only start after the current one is finished. This means 44 minutes for the first incremental crawl to finish in this scenario, after which the next incremental crawl kicks in and finds the updated document and send it to the search index. This scenario shows that it could take around 45 minutes from the time the document was updated until it is available in search.

In Scenario 2, a new continuous crawl will start at each 15 minutes, as multiple continuous crawls can run in parallel. The second continuous crawl will see the updated document and send it to the search index. By using the continuous crawl in this case, we have reduced the time it takes for a document to be available in search from around 45 minutes to 15 minutes.

Not enabled by default

Continuous crawls are not enabled by default and enabling them is done from the same place as for the incremental crawl, from the Central Administration, from Search Service Application, per content source. The interval in minutes at which a continuous crawl will start is set to a default of 15 minutes, but it can be changed through PowerShell to a minimum of 1 minute if required. Lowering the interval will however increase the load on the server. Another number to take into consideration is the maximum number of simultaneous requests, and this is a configuration that is done again from the Central Administration.

Continuous crawl in Office 365

Unlike in SharePoint 2013 Server, continuous crawls are enabled in SharePoint Online by default but are managed by Microsoft. For those used to the Central Administration from the on-premise SharePoint server, it might sound surprising that this is not available in SharePoint Online. Instead, there is a limited set of administrative features. Most of the search features can be managed from this administrative interface, though the ability to manage the crawling on content sources is missing.

The continuous crawl for Office 365 is limited in the lack of control and configuration. The crawl frequency cannot be modified, but Microsoft targets between 15 minutes and one hour between a change and its availability in the search results, though in some cases it can take hours.

Closer to real-time indexing

The continuous crawl in SharePoint 2013 overcomes previous limitations of the incremental crawl by closing the gap between the time when a document is updated and when this is visible in the search index.

A different concept in this area is the event driven indexing, which we will explain in our next blog post. Stay tuned!

Why search and Findability is critical for the customer experience and NPS on websites

To achieve a high NPS, Net Promoter Score, the customer experience (cx) is crucial and a critical factor behind a positive customer experience is the ease of doing business. For companies who interact with their customers through the web (which ought to be almost every company these days) this of course implies a need to have good Findability and search on the website in order for visitors to be able to find what they are looking for without effort.

The concept of NPS was created by Fred Reichheld and his colleagues of Bain and Co who had an increasing recognition that measuring customer satisfaction on its own wasn’t enough to make conclusions of customer loyalty. After some research together with Satmetrix they came up with a single question that they deemed to be the only relevant one for predicting business success “How likely are you to recommend company X to a friend or colleague.” Depending upon the answer to that single question, using a scale of 0 to 10, the respondent would be considered one of the following:

net-promoter

The Net Promoter Score model

The idea is that Promoters—the loyal, enthusiastic customers who love doing business with you—are worth far more to your company than passive customers or detractors. To obtain the actual NPS score the percentage of Detractors is deducted from the percentage of Promoters.

How the customer experience drives NPS

Several studies indicate four main drivers behind NPS:

  • Brand relationship
  • Experience of / satisfaction with product offerings (features; relevance; pricing)
  • Ease of doing business (simplicity; efficiency; reliability)
  • Touch point experience (the degree of warmth and understanding conveyed by front-line employees)

According to ‘voice of the customer’ research conducted by British customer experience consultancy Cape Consulting the ease of doing business and the touch point experience accounts for 60 % of the Net Promoter Score, with some variations between different industry sectors. Both factors are directly correlated to how easy it is for customers to find what they are looking for on the web and how easily front-line employees can find the right information to help and guide the customer.

Successful companies devote much attention to user experience on their website but when trying to figure out how most visitors will behave website owners tend to overlook the search function. Hence visitors who are unfamiliar with the design struggle to find the product or information they are looking for causing unnecessary frustration and quite possibly the customer/potential customer runs out of patience with the company.

Ideally, Findability on a company website or ecommerce site is a state where desired content is displayed immediately without any effort at all. Product recommendations based on the behavior of previous visitors is an example but it has limitations and requires a large set of data to be accurate. When a visitor has a very specific query, a long tail search, the accuracy becomes even more important because there will be no such thing as a close enough answer. Imagine a visitor to a logistics company website looking for information about delivery times from one city to another, an ecommerce site where the visitor has found the right product but wants to know the company’s return policy before making a purchase or a visitor to a hospital’s website looking for contact details to a specific department. Examples like these are situations where there is only one correct answer and failure to deliver that answer in a simple and reliable manner will negatively impact the customer experience and probably create a frustrated visitor who might leave the site and look at the competition instead.

Investing in search have positive impacts on NPS and the bottom line 

Google has taught people how to search and what to expect from a search function. Step one is to create a user friendly search function on your website but then you must actively maintain the master data, business rules, relevance models and the zero-results hits to make sure the customer experience is aligned. Also, take a look at the keywords and phrases your visitors use when searching. This is useful business intelligence about your customers and it can also indicate what type of information you should highlight on your website. Achieving good Findability on your website requires more than just the right technology and modern website design. It is an ongoing process that successfully managed can have a huge impact on the customer experience and your NPS which means your investment in search will generate positive results on your bottom line.

More posts on this topic will follow.

/Olof Belfrage

SLTC 2012 in retrospect – two cutting-edge components

The 4th Swedish Language Technology Conference (SLTC) was held in Lund on 24-26 October 2012.
It is a biennial event organized by prominent research centres in Sweden.
The conference is, therefore, an excellent venue to exchange ideas with Swedish researchers in the field of Natural Language Processing (NLP), as well as present own research and be updated of the state-of-the-art in most of the areas of Text Analytics (TA).

This year Findwise participated in two tracks – in a workshop and in the main conference.
As the area of Search Analytics (SA) is very important to us, we decided to be proactive and sent an application to organize a workshop on the topic of “Exploratory Query Log Analysis” in connection with the main conference. The application was granted and the workshop was very successful. It gathered researchers who work in the area of SA from very different perspective – from utilizing deep Machine Learning to discover users’ intent,  to looking at query logs as a totally new genre. I will do a follow-up on that in another post. All the contributions to the workshop will also be uploaded on our research page.

As for the main conference, we had two papers accepted for presentation. The first one dealt with the topic of document summarization – both single and multidocument summarization
(http://www.slideshare.net/findwise/extractive-document-summarization-an-unsupervised-approach).
The second paper was about detecting Named Enities in Swedish
(http://www.slideshare.net/findwise/identification-of-entities-in-swedish).

These two papers presented de facto state-of-the-art results for Swedish both when it comes to document summarization and Named Entity Recognition (NER). As for the former task, there is neither a standard corpus for evaluation of summarization systems, nor many previous results and just few other systems which made it unfeasible to compare our own system with. Thus, we have contributed two things to the research in document summarization – a Swedish corpus based on featured Wikipedia articles to be used for evaluation and a system based on unsupervised Machine Learning, which by relying on domain boosting achieves state-of-the-art results for English and Swedish. Our system can be further improved by relying on our enhanced NER and Coreference resolution modules.

As for the NER paper, our Entity recognition system for Swedish achieves 74.0% F-score, which is 4% higher than another study presented simultaneously at SLTC (http://www.ling.su.se/english/nlp/tools/stagger). Both systems were evaluated on the same corpus, which is considered a de facto standard for evaluation of different NLP resources for Swedish. The unlabelled score (i.e. no fine-grained division of classes but just entity vs non-entity) of our system achieved 91.3% F-score (93.1% Precision and 89.6% Recall). When identifying people, the Findwise NER system achieves 78.1% Precision and 90.5% Recall (83.9% F-score).

So, what did we take home from the conference? We were really happy to see that the tools we develop for our customers are not something mediocre but rather something that is of very high quality and is the state-of-the-art in Swedish NLP. We actively share our results and our corpora for research perposes. Findwise showed keen interest in cooperating with other researchers in developing better tools and systems in the area of NLP and Text Analytics. And this I think is a huge bonus to all our current and prospective customers – we actively follow the current trends in the research community and cooperate with researchers, and our products do incorporate the latest findings in the field, which make us leverage both high quality and cutting-edge technology.

As we continuously improve our products, we have also released a Polish NER and some work has been initiated on Danish and Norwegian ones. More NLP components will be soon available for demo and testing on our research page.

Impressions of GSA 7.0

Google released Google Search Appliance, GSA 7.0, in early October. Magnus Ebbesson and I joined the Google hosted pre sales conference in Zürich where we had some of the new functionality presented and what the future will bring to the platform. Google is really putting an effort into their platform, and it gets stronger for each release. Personally I tend to like hardware and security updates the most but I have to say that some of the new features are impressive and have great potential. I have had the opportunity to try them out for a while now.

In late November we held a breakfast seminar at the office in Gothenburg where we talked about GSA in general with a focus on GSA 7.0 and the new features. My impression is that the translate functionality is very attractive for larger enterprises, while the previews brings a big wow-factor in general. The possibility of configuring ACLs for several domains is great too, many larger enterprises tend to have several domains. The entity extraction is of course interesting and can be very useful; a processing framework would enhance this even further however.

It is also nice to see that Google is improving the hardware. The robustness is a really strong argument for selecting GSA.

It’s impressive to see how many languages the GSA can handle and how quickly it performs the translation. The user will be required to handle basic knowledge of the foreign language since the query is not translated. However it is reasonably common to have a corporate language witch most of the employees handle.

The preview functionality is a very welcome feature. The fact that it can highlight pages within a document is really nice. I have played around to use it through our Jellyfish API with some extent of success. Below are two examples of usage with the preview functionality.

GSA 7.0 Preview

GSA 7 Preview - Details

A few thoughts

At the conference we attended in Zürich, Google mentioned what they are aiming to improve the built in template in the GSA. The standard template is nice, and makes setting up a decent graphical interface possible for almost no cost.

My experience is however that companies want to do the frontend integrated with their own systems. Also, we tend to use search for more purposes than the standard usage. Search driven intranets, where you build intranet sites based on search results, is an example where the search is used in a different manner.

A concept that we have introduced at Findwise is search as a service. It means that the search engine is a stand-alone product that has APIs that makes it easy to send data to it and extract data from it. We have created our own APIs around the GSA to make this possible. An easy way to extract data based on filtering of data is essential.

What I would like to see in the GSA is easier integration with performing search, such as a rest or soap service for easy integration of creating search clients. This would make it easier to integrate functionality, such as security, externally. Basically you tell the client who the current user is and then the client handles the rest. It would also increase maintainability in the sense of new and changing functionality does not require a new implementation for how to parse the xml response.

I would also like to see a bigger focus of documentation of how to use functionality, previews and translation, externally.

Final words

My feeling is that the GSA is getting stronger and I like the new features in GSA 7.0. Google have succeeded to announce that they are continuously aiming to improve their product and I am looking forward for future releases. I hope the GSA will take a step closer to the search as a service concept and the addition of a processing framework would enhance it even further. The future will tell.

Enterprise Search in Practice: A Presentation of Survey Results and Areas for Expert Guidance

Enterprise search in practice presentation has two main focuses. First, to present some interesting and sometimes rather contradicting findings from the Enterprise Search and Findability survey 2012. Second, to introduce an holistic approach to implementing search technology involving five different aspects that are all important to succeed and to reach findability rather than just the ability to search.

Presented at Gilbane Conference 2012 in Boston USA on the 28th of November by Mattias Ellison.

Search in SharePoint 2013

There has been a lot of buzz about the upcoming release of Microsoft’s SharePoint 2013, how about the search in SharePoint 2013? The SharePoint Server 2013 Preview has been available for download since July this year, and a few days ago the new SharePoint has reached Release to Manufacturing (RTM) with general availability expected for the first quarter of 2013.

If you currently have an implementation of SharePoint in your company, you are probably wondering what the new SharePoint can add to your business. Microsoft’s catchphrase for the new SharePoint is that “SharePoint 2013 is the new way to work together”. If you look at it from a tech perspective, amongst other features, SharePoint 2013 introduces a cloud app model and marketplace, a redesign of the user experience, an expansion of collaboration tools with social features (such as microblogging and activity feeds), and enhanced search functionality. There are also some features that have been deprecated or removed in the new product, and you can check these on TechNet.

Let’s skip now to the new search experience provided out-of-the-box by SharePoint 2013. The new product revolves around the user more than ever, and that can be seen in search as well. Here are just a few of the new or improved functionalities. A hover panel to the right of a search result allows users to quickly inspect content. For example, it allows users to preview a document and take actions based on document type. Users can find and navigate to past search results from the query suggestions box, and previously clicked results are promoted in the results ranking. The refiners panel now reflects more accurately the entities in your content (deep refiners) and visual refiners are available out-of-the-box. Social recommendations are powered by users’ search patterns, and video and audio have been introduced as new content types. Some of the developers reading this post will also be happy to hear that SharePoint 2013 natively supports PDF files, meaning that you are not required anymore to install a third-party iFilter to be able to index PDF files!

Search Overview in SharePoint 2013

Search results page in SharePoint 2013 – from the Microsoft Office blog

While the out-of-the-box SharePoint 2013 search experience sounds exciting, you may also be wondering how much customization and extensibility opportunities you have. You can of course search content outside SharePoint and several connectors that allow you to get content from repositories such as file shares, the web, Documentum, Lotus Notes and public Exchange folders are included. Without any code, you can use the query rules to combine user searches with business rules. Also, you can associate result types with custom templates to enrich the user experience. Developers can now extend content processing and enrichment, which previously could have only be achieved using FAST Search for SharePoint. More than that, organizations have the ability to extend the search experience through a RESTful API.

This post does not cover all the functionalities and if you would like to read more about what changes the new SharePoint release brings, you can start by checking the TechNet material and following the SharePoint Team Blog and the Findwise Findability Blog, and then get in touch with us if you are considering implementing SharePoint 2013 in your organization or company.

Findwise will attend the SharePoint Conference 2012 in Las Vegas USA between 12-15 November and this will be a great opportunity to learn more about the upcoming SharePoint. We will report from the conference from a findability and enterprise search perspective. Findwise has years of experience in working with FAST ESP and SharePoint, and is looking forward to discussing how SharePoint 2013 can help you in your future enterprise search implementation.

Presentation: The Why and How of Findability

“The Why and How of Findability” presented by Kristian Norling at the ScanJour Kundeseminar in Copenhagen, 6 September 2012. We can make information findable with good metadata. The metadata makes it possible to create browsable, structured and highly findable information. We can make findability (and enterprise search) better by looking at findability in five different dimensions.

Five dimensions of Findability

1. BUSINESS – Build solutions to support your business processes and goals

2. INFORMATION – Prepare information to make it findable

3. USERS – Build usable solutions based on user needs

4. ORGANISATION – Govern and improve your solution over time

5. SEARCH TECHNOLOGY – Build solutions based on state-of-the-art search technology

Enterprise Search and Findability discussions at World Cafe in Oslo

Yesterday we (Kristian Hjelseth and Kristian Norling) participated in a great World Cafe event arranged by Steria in Norway. We did a Pecha Kucha inspired presentation (scroll down to the bottom of this blog post for the presentation) to introduce the subject of Enterprise Search and Findability and how to work more efficiently with the help of enterprise search. Afterwards there was a set of three round-table workshop with practitioners, where search related issues were discussed. We found the discussions very interesting, so we thought we should share some of the topics with a broader audience.

The attendees had answered a survey before coming to the World Cafe. In which 83,3% stated that finding the right information was critical for their business goals. But only 20,3% were satisfied with their current search solution, because 75% said it was hard or very hard to find the right information. More stats from a global survey on enterprise search that asked the same questions.

Unified Search

To have all the information that you would like to find in the same search was deemed very important for findability by the participants. The experience of search is that the users don’t know what to search for, but to make it even worse, they do not know where to look for the information! This is also confirmed by the Enterprise Search and Findability Survey that was done earlier this year. The report is available for download.

Trust

Google web search always comes up as an example of what “just works”. And it does work because they found a clever algorithm, PageRank, that basically measures the trustworthiness of information. Since PageRank is heavily dependent on inbound links this way of measuring trust is probably not going to work on an intranet where cross-referencing is not as common based on our experience. Most of the time it is not even possible to link stuff on the intranet, since the information is not accessible through http. Read more about it in this great in-depth article series on the difference between web search and enterprise search by Mark Bennet.

So how can we make search inside the firewall as good as web search? I think by connecting the information to the author. Trust builds between people based on their views of others. Simply put, someone has the authority over her peers either through rank (=organisation chart) or through trust. The trustworthiness can be based on the persons ability to connect to other people (we all probably know someone who knows “everyone”) or we trust someone based on the persons knowledge. More reading on the importance of trust in organisations. How to do this in practice? Some ideas in this post by BIll Ives. Also a good read: “How social is Enterprise Search?” by Jed Cawthorne. And finally another good post to read.

Metadata

By adding relevant metadata to information, we can make it more findable. There was discussions on the importance of strict and controlled metadata and how to handle user tagging. For an idea on how to think about metadata, read a blog post on how VGR used metadata by Kristian Norling.

Search Analytics

Before you start to do any major work with your current enterprise search solution, look at the search log files and analyze the data. You might be surprised in what you find. Search analytics is great if you want insight into what the user expects to find when they search. Watch this video for an introduction to Search Analytics in Practice.

Other subjects

  • Access control and transparency
  • Who owns search?
  • Who owns the information?
  • Personalization of search results
All these subjects and many more were discussed at the workshops, but that will have to wait for another blog post!
As always, your thoughts and comments are most welcome!

Findability day in Stockholm – search trends and customer insights

Last Thursday about 50 of Findwise customers, friends and people from the industry gathered in Stockholm for a Findability day (#findday12). The purpose was simply to share experiences from choosing, implementing and developing search and findability solutions for all types of business and use cases.

Martin White, who has been in the intranet business since 1996, held the keynote speech about “Why business success depends on search”.
Among other things he spoke about why the work starts once search is implemented, how a search team should be staffed and what the top priority areas are for larger companies.
Martin has also published an article about Enterprise Search Team Management  that gives valuable insight in how to staff a search initiative. The latest research note from Martin White on Enterprise search trends and developments.

Henrik Sunnefeldt, SKF, and Joakim Hallin, SEB, were next on stage and shared their experiences from working with larger search implementations.
Henrik, who is program manager for search at SKF, showed several examples of how search can be applied within an enterprise (intranet, internet, apps, Search-as-a-Service etc) to deliver value to both employees and customers.
As for SEB, Joakim described how SEB has worked actively with search for the past two years. The most popular and successful implementation is a Global People Search. The presentation showed how SEB have changed their way of working; from country specific phone books to a single interface that also contains skills, biographies, tags and more.

During the day we also had the opportunity to listen to three expert presentations about Big data (by Daniel Ling and Magnus Ebbeson), Hydra – a content processing framework – video and presentation (by Joel Westberg) and Better Business, Protection & Revenue (by David Kemp from Autonomy).
As for Big data, there is also a good introduction here on the Findability blog.

Niklas Olsson and Patric Jansson from KTH came on stage at 15:30 and described how they have been running their swift-footed search project during the last year. There are some great learnings from working early with requirements and putting effort into the data quality.

Least, but not last, the day ended with Kristian Norling from Findwise who gave a presentation on the results from the Enterprise Search and Findability Survey. 170 respondents from all over the world filled out the survey during the spring 2012 that showed quite some interesting patterns.
Did you for example know that in many organisations search is owned either by IT (58%) or Communication (29%), that 45% have no specified budget for search and 48% of the participants have less than 1 dedicated person working with search?  Furtermore, 44,4% have a search strategy in place or are planning to have one in 2012/13.
The survey results are also discussed in one of the latest UX-postcasts from James Royal-Lawson and Per Axbom.

Thank you to all presenters and participants who contributed to making Findability day 2012 inspiring!

We are currently looking into arranging Findability days in Copenhagen in September, Oslo in October and Stockholm early next spring. If you have ideas (speakers you would like to hear, case studies that you would like insight in etc), please let us know.