SharePoint 2013 entity extraction – part 2

In previous blog post we have described the built-in way for entity extraction in Sharepoint. It was pointed out that it’s good as long as you are able to create a full dictionary of all entities you want to extract. It’s not always possible. An alternative to the dictionary-based entity extraction is the statistical approach where we train a model for the purpose of recognizing entity names.

Sharepoint Content Enrichment

The content enrichment web service callout offers the possibility of processing the crawled content before it is indexed. This processing, which can consist of for example cleaning data, computing new values based on existing ones, or enriching the content with metadata, is done in addition to the processing done by SharePoint. Note however that this solution is limited to SharePoint Server 2013 with Enterprise CALs.

The processing can be applied not only to the SharePoint content, but it can benefit any content indexed by SharePoint, such as external websites.

Findwise Entity Extraction

Findwise has implemented a basic content processing enrichment service for customized processing of content indexed by SharePoint. This service can thus be used for processing and enriching documents using other already tested services developed by Findwise, such as the text analytic components. Moreover, it contains the basis on which other custom services can be built upon.

One of these text analytic services is Entity Extraction. It is based on statistical approach so before we can run it we need to train our model. The documents used for training must of course be representative of the domain in terms of form, terminology and writing style. However, this statistical approach has the potential of improving over time through training, as more examples are provided.

Make it run

To make it works we need to setup Findwise Entity Extraction service and provide documents for training. The more we get the better.

Findwise Entity Extraction Service is just Web Service that get text and return an array of found entities. Example:

For given text:

"Bill Wiggins has joined the findwise company in 2016. 
Since then much has happened, in example George Lucas joined and left, 
Tom Rubik has decided to move forward."

We got in response:

"Bill Wiggins",
"George Lucas",
"Tom Rubik "

Next we create Web Service Callout project. The only thing it does is getting document body and put the extracted entities in new “Entity” managed property. No mapping required this time – just make sure to create new managed property. Finally add the Web Service Callout to IIS and register it in SharePoint using the PowerShell scripts.

Information on how to create and register Content Enrichment Web Service Callout you can find at https://msdn.microsoft.com/en-us/library/office/jj163982.aspx

After than just run the content source and add new Refiner to your search page. Below you can find a result of run on Wikipedia texts. Note that extracted names doesn’t come from any dictionary but are returned by Findwise Entity Extraction text analytic component:sharepoint-results

SharePoint 2013 entity extraction – part 1

What is your search missing?

The built-in search experience in SharePoint 2013 has greatly improved from previous versions, and companies adopting it enjoy a bag of new features (such as the visual refiners, the social search, the hover-panel with previews, to name a few). However, is your implementation of the search in SharePoint 2013 matching all your business and information needs?! Is your search solution reaching the target search KPIs? Are you wondering how you can cut down on the task of the editor, improve the search experience for your users, or reduce the time spent by your information workers finding the relevant content?

Entity Extraction in SharePoint 2013 Search  

To make your search good you need good metadata. They can be then used as a filters, boosted fields etc. Usually that means that documents need to be tagged, which may take a long time if done manually by content owners.

However, it is possible to extract some metadata from document content during index time. In SharePoint 2013 there are two ways of doing it: “Custom entity extraction” or with use of “Custom Content processing”. In this post you can learn the first way.

Custom entity extraction

SharePoint 2013 introduced a new way for entity extraction. It allows to extract entities from document based on dictionary.

The first step is, of course, preparing the dictionary. It needs to be in following format:

Key,Display form
Findwise, Findwise
FW, Findwise
Sharepoint, Sharepoint
Microsoft,Microsoft

Then you need to register that dictionary file in SharePoint – using Powershell scripts: https://technet.microsoft.com/library/jj219614.aspx

Last thing left to do is enabling entity extraction on the Managed Property it should be applied to. To get them from content of a document just edit “body” Managed Property

scr1

and select “Word Extraction – Custom 1” checkbox:

scr2

We choose that one because our dictionary was registered with – DictionaryName “Microsoft.UserDictionaries.EntityExtraction.Custom.Word.1” parameter, if you need more dictionaries then you register them using different dictionary name values and selecting the right option in Managed Property settings.

After that run “Full Crawl” for your sources.

Finally, just add the “WordCustomRefiner1” to your refiners on search result list and start using new filter:scr3

This way is really good if you are able to generate a static dictionary. Eg. you can use a list of all countries or cities you can find on the internet for location extraction. You can also extract at dictionary from your customers database or your employees list and then update it on regular basis.

However usually it’s not possible to get a full list of all entities, and they must be extracted using one of NLP algorithms, that will be described in next part.

The search experience in SharePoint 2013: customised or targeted?

This post is the fourth in a series of four articles providing several best practices on how to implement and customise the search experience in SharePoint 2013. The previous posts listed the differences between the cloud and on-premise SharePoint, provided considerations when upgrading to SharePoint 2013, and dealt with the practicalities of configuring search in SharePoint Online. This fourth post handles the more advanced topic of ranking results and the future of search in SharePoint.

Managing ranking

We’ve previously mentioned the query rules as a way to change the ranking of the search results based on your requirements. These allow the promotion of certain search results or search result blocks on top of the ranked searched results, and more advanced query rules allow even changing the ranking of the search results based on what the query terms are.

By using query rules, customising the search results web part, and a few content by search web parts, you can change the behaviour of the search depending on what user is accessing it. That is, you would also need good metadata to make this work, but having a complete user profile (including the job title, department, and interests) is a good start. Based on such user information, you can define how the search experience for that user will be.

Changing ranking using query rules, however, requires a query rule condition, which describes the prerequisites that the query must fulfil in order for the query rule to fire. For changing the results for all queries, you can use the next approach.

If the default ranking does not satisfy your search requirements and you want to change the order of the ranked search results, SharePoint provides the possibility of changing the ranking models. It is a feature available in SharePoint Online as well, as described in the TechNet documentation: “SharePoint Online customers need to download and install the free Rank Model Tuning App in order to create and customize ranking models.”

A ranking model contains the features and corresponding weights that are used in calculating a score for each search result. Changing the ranking models might require a deeper and theoretical knowledge of how search works, and those that take the challenge of changing the ranking model are often dedicated search administrators or external specialised consultants.

The Ranking Model Tuning app is free on the App Store - http://office.microsoft.com/en-001/store/ranking-model-tuning-WA104192565.aspx

The Ranking Model Tuning app is free on the App Store

The Rank Model Tuning App provides a user interface for creating custom ranking models, and can be used for both SharePoint Online and SharePoint Server, though in SharePoint 2013 Server there is also the possibility to use PowerShell to customise ranking models. New models are based on existing ranking models for which you can add or remove new rank features and tune the weight of a rank feature. It also allows for evaluating the new ranking model using a test set of queries. The set of test queries can be constructed from real queries made by users that can be gathered from previous search logs, for example. How to use the tuning app is explained step-by-step in the documentation on the Office site.

Changing the weight of certain file types (say for example for PowerPoint documents compared to Excel documents) might be enough for many search implementations, but depending on the content, the features that influence the ranking of the search results can become more elaborate. For example, a property defining whether documents are either official or work-in-progress might become an important factor in determining the ranking of search results. SharePoint provides the liberty to create new properties, so it makes sense that these can be used in search to improve the relevance.

It should be pointed out, however, that changing the ranking model influences all searches that are run using that ranking model. Though the main idea of changing the ranking model is to improve the ranking, it can become much too easy to make changes that can have an undesirable effect on the ranking. This is why a proper evaluation of ranking changes needs to be part of your plan for improving search relevance.

The office graph and the future of social

The social features introduced in SharePoint 2013 provide a rich social experience, which is interconnected with the search experience. Many social features are driven by search (such as the recommendations for which people or documents to follow), and social factors also affect the search (such as finding the right expertise from conversations in your network).

In the month of June 2012 Microsoft acquired the social enterprise platform Yammer. The SharePoint Server 2013 Preview has been made available for download since July 2012, and it reached Release to Manufacturing (RTM) in October the same year. The new SharePoint 2013 implements new social features (see for example the newsfeed, the new mysites and the tagging system), many of which are overlapping with those available in Yammer! This brings us to the question on everyone’s mind since the acquisition of Yammer: what is the future of social in SharePoint? Should you use SharePoint’s social features or use Yammer?

In March 2014, Microsoft announced that they will not include new features in the SharePoint Social but rather invest in the integration between Yammer and Office 365. The guidance is thus to go for Yammer.

“Go Yammer! While we’re committed to another on-premises release of SharePoint Server—and we’ll maintain its social capabilities—we don’t plan on adding new social features. Our investments in social will be focused on Yammer and Office 365” – Jared Spataro, Microsoft Office blog

Also at the SharePoint conference this March 2014, Microsoft introduced the Office Graph, and with it Oslo as the first app demo using it. During the keynote, Microsoft mentions that the Office Graph is “perhaps the biggest idea we’ve had since the beginning of SharePoint”. The office graph maps relationships between people, the documents they authored, the likes and posts they made, and the emails they received; it’s actually an extension of Yammer’s enterprise graph. The Oslo application is leveraging the graph, in a way that looks familiar from Facebook’s graph search.

The Office Graph, connecting people and information - Microsoft Office Blog http://blogs.office.com/2014/03/03/work-like-a-network-enterprise-social-and-the-future-of-work/

The Office Graph, connecting people and information – Microsoft Office Blog

The new Office Graph provides exciting opportunities, and has consequences for how the search will be used. Findwise started exploring the area of enterprise graph search before Microsoft announced the Office Graph – see our post about the Enterprise Graph Search from January 2013.

Reluctant to go for the cloud?

Microsoft has hinted during the SharePoint conference keynote in March that they will be adding new functionalities to the cloud version first. Although they are still committed to another version of SharePoint server, new updates might come at a slower pace for the on-premise version. However, Microsoft also announced that with the SharePoint SP1 there is a new functionality in the administrative interface: a hybrid setting which allows you to specify whether you want the social component in the cloud/Yammer, or your documents on OneDrive, so that you don’t need to move everything to the cloud overnight.

Let us know how far you’ve come with your SharePoint implementation! Contact us if you need help in deciding which version of SharePoint to choose, need help with tuning search relevance, have questions about improving search, or would like to work with us to reach the next level of findability.

Customizing search in SharePoint Online

Search in SharePoint 2013 – Part 3: Customizing search in SharePoint Online

This post is the third in a series of four articles providing several best practices on how to implement and customise search in SharePoint. In the first post, we provided a brief overview of the differences in terms of search between the on-premise and cloud versions, and in the second blog post we discussed several things you should consider when migrating to the new SharePoint. In this post, we will mention several search features that can be configured in SharePoint Online, and we will be specifically referring to those available in the Enterprise Plan. If you need more information than is provided in this blogpost, feel free to visit our website or contact us!

Here is a summary of what customisations for search in SharePoint Online will be discussed:

  • Defining your own custom result sources, and hiding any that you are not using
  • Setting up hybrid search if you chose a hybrid solution
  • Defining which refiners to show and how to display them
  • Adding query suggestions that are related to your organisation
  • Adding query spelling corrections
  • Changing how the search results are displayed to show previews and additional metadata

Get ready to search ‘everything’

This is the uncustomized search box that you will see on your search center page. Please note that in some SharePoint Online plans the ‘Videos’ vertical is not available.

This is the uncustomized search box that you will see on your search center page.
Please note that in some SharePoint Online plans the ‘Videos’ vertical is not available.

Everything is the default scope when performing a search in the SharePoint search center and is returning every type of result from all of your site collections. There are a few other scopes (search verticals, or so-called Result Sources) that are included by default, People, Conversations, and Videos, and these are preconfigured to search on what you would expect.

  • You can add new result sources, say for example Reports, that shows only search results that are tagged with the keyword ‘Final Report’. You define yourself what the criteria for a result source should be.
  • If there is a result source that you are not using, say for example if you have no video content and don’t plan to have in the near future, it’s less confusing for the users if you simply not show it for now. It’s easy to add it back if you will need it in the not so foreseeable future.

If you choose a hybrid solution, your content is split between the online SharePoint and the on-premise SharePoint Server.

  • It’s possible to have one search that displays results from both locations. For example, to show results from the on-premise installation in SharePoint Online, you have to define a new result source that is able to retrieve the results from the on-premise. Then you can configure the search results page to show results from both result sources (everything from SharePoint Online plus everything from SharePoint on-premise that matches the search query).

Screenshot from the post Hybrid search by the Microsoft SharePoint Team Blog showing how results from the cloud are integrated in the search results page when the user searches from an on-premises SharePoint 2013 site. Notice also the new visual refiner for date interval in the refinement panel on the left.

Screenshot from the post Hybrid Search by the Microsoft SharePoint Team Blog showing how results from the cloud are integrated in the search results page when the user searches from an on-premises SharePoint 2013 site.
Notice also the new visual refiner for date interval in the refinement panel on the left.

Drill down into the search results

The search Refiners allow the users to drill down into the search results. There is a new type of refiner in SharePoint 2013, a visual refiner, by default used for the ‘Modified Date’.

  • The way in which the visualisation of the refiners is made has drastically changed, and you can define your own visualisation of the data if you want to. For example, what about a map as a refiner, instead of a list of city names?

By default, the refiners you will see would be the Result type (example values: Excel, Web page), Author (example values: John Doe, Jane Doe), and Modified Date (shown as a distribution of values).

  • If you edit the web part responsible for the refiners, you will be able to add other refiners as well. For example, company names are automatically extracted from your content, so it is easy to simply add that to your refiners.
  • Also, another useful refiner to show to your users is the Content Type, offering one level of detail more from the Result Type refiner.

Search guidance

Query suggestions are displayed as the user types.

Query suggestions are displayed as the user types.

As the user types a query in the search box, SharePoint is able to show Query Suggestions that help complete the query. SharePoint automatically creates a list of suggestions based on previous searches. When at least 6 search results are clicked for a specific query, that query will be added to the list of suggestions.

  • Besides the list that SharePoint creates automatically, you are able to add your own list of suggestions. This is especially useful when starting fresh with your installation, since a fresh installation will come with no query suggestions. You could help the users by adding your company name, product names or similar to the initial list of suggestions. You will also find manual adding of suggestions useful when reviewing the search logs, since these can give you a new perspective on what the users are looking for, and based on that input help guide your user to the relevant results using query suggestions.
  • You are also able to import a list of suggestions that are not intended to be shown in suggestions. Say for example that your testing team uses a specific keyword for testing content. In this case, it is very probable that the test keyword will soon appear as a suggestion for all users. To avoid this, simply add the keyword to the query suggestion exclusion list.

Similar to the query suggestions, another functionality whose purpose is to help the user in formulating the query is the Query Spelling Correction. An inclusion and exclusion list is used in this case as well, the only difference is that these are managed in the Term Store, while managing query suggestions is made by importing a plain text file.

  • You can add your own terms in the query spelling correction inclusion and exclusion lists. Probably one of the most often misspelled words is the word ‘business’. Or was it ‘bussiness’? After adding this term to the list of words to be included in the spelling suggestions, the correct form of the word would be shown under the ‘Did you mean’ functionality if the user misspells it.

Change how the search results are displayed

Screenshot from an Office Blogs post showing the hover panel for a PowerPoint document.

Screenshot from an Office Blogs post showing the hover panel for a PowerPoint document.

A final item on our list of proposed customisations for your search results is to change how the search results are displayed. In SharePoint 2013, it is the Display Templates that define how each element in the search results page is displayed. For example, there is a template for the refiner, another one for the hover panel of a PDF item, another one for the hover panel of a Word item, and so on.

  • A simple fix would be make sure that you have previews for PDF files in the hover panel. It is the Office Web Apps that power the previews for Office documents (such as Word, PowerPoint, Excel), but the preview for PDF files might not be visible for you. If so, what you can do is change the display template that is associated to the PDF result type.
  • You can also define what metadata to show for each result type. For example, for a Word document you would by default be able to see the Title, a text snippet and a URL, and in the hover panel the document preview, Last Modified date and author, as well as probably a list of the main headings from the document. However, if you have added additional metadata to your document, such as Location or Keywords, you can display these in the search results as well by modifying the right display template.

You can find more information about how to administer many of these search functionalities from this Microsoft Office page and from our search experts. Let us know how far you are in implementing SharePoint online for your organisation – we sure have a few more tips to how to configure and customize the search in SharePoint!

Cloud vs. on-premise SharePoint 2013 search

Search in SharePoint 2013 – Part 1: The difference between search within on-premise SharePoint 2013 and SharePoint Online

Cloud or on-premise? Findwise offers implementation and consulting services for both scenarios. This post is the first in a series of four articles providing several best practices on how to implement and customise search in SharePoint. The focus of this first post is introducing the difference between the cloud and on-premise SharePoint 2013 in terms of search features. If you need more information than you find in this blogpost, just stop by our website or contact us

“The cloud is on fire”

That is a quote from the Microsoft Office General Manager Jared Spataro during his keynote at the SharePoint conference in Las Vegas last month. At this conference, Microsoft revealed that 60% of the Fortune top 500 adopted Office 365 in the previous 12 months. While new versions of on-premise SharePoint and Exchange Server are promised to still come next year, Microsoft is adding more and more capabilities to the cloud version.

SPC14 Keynote summary

Fun random facts about SharePoint Online presented during the keynote at the SharePoint conference in Las Vegas this year (March 3rd 2014)

In addition to the numbers above, a market analysis report done by The Radicati Group on the adoption of Microsoft SharePoint reveals that almost a quarter of the worldwide users accessing deployments of SharePoint made during the year 2013 are using the cloud based SharePoint.

When deciding whether to go for the on-premise or cloud solution, a go-to resource for your IT team is the TechNet article describing the availability of features across the solutions. That article not only divides the features between on-premise and cloud, but also between the different Office 365 and SharePoint Online plans. What is the difference? SharePoint Online is the cloud version of the SharePoint Server, but it can be deployed as a standalone service or as part of the Office 365 suite, so different plans are usually listed for these different scenarios. There are also the Office 365 Dedicated plans, but these are out of the scope for this article. The Microsoft Office site has a more business oriented comparison of the different plans, including pricing. If not decided for one or the other, there is also the possibility of a hybrid solution!

 Availability Search feature Office 365 Small BusinessOffice 365 Small Business Premium Office 365 Midsize BusinessOffice 365 Enterprise E1 or K1Office 365 Education A2Office 365 Government G1 or K1 Office 365 Enterprise E3 or E4Office 365 Education A3 or A4Office 365 Government G3 or G4 SharePoint Online Plan 1 SharePoint Online Plan 2 SharePoint Foundation 2013 SharePoint Server 2013 Standard CAL SharePoint Server 2013 Enterprise CAL
Available within all plans
Phonetic name matching Yes Yes Yes Yes Yes Yes Yes Yes
Expertise Search Yes Yes Yes Yes Yes Yes Yes Yes
Quick preview Yes Yes Yes Yes Yes Yes Yes Yes
RESTful Query API/Query OM Yes Yes Yes Yes Yes Yes Yes Yes
Result sources Yes Yes Yes Yes Yes Yes Yes Yes
Search results sorting Yes Yes Yes Yes Yes Yes Yes Yes
Ranking models Yes Yes Yes Yes Yes Yes Yes Yes
Query spelling correction Yes Yes Yes Yes Yes Yes Yes Yes
Refiners Yes Yes Yes Yes Yes Yes Yes Yes
Manage search schema Yes Yes Yes Yes Yes Yes Yes Yes
Available in all Office365 and SharePoint Online plans
Deep links Yes Yes Yes Yes Yes No Yes Yes
Event-based relevancy Yes Yes Yes Yes Yes No Yes Yes
Graphical refiners Yes Yes Yes Yes Yes No Yes Yes
Recommendations Yes Yes Yes Yes Yes No Yes Yes
Search vertical: “Conversations” Yes Yes Yes Yes Yes No Yes Yes
Search vertical: “People” Yes Yes Yes Yes Yes No Yes Yes
Query suggestions Yes Yes Yes Yes Yes No Yes Yes
Query throttling Yes Yes Yes Yes Yes No Yes Yes
“This List” searches Yes Yes Yes Yes Yes No Yes Yes
Query rules—Add promoted results Yes Yes Yes Yes Yes No Yes Yes
Avail. in Office365 Advanced Content Processing Yes Yes Yes No No Yes Yes Yes
Hybrid search No Yes Yes Yes Yes Yes Yes Yes
Query rules—advanced actions No No Yes No No No No Yes
Search vertical: “Video” No No Yes No Yes No No Yes
Not available in any of the Office 365, SharePoint Online plans
Search connector framework No No No No No No Yes Yes
Custom entity extraction No No No No No No No Yes
Extensible content processing No No No No No No No Yes

— Simplified view of the TechNet article, focusing on the search features availability across SharePoint solutions

Limitations in Office 365 and SharePoint Online plans

Is the cloud version good enough for your organisation when it comes to search features? The table above illustrates some of the things that you might be missing in terms of search, and in what follows we will discuss those whose availability varies amongst the Office 365 or SharePoint Online plans.

Query rules – advanced actions

In order to adapt the relevance of the search results to the user intent, SharePoint 2013 adds a new feature called query rules. A query rule is defined by a condition and a corresponding action to be taken when the condition is met. Within some SharePoint Online licenses, this functionality is limited to the possibility of adding promoted results, while more advanced actions are left out. The promoted results are similar to what was in previous SharePoint versions known as search keywords, or best bets, letting you promote specific results on top of the ranked search results. The more advanced actions could consist of for example changing the query or changing the ranking of the search results by promoting a certain group of results. You can read more about various usages of query rules in one of our previous blog post.

Search Connector Framework and Hybrid Search

Administrators of SharePoint Online will miss the feature of managing the different search connectors to content sources, since the search connector framework is not available. Only SharePoint content that is stored online is going to be indexed. Search results can only be retrieved from that content, or can be set up to retrieve from an Exchange Server, from a remote SharePoint, or from a search engine that uses the OpenSearch protocol. As an alternative approach to making content from other sources searchable, you can set up hybrid search. This feature is available in almost all Office 365 and SharePoint Online scenarios. It allows users to show search results from content available in the cloud and on-premise. So if you would like to index a content source that is not supported in SharePoint Online, you should be able to index it on the on-premise.

Custom Entity Extraction

The TechNet article describing features across solutions actually shows that this feature is only available with the enterprise licensing of SharePoint Server. This feature allows the extraction of custom-defined terms from your content and making them usable as search refiners. Say for example that you would like to extract all of your current product names from the content of your documents and then be able to refine your search results on the product name.

Content Processing Extensibility

The other search feature that is only available with the enterprise licensing of SharePoint Server is the content processing extensibility. In practice, this means there is an API that can be used to transform the data before it is stored in the index. For example, more advanced entity extraction can be made at this step. While the custom entity extraction discussed previously is able to identify names in the content based on a pre-defined list of names, through this API you can use a trained model to do entity extraction for example. Additional use cases could be cleaning or normalising the data according to predefined rules (for example, lowercasing all values in a property), or automatically tagging items based on the content.

It should be noted that the TechNet article is not a comprehensive list, and rather gives an overview of the major differences between solutions. Here is for example one more feature whose availability is limited.

Synonyms

One of the missing features in SharePoint Online that is available in the on-premise solution is the possibility of defining synonyms. Since it’s too easy to communicate the same thing with different words, defining synonyms or abbreviations for search phrases can help aggregate the results for the multiple ways of expressing the same information need. We hope that Microsoft will integrate this feature in the future versions of SharePoint Online as well.

Find the right documentation

When searching for which functionality is available across solutions on the Microsoft Office.com website or TechNet, make sure to check that the discussed functionality applies to your version of SharePoint. Articles usually indicate for which versions the functionality applies to.

Feature availability in MS articles

Articles on Office.com (left) and TechNet (right) indicate for which version
of SharePoint the discussed topic applies to.

Please note that things might change, new updates in SharePoint online can add functionality that was missing before.

To stay up-to-date, check the TechNet page once in a while, visit our website or contact us to help you map your requirements to the available search features across solutions.

Event driven indexing for SharePoint 2013

In a previous post, we have explained the continuous crawl, a new feature in SharePoint 2013 that overcomes previous limitations of the incremental crawl by closing the gap between the time when a document is updated and when the change is visible in search. A different concept in this area is event driven indexing.

Content pull vs. content push

In the case of event driven indexing, the index is updated real-time as an item is added or changed. The event of updating the item triggers the actual indexing of that item, i.e. pushes the content to the index. Similarly, deleting an item results in deleting the item from the index immediately, making it unavailable from the search results.

The three types of crawl available in SharePoint 2013, the full, incremental and continuous crawl are all using the opposing method, of pulling content. This action would be initiated by the user or automated to start at a specified time or time intervals.

The following image outlines the two scenarios: the first one illustrates crawling content on demand (as it is done for the full, incremental and continuous crawls) and the second one illustrates event-driven indexing (immediately pushing content to the index on an update).

Pulling vs pushing content, showing the advantage of event driven indexing

Pulling vs pushing content

Example use cases

The following examples are only some of the use cases where an event-driven push connector can make a big difference in terms of the time until the users can access new content or newest versions of existing content:

  • Be alerted instantly when an item of interest is added in SharePoint by another user.
  • Want deleted content to immediately be removed from search.
  • Avoid annoying situations when adding or updating a document to SharePoint and not being able to find it in search.
  • View real-time calculations and dashboards based on your content.

Findwise SharePoint Push connector

Findwise has developed for its SharePoint customers a connector that is able to do event driven indexing of SharePoint content. After installing the connector, a full crawl of the content is required after which all the updates will be instantly available in search. The only delay between the time a document is updated and when it becomes available in search is reduced to the time it takes for a document to be processed (that is, to be converted from what you see to a corresponding representation in the search index).

Both FAST ESP and Fast Search for SharePoint 2010 (FS4SP) allow for pushing content to the index, however this capability was removed from SharePoint 2013. This means that even though we can capture changes to content in real time, we are missing the interface for sending the update to the search index. This might be a game changer for you if you want to use SharePoint 2013 and take advantage of the event driven indexing, since it actually means you would have to use another search engine, that has an interface for pushing content to the index. We have ourselves used a free open source search engine for this purpose. By sending the search index outside the SharePoint environment, the search can be integrated with other enterprise platforms, opening up possibilities for connecting different systems together by search. Findwise would assist you with choosing the right tools to get the desired search solution.

Another aspect of event driven indexing is that it limits the resources required to traverse a SharePoint instance. Instead of continuously having an ongoing process that looks for changes, those changes come automatically when they occur, limiting the work required to get that change. This is an important aspect, since the resources demand for an updated index can be at times very high in SharePoint installations.

There is also a downside to consider when working with push driven indexing. It is more difficult to keep a state of the index in case problems occur. For example, if one of the components of the connector goes down and no pushed data is received during a time interval, it becomes more difficult to follow up on what went missing. To catch the data that was added or updated during the down period, a full crawl needs to be run. Catching deletes is solved by either keeping a state of the current indexed data, or comparing it with the actual search engine index during the full crawl. Findwise has worked extensively on choosing reliable components with a high focus on robustness and stability.

The push connector was used in projects with both SharePoint 2010 and 2013 and tested with SharePoint 2007 internally. Unfortunately, SharePoint 2007 has a limited set of event receivers which limits the possibility of pure event driven indexing. Also, at the moment the connector cannot be used with SharePoint Online.

You will probably be able to add a few more examples to the use cases for event driven indexing listed in this post. Let us know what you think! And get in touch with us if you are interested in finding more about the benefits and implications of event driven indexing and learn about how to reach the next level of findability.

Continuous crawl in SharePoint 2013

Continuous crawl is one of the new features that comes with SharePoint 2013. As an alternative to incremental crawl, it promises to improve the freshness of the search results. That is, the time between when an item is updated in SharePoint by a user and when it becomes available in search.

Understanding how this new functionality works is especially important for SharePoint implementations where content changes often and/or where it’s a requirement that the content should instantly be searchable. Nonetheless, since many of the new SharePoint 2013 functionalities depend on search (see the social features, the popular items, or the content by search web parts), understanding continuous crawl and planning accordingly can help level the user expectation with the technical capabilities of the search engine.

Both the incremental crawl and the continuous crawl look for items that were added, changed or deleted since the last successful crawl, and update the index accordingly. However, the continuous crawl overcomes the limitation of the incremental crawl, since multiple continuous crawls can run at the same time. Previously, an incremental crawl would start only after the previous incremental crawl had finished.

Limitation to content sources

Content not stored in SharePoint will not benefit from this new feature. Continuous crawls apply only to SharePoint sites, which means that if you are planning to index other content sources (such as File Shares or Exchange folders) your options are restricted to incremental and full crawl only.

Example scenario

The image below shows two situations. In the image on the left (Scenario 1), we are showing a scenario where incremental crawls are scheduled to start at each 15 minutes. In the image on the right (Scenario 2), we are showing a similar scenario where continuous crawls are scheduled at each 15 minutes. After around 7 minutes from starting the crawl, a user is updating a document. Let’s also assume that in this case passing through all the items to check for updates would take 44 minutes.

Continuous crawl SharePoint 2013

Incremental vs continuous crawl in SharePoint 2013

In Scenario 1, although incremental crawls are scheduled at each 15 minutes, a new incremental crawl cannot be started while there is a running incremental crawl. The next incremental crawl will only start after the current one is finished. This means 44 minutes for the first incremental crawl to finish in this scenario, after which the next incremental crawl kicks in and finds the updated document and send it to the search index. This scenario shows that it could take around 45 minutes from the time the document was updated until it is available in search.

In Scenario 2, a new continuous crawl will start at each 15 minutes, as multiple continuous crawls can run in parallel. The second continuous crawl will see the updated document and send it to the search index. By using the continuous crawl in this case, we have reduced the time it takes for a document to be available in search from around 45 minutes to 15 minutes.

Not enabled by default

Continuous crawls are not enabled by default and enabling them is done from the same place as for the incremental crawl, from the Central Administration, from Search Service Application, per content source. The interval in minutes at which a continuous crawl will start is set to a default of 15 minutes, but it can be changed through PowerShell to a minimum of 1 minute if required. Lowering the interval will however increase the load on the server. Another number to take into consideration is the maximum number of simultaneous requests, and this is a configuration that is done again from the Central Administration.

Continuous crawl in Office 365

Unlike in SharePoint 2013 Server, continuous crawls are enabled in SharePoint Online by default but are managed by Microsoft. For those used to the Central Administration from the on-premise SharePoint server, it might sound surprising that this is not available in SharePoint Online. Instead, there is a limited set of administrative features. Most of the search features can be managed from this administrative interface, though the ability to manage the crawling on content sources is missing.

The continuous crawl for Office 365 is limited in the lack of control and configuration. The crawl frequency cannot be modified, but Microsoft targets between 15 minutes and one hour between a change and its availability in the search results, though in some cases it can take hours.

Closer to real-time indexing

The continuous crawl in SharePoint 2013 overcomes previous limitations of the incremental crawl by closing the gap between the time when a document is updated and when this is visible in the search index.

A different concept in this area is the event driven indexing, which we will explain in our next blog post. Stay tuned!

Microsoft is betting on cloud, mobile and social for SharePoint 2013 – Impressions from the SharePoint Conference 2012

Over 10,000 attendees from 85 countries, more than 200 sponsors and exhibitors, and over 250 sessions. Besides these impressive numbers, the 2012 SharePoint conference in Las Vegas has also marked the launch of the new version of SharePoint. Findwise was there to learn and is now sharing with you the news about enterprise search in SharePoint 2013.

In the keynote presentation on the first day of the conference, Jared Spataro (Senior Director, SharePoint Product Management at Microsoft) mentions the three big bets made for the SharePoint 2013 product: CLOUD, MOBILE, and SOCIAL. This post tries to provide a brief overview of what these three buzzwords mean for the enterprise search solution in SharePoint 2013. Before reading this, also check out our previous post about search in SharePoint 2013 to get a taste of what’s new in search.

Search in the cloud

While you have probably heard the saying that “the cloud has altered the economics of computing” (Jared Spataro), you might be wondering how to get there. How to go from where you are now to the so-called cloud. The answer for search is that SharePoint 2013 provides a hybrid approach that helps out in this transition. Hybrid search promises to be the bridge between on-premises and the cloud.

The search results from the cloud and those from on-premise can be shown on the same page with the use of the “result blocks”. The result block, new to SharePoint 2013, is a block of results that are individually ranked and are grouped according to a “query rule”. In short, a query rule defines a condition and an action to be fired when the condition is met. With the use of the result blocks, you can display the search results for content coming from the cloud when searching from an on-premises site and the other way around (depending whether you want the search to be one-way or bidirectional), and you can also conditionally enable these result blocks depending on the query (for example, queries matching specific words or regular expressions).

hybridsearch

Screenshot from the post Hybrid search of the Microsoft SharePoint Team Blog showing how results from the cloud are integrated in the search results page when the user searches from an on-premises SharePoint 2013 site.

Before making the decision to move to the cloud, it is wise to check the current features availability for both online and on-premise solutions on TechNet.

Mobile devices

With SharePoint 2013, Microsoft has added native mobile apps for Windows, Windows Phone, iPhone, and iPad, and support across different mobile devices (TechNet), which provides access to information and people wherever the users are searching from.

Also important to mention when talking about mobile, is that the improved REST API widens the extensibility options and allows easy development of custom user experiences across different platforms and devices. The search REST API provides access to the keyword query language parameters, and combining this with a bit of JavaScript and HTML allows developers to quickly start building Apps with custom search experiences and making all information available across devices.

Social search

In the same keynote, Jared Spataro said that Microsoft has “integrated social very deeply into the product, creating new experiences that are really designed to help people collaborate more easily and help companies become more agile.” This was also conveyed by the presence of the two founders of the enterprise social network Yammer in the keynote presentation. The new social features integration means that the information about people following content, people following other people, tags, mentions, posts, discussions, are not only searchable but can be used in improving the relevance of the search results and improving the user experience overall. Also, many of the social features are driven by search, such as the recommendations for people or documents to follow.

Whether you are trying to find an answer to a problem to which the solution has already been posted by somebody else, or whether you are trying to find a person with the right expertise through the people search, SharePoint 2013 provides a more robust and richer social search experience than its previous versions. And the possibilities to extend the out-of-the-box capabilities must be very attractive to businesses that are for example looking to combine the social interactivity inside SharePoint with people data stored in other sources (CRM solutions, file shares, time tracking applications, etc).

Stay tuned!

It was indeed an awesome conference, well organized, but most of the times it was hard to decide which presentation to choose from the many good sessions running at the same time. Luckily (or wisely), we had more than one Findwizard on location!

This post is part of our series of reports from the SharePoint 2012 Conference. Keep an eye on the Findability blog for part two of our report from the biggest SharePoint conference of 2012!

Search in SharePoint 2010

This week there has been a lot of buzz about Microsoft’s launch of SharePoint 2010 and Office 2010. Since SharePoint 2007 has been the quickest growing server product in the history of Microsoft, the expectations on SharePoint 2010 are tremendous. And also great expectations for search in Sharepoint 2010

Apart from a great deal of possibilities when it comes to content creation, collaboration and networking, easy business intelligence etc. the launch also holds another promise: that of even better capabilities for search in Sharepoint 2010 (with the integration of FAST).

Since Microsoft acquired FAST in 2008, there have been a lot of speculations about what the future SharePoint versions may include in terms of search. And since Microsoft announced that they will drop their Linux and UNIX versions in order to focus on higher innovation speed, Microsoft customer are expecting something more than the regular. In an early phase it was also clear that Microsoft is eager to take market shares from the growing market in internet business.

So, simply put, the solutions that Microsoft now provide in terms of search is solutions for Business productivity (where the truly sophisticated search capabilities are available if you have Enterprise CAL-licenses, i.e. you pay for the number of users you have) and Internet Sites (where the pricing is based on the number of servers). These can then be used in a number of scenarios, all dependent on the business and end-user needs.
Microsoft has chosen to describe it like this:

  • Foundation” is, briefly put, basic SharePoint search (Site Search).
  • Standard” adds collaboration features to the “Foundation” edition and allows it to tie into repositories outside of SharePoint.
  • Enterprise ” adds a number of capabilities, previously only available through FAST licenses, such as contextual search (recognition of departments, names, geographies etc), ability to tag meta data to unstructured content, more scalability etc.

I’m not going to go into detail, rather just conclude that the more Microsoft technology the company or organization already use, the more benefits it will gain from investing in SharePoint search capabilities.

And just to be clear:  non-SharePoint versions (stand-alone) of FAST are still available, even though they are not promoted as intense as the SharePoint ones.

Apart from Microsoft’s overview above, Microsoft Technet provides a more deepdrawing description of the features and functionality from both an end-user and administrator point of view.

We look forward describing the features and functions in more detail in our upcoming customer cases. If you have any questions to our SharePoint or FAST search specialist, don’t hesitate to post them here on the blog. We’ll make sure you get all the answers.