Continuous crawl in SharePoint 2013

Continuous crawl is one of the new features that comes with SharePoint 2013. As an alternative to incremental crawl, it promises to improve the freshness of the search results. That is, the time between when an item is updated in SharePoint by a user and when it becomes available in search.

Understanding how this new functionality works is especially important for SharePoint implementations where content changes often and/or where it’s a requirement that the content should instantly be searchable. Nonetheless, since many of the new SharePoint 2013 functionalities depend on search (see the social features, the popular items, or the content by search web parts), understanding continuous crawl and planning accordingly can help level the user expectation with the technical capabilities of the search engine.

Both the incremental crawl and the continuous crawl look for items that were added, changed or deleted since the last successful crawl, and update the index accordingly. However, the continuous crawl overcomes the limitation of the incremental crawl, since multiple continuous crawls can run at the same time. Previously, an incremental crawl would start only after the previous incremental crawl had finished.

Limitation to content sources

Content not stored in SharePoint will not benefit from this new feature. Continuous crawls apply only to SharePoint sites, which means that if you are planning to index other content sources (such as File Shares or Exchange folders) your options are restricted to incremental and full crawl only.

Example scenario

The image below shows two situations. In the image on the left (Scenario 1), we are showing a scenario where incremental crawls are scheduled to start at each 15 minutes. In the image on the right (Scenario 2), we are showing a similar scenario where continuous crawls are scheduled at each 15 minutes. After around 7 minutes from starting the crawl, a user is updating a document. Let’s also assume that in this case passing through all the items to check for updates would take 44 minutes.

Continuous crawl SharePoint 2013

Incremental vs continuous crawl in SharePoint 2013

In Scenario 1, although incremental crawls are scheduled at each 15 minutes, a new incremental crawl cannot be started while there is a running incremental crawl. The next incremental crawl will only start after the current one is finished. This means 44 minutes for the first incremental crawl to finish in this scenario, after which the next incremental crawl kicks in and finds the updated document and send it to the search index. This scenario shows that it could take around 45 minutes from the time the document was updated until it is available in search.

In Scenario 2, a new continuous crawl will start at each 15 minutes, as multiple continuous crawls can run in parallel. The second continuous crawl will see the updated document and send it to the search index. By using the continuous crawl in this case, we have reduced the time it takes for a document to be available in search from around 45 minutes to 15 minutes.

Not enabled by default

Continuous crawls are not enabled by default and enabling them is done from the same place as for the incremental crawl, from the Central Administration, from Search Service Application, per content source. The interval in minutes at which a continuous crawl will start is set to a default of 15 minutes, but it can be changed through PowerShell to a minimum of 1 minute if required. Lowering the interval will however increase the load on the server. Another number to take into consideration is the maximum number of simultaneous requests, and this is a configuration that is done again from the Central Administration.

Continuous crawl in Office 365

Unlike in SharePoint 2013 Server, continuous crawls are enabled in SharePoint Online by default but are managed by Microsoft. For those used to the Central Administration from the on-premise SharePoint server, it might sound surprising that this is not available in SharePoint Online. Instead, there is a limited set of administrative features. Most of the search features can be managed from this administrative interface, though the ability to manage the crawling on content sources is missing.

The continuous crawl for Office 365 is limited in the lack of control and configuration. The crawl frequency cannot be modified, but Microsoft targets between 15 minutes and one hour between a change and its availability in the search results, though in some cases it can take hours.

Closer to real-time indexing

The continuous crawl in SharePoint 2013 overcomes previous limitations of the incremental crawl by closing the gap between the time when a document is updated and when this is visible in the search index.

A different concept in this area is the event driven indexing, which we will explain in our next blog post. Stay tuned!

Microsoft is betting on cloud, mobile and social for SharePoint 2013 – Impressions from the SharePoint Conference 2012

Over 10,000 attendees from 85 countries, more than 200 sponsors and exhibitors, and over 250 sessions. Besides these impressive numbers, the 2012 SharePoint conference in Las Vegas has also marked the launch of the new version of SharePoint. Findwise was there to learn and is now sharing with you the news about enterprise search in SharePoint 2013.

In the keynote presentation on the first day of the conference, Jared Spataro (Senior Director, SharePoint Product Management at Microsoft) mentions the three big bets made for the SharePoint 2013 product: CLOUD, MOBILE, and SOCIAL. This post tries to provide a brief overview of what these three buzzwords mean for the enterprise search solution in SharePoint 2013. Before reading this, also check out our previous post about search in SharePoint 2013 to get a taste of what’s new in search.

Search in the cloud

While you have probably heard the saying that “the cloud has altered the economics of computing” (Jared Spataro), you might be wondering how to get there. How to go from where you are now to the so-called cloud. The answer for search is that SharePoint 2013 provides a hybrid approach that helps out in this transition. Hybrid search promises to be the bridge between on-premises and the cloud.

The search results from the cloud and those from on-premise can be shown on the same page with the use of the “result blocks”. The result block, new to SharePoint 2013, is a block of results that are individually ranked and are grouped according to a “query rule”. In short, a query rule defines a condition and an action to be fired when the condition is met. With the use of the result blocks, you can display the search results for content coming from the cloud when searching from an on-premises site and the other way around (depending whether you want the search to be one-way or bidirectional), and you can also conditionally enable these result blocks depending on the query (for example, queries matching specific words or regular expressions).

hybridsearch

Screenshot from the post Hybrid search of the Microsoft SharePoint Team Blog showing how results from the cloud are integrated in the search results page when the user searches from an on-premises SharePoint 2013 site.

Before making the decision to move to the cloud, it is wise to check the current features availability for both online and on-premise solutions on TechNet.

Mobile devices

With SharePoint 2013, Microsoft has added native mobile apps for Windows, Windows Phone, iPhone, and iPad, and support across different mobile devices (TechNet), which provides access to information and people wherever the users are searching from.

Also important to mention when talking about mobile, is that the improved REST API widens the extensibility options and allows easy development of custom user experiences across different platforms and devices. The search REST API provides access to the keyword query language parameters, and combining this with a bit of JavaScript and HTML allows developers to quickly start building Apps with custom search experiences and making all information available across devices.

Social search

In the same keynote, Jared Spataro said that Microsoft has “integrated social very deeply into the product, creating new experiences that are really designed to help people collaborate more easily and help companies become more agile.” This was also conveyed by the presence of the two founders of the enterprise social network Yammer in the keynote presentation. The new social features integration means that the information about people following content, people following other people, tags, mentions, posts, discussions, are not only searchable but can be used in improving the relevance of the search results and improving the user experience overall. Also, many of the social features are driven by search, such as the recommendations for people or documents to follow.

Whether you are trying to find an answer to a problem to which the solution has already been posted by somebody else, or whether you are trying to find a person with the right expertise through the people search, SharePoint 2013 provides a more robust and richer social search experience than its previous versions. And the possibilities to extend the out-of-the-box capabilities must be very attractive to businesses that are for example looking to combine the social interactivity inside SharePoint with people data stored in other sources (CRM solutions, file shares, time tracking applications, etc).

Stay tuned!

It was indeed an awesome conference, well organized, but most of the times it was hard to decide which presentation to choose from the many good sessions running at the same time. Luckily (or wisely), we had more than one Findwizard on location!

This post is part of our series of reports from the SharePoint 2012 Conference. Keep an eye on the Findability blog for part two of our report from the biggest SharePoint conference of 2012!

Enterprise Search Stuffed up with GIS

When I browsed through marketing brochures of GIS (Geographic Information System) vendors I noticed that the message is quite similar to search analytics. It refers in general to integration of various separate sources into analysis based on geo-visualizations. I have recently seen quite nice and powerful combination of enterprise search and GIS technologies and so I would like to describe it a little bit. Let us start from the basic things.

Search result visualization

It is quite obvious to use a map instead of simple list of results to visualize what was returned for an entered query. This technique is frequently used for plenty of online search applications especially in directory services like yellow pages or real estate web sites. The list of things that are required to do this is pretty short:

– geoloalization of items  – it means to assign accurate geo coordinates to location names, addresses, zip codes or whatever expected to be shown in the map; geo localization services are given more less for free by Google or Bing maps.

– backgroud map – this is necessity and also given by Google or Bing; there are also plenty of vendors for more specialized mapping applications

– returned results with geo-coordinates  as metadata – to put them in the map

Normally this kind of basic GIS visualisation delivers basic map operations like zooming, panning, different views and additionally some more data like traffic, parks, shops etc. Results are usually pins [Bing] or drops [Google].

Querying / filtering with the map

The step further of integration between search and GIS would be utilizing the map as a tool for definition of search query. One way is to create area of interest that could be drawn in the map as circle, rectangle or polygon. In simple way it could be just the current window view on the map as the area of query. In such an approach full text query is refined to include only results belonging to area defined.

Apart from map all other query refinement tools should be available as well, like date-time sliders or any kind of navigation and fielded queries.

Simple geo-spatial analysis

Sometimes it is important to sort query results by distance from a reference point in order to see all the nearest Chinese restaurant in the neighborhood.  I would also categorize as simple geo-spatial analysis grouping of search result into a GIS layers like e.g. density heatmap, hot spots using geographical and other information stored in results metadata etc.

Advanced geo-spatial analysis

More advance query definition and refinement would involve geo-spatial computations. Basing on real needs it could be possible for example to refine search results by an area of sight line from a picked reference point or select filtering areas like those inside specific borders of cities, districts, countries etc.

So the idea is to use relevant output from advanced GIS analysis as an input for query refinement. In this way all the power of GIS can be used to get to the unstructured data through a search process.

What kind of applications do you think could get advantage of search stuffed with really advanced GIS? Looking forward to your comments on this post.

Microsoft SharePoint Conference 2011: Contributor vs. Consumer

A couple of weeks ago I had the opportunity to attend the Microsoft SharePoint Conference 2011, Anaheim USA. This turned out to be an intense four-day conference covering just about any SharePoint 2010 topic you can imagine – from the geekiest developer session to business tracks with lessons learned.

To me, one of the most memorable sessions where Social Search with Dan Benson and Paul Summers, in which they showed us how social behaviours can be used to influence the current rank of search. For instance, users interests entered in MySite can be used to boost (xrank) search results accordingly. This was an eye opener as it illustrated what’s possible with quite easy means. Thanks for that!

Another great session was Scott Jamison talking about Findability in SharePoint. The key ingredient in this session was to differentiate between contributor and consumer. Typically we focus on the contributor, building 100 level folder structures with names that make sense to contributor. However, we seem to forget about the consumers, who of course are the other key aspect of an intranet. It is equally important to create a good support system for contributors, as it is to focus on consumer needs. As Jamison said “why have folders for both contributors and consumers? ”. SharePoint includes endless possibilities when it comes to creating logical views built on search, tags and filtering aimed to fill the needs of the consumers.

So, keep the folders or what ever support the contributor needs, but let your imagination float free for delivering best class Findability to the consumer!

Distributed processing + search == true?

In June 2011, I attended the Berlin Buzzwords conference. The main theme of the conference was undoubtedly the current paradigm shift in distributed processing, driven by the major success of Hadoop. Doug Cutting – founder of Apache projects such as Lucene, Nutch and Hadoop – held one of the keynotes. He focused on what he recognized as the new foundations for this paradigm shift:

– Commodity hardware
– Sequential file access
– Sharding
– Automated, high level reliability
– Open source

Distributed processing is done fairly well with Hadoop. Distributed search on the other hand is more or less limited to sharding and/or replicating the index. The downside of sharding is that you perform the same search on multiple servers and then need to combine the results. Due to the nature of algorithms in search such as tf/idf, tasks like ranking results suffers. Andrzej Białecki (another frequent Lucene committer) held a presentation on this topic, and his view can be summarized as: Use local search as long as you can, distribute only when the cost of local search limitations outweighs the cost of distributed search.

The setup of automated replication and sharding, with help from Zookeeper in the Solr Cloud project, is a major step in the right direction but the question on how to properly combine search results from different nodes still remains. One thing is sure though, there is a lot of interesting work being done in this area.

Design Principles for Enterprise Search – The Philosophy of UX

In May I attended An Event Apart in Boston (AEA). AEA is a 2-day (design) conference for people who working with websites and was created by the father of web design Jeffrey Zeldman and the CSS guru Eric Meyer. The conference has a broad perspective, dealing with everything from how to write CSS3 and HTML5 to content strategy and graphic design. This post is about an AEA topic brought up by Whitney Hess: Create design principles and use them to establish a philosophy for the user experience.

Hess wants to create universal principals for user experience to communicate a shared understanding amongst team members and customers and to create a basis for an objective evaluation. The principles suggested by Hess are listed below along with examples of how these can relate to search and search user interfaces.

Stay out of people’s way

When you do know what people want stay out of their way

Google knows what to do when people visit their search at Google.com. They get out of the way and make it easy to get things done. The point is not to disturb users with information they do not need, including everything from modal popup windows or to many settings.

Create a hierarchy that matches people’s needs

Give crucial elements the greatest prominence

This means that the most used information should be easy to find and use. A classic example is that on most university webpages – it is almost impossible to find contact details to faculty members or campus address but very easy to find a statement of the school philosophy. But the former is probably what users mostly will try to find.

university website -  xkcd.com/773/

Limit distractions

This principle means that you should design for consecutive tasks and limit related information to the information you know would help the user with her current task. Don’t include related information in a search user interface just because you can if the information does not add value.

Provide strong information scent

There should be enough information in search results for users to decide if results are relevant. In an e-commerce site this would be the difference between selling and not selling. A search result will not be perceived as cluttered if the correct data is shown.

Provide signposts and cues

Always make it clear how to start a new search, how to apply filters and what kind of actions can be applied to specific search results.

Provide context

Let the user know that there are different kinds of search result. Display thumbnails for pictures and videos or show msn availability in people search.

Use constraints appropriately

Prevent errors before they happen. Query suggestion is a good way as it helps users correct spelling error before they happen. This saves time and frustration for the user.

Make actions reversible

Make it obvious how to removes filters or reset other settings.

Provide feedback

Interaction is a conversation so let the user know when something happens or when the search interface fetches new search results. Never let the user guess what happens.

Make a good first impression

You only have one time to make a first impression. It is therefore important to spend time designing the first impression of any interface. Always aim to make the experience for new users better. This could mean voluntary tutorials or fun and good-looking welcome messages.

So now what?

Are universal principles enough? Probably not. Every project and company is different and need their own principles to identify with. Hess ended her presentation with tips on how to create company principles to complement the universal principles. Maybe there will be future blog posts about creating your own design principles.

So what are your company’s principles?

Delivering Information Where It is Needed: Location Based Information

I recently started working at Findwise after having finished my thesis on location based information delivery in a mobile phone. The purpose of my thesis was to:

  • Investigate how location based information (as opposed to fixed locations) could be connected to search results
  • Improve quality of location based information by considering the course and velocity of the user

To start with, I created an iPhone application with a location-based reminder system. The reminders described location constraints and users could create reminders with single locations (at home) or groups of locations (at any pharmacy). To find these groups of locations, the system searched for locations with associated information (like nearby pharmacies) and delivered this information without users having to click Search repeatedly.

This is an unusual approach to search as the user is passive, instead the system is performing searches for the user. However, to make search results relevant one has to add contextual constraints to describe when, where and to whom a piece of information is relevant. When all constraints are met, information should be relevant. If not, the system lacks some crucial contextual constraints.

When search is automated, the importance of relevant search results increases and the more you know of the users world, the better you can adjust the results. However, traditional search can also benefit from contextual information. It can be used as a filter where search results that are irrelevant in the current context are removed. Alternatively it could be a part of the relevance model, improving search results by reordering them according to context. Hence, whereas automatic information delivery is probably undesirable for many types of information – contextual constraints can still be of good use!

The people who tested my application created 25% of their reminders as groups of locations and found it useful as it helped them find places they weren’t aware of, facilitating opportunistic behavior. The course and velocity information reduced the number of false-positive information deliveries. Overall, the system worked well as a niche product.

Why Web Search is Like a Store Clerk

When someone is using the search function on your web site, your web search, it tells you two things. First of all they have a specific need, expressed by their search query. Second, and more importantly he or she wants you to fulfill that need. If users didn’t care where the service was delivered from, they would have gone straight to Google. Hence, the use of your search function signals trust in your capabilities. This means that even if the majority of your website visitors doesn’t use the search function, you know that the ones who do have a commitment to you. Imagine you are working in a store as a clerk; the customer coming up to you and asking you something is probably more interested in doing business with you than the ones just browsing the goods.

This trust however, can easily be turned to frustration and bad will if the web search result is poor and users don’t find what they are looking for. Continuing our analogy with the store, this is much like the experience of looking for a product, wandering around for a few minutes, finally deciding to ask a clerk and getting the answer “If it’s not on the shelf we don’t have it”. I certainly would leave the store and the same applies for a web site. If users fail when browsing and searching, then they will probably leave your site. The consequence is that you might antagonize loyal customers or loose an easy sale. So how do you recognize a bad search function? A good way to start is to look at common search queries and try searching for them yourself. Then start asking a few basic questions such as:

  • Does the sorting of the search results make sense?
  • Is it possible to decide which result is interesting based on the information in the result presentation?
  • Is there any possibility to continue navigating the results if the top hits are not what you are looking for?

Answering these questions yourself will tell you a lot about how your web search is performing. The first step to a good user experience is to know where your challenges are, then you can start making changes to improve the issues you have found in order to make your customers happier. After all, who wants to be the snarky store clerk?

Google Instant – Can a Search Engine Predict What We Want?

On September 8th Google released a new feature for their search engine: Google instant.
If you haven’t seen it yet, there is an introduction on Youtube that is worth spending 1:41 minutes on.

Simply put, Google instant is a new way of displaying results and helping users find information faster. As you type, results will be presented in the background. In most cases it is enough to write two or three characters and the results you expect are already right in front of you.

Google instant

The Swedish site Prisjakt has been using this for years, helping the users to get a better precision in their searches.

At Google you have previously been guided by “query suggestion” i.e. you got suggestions of what others have searched for before – a function also used by other search engines such as Bing (called Type Ahead). Google instant is taking it one step further.

When looking at what the blog community has to say about the new feature it seems to split the users in two groups; you either hate it or love it.

So, what are the consequences? From an end-user perspective we will most likely stop typing if something interesting appears that draws our attention. The result?
The search results shown at the very top will generate more traffic , it will be more personalized over time and we will most probably be better at phrasing our queries better.

From an advertising perspective, this will most likely affect the way people work with search engine optimization. Some experts, like Steve Rubel, claims Google instant will make SEO irrelevant, wheas others, like Matt Cutts think it will change people behavior in a positive way over time  and explains why.

What Google is doing is something that they constantly do: change the way we consume information. So what is the next step?

CNN summarizes what the Eric Schmidt, the CEO of Google says:

“The next step of search is doing this automatically. When I walk down the street, I want my smartphone to be doing searches constantly: ‘Did you know … ?’ ‘Did you know … ?’ ‘Did you know … ?’ ‘Did you know … ?’ ”

Schmidt said at the IFA consumer electronics event in Berlin, Germany, this week.

“This notion of autonomous search — to tell me things I didn’t know but am probably interested in — is the next great stage, in my view, of search.”

Do you agree? Can we predict what the users want from search? Is this the sort of functionality that we want to use on the web and behind the firewall?