SharePoint optimized – part 2, Search power

Last week I wrote a post about how I fix CSOM code in order to accelerate whole query execution. Final result was not that bad though still not good enough:

  • 0.8s for fetching ~500 subsites
  • 6.5s for fetching ~900 subsites recursively for whole subsites hierarchy

My aim is to fetch whole subsites hierarchy within time that is reasonable to wait (1-2s total).

In this post I show you how to achieve it – we can fetch whole subsites hierarchy in less than 2s!

When CSOM is not enough

In part 1 of this post we saw that CSOM has its limitations that are hard to resolve – one of those is getting all subsites of a site collection.

Main reason of a problem is that we need to make separate query calls for every subsite. In other words if we need to get information of 100 subsites we need to make 100 query calls. The asynchrony of CSOM calls and loaded properties limitation (i.e. only Title and URL) are very helpful here but still leave us with total time above our requirements.

There is also a problem related to hierarchy crawling – Imagine you have following structure:

  • Root site
    • Subsite A
      • Subsite A1

If you don’t have access to subsite A you won’t have information about subsite A1 – nevertheless if you have access to A1 or not. This is because you need to have access to subsite A in order to get information about its subsites. Of course there is always a question “why would someone break permissions and give me access to A1 if I don’t have an access to A?” but that’s another story. Anyway – recursive hierarchy crawling with CSOM will not give us information about subsites of a parent that we don’t have access to even if we have access to those subsites.

There are multiple ways of solving it – let’s briefly talk what options we have.

Candidates

Local caching

We could leave current solution (CSOM that is) and try to improve it with some caching – put all subsites in user local cache available in browser. That would allow us to avoid fetching all sites every time user open home page. But the problem is from time to time, when user visit home page for the first time in session he/she will have to fill that cache…and this will still took around ~6-7 second.

Caching on list

This is very common solution that I saw in many production environments. Creating a list that will contains all subsites as rows and fetch information from there using CSOM. This may work in terms of performance because then CSOM will have to retrieve information only from one list (so only 1 query call). However there is one thing that need to be taken into consideration – is it ok to maintain such additional content source? Because originally CSOM, even if it was slow, didn’t need any additional maintenance when there will be a new customer or project. When a new subsite will be created it will be fetched by the code on-the-run. If we swap to “list caching” we will need to manually or via code keep it updated with newly created customers or projects. And what about deletion or synchronization? If we can go for some compromises than using such list may be a good idea. But if not…

External DB

This is basically the same concept as in “Caching on list” however using another architecture. We can store sites information in Azure DB, fetch it via BCS and using external content type import it as external list in SPO, and then fetch it using CSOM (if code rebuilding is still a case I would go for PnP-JS-Core). Syncing sites to DB may be driven by powershell script.

As a developer I would go for it – connecting different tech stuff and making it talk to each other is great fun.

As a consultant – it’s a bit over complicated ;).

SharePoint Search

We could use SharePoint to collect data we needed and than just use its results. From that point the only things that left are fetching results and render them. Hm, sounds good? Because it is! It’s quick, does not need any additional handling when new subsite created and there is also one extra hidden advantage that I’ll describe in a bit.

Do it with Search!

Search driven architecture has big advantage (besides those I described in this post) – using SharePoint built-in capabilities to not reinvent a wheel. In my case SharePoint Search is perfect candidate for a data source since it already have contained all the data, it’s very fast and results may be easily and precisely prepared. I used SharePoint Search Result Source for that because it allows to limit search results to proper data set and it contains easy but powerful query builder with test capability. Great stuff out-of-the-box. I used below query in result source:

{searchTerms} contentclass:STS_Web Path:{Site.URL}/customers*

Fetching data from result source may be easily done by REST API that allows to make call to search endpoint with a query and result source id. Thanks to that we can query result source for all items (query=”*”) and we can be sure that only results from result source will be returned. Like below:

/_api/search/query?querytext='*'&sourceid='<Your Result Source GUID>'

In our case that will be customers and projects sites. And now is the greatest part (that extra hidden advantage I’ve mentioned) – in order to narrow/extend/sort data set you don’t need a developer because you can easily change that in SharePoint platform by editing Result Source that you’ve created.

Retrieving more than 500 items from search

There is one thing that need to be taken into consideration when working with SP Search REST API. Maximum number of items in results is limited to 500. So in order to retrieve 1000 items we need to make 2 calls 500 items each. Even though it’s easy not a big deal to implement I’m going to put my helper class on github so anyone could save an hour or two. I’ll write post about it 🙂

Data parsing and rendering

SP Search returns results in a flat structure. So we if we want to group results or display it in tree hierarchy we need to assemble data and, depends on used tool for visualizing, prepare in specific format. I used SharePoint Framework with React and Typescript which is great framework for quick development, testing and deployment independently of used OS. Moreover the way how React manage code rendition is very efficient and elegant I would say. To display results in tree view I used React TreeBeard tool which does not support Typescript in current version however it was not very challenging to use it. Final result presents as below:

As you can see it allows for searching not only in customers but also in projects. In above example you can see that if searching phrase is found in customer then component shows all customer projects. If searching phrase is found in project then it shows only that project under customer projects. I’ve implemented also simple toggle button (using cool React Fabric UI toggle buttonSP design OOTB) that allows user to choose if he wanted to search in customers and projects or only customers.

Performance

I prepared simple statistics information as shown below:

  • Data fetching time – 1023ms (averaged – 1367.5ms)! That is making 2 search rest api calls. How awesome drop it is! From 10s to 1.4s! And remember that we fetched all items (not only customers). Moreover – according to my observations each search rest api request took the same amount of time. To small data set for conclusions but give a hope that time consume for calls towards result source is linear.
  • Extracting – 3.75ms. That is getting proper properties from search results (even though we passed selected properties in a call there are still extra properties that search results will contain). Anyway – negligible.
  • Extracting customers from dataset – 2.2ms. Negligible.
    Attaching projects to customers – 98ms. As I said – search results are in flat structure. So we need to group it based on URL depth. I.e. https:///Contoso will be a customer and https:///Contoso/ProjectA will be a project. I’ve created special array of customers where all customer projects were kept as a children array in customer.
  • Mapping to tree model – 102ms. I tried to use different tree viewers so after I group up my results in previous step I need to change some properties names (i.e. name instead of title) and make it not typescripted (I used tool that use simple object array). Definitely there is a room for improvement (phew! fun fun fun!) by merging attaching projects to customers and mapping to tree model.

And that’s it! As you can see all component operations close in almost 1.5s total!

Conclusion

As you can see Search Driven Architecture can “do performance” too. However please remember one thing – and I’ll use kitchen analogy for that, one of my favorite:

There are different types of knives: Chief’s knife, Utility knife, Paring knife, Butcher knife, Bread knife, Carving knife, Spatula, Meat cleaver etc.
And you can use any of these to cut a cheese.

There are different ways of achieving same results. Every solution has its pros and cons, may require different tools and skills. Choosing the best is always achieved in a way of compromises.

So what are the pros and cons of using search as your data source?

Search Pros:

  • it’s fast
  • it can fetch information from different subsites or sites collections
  • great for aggregation – with one query you can get different information types (i.e. sites, items, files)
  • it can fetch information about items that you have access to even if you have not access to its parent (although it can be a con in some scenario)

Search Cons:

  • results freshness depends on crawling
  • if you want to get additional information from columns (it does not apply to sites of course) you must map it to managed properties in search and make sure they are retrievable
  • search can returns you a maximum of 500 items per query
  • fetching 500+ items from 1 list may is not as efficient as using other ways

Final thought

To summarize when to use Search Driven Architecture I would say:

the more your data are spreaded out between containers and/or the more different data types you want to get in one query

the more likely you would like to use search instead of anything else.

 

I hope this post will help you in consideration  of using search as your data source.

If you’re interested into code that I used in described solution feel free to contact me. It will need a small clean up but is absolutely doable 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *