Quality assuring an enterprise search solution is challenging, yet important. The challenge is to be able to do continuous follow-up of the quality of the solution during implementation but also after release, when the solution is in production and operated by an operations team. Testing is important, but it is also costly – unless it can be automated.
So what kind of testing is specific for a search application? And what of that can be automated?
The whole idea of Enterprise Search is to provide the right information to the right people at the right time. The information made findable is normally stored in many different information systems and the information in these systems is constantly changing. In the end, every enterprise search solution operates in a context where the requirements of the end-users and the available content changes on a daily basis. In other words, assuring the quality of enterprise search is about assuring the quality of the information and the way that information is accessed by and delivered to the end-users.
During our engagements over the years, we have set routines and developed tools for automated testing of enterprise search. What we specifically want to track in an automated fashion is:
- Completeness
- Freshness
- Access restrictions
- Metadata quality
- Performance
- Relevance
Allow me to take a few moments and describe what this means.
Completeness testing
Completeness tests aim to make sure that the search index is complete – that all information objects (such as web pages and documents) that are supposed to be searchable are really searchable. In addition, completeness testing provides proof that the correct parts of the information objects are indexed for retrieval, e.g. all pages in a multi-page document, as well as titles and other searchable metadata. It is also important to monitor that information that should not be searchable is indeed not indexed, e.g. headers and footers of web pages.
Freshness testing
Freshness tests aim to make sure that the search index is up to date, i.e. new content that has been added to a source (such as a document management system) becomes searchable, deleted content is removed automatically from the search index and updated content is updated in the search index – all in due time.
Testing access restrictions
If an enterprise search solution provides access to access-controlled information, it is of uttermost importance to be able to prove that security is never compromised. Testing access restrictions aim to do precisely that. What one needs to monitor is that existing document-level security works, i.e. that people who should have access to an information object really has access and that people who shouldn’t have access, don’t have access. The tricky part is to monitor that a change in access privileges in for instance Active Directory or in the access restrictions (the ACL) for a particular document is handled in the search index as well in due time.
Testing metadata quality
Each information object in the search index contains a set of fields containing metadata and text, e.g. a title, the text body, an author, a timestamp containing last modification date, information on file format, a keywords field and many more.
In an enterprise search setting, many different information models implemented in the source systems need to be harmonized into one common domain model (schema/index profile/information model) in the search index. This means information regarding a creator of an information object in one system and a publisher of an information object in another system can be stored in a common author metadata field in the search index in a common, defined format such as Firstname Lastname regardless of formatting in the source system. Unless you have a common model in the index, you can’t provide features like cross-system filtering with facets.
So how do you track that the metadata in the search index stays in good shape? This is the aim of metadata testing. The test cases provided for metadata testing need to check that the metadata in the search index conforms to the defined domain model and formatting even when the underlying content changes in the source systems.
Performance testing
Performance testing is probably the easiest type of tests you can create and run. In the end you will have a threshold or pain limit in milliseconds under which a query in the enterprise search solution will be required to provide an answer even under peak times with high query loads. Normally you will also be monitoring issues like RAM and processor capacity usage of the software components of your solution to be able to generate automatic alerts to the maintenance team if the hardware is under too much pressure.
Relevance testing
Quality assuring the relevance model of an enterprise search solution is tricky. Largely because relevance in a result set is to some extent subjective. However, when implementing search, one does need to set a relevance model that presupposes a set of business rules for what type of content is to be deemed more important than other. For example, when making documents in a document management system searchable, a typical business rule would be that documents tagged with Status=Approved must always be deemed more important than documents with any other status (such as Preliminary or Deprecated). Another typical rule is that a document for which a query term can be found in the title or in the keywords metadata field is most likely more important than documents where the query term is found elsewhere in the text body.
What it all boils down to is the definition of the business rules for relevance. Once you have defined the rules that govern how the results are to be ranked, you can also create test cases, i.e. associate query terms with information objects that must be returned as top results given these terms.
Automating it all
Once you have defined you test cases for all the above mentioned types of tests in a test plan, you are ready to automate, i.e. enter the test plan into a test automation framework. The beauty of it all is that you can automate regression testing during the implementation phase of an enterprise search solution, i.e. continuously test that new development does not break such parts of the solution that worked as intended before. This is in particular important if you add new information sources to your enterprise search solution, when there is a high risk that the relevance model that worked fine yesterday all of the sudden gets out of order. In addition, after the release of the enterprise search solution, the test automation framework will assist the operations team in monitoring that the solution behaves as expected even after the implementation team has left the building. All in all this leads to continuously good quality of the solution while lowering the costs for monitoring.