Semantic Annotation (how to make stuff findable, and more)

With semantic annotation, your customers and employees can get the right information in order to make better decisions

Why automatic Semantic Annotation?  

Empower customers & employees with the right information 

Moving data and services to the Cloud have many advantages, including the flexibility of work practices. COVID-19 has boosted thtrend and many organisations are benefiting from employees also being able to work from home. If employees are to become customers themselves, they should be expecting a quality Search service. Semantic Annotation can help with this.

For many employees, finding information is still a problem. Having poor Search does little to encourage users either to use it, or to improve their decision-making, knowledge sharing or curiosity & innovation. Let’s not forget, better search means less duplication too. 

Making data and content “smarter” makes it more findable. 


Data and content are rarely structured with good metadata or tagging (annotation) unless either they are being used to sell something, or they are deemed as business critical. Generally, when we create (data, content), we just save it to storage(s). 

We could tag manually, but research shows that we’re not good at this. Even if we bother to tag, we only do it from our own perspective, and even then, we do it inconsistently over time.  

Alternatively, we could let AI do the work. Give data/content structure, meaning and context (all automatically and consistently), so that it can be found. 

The main need for automatic Semantic Annotation? About 70-80% of the average organisation’s data is unstructured (/textual). Add to this: even databases have textual labels and headings. 

How to create automatic Semantic Annotation?  

Use stored knowledge (from an Enterprise Knowledge Graph) 

When thinking about the long-term data health of an organisation, the most effective and sustainable way to set up semantic annotation, is to create your own Enterprise Knowledge Graph (which can then be used for multiple usecase scenarios, not just annotation). 

In an Enterprise Knowledge Graph (EKG), an organisation can store its key knowledge (taxonomies, thesauri, ontologies, business rules). Tooling now exists so that business owners and domain experts can collaboratively add their knowledge, not having to know about the underlying semantic web-based technologies, the ones that allow your machines and applications to read this knowledge as well (before making their decisions). 

 Your EKG is best created using both human input and AI (NLP & ML = Natural Language Processing & Machine Learning). The AI part exploits your existing data plus any existing industry-standard terminologies or ontologies that fit your business needs (you may want to just be able to link to them). While the automation of EKG creation is set to improve, EKG robustness can be tested by using corpus analysis with your data to find any key business concepts that are missing.

How does automatic Semantic Annotation work?  

Smart processing 

Despite improvements in search features and functionality, Search in the digital workplace may still have that long-tail of search – where the lessfrequent queries are harder to cater for. With an EKG annotation process, the quality of search results can significantly improve. Processing takes extracted concepts (Named Entity Recognition) from the resource asset that needs to be annotated. It then finds all the relationships that link these concepts to other concepts within the graphIn doing so, the aboutness of the asset is calculated using an algorithm before appropriate annotation takes place. The annotations go to making an improved index. The process essentially makes your data assets “smarter,” and therefore, more findable.  

Processing also includes shadow concept annotations – the adding of concept tag where the concept itself does not appear within the resource asset, but which perfectly describes the resource (thanks to known concept relationships in the graph). Similarly, the quality of retrieved search results can be increased as the annotation process reduces the ambiguity about the meaning of certain concepts e.g. it differentiates between Apple (the brand) and apple (the fruit) by virtue of their connections to other concepts i.e. it can answer: are we talking tech or snacks? 

Your preferred tooling may be that which supports the parthumanexpert maintenance of key business language (taxonomies – including phrases, alternative labels, acronyms, synonyms etc). Thus, the EKG is used for differing language and culture perspectives of both customers and employees (think Diversity & Inclusion). And of course, search just gets better when linked to any user profile concepts for personalisation. 

Analysis of search queries to find “new” language, means that business language can be kept “alive,” and reflect both your data and query trends (typed and spoken). Resultant APIs can offer many different UX options e.g. for “misfired” queries: clickable, search-generating related concepts, or broader/narrower concepts for decreased/increased search granularity.

What are the alternatives? 

EKGs, AI enhancements and COTS 

There are several providers of commercial knowledge engineering and graph software in the market, many of whom Findwise partner with. As EKGs are RDF-based, once made, they are transferrable between software products, should the need arise. 

Incremental AI-based algorithmic additions can be added to improve existing search (e.g. classifiers, vector embeddings etc), having more of a single-focus, single-system perspective. Very often these same enhancement techniques can also provide input for improving and automating EKGs – just as the EKG can offer logical base and rules for a robust AI engineering strategy. 

EKGs offer a hybrid architecture with open source search engines. There are of course commercial off-the-shelf solutions (COTS) that offer improved search over data assets (often also with a graph behind them). But before you go for any vendor lock in, check what it is you need and if they cover all or any of the possible EKG-related scenarios: 

Are they inclusive of all your data? Do they help formalise data governance and accountability framework? Is the AI transparent enough to understand? Can your information and business model(s) be built in and be reflected in data structures? How easy would it be to alter your business model(s) and see such changes reflected in data structures

Does the software solution cope with other use cases? e.g. Data findability? FAIR data? Do they have multilingual functionality? Can they help make your data interoperable or connected with your ecosystem or Web data? Do they support potential data-centric solutions or just application-centric ones?


Semantic Annotation: How to make it happen? 

Your ultimate choice may be the degree to which you want or need to control your data and data assets, plus how important it is for your organisation to monitor their usage by customers and employees. 

EKGs are mostly introduced into an organisation via a singular use case rather the result of a future-looking, holistic, data-centric strategy – though this is not unheard ofThat said, introducing automatic Semantic Annotation with an EKG could prove a great follow up to your organisation’s Cloud project, as together they can dramatically increase the value of your data assets within the first processing. 

For an example of an implemented semantic annotation use case, click here: NHS Learning Hub, a collaborative Health Education England and Findwise project. 

Alternatively check out Findability by Findwise and reach out to get the very best digital transformation roadmap for your organisation.

Peter Voisey     Linkedin   Twitter

Leave a Reply

Your email address will not be published. Required fields are marked *