Building a chatbot – that actually works

In the era of artificial intelligence and machine learning, chatbots have gained a lot of attention. Chatbots can for example help a user to book restaurants or schedule flights. But why should organizations use chatbots instead of simple user interaction (UI) systems? Considering that chatbots are both easier and more natural to interact with -compared to that of a UI system – endorses the implementation of chatbots in certain use cases. Additionally, a chatbot can engage a user for a longer time which can result in a company increasing its business. A chatbot needs to understand the natural language as there can be multiple ways to express one’s intention with language ambiguity. Natural Language Processing (NLP) helps us to achieve this to some extent.

Natural language processing – the foundation for a chatbot

Compared to rule-based solutions, chatbots using machine learning and language understanding are much more efficient. After years and new waves of statistical models, such as deep learning RNN, LSTM, transformers etc., these algorithms have now become market standards.

NLP is a part of linguistics and artificial intelligence, where algorithms are used to understand, analyze, manipulate and potentially generate human readable text. Usually, it contains two components: Natural Language Understanding (NLU) and Natural Language Generation (NLG).

To start with, the natural language input is mapped into useful representation for machine reading comprehension. This is achieved through using basics like: tokenization, stemming / lemmatization or tagging part of speech. There are also more advanced elements such as recognizing named entities or chunking. The latter is a processing method which organizes the individual terms found previously into a more prominent structure. For example: ’South Africa’ – is more useful as a chunk than the individual words ‘South’ and ‘Africa’.

FIGURE 1: A PROCESS OF BREAKING A USER’S MESSAGE INTO TOKENS

FIGURE 1: A PROCESS OF BREAKING A USER’S MESSAGE INTO TOKENS

From the other side, NLG is the process of producing meaningful phrases and sentences in natural language from an internal structural representation using e.g. content determination, discourse planning, sentence aggregation, lexicalization, referring expression generation or linguistic realization.

Open-domain and Goal-Driven Chatbot

Chatbots can be classified into two categories: Goal-driven and Open-domain. Goal-driven chatbots are built to solve specific problems such as a flight bookings or restaurant reservations. On the other hand, the Open-domain dialogue system attempts to establish a long-term connection with the user, such as psychological support and language learning.

Goal-driven chatbots are based on slot filling and handcrafted rules, which are reliable but restrictive in conversation. A user has to go through a predefined dialogue flow to accomplish a task.

FIGURE 2: ARCHITECTURE FOR GOAL-DRIVEN CHATBOT

FIGURE 2: ARCHITECTURE FOR GOAL-DRIVEN CHATBOT

Open domain chatbots are intended to converse coherently and engagingly with humans and maintain a long dialog flow with a user. However, we need to have big amounts of data to train these chatbots.

FIGURE 3: ARCHITECTURE FOR OPEN-DOMAIN CHATBOT

FIGURE 3: ARCHITECTURE FOR OPEN-DOMAIN CHATBOT

Knowledge graphs bring connections and data structures to information

Knowledge graphs provides a semantic layer on the top of your database which provides you with all possible entities and the relationships between them. There are a number of representation and modeling instruments available for building a knowledge graph, ontologies being one of them.

Ontology comprises of classes, relationships and attributes as shown in Figure 9. This offers a robust way to store information and concepts – similar to how humans store information.

FIGURE 4: OVERVIEW OF A KNOWLEDGE GRAPH WITH AN RDF SCHEMA

FIGURE 4: OVERVIEW OF A KNOWLEDGE GRAPH WITH AN RDF SCHEMA

A chatbot based on ontology can help to clarify the user’s context and intent – and it can dynamically suggest related topics. Knowledge graphs represent the knowledge of an organization,  as depicted in the following Figure 10. Consider a knowledge graph based on an organization (as shown on the right image in Figure 10) and a chatbot (as shown on the left image in Figure 10) which is based on the ontology of this knowledge graph. In the chatbot example in Figure 10, the user asks a question about a specific employee. The NLP detects the employee as an entity and also detects the intent behind asking a question about this entity. The chatbot matches the employee entity in the ontology and navigates to the node in the graph. From that node we now know all possible relationships of that entity and the chatbot will ask back for possible options, such as co-workers and projects, to navigate further.

FIGURE 5: A SCENARIO - HOW A CHATBOT CAN INTERACT WITH A USER WITH A KNOWLEDGE GRAPH.

FIGURE 5: A SCENARIO – HOW A CHATBOT CAN INTERACT WITH A USER WITH A KNOWLEDGE GRAPH.

Moreover, the knowledge graph also improves the NLU in a chatbot. For example, if a user asks the following;

  • ‘Which assignments was employee A part of?’. To navigate further in the knowledge graph, a rank system can be created for possible connections from the employee node. This rank system might be based on word vector space and a similarity score.
  • In this scenario, ‘worked in, projects’ will have the highest rank when calculating the score with ‘part of, assignments’. So, the chatbot would know it needs to return the list of corresponding projects.

Virtual assistants with Lucidworks Fusion

Lucidworks Fusion is an example of a platform that supports building conversation interfaces. Fusion includes NLP features to understand the meaning of content and user intent. In the end, it’s all about retrieving the right answer at the right time. Virtual assistants, with a more human level of understanding, goes beyond static rules and profiles. It uses machine learning to predict user intention and provides insights. Customers and employees can locate critical insights to help them move to their next best action.

FIGURE 6: LUCIDWORKS FUSION DATA FLOW

FIGURE 6: LUCIDWORKS FUSION DATA FLOW

Lucidworks recently announced Smart Answers – new Fusion’s feature. Smart Answers enhances the intelligence of chatbots and virtual assistants by using deep learning to understand natural language questions. It uses deep learning models and mathematical logic to match the similarity of a question (which can be asked in many different ways) to the most relevant answer. As users interact with the system, Smart Answers continues to rank all answers and improve relevancy.

Fusion is focused on understanding a user’s intent. Smart Answers includes model training and serving methods for different scenarios:

  • When FAQs or question-answer pairs exist, they can be easily integrated into Smart Answers’ model training framework,
  • When there are no FAQ or question-answer pairs, knowledge base documents can be used to train deep learning models and match existing knowledge for the best answers to incoming queries. Once users click on documents returned for specific queries, they become question-answers pairs signals and can enrich the FAQ model training framework,
  • When there are no documents internally, Smart Answers uses cold-start models trained on large online sources, available in multiple languages. Once it goes live, the models begin training on actual user signals.

Smart Answers’ API enables easy integration with any platform, knowledge base, adding value to existing applications. One of the strengths of Fusion Smart Answers is integration with Rasa, an open-source conversation engine. It’s a framework that helps with understanding user intention and maintaining dialogue flow. It also has prebuilt NLP components such as word vectors, tokenizers, intent classifiers and entity extractor. Rasa allows to configure the pipeline that processes a user’s message and analyze human language. Another part of this engine enables modeling dialogues, so chatbot knows what the next action or response should be.

intent:greet 
- Hi 
- Hey 
- Hi bot 
- Hey bot 
 
## intent:request_restaurant 
- im looking for a restaurant 
- can i get [swedish](cuisine) food in any area. 
- a restaurant that serves [caribbean](cuisine) food. 
- id like a restaurant 
- im looking for a restaurant that serves [mediterranean](cuisine) food 
- can i find a restaurant that serves [chinese](cuisine)

Building chatbots requires a lot of training examples for every intent and entity to make them understand the user intention, domain knowledge and to improve NLU of the chatbot. When building a simple chatbot, using prebuilt trained models can be useful and requires less training data. For example: If we build a chatbot where we only need to detect the common location entity, few examples and spaCy models can be enough. However, there might be cases when you need to build a chatbot for an organization where you need different contextual entities – which might not be available in the pretrained models. Knowledge graphs can then be helpful to have a domain knowledge for a chatbot and can balance the amount of work related to training data.

Conclusion

Two main chatbot usages are: 1/solving employee frustration in accessing e.g. corporate information and 2/providing customers with answers to support questions. Both examples above are looking for a solution to reduce time spent on finding information. Especially for online commerce, key performance indicators are clear and can relate to e.g. decreasing call center traffic or call deflection from web and email – examples of situations where ontology based chatbots can be very helpful. From a short-term perspective creating a knowledge graph can initially require a lot of effort – but from a long-term perspective it can also create a lot of value. Companies rely on digital portals to provide information to users; employees search for HR or organization policies documents. Online retailers try to increase customers’ self-service in solving their problems or simply want to improve discovery of their products and services. With solutions like e.g. Fusion Smart Answers, we are able to cut down time-to-resolution, increase customer retention and take knowledge sharing to the next level. It helps employees and customers resolve issues more quickly and empowers users to find the right answer immediately without seeking out additional, digital channels.

Authors: Pragya Singh, Pedro Custodio, Tomasz Sobczak

To read more:

  1. Ehud Reiter and Robert Dale. 1997. Building applied natural language generation systems. Nat. Lang. Eng. 3, 1 (March 1997), 57–87. DOI:https://doi.org/10.1017/S1351324997001502.
  2. Challenges in Building Intelligent Open-domain Dialog Systems by Huang, M.; Zhu, X.; Gao, J.
  3. A Novel Approach for Ontology-Driven Information Retrieving Chatbot for Fashion Brands by Aisha Nazir, Muhammad Yaseen Khan, Tafseer Ahmed, Syed Imran Jami, Shaukat Wasi
  4. https://medium.com/@BhashkarKunal/conversational-ai-chatbot-using-rasa-nlu-rasa-core-how-dialogue-handling-with-rasa-core-can-use-331e7024f733
  5. https://lucidworks.com/products/smart-answers/

Design Elements of Search – Zero Results Page

The sixth and last part in this series, Design Elements of Search is dedicated to the zero results page. This lonely place is where your users end up when the search solution doesn’t find anything. Do your best to be friendly and helpful to your users here, will you?

A blog series – Six posts about Design Elements of Search


A word on Technology and Relevance – a disclaimer

Equally important as having a good user interface is having the right technology and the right relevance model set-up. I will not cover technology and relevance in this blog series. If you wish to read more, these topics is well covered by Findwise since before: Improve search relevancy  and Findwise.com/technology.


Designing Zero Results Page

The design, function and layout of your zero results page gossip about the quality of your search solution. This page is often forgotten and discussed last (like in this series). Whenever I review existing search solutions, this is where I start, because a lot of problems with existing search solutions show up here. You need to understand that from the user’s perspective, ending up on a zero results page can be a frustrating experience. You need to help the user recover from this state. Below is a good example from one of our clients. The intranet of the Swedish courts. The page clearly explains what has happened, No documents were found.

zero results page clearly explains what has happened

A good zero results page that clearly explains “No documents were found”.

Providing further Help

Sometimes there is nothing the system can do to deliver results. The last resort is when it’s time to ask your user to alter their query. Sometimes the query is misspelled or otherwise not optimal. You can copy and use this text on your own zero results page if you like.

  • Check that all words are spelled correctly
  • Try a different search
  • Try a more general search
  • Use fewer search terms

Avoid digging a deeper hole

Microsoft’s OneDrive provides a beautiful zero results page below, but they make a big mistake by showing filtering options in this state. This makes no sense, if there already are no results, there will definitely not be more by narrowing down the search scope further. Avoid this mistake!

avoid providing more filtering options on you zero results page

Pretty looking, but bad zero results page because of the filters on the right hand side.

That was it! The whole Design Elements of Search series is done. This is not everything however, designing a search solution is deeper than this. Me and my friends at Findwise will gladly help you realize all of your dreams. Ok maybe not all of them, but your search related dreams maybe? Ok, that was awkward.

See you in the future, best regards //Emil Mauritzson

Get in touch

Contact Findwise

Contact Emil Mauritzson

Design Elements of Search – Landing Page

We have just covered the area of results in the previous post, I hope that was fun, you are still here. That means you are ready for more, awesome! Let’s get into it. Here is the fifth part in the series Design Elements of Search, landing pages, whatever can it be?

A blog series – Six posts about Design Elements of Search


A word on Technology and Relevance – a disclaimer

Equally important as having a good user interface is having the right technology and the right relevance model set-up. I will not cover technology and relevance in this blog series. If you wish to read more, these topics is well covered by Findwise since before: Improve search relevancy  and Findwise.com/technology.


Designing Landing Pages

What normally happens when you click a search result? The answer seems obvious, you are sent to that document or that webpage or that product. Easy peasy.

diagram for how traditional search sends users to another webpage when clicking results

Traditionally you leave the search solution when clicking results.

However, during my years of consulting, I have come across multiple cases where we don’t know where to send users, because there is no obvious destination. Consider a result for an employee, a product, a process or a project. Sometimes there is no existing holistic view for these information objects. In these cases, we suggest building that holistic view in something we at Findwise call landing pages. When we use landing pages for certain results, users remain inside the search application when they click a result like this. Unlike a traditional search interfaces that sends users away to another application, or document.

design landing page ux diagram for how modern search can send users to a landing page

Get to the landing pages from the ordinary results page.

Paving the path

On landing pages, we show relationships between a variety of information objects we have in the search index. Let me describe it this way.

Sarah works as an architect. In her daily work she needs to be up to date regarding certain types of projects within her area of expertise. Therefore, Sarah is now doing research on how a certain material was used in a certain type of construction. She searches for “concrete bridges” and sees that there are 12 project results. Sarah looks over the results and clicks the third project and sees the landing page for that project. Here, she can see high level information about the project, and also see who the project members have been. Sarah sees Arianna Fowler and also more people. Sarah is curious about the person Peter Fisher because that name sounds familiar. She now sees the landing page for Peter. Here she can see all the projects Peter has been working on. She sees Peters most recent documents. She sees his close collogues. Sarah sees that Peter has been working in multiple projects that has used concrete as the main material. However, when she calls Peter, she learns he is not available right now. Therefore, Sarah decides to call Peters closest colleague. The system has identified close colleagues by knowing how many projects people have been working on together. Sarah calls Donna Spencer instead, because Donna and Peter has collaborated in 12 projects in the last five years. Sarah gets to know everything she needed and is left in a good mood.

Interesting paths

Your specific use case determines what information makes sense to show in these landing pages. Whatever you choose, you will set your users up for interesting paths of information finding and browsing, by connecting at least two information objects with landing pages. See illustration below.

diagram for how modern search can set users up for content discovery

Infinite discovery made possible by linking landing pages together.

When you look past the old way of linking users directly to documents and systems and instead making it possible to find unexpected connections between things. You have widened the definition of what enterprise search can be. This is a new way of delivering value to your organization using search.

This marks the end of the fifth part, next up you’ll read about what happens when a search yields zero results, and what you should do about that.

Get in touch

Contact Findwise

Contact Emil Mauritzson

Design Elements of Search – Results

You are currently reading the fourth part in the series Design Elements of Search. This part is about the search results. The actual results certainly is the most central part of an entire search solution, so it’s important to get this part right. Don’t worry, I’ll show you how.

A blog series – Six posts about Design Elements of Search


A word on Technology and Relevance – a disclaimer

Equally important as having a good user interface is having the right technology and the right relevance model set-up. I will not cover technology and relevance in this blog series. If you wish to read more, these topics is well covered by Findwise since before: Improve search relevancy  and Findwise.com/technology.


Designing Results

Let’s say you are satisfied with the relevance model for now, how on earth do you design good looking and good performing results? If your indexed information mostly is text documents, your results will likely have a title and a snippet, that’s good – But it’s all the other things you include in the result that make it great. For each content source you have, you’ll need to think about what your target audience want to see. You’ll want your users to be able to understand if this seem like the right result or not.

Snippet

A snippet is the chunk of text presented on search results, usually below the title. If you have a 1000 words long PDF, and the user search for a word in a document. The search engine will show some words before the search term, and some words after. These snippets usually start with three dots … to indicate that the text is cut off. Snippets helps your user understand what this document is about. If it seems interesting, the user can decide to click on the result.

A regular search result

A regular search result from www.startpage.com.

Context

If you have indexed documents from a file share, provide the folder structure as breadcrumbs. Bonus points for making the individual folders clickable. If you have indexed webpages, show the URL as breadcrumbs. Make the individual pages clickable. Not all subpages make sense to navigate to, depending on your structure. Bonus points to you if you exclude these from being links. Below you see a webpage being located in “University -> Home -> Departments -> Mathematical Sciences -> Research”. This context is valuable information that helps your user understand what to expect of this search result.

providing the url for context is good on a search result

The url is used to communicate context, answering the question “where is this page located on the site”.

What Type is this Result?

When you index data sets from different sources and make them findable in a common search interface, you need to be as clear as possible about helping your user understand – “What is this result?”. Show clearly with a label if the result is a guide, a blogpost, a steering document, a product, a person, a case study, and so on. You want to have descriptive labels, not general ones like document, webpage or file. These general labels seldom make sense to users. Again, your labels and how you enable slicing and dicing of the data is the result of the IA work done, and not directly covered in this series.

Filetype

I just said above that the label “Document” doesn’t make much sense. That’s not the same thing as showing what filetype the current document has. It is sometimes helpful to know if this File is a PDF-file or a Word-file. Like Google and other search engines, show the filetype to the left of the title, in a little box. If your company uses the Microsoft Office, you can have labels like Word, Excel, PowerPoint. If you design for a general audience it makes more sense to use labels like DOC, XLS, PPT.

This is a good place to use colors, most word processors icons are blue, like Microsoft Word and Google Docs. Excel and Google Sheets is green. Adobe Reader is red. Regarding variations of filetypes, help your users by not bothering them with the difference of XLS and XLSX, or DOC and DOCX and so on. Just call them XLS and DOC. Since filetype also often is a filter. Excluding the different variants of the same file format will reduce the number of options in the list. Below we use colors, icons and labels to communicate filetype.

Showing the file extension and icon and a color is good for filetypes of a result

The filetype is clearly visible and communicated through text, icon and color.

Highlighting

Showing your users how results are matching the query is a key component of a well-liked and well understood search solution. In practice, highlighting means that if the user search for “summer vacation”, you provide a different styling on the words “summer” and “vacation” on the result. Most of the time, snippets come standard with highlighting, either in bold or in italics. In order to provide meaningful results, show highligting everywhere on the result. This means that if the matching terms are in the title, highlight that. If it’s in the breadcrumb, highlight that. Also, you can get creative and highlight in other ways than bold or italics, just see below.

showing where the search term matched os good

Search result with “summer” highlighted.

Here we try to mimic the look and feel of an actual highlighting pen, pretty neat.

highlighting looks like an actual pen

Highlighting up-close.

Time

When you are searching a webpage, an intranet or something else for that matter. Always show date of publication, or date of revision if you have that. Otherwise how would you know if the document “Release tables March 29” is recent, or very old? Many people get this basic thing wrong, don’t be one of them!

Be bold, but be Right

In order for your users to understand what data you are showing on the result, the data need a label describing it, like “Author: Emil Mauritzson”. All good so far. The most important thing is the data (Emil Mauritzson), not the label (Author). I see many getting this wrong and highlight the label. Highlight the data instead.

Visual focus on the data not the label is a best practice for search results

Make the most important thing most visible.

So, there’s that. The part about results is complete. If you are ready for more, get on to the next part, the one about what we call landing pages, whatever that can be…Exciting!

Get in touch

Contact Findwise

Contact Emil Mauritzson

Design Elements of Search – Filters

Hey, I’m happy you have found your way here, you are currently reading the third part in the series Design Elements of Search. This part is dedicated to filters, tabs and something we like to call filter tags.

A blog series – Six posts about Design Elements of Search


A word on Technology and Relevance – a disclaimer

Equally important as having a good user interface is having the right technology and the right relevance model set-up. I will not cover technology and relevance in this blog series. If you wish to read more, these topics is well covered by Findwise since before: Improve search relevancy  and Findwise.com/technology.


Designing Filters

When setting up new search solutions, we tend to spend a lot of time with the data structure. How should our users slice and dice the search-results? What makes sense? What does not? This is the part of the job sometimes classified as Information Architecture (IA). This text focuses more on the visual elements, the results of the IA work you can say.

Don’t make it difficult

The biggest pitfall when designing search is to overwhelm the user with too many options.

You got a million hits! – There are 345566 pages – Here are some results, Do you only want to see People results? – Sort by Price, Ascending or Descending?! – Click me – Did you mean: Coffee buns? – Click me – CLICK MEEEE! Yep, try to tone this down if you can.

Below you’ll see a disastrous layout. There is so many things screaming for users’ attention. If you look really hard, you can see a search result all the way down in the bottom of the picture.

image of a busy search interface

The original interface, very little room for results.

I said above that we spend a lot of time on the structure (IA). And we generally spend a lot of time on filters as well. This time is well spent. However, we need to realize that what is most important for our users. Do they find what they are looking for, or not? The order of the search results, i.e. the relevance is most important. Therefore, the actual search results should be totally in focus, visually in your interface.

Make it Easy

Instead of giving your users too many options up-front, consider hiding filters under a button or link. The button can say “Filter search results”, or “Refine results” or “Filter and Sort”. I’ll show you what I mean below. I have removed and renamed things from the above example, creating a design mockup. It’s not a perfect redesign, but you get my point, hopefully. All of a sudden there is room for three results on screen, success!

image of a not so busy search interface

A cleaned up interface, more room for results.

The second example is a sneak peek of White Arkitekter internal search solution. Here we can follow the user searching from the start page and applying a filter. The search results are in focus, and at the same time it’s easy to apply filters when needed. A good example.

animated gif showing a search interface and filters

Showing how easy a filter is applied.

Search inside Filters

In the best case, a specific filter will contain a handful of values that are easily scanned just by looking at the list. In reality however, often these lists of filter values are long. How should you sort the list? Often, we sort them by “most first”, sometimes alphabetically. When the list is not easily scannable, provide a way to “search” inside the filter. Like this:

animated gif showing a how to search inside filters

Typing inside this filter is helping the user more quickly find “Sweden”.

Filters values with Zero Results

Hey, if a filter value will yield zero results, like Calendar, Local files and Archived files below. Show the filter value but don’t make it clickable! Why on earth would you want that? You don’t want to send your users to a dead end. Sometimes they will end up there anyway, and then you have to help. Skip ahead to the part about the Zero Results Page to learn about how to help users recover.

You should not be able to click a filter with zero results

A filter with some values returning zero results. Good to show them, but important to make them not clickable.

Filter tags

I said above that the results should be the graphical element that stands out the most. And also, that making the first refinement should be easy to make. Well, this will mean that the filters will be hidden behind something. This does not mean, by the way, that the filter selection made by the user, should be hidden. On the contrary. You definitely want to be clear about what things affect the search results. This is normally the query, the filter selections and the sorting. A filter tag is simply a graphical element that is clearly visible above the search results when activated. It is also easy to remove it, simply by clicking on it. Below, I show you an example when the user has filtered on “News”.

filter is apllied and renders a filter tag

“News” is the active filter. A green filter tag is visible and is easy to see and easy to remove.

If you are up to a third example of filters check this case study out about Personalized search results in Netflix-style user interface.

This was all I had for you regarding filters. I hope some of it made sense, if not let’s get in touch, you can ask me about more details. Or perhaps tell me something I have missed. Always be learning! Next post will discuss results, see you over there.

Further reading

Information Architecture Basics

Filters vs. Facets: Definitions

Mobile Faceted Search with a Tray: New and Improved Design Pattern

Get in touch

Contact Findwise

Contact Emil Mauritzson

Design Elements of Search – Autocomplete Suggestions

You are currently reading the second part in the series Design Elements of Search, the one about autocomplete suggestions. When you’re typing text into the search bar, something is happening just below. A list of words relevant to the text appears. You probably know this from Google and around the web. I will share my findings and some best practices for autocomplete suggestions now. Call me a search-nerd, because I really enjoy implementing awesome autocomplete features!

A blog series – Six posts about Design Elements of Search


A word on Technology and Relevance – a disclaimer

Equally important as having a good user interface is having the right technology and the right relevance model set-up. I will not cover technology and relevance in this blog series. If you wish to read more, these topics is well covered by Findwise since before: Improve search relevancy  and Findwise.com/technology.


Designing Autocomplete Suggestions

I bet you recognize this? It just works right. But how do you get here? Read on and I will tell you.

animated god showing google autocompleter

How autocomplete works at google, a solid experience.

Instant Search

Autocomplete suggestions is a nice feature to offer when you expect your users to execute the query by clicking the search-icon or pressing the enter key. However sometimes your search solution is set up in such a way that for each character the user enters, a new search is performed automatically, this is called instant search. When this is the case you do not want autocomplete suggestions. Google experimented with instant search a few years ago. Google decided to revert back due to a few reasons. However, providing instant search in your use case might still be a good idea. In my experience instant search works well for structured data sets, like a product catalogue, or similar. When your information is diversified, the results could be either documents, web pages, images, people, videos and so on, you are probably better of providing traditional search in combination with autocomplete suggestions.

Suggestions based on User Queries

In my experience, using queries as the foundation for suggestions is the way to go. You can’t just take all queries and potentially suggest it to your entire user base though. What happens if you have a bad actor who want to troll and mess up your suggestions? Let’s say a popular query among your users is “money transfer” and your bad actor searches for something as nasty as “monkeyballs” 100 times. How do you make sure to provide the right suggestion when your user types “mon” in the search bar? You definitely don’t want your search team to actively monitor your potential autocomplete suggestions and manually weed out the bad ones.

One effective method we use is to check if the query matches any document in the index. Hopefully (!?) you do not have any document containing the word “monkeyballs” in your index, and therefore these terms will not be suggested to your users in the autocomplete suggestions. Using this method will make sure your suggestions is always domain specific to your particular case.

Another safeguard to ensure high quality suggestions is to have a threshold. A threshold means a query need to be performed X amount of times before it ends up as a potential suggested term. You can experiment with this threshold in your specific case for the best effect. This threshold will weed out “strange” queries like seemingly random numbers and other queries entered by mistake, that happens to yield some results.

Here is a high-level architecture of a successfully implemented autocomplete suggester at a large client.

architectural image showing autocomplete behind the scenes

Architectural overview of a good performing autocomplete suggester implemented at a client.

Right information, in the right time

So far, I have explained how to weed out the poor and nasty terms. More importantly however, how do you suggest terms in a good order? Basically, to achieve this, we consider the more people searching for something, the higher up the term will be in the list of suggestions. How do you solve the following case? Let’s say summer is coming up, and people are interested in “Vacation planning 2020”, how do you provide this suggestion above “Vacation planning 2019” in the spring of 2020? The term “Vacation Planning 2019” have been searched for 10.000 times and “Vacation planning 2020” only have been searched for 200 times?

Basically, you need to consider when these searches have been performed, and value recency together with number of searches. I don’t have an exact formula to share, but as you can see in the high-level architecture, we divide the queries on “last year, last month, last week”. Getting a good balance here will help boost recent queries that will be of interest to your users.

Add Static lists

Sometimes, you possess high quality lists of words that you want to appear in the autocomplete suggestions without the users first searching for them. Then you can populate the suggestions manually once. You may have a list of all the conference room names in your building, you may have a list of subjects that content creators use to tag documents. Please go ahead and use lists like this in your autocomplete suggestions.

Highlight the right thing

When presenting search results on the results page, you want to highlight where the query matched the document. Read about Results in the fourth part in this series. In the autocomplete suggestions however, you want to do the opposite. In this state, users know what characters they just entered, they are looking for what you are suggesting, this is what you highlight.

example of do and dont - highlight

Highlighting what comes after, not what the user has already entered.

Here we are, right at the end of autocomplete suggestions. Coming up in the next part, I will give you details about filters. Filters is surprisingly difficult to get right. But with some effort, it’s possible to make them shine. See you on the other side.

Further reading

13 Design Patterns for Autocomplete Suggestions

Get in touch

Contact Findwise

Contact Emil Mauritzson

Design Elements of Search – The Search Bar

Time for the first part in the series Design Elements of Search. How do you design a search solution so that it provides value to your organization? How do you make sure users enjoy, use and actually find what they expect? There are already so many great implementations of successful search applications, what can we learn from them? If these questions are in your domain, then you have reached the right place. Buckle up, you are in for a ride! Let’s dive into it right away by discussing the search bar.

A blog series – Six posts about Design Elements of Search


A word on Technology and Relevance – a disclaimer

Equally important as having a good user interface is having the right technology and the right relevance model set-up. I will not cover technology and relevance in this blog series. If you wish to read more, these topics is well covered by Findwise since before: Improve search relevancy  and Findwise.com/technology.


Designing the Search Bar

To set the scene and get cozy, here are some search bars.

Animated gif showing a variety of different search bars

A selection of search bars, for your pleasure.

Placing the search bar in the “right” place

Before discussing the individual graphical elements of the search bar, let’s consider where a search bar can be placed. On the search page itself, it normally resides in the top of the page (think Google). However, consider the vast landscape of your digital workplace and you might understand where I am going. A search bar can be placed on your intranet, usually in the header. It can be placed in the taskbar of your workforces’ computers. It can be placed in multiple other business applications in your control. From our perspective this is called entry points. It is well worth following up where your users come from. This is only one data point, you definitely want to follow up more usage statistics. You want to be data informed. In our client projects we usually use Kibana for statistics, showing graphs in custom dashboards. Before redesigning something, we first analyze existing usage statistics, and then follow up with users to draw conclusions that will inform design decisions. I’ll stop talking about usage statistics now, let’s go ahead and break down the search bar.

Placeholder Text

A placeholder text invites users to the search bar. The placeholder text explains what your users can expect to find in this search solution. While respecting the tone of voice of your application, it doesn’t hurt to be friendly and helpful here. Examples of good placeholder texts is: “What are you looking for today?” “How can we help?”  “Find people, projects and more”. H&M, the clothing store have implemented a dynamic placeholder text that animates in a neat way.

Placeholder text from IKEA that animates

Animated placeholder text that sparks interest in the different kind of things you can search for at IKEA.com

Google Photos is switching it around and suggests what you can search for based on the meta data of your uploaded photos, here are a few examples.

placeholder text from google showing a variety of different texts

A variety of placeholder texts helping the user discover what can be searched for. The text is also personalized.

The placeholder text should be gray, so that the text is not mistaken to be actual data entered into the search bar. The placeholder text should immediately disappear when your user starts typing.

Contrast

Make sure the color of the search bar and the background color of the page provides enough contrast so that the search bar is clearly visible. It’s is also fine to have the same color if you provide a border around the search bar with enough contrast. Here a few good examples, and some bad.

High Contrast

screenshot of bing start page

Clearly enough contrast on Bing.com

screenshot of Dustin.com providing good contrast

Easy to find the search bar on Dustin

Low Contrast

Google actually have low contrast on the border surrounding the search bar. The search bar also has the same color as the page. Normally this is something to avoid. There is few items on the page, and users expect to search at Google.com, so they get away with low contrast I guess. Still, Bing is better in this regard.

screenshot of Google.com providing poor contrast

Too little contrast on Google.

Screenshot of search bar with too little contrast

Where is the search bar? Look hard.

If you are unsure, check if your current colors provide enough contrast using an online Contrast Checker.Chances are your contrasts are too low and need improvement.

The Search Button

This is the button that performs the search. Many people use the Enter key on their keyboard instead of clicking this button. However, you still want to keep the search button for clarity and ease of use. Generally, all icons should have labels. The search button is one of the few icons for which it´s safe to skip the label. I can argue that the search icon is generally recognized, especially in the context of search. On the other hand, if you have the room. Why not use a label? I mean it cannot be clearer than this:

Screenshot of Försäkringskassan having good labels

Clearly labeled buttons, easy to comprehend.

Clear the search bar easily with an “X”

As frequently implemented on mobile applications, you should provide an easy way of clearing the text-field on your desktop application. This is accomplished by an “X”-icon. As discussed above, not many icons are recognized by majority of users. Therefore, it is common practice to provide labels for icons. For the “X”-icon in this specific context, is also fine to skip the label.

a search bar that makes it easy to remove the typed text

Make the text easy to remove.

Number of Results

After the query has been executed and results are showing, it is helpful to communicate how many results that were returned. This provides value in itself, and in combination with filters it is even more powerful. Telling the users how many results were returned is helping them understand how your search application is working, especially in combinations with applied filters. Skip ahead to Filters and read all about it. Avoid sounding like a robot, don’t say “Showing 10 of 28482 results on Pages 1-2849. Plainly say “Showing 123 results” or “123 results found”.

example of do and dont - number of results

Make your search solution friendly and approachable, not robotic and stiff.

Did you mean

Use the power of search technologies and query analysis to give your users the option to adjust the initial query for the better. Sometimes you will suggest a correctly spelled query when your user misspelled, or you can suggest alternative phrases or other related terms.

did you mean example

The search solution can help you spell words correctly.

Here we are, right at the end of the first part. I hope it was compelling, there is more where this came from, so keep on reading. To sum up this first part, when designing the search bar, just the obvious things need to be right. In the second part, you’ll get to know something called autocomplete suggestions. This feature helps your users formulate better queries, and that really is a good start.

Further reading

How to design: accessible search bars

Design a Perfect Search Box

Get in touch

Contact Findwise

Contact Emil Mauritzson

3 challenges for the internal service desk – and how to solve them

The digital transformation and the internal service desks

Most organizations today are focusing on creating a digital service desk experience. This transformation has of course been going on for many years and different organizations have different versions of ticket systems for reporting and solving internal (mostly IT) issues. A common trend, though, is efforts on creating a digital self-service.  Gartner has targeted self-service support as one of the top priority areas for 2020:

“Improve the customer service experience by reducing live contact volume by shifting from a live to a self-service functionality”

The self-service trend is mainly focusing on answering the “simple and reoccurring” questions. These are the types of questions, asked often and by different users, that typically have simple answers. Our experience at Findwise is that surprisingly many of all the questions handled by an internal service desk can be categorized as simple and reoccurring. We have targeted 3 challenges for the internal service desk – and suggestions on how to solve them.

The challenge of self-service in an internal service desk

In almost all organizations there is a need to handle internal support questions. It might be IT-related such as “how do I install a VPN to be able to work from home”, HR-related such as “how do I order terminal glasses”, Finance related such as “where do I report the financial result for last quarter “ etc.

This is generally handled by the “internal service desk” or “internal support”. It might be handled “case-by-case” using email by the responsible person or in a more structured form in a “ticket system”. Often IT has a structured and formalized way of working but other areas (HR, Finance etc.) might not be equally structured.

The business impact on an organization when the internal service desk does not deliver fast and accurate answers might be huge! People might not get their work done and instead need to “idle” in wait for a response or answer.

3 challenges to solve

Findwise has during the years created several digital, self-service, internal service desk portals with the ability to be proactive and give the users the fast and accurate answers they are looking for.

In this work we have learned that there are 3 main challenges you need to solve:

  1. Take control of your data

If you are going to provide proactive and self-service answers to simple and recurring questions you need to know where the answers are. You need to have control of your data!

In order to do that you need to have a plan and work continuously with the data challenges viewed in the picture below:

At Findwise we have measured Search and Findability in various organisations since 2012. As clearly shown in the result of the 2019 Search & Findability Survey, finding relevant information is still a major challenge to most organizations. When it comes to internal information, as many as 55% find it difficult or very difficult to find what they are looking for. Bad information quality is one of main reasons for poor findability.

Not only does insufficient information quality lead to poor findability, it also has a negative effect on digital transformation in general. To be able to extract value from data and create, in this case, a digital self-service, the first step always needs to be to “sort out the (information) mess”. Read more about, what we at Findwise call, “The pyramid of digital transformation” and why sorting the mess is fundamental.

  1. Create the appropriate platform

Creating a ticket in a ticket system is good for complex questions that are not occurring daily. They need to be handled by a person working in the internal support organization.

Finding an answer to the simple and reoccurring questions requires another kind of system. This is more of a search-platform than a ticket system. The user wants to find the answer – not create a ticket and then wait.

At Findwise we have created various search and information platforms, with service desk applications built on top, during the years. The solution depends on the user’s specific need, type of data and optimal way of consuming information.

service desk platform

An internal support service combining the ticket system for complex questions and the self-service portal for simple and recurring questions can handle any kind of internal issue in an efficient way!

  1. Make it easy to find the correct answer

Understanding user intent is hard. Luckily, we can use AI-technology to bridge the gap  in communication between a human and a system.

Users (supported by technological development) have moved from keyword search to searching in our Natural Language. Natural Language Processing is a substantial part of AI used for understanding the human language and being able to answer in the same way. Home assistants are a great example of NLP in our everyday life.

Digging deeper in the area of NLP you’ll find Name Entity Recognition (NER). This is how we at Findwise know that “surprisingly many of all the questions handled by an internal service desk can be categorized as simple and reoccurring”. Let’s look at some examples of questions that appear unique, but can actually be clustered.

In the case of the phone numbers the queries seem to be unique but since they all refer to the same “entity” they can be clustered and handled as “simple and recurring”.

There are obviously a lot of different ways of asking the same question! Natural Language Understanding, using dense vectors or embeddings, is likely the hottest area withing the deep learning NLP community today. Google’s BERT that was released late 2018 has even been able to outperform humans in question answering tasks.

But AI doesn’t need to be the silver bullet every time. Another example of how to make it easy to find the current answer is to work proactively. During a year different thing are important for different users. Approaching summer questions about vacation rules, vacation application etc. might be very common. Coming back after the summer vacation IT-departments might be bored with the simple and recurring question of “I have forgotten my password – how do I change it?”

Using search technology boosted with AI and a lot of common sense the support organization should be able to present answers to questions that they think many users will ask – at the right time of the year.

Summary

The trend towards a digital and self-service oriented internal service desk is rapidly gaining phase, in the short perspective driven by the fact that more people than ever are working from home.

The negative business impact of a poor service desk not giving fast and accurate answers can be significant.

Findwise experience within this filed can be summarized in three challenges that need to be solved:

  • Take control of your data
  • Create the appropriate platform
  • Make it easy to find the correct answer

If you want to know more about how Findwise solves these challenges and the solutions we provide, do not hesitate to contact us.

Toward data-centric solutions with Knowledge graphs

In the last blog posts [1, 2] in this series by Fredric Landqvist and Peter Voisey we have outlined for you, at a high level, about the benefits of making data smarter and F.A.I.R., ideally made findable through a shareable, but controlled, type of Information Commons. In this post, we introduce you to Knowledge Graphs (based on Semantic Web Technologies), the source for the magic of smart and FAIR data automation. Data that is findable, accessible, interoperable and reusable. They can help tackle a range of problems, from the data tsunami to the scarcity of (quality) data for that next AI project.

What is a Knowledge Graph?

There are several different types of graph and certainly many have been many attempted definitions of a Knowledge Graph. Here’s ours:

A Knowledge Graph is the structural representation of explicit knowledge for a domain, encoded in such a way that both humans and machines can read (process) it.

Ultimately, we are wanting to exploit data and their connections or relationships within the graph format in order to surface important and relevant data and information. Without these relationships, the understandings, the stories and the searches around our data tend to dry up fairly quickly. Our world is increasingly connected. So we hope, from an organisational perspective, you are asking: Why isn’t our data connected?!

Where does the term “Knowledge Graph” come from?

The term Knowledge Graph was coined by Google on the release of its own Knowledge Graph in 2012. More recently, organisations have been cottoning on to the collective benefits of employing a Knowledge Graph, so much so, that many refer to the Enterprise Knowledge Graph today.

What are the technologies behind the Enterprise Knowledge Graph?

The Enterprise Knowledge Graph is based on a stack of W3C-ratified Semantic Web Technologies. As their name alludes to, they form the basis of the Semantic Web. Their formulation began in 2001 with Sir Tim Berners-Lee. Sir Tim, not content with giving us the World Wide Web for free, pictured a web of connected data and concepts, besides the web of linked documents, so that machines would be able to understand our requests by virtue of known connections and relationships.

Why Enterprise Knowledge Graphs now?

These technologies are complex to the layperson and some of them are nearly 20 years old. What’s changed to make Enterprises take note of them now? Well worsening internal data management problems, the need for some knowledge input for most sustainable AI projects and the fact that Knowledge Graph building tools have improved to become collaborative and more user-friendly for the knowledge engineer, domain expert and business executive. The underlying technologies in new tools are more hidden from the end user’s perspective, allowing them to concentrate on encoding their knowledge so that it can be used across enterprise systems and applications. In essence, linking enterprise data.

Thanks to Google’s success in using their Knowledge Graph with their search, Enterprise Knowledge Graphs are becoming recognised as the difference between “googling” and using the sometimes-less-than-satisfying enterprise consumer-facing or intranet search.

The key takeaway here though is that real power of any knowledge graph is in its relationships/connections between concepts. We’ll look into this in more detail next.

RDF, at the heart of the Enterprise Knowledge Graphs (EKGs)

EKGs use the simple RDF graph data model at their base. RDF stands for Resource Description Framework – a framework for the way resources or things are described so that we can recognise more easily plus understand more about them.

An aside: We’re talking RDF (namespace) Knowledge Graphs here, rather than their sister graph type, Property Graphs, which we will cover in a future post. It is important to note that there are advantages with both types of graph and indeed new technologies are being developed, so processes can straddle both types.

The RDF graph data model describes a thing or a resource in terms of “triples”: Subject – predicate – Object. The diagram below illustrates this more clearly with an example.


Figure 1. What does a Knowledge Graph look like? The RDF elements of a Knowledge Graph

The graph consists of nodes (vertices) that represent entities (a.k.a. concepts both concrete and abstract, terms, phrases, but now think things, not strings), and edges (lines or arrows) representing the relationships between nodes. Each concept and each relationship have their own URI (a kind of ID), that helps a search engine or application understand their meaning to spot differences (disambiguation) e.g. homonyms (words spelt or pronounced similarly, but that have different meaning) or similarities e.g. alternative labels, synonyms, acronyms, misspellings, foreign language term equivalents etc.

Google uses its Knowledge Graph when it crawls websites to recognise entities like: People, Places, Products, Organisations and more recently Topics, plus all their known relationships between them. There is often a dire need within most organisations for readily available knowledge about People and their related Roles, Skills/Competencies, Projects, Organisations/Departments and Locations.

There are of course many other well-known Knowledge Graphs now including IBM’s Watson,  Microsoft’s Academic Knowledge Graph, Amazon’s Cortex Knowledge Graph, the Bing Knowledge Graph etc.

One thing to note about Google is that the space devoted to their organic (non-paid for) search results has reduced dramatically over the last ten years. In place, they have used their Knowledge Graph to better understand the end user’s query and context. Information too is served automatically based on query concept relationships, either within an Information Panel or as commonly known Questions and Answers (Q&As). Your employees (as consumers) of course are at home with this intuitive, easy-click user experience. While Google’s supply of information has become sharper, so has its automatic assessment of all webpage content, relying increasingly on websites to provide it with semantic information e.g. declaring their “aboutness” by using schema.org or other microformats in their markup rather than relying on SEO keywords.

How does Knowledge Graph engineering differ from traditional KM/IM processes?

In reality, not that much. We still want the same governing principles that can give data good structure, metadata, context and meaning.

Constructing a Knowledge Graph can still be likened to the development of taxonomy or thesaurus with their concepts and an ontology (the relationships between concepts). Here the relationships include firstly: poly-hierarchical relationships (in terms of the taxonomy): a concept may have several broader concepts meaning that the concept itself (with its own URI) can appear in multiple times within a taxonomy. This polyhierarchy can be exploited later for example in both search filtering and website navigation.

Secondly, relationships can also be associative/relational with regards to meaning and context – your organisation’s own made +/or industry-adopted concepts and the key relationships that define your business, and even its goals, strategy and workflows.

A key difference though is the way in which you can think about your data and its organisation. It is no longer flat or 2-D, but rather think 3-D and 360-degree concept- or consumer-centric views to see how they connect to other concepts.

A semantic layer for Automatic Annotation, smarter data & semantic search

We will look at the many different benefits of a Knowledge Graph and further use cases in the next post, but for now, we go with the magic that an EKG can sit virtually on top of any or all your data sources (with different formats and metadata) without the need to move or copy any data. Any data source or data catalogue then consumed via a processing pipeline can be automatically and consistently be annotated (“tagged”) and classified according to declared industry or in-house standards, thus becoming more structured and its meaning more readily “understood,” ready to be found and consumed in accordance with any known or stated conditions.

The classification may also extend to including levels of data security and sensitivity, provenance or trust or location, device and time accessibility.

Figure 2 The automatic annotation & classification process for making data/content smart by using an Enterprise Knowledge Graph

It’s often assumed, incorrectly, that there is only one Enterprise Knowledge Graph. Essentially an enterprise can have one or many, perhaps overlapping graphs for different purposes, subject domains or applications. The importance is that knowledge becomes encoded and readily usable for humans and machines.

What’s wrong with Relational Databases?

There’s nothing wrong with relational databases per se and Knowledge Graphs will not necessarily replace them any time soon. It’s good to note though that data in tabular format can be converted to RDF graph data (triples/tuples) relatively easily and stored in a triple store (Graph Database) or some equivalent. 

In relational databases, references to other rows and tables are indicated by referring to primary key attributes via foreign key columns. Joins are computed at query time by matching primary and foreign keys of all rows in the connected tables. 

Understanding the connections or relations is usually very cumbersome, and those types of costly join operations are often addressed by denormalizing the data to reduce the number of joins necessary, therefore breaking the data integrity of a relational database.

The data models for relational versus graph are different. If you are used to modelling with relational databases, remember the ease and beauty of a well-designed, normalized entity-relationship diagram (i.e using UML) –  a graph is exactly that – a clear model of the domain. Each node (entity or attribute) in the graph model directly and physically contains a list of relationship records that represent the relationships to other nodes. These relationship records are organized by type and direction and may hold additional attributes.

Querying relational databases is easy with SQL. The graph has something similar by using SPARQL, a query language for RDF. If you have ever tried to write a SQL statement with a large number of joins, you know that you quickly lose sight of what the query actually does. In SPARQL, the syntax remains concise and focused on domain components and the connections among them.

Toward data-centric solutions with RDF

With enterprise-linked-data, as with knowledge graphs, one is able to connect many different schemas (data models) and formats in different relational databases and build a connected worldview, domain of discourse. Herein lays the strengths with linking-data, and liberating data from lock-in mechanisms either by schemas (data models) or vendor (software). To do queries and inferencing to find new knowledge and insights that were not possible before due to time or human computation factors. Semantics support this reasoning!

Of course, having interoperable graph data means could well mean fewer code patches on individual systems and more sustainable and agile data-centric solutions in the future.

In conclusion

The expression “in the right place, at the right time” is generally associated with luck. We’ve been talking in our enterprises about “the right information, in the right place, at the right time” for ages, unfortunately sometimes with similar fortune attached. The opportunity is here now to embark on a journey to take back control of your data if you haven’t already, and make it an asset again in achieving your enterprise aims and goals.

More reading on graphs and linked enterprise data:

Next up in the series: Knowledge Graphs: The collective Why?

View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Peter Voisey's LinkedIn profilePeter Voisey

Making your data F.A.I.R and smart

This is the second post in a new series by Fredric Landqvist & Peter Voisey, explaining how your organisation could best shape its data landscape for the future.

How to create a smart data framework for your organisation

In our last post for you, we presented the benefits of F.A.I.R data, how to make data smarter for search engines and the potentials of an Information Commons. In this post, we’re giving you the pragmatic steps to make your data FAIR by creating and applying your own smart data framework. Your data-sharing dream, internally and externally, is possible.

A smart data framework, using FAIR data principles, encompasses the tooling, models and standards that govern datasets and the different context-specific information systems (registers, catalogues). The data is then ingested and processed (enriched/refined) into smart data, datasets and data catalogues. It can then be used and reused by different applications and e-services via open APIs. In this ecosystem, all actors and information behaviours (personas) interplay: provision agents, owners, builders, enrichers, end-user searchers and referrers.

The workings of a smart data framework

A smart data & metadata catalogue   

A smart data & metadata catalogue (illustrated below), provides an organisational capability that aligns data management with the FAIR data principles. View it not so much as one system to rule them all, but rather an ecosystem that is smart and sustainable. In order to simplify your complex and heterogeneous information environment, this set-up can be  instantiated, as one overarching mechanism. Although we are describing a data and metadata catalogue here, the exact same framework and set up would of course apply also to your organisation’s content, making it smarter and more findable (i.e. it gets the sustainable stamp).

Smart Data Catalogue
The necessary services and component of a smart data catalogue

The above picture illustrates the services and components that, together, build smart data and metadata catalogue capabilities. We now describe each one of them for you:

Processing (Ingestion & Enrichment) for great Findability & Interoperability

  • (A) Ingest, harvest and operate. Here you connect the heterogeneous data sources for ingestion.

The configured input mechanisms describe each of the data sources, with their data, datasets and metadata ready for your catalogue search. Hopefully, at the dataset upload stage, you have provided a good system/form that now provides your search engine with great metadata (i.e. we recommend you use the open data catalogue standard DCAT-AP). The concept upload is interchangeable with either machine-to-machine harvester mechanisms, as with open-data, traditional data integration, or manual provision by human upload effort. (D) Enterprise Metadata Repository: here is the persistent storage of data in both data catalogue, index and graph. All things get a persistent ID (how to design persistent URI) and rich metadata.

  • (B) Enrich, refine analyze, and curate. This is the AI part (NLP, Semantics, ML) that enriches the data and datasets, making them smarter. 

Concepts (read also entities, terms, phrases, synonyms, acronyms etc.) from the data sources are found using named entity extraction (NER). By referring to a Knowledge Graph in the Enricher, the appropriate resources are annotated (“tagged”) with the said concept. It does not end here, however. The concept also takes with it from the Knowledge Graph all of the known relationships it has with other concepts.

Essentially a Knowledge Graph is your encoded domain knowledge in a connected graph format. It is by reading these encoded relationships that the machine “understands” the meaning or aboutness of data.

This opens up a very nice Pandora’s box for your search (understanding query intent) and for your Graphical User Interface (GUI) as your data becomes smarter now through your ability to exploit the relationships and connections (semantics and context) between concepts.

You and AI can have a symbiotic relationship in the development of your Knowledge Graph. AI can suggest new concepts and relationships as new data is added. It is, however, you and your colleagues who determine the of concepts/relationships in the Knowledge Graph – concepts/relationships that are important to your department or business. Remember you can utilise more than one knowledge graph, or part of one, for a particular business need(s) or data source(s). The Knowledge Graph is a flexible expression of your business/information models that give structure to all your data and its access.

Extra optional step: If you can manage not only to index the dataset metadata but the datasets themselves, you can make your Pandora’s box even nicer. Those cryptic/nonsensical field names that your traditional database experts love to create can also be incorporated and mapped (one time only!) into your Knowledge Graph, thus increasing the machine “understanding” of the data. Thus, there is a better chance of the data asset being used more widely. 

The configuration of processing with your Knowledge Graph can take care of dataset versioning, lineage and add further specific classifications e.g. data sensitivity, user access and personal information.

Lastly on Processing, your cultural and system interoperability is immensely improved. We’re not talking everyone speaking the same language here, rather everyone talking their language (/culture) and still being able to find the same thing. In this open and FAIR vocabularies further, enrich the meaning to data and your metadata is linked. System interoperability is partially achieved by exploiting the graph of connections that now “sit over” your various data sources.

Controlled Access (Accessible and Reusable)

  • (C) Access, search and visualize APIs. These tools control and influence the delivery, representation, exploration and consumption/use of datasets and data catalogues via a smarter search (made so by smarter data) and a more intuitive Graphical User interface (GUI).

This means your search can now “understand” user intent from just one or two keyword queries (through known relationship connections in the Knowledge Graph). 

Your search now also caters for your searchers who are searching in an unfamiliar subject area or are just having a query off day. Besides offering the standard results page, the GUI can also present related information (again due to the Knowledge Graph), past related user queries, information and question-answer (Q&A) type material. So: search, discovery, learning, serendipity.

Your GUI can also now become more intuitive, changing its information presentation and facets/filters automatically, depending on the query itself (more sustainable front-end coding). 

An alternative to complex scenario coding also includes the possibility for you to create rules (set in your Knowledge Graph) that can control what data users can access (when, how and where) based on their profile, their role, their location, the time and on the device they are using. This same Knowledge Graph can help push and recommend data for certain users proactively. Accessibility will be possible by using standard communication protocols, open access (when possible), authentication where necessary, and always with metadata at hand.

Reusable: your new smart data framework can help increase the time your Data Managers (/Scientists, Analysts) spend using data (and not trying to find it, the 80/20 data science dilemma). It can also help reduce the risk to your AI projects (50% failure rate) by helping searchers find the right data, with its meaning and context, more easily.  Reuse will also be possible with the design that metadata multiple attributes, use licence and provenance in line with community standards

Users and information behaviour (personas)

Users and personas
User groups and services

From experience we have defined the following broad conceptual user-groups:

  • Data Managers, a.k.a. Data Op’s or Data Scientists
    Data Managers are i.e. knowledge engineers, taxonomists and analysts. 
  • Data Stewards
    Data Stewards are responsible for Data Governance, such as data lineage. 
  • Business Professionals/Business end-users
    Business Users may have a diverse background. Hence Business end-users.
  • Actor System are different information systems and applications and services that integrate information via the rich open APIs from the Smart Data Catalogue

The outlined collaborative actors (E-H user groups) and their interplay as information behaviour (personas) with the data (repository) and services (components), together, build the foundation for a more FAIR data management within your organisation, providing for you at the same time, the option to contribute to an even broader shared open FAIR information commons.

  • (E) Data Op’s workplace and dashboard is a combination of tools supporting Data Op’s data management processes in the information behaviours: data provision agents, enrichers and developers.
  • (F) Data Governance workplace is the tools to support Data Stewards collaborative data governance work with Data Managers in the information behaviours: data owner.
  • (G) Access, search, visualize APIs, is the user experience to explore, find and interact with the catalogue and data in the information behaviours: searcher and referrer.
  • (H) API, is the set of open APIs to support access to catalogue data for consuming information systems in the information behaviours: referrer (a.k.a. data exchange).

Potential tooling for this smart data framework:

We hope you enjoyed this post and understand the potential benefits such a smart data framework incorporating FAIR data principles can have on your data catalogue, or for that matter, your organisational content or even your data swamps.


In the next post, Toward data-centric solutions with Knowledge Graphs, we talk about Knowledge Graphs (KG) and its non-proprietary RDF semantic web tech, how you can create your KG(s) and the benefits they can bring to your future data landscape.

View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Peter Voisey's LinkedIn profilePeter Voisey