The sixth and last part in this series, Design Elements of Search is dedicated to the zero results page. This lonely place is where your users end up when the search solution doesn’t find anything. Do your best to be friendly and helpful to your users here, will you?
A blog series – Six posts about Design Elements of Search
Equally important as having a good user interface is having the right technology and the right relevance model set-up. I will not cover technology and relevance in this blog series. If you wish to read more, these topics is well covered by Findwise since before: Improve search relevancy and Findwise.com/technology.
Designing Zero Results Page
The design, function and layout of your zero results page gossip about the quality of your search solution. This page is often forgotten and discussed last (like in this series). Whenever I review existing search solutions, this is where I start, because a lot of problems with existing search solutions show up here. You need to understand that from the user’s perspective, ending up on a zero results page can be a frustrating experience. You need to help the user recover from this state. Below is a good example from one of our clients. The intranet of the Swedish courts. The page clearly explains what has happened, No documents were found.
A good zero results page that clearly explains “No documents were found”.
Providing further Help
Sometimes there is nothing the system can do to deliver results. The last resort is when it’s time to ask your user to alter their query. Sometimes the query is misspelled or otherwise not optimal. You can copy and use this text on your own zero results page if you like.
Check that all words are spelled correctly
Try a different search
Try a more general search
Use fewer search terms
Avoid digging a deeper hole
Microsoft’s OneDrive provides a beautiful zero results page below, but they make a big mistake by showing filtering options in this state. This makes no sense, if there already are no results, there will definitely not be more by narrowing down the search scope further. Avoid this mistake!
Pretty looking, but bad zero results page because of the filters on the right hand side.
That was it! The whole Design Elements of Search series is done. This is not everything however, designing a search solution is deeper than this. Me and my friends at Findwise will gladly help you realize all of your dreams. Ok maybe not all of them, but your search related dreams maybe? Ok, that was awkward.
See you in the future, best regards //Emil Mauritzson
We have just covered the area of results in the previous post, I hope that was fun, you are still here. That means you are ready for more, awesome! Let’s get into it. Here is the fifth part in the series Design Elements of Search, landing pages, whatever can it be?
A blog series – Six posts about Design Elements of Search
Equally important as having a good user interface is having the right technology and the right relevance model set-up. I will not cover technology and relevance in this blog series. If you wish to read more, these topics is well covered by Findwise since before: Improve search relevancy and Findwise.com/technology.
Designing Landing Pages
What normally happens when you click a search result? The answer seems obvious, you are sent to that document or that webpage or that product. Easy peasy.
Traditionally you leave the search solution when clicking results.
However, during my years of consulting, I have come across multiple cases where we don’t know where to send users, because there is no obvious destination. Consider a result for an employee, a product, a process or a project. Sometimes there is no existing holistic view for these information objects. In these cases, we suggest building that holistic view in something we at Findwise call landing pages. When we use landing pages for certain results, users remain inside the search application when they click a result like this. Unlike a traditional search interfaces that sends users away to another application, or document.
Get to the landing pages from the ordinary results page.
Paving the path
On landing pages, we show relationships between a variety of information objects we have in the search index. Let me describe it this way.
Sarah works as an architect. In her daily work she needs to be up to date regarding certain types of projects within her area of expertise. Therefore, Sarah is now doing research on how a certain material was used in a certain type of construction. She searches for “concrete bridges” and sees that there are 12 project results. Sarah looks over the results and clicks the third project and sees the landing page for that project. Here, she can see high level information about the project, and also see who the project members have been. Sarah sees Arianna Fowler and also more people. Sarah is curious about the person Peter Fisher because that name sounds familiar. She now sees the landing page for Peter. Here she can see all the projects Peter has been working on. She sees Peters most recent documents. She sees his close collogues. Sarah sees that Peter has been working in multiple projects that has used concrete as the main material. However, when she calls Peter, she learns he is not available right now. Therefore, Sarah decides to call Peters closest colleague. The system has identified close colleagues by knowing how many projects people have been working on together. Sarah calls Donna Spencer instead, because Donna and Peter has collaborated in 12 projects in the last five years. Sarah gets to know everything she needed and is left in a good mood.
Interesting paths
Your specific use case determines what information makes sense to show in these landing pages. Whatever you choose, you will set your users up for interesting paths of information finding and browsing, by connecting at least two information objects with landing pages. See illustration below.
Infinite discovery made possible by linking landing pages together.
When you look past the old way of linking users directly to documents and systems and instead making it possible to find unexpected connections between things. You have widened the definition of what enterprise search can be. This is a new way of delivering value to your organization using search.
This marks the end of the fifth part, next up you’ll read about what happens when a search yields zero results, and what you should do about that.
You are currently reading the fourth part in the series Design Elements of Search. This part is about the search results. The actual results certainly is the most central part of an entire search solution, so it’s important to get this part right. Don’t worry, I’ll show you how.
A blog series – Six posts about Design Elements of Search
Equally important as having a good user interface is having the right technology and the right relevance model set-up. I will not cover technology and relevance in this blog series. If you wish to read more, these topics is well covered by Findwise since before: Improve search relevancy and Findwise.com/technology.
Designing Results
Let’s say you are satisfied with the relevance model for now, how on earth do you design good looking and good performing results? If your indexed information mostly is text documents, your results will likely have a title and a snippet, that’s good – But it’s all the other things you include in the result that make it great. For each content source you have, you’ll need to think about what your target audience want to see. You’ll want your users to be able to understand if this seem like the right result or not.
Snippet
A snippet is the chunk of text presented on search results, usually below the title. If you have a 1000 words long PDF, and the user search for a word in a document. The search engine will show some words before the search term, and some words after. These snippets usually start with three dots … to indicate that the text is cut off. Snippets helps your user understand what this document is about. If it seems interesting, the user can decide to click on the result.
A regular search result from www.startpage.com.
Context
If you have indexed documents from a file share, provide the folder structure as breadcrumbs. Bonus points for making the individual folders clickable. If you have indexed webpages, show the URL as breadcrumbs. Make the individual pages clickable. Not all subpages make sense to navigate to, depending on your structure. Bonus points to you if you exclude these from being links. Below you see a webpage being located in “University -> Home -> Departments -> Mathematical Sciences -> Research”. This context is valuable information that helps your user understand what to expect of this search result.
The url is used to communicate context, answering the question “where is this page located on the site”.
What Type is this Result?
When you index data sets from different sources and make them findable in a common search interface, you need to be as clear as possible about helping your user understand – “What is this result?”. Show clearly with a label if the result is a guide, a blogpost, a steering document, a product, a person, a case study, and so on. You want to have descriptive labels, not general ones like document, webpage or file. These general labels seldom make sense to users. Again, your labels and how you enable slicing and dicing of the data is the result of the IA work done, and not directly covered in this series.
Filetype
I just said above that the label “Document” doesn’t make much sense. That’s not the same thing as showing what filetype the current document has. It is sometimes helpful to know if this File is a PDF-file or a Word-file. Like Google and other search engines, show the filetype to the left of the title, in a little box. If your company uses the Microsoft Office, you can have labels like Word, Excel, PowerPoint. If you design for a general audience it makes more sense to use labels like DOC, XLS, PPT.
This is a good place to use colors, most word processors icons are blue, like Microsoft Word and Google Docs. Excel and Google Sheets is green. Adobe Reader is red. Regarding variations of filetypes, help your users by not bothering them with the difference of XLS and XLSX, or DOC and DOCX and so on. Just call them XLS and DOC. Since filetype also often is a filter. Excluding the different variants of the same file format will reduce the number of options in the list. Below we use colors, icons and labels to communicate filetype.
The filetype is clearly visible and communicated through text, icon and color.
Highlighting
Showing your users how results are matching the query is a key component of a well-liked and well understood search solution. In practice, highlighting means that if the user search for “summer vacation”, you provide a different styling on the words “summer” and “vacation” on the result. Most of the time, snippets come standard with highlighting, either in bold or in italics. In order to provide meaningful results, show highligting everywhere on the result. This means that if the matching terms are in the title, highlight that. If it’s in the breadcrumb, highlight that. Also, you can get creative and highlight in other ways than bold or italics, just see below.
Search result with “summer” highlighted.
Here we try to mimic the look and feel of an actual highlighting pen, pretty neat.
Highlighting up-close.
Time
When you are searching a webpage, an intranet or something else for that matter. Always show date of publication, or date of revision if you have that. Otherwise how would you know if the document “Release tables March 29” is recent, or very old? Many people get this basic thing wrong, don’t be one of them!
Be bold, but be Right
In order for your users to understand what data you are showing on the result, the data need a label describing it, like “Author: Emil Mauritzson”. All good so far. The most important thing is the data (Emil Mauritzson), not the label (Author). I see many getting this wrong and highlight the label. Highlight the data instead.
Make the most important thing most visible.
So, there’s that. The part about results is complete. If you are ready for more, get on to the next part, the one about what we call landing pages, whatever that can be…Exciting!
Hey, I’m happy you have found your way here, you are currently reading the third part in the series Design Elements of Search. This part is dedicated to filters, tabs and something we like to call filter tags.
A blog series – Six posts about Design Elements of Search
Equally important as having a good user interface is having the right technology and the right relevance model set-up. I will not cover technology and relevance in this blog series. If you wish to read more, these topics is well covered by Findwise since before: Improve search relevancy and Findwise.com/technology.
Designing Filters
When setting up new search solutions, we tend to spend a lot of time with the data structure. How should our users slice and dice the search-results? What makes sense? What does not? This is the part of the job sometimes classified as Information Architecture (IA). This text focuses more on the visual elements, the results of the IA work you can say.
Don’t make it difficult
The biggest pitfall when designing search is to overwhelm the user with too many options.
You got a million hits! – There are 345566 pages – Here are some results, Do you only want to see People results? – Sort by Price, Ascending or Descending?! – Click me – Did you mean: Coffee buns? – Click me – CLICK MEEEE! Yep, try to tone this down if you can.
Below you’ll see a disastrous layout. There is so many things screaming for users’ attention. If you look really hard, you can see a search result all the way down in the bottom of the picture.
The original interface, very little room for results.
I said above that we spend a lot of time on the structure (IA). And we generally spend a lot of time on filters as well. This time is well spent. However, we need to realize that what is most important for our users. Do they find what they are looking for, or not? The order of the search results, i.e. the relevance is most important. Therefore, the actual search results should be totally in focus, visually in your interface.
Make it Easy
Instead of giving your users too many options up-front, consider hiding filters under a button or link. The button can say “Filter search results”, or “Refine results” or “Filter and Sort”. I’ll show you what I mean below. I have removed and renamed things from the above example, creating a design mockup. It’s not a perfect redesign, but you get my point, hopefully. All of a sudden there is room for three results on screen, success!
A cleaned up interface, more room for results.
The second example is a sneak peek of White Arkitekter internal search solution. Here we can follow the user searching from the start page and applying a filter. The search results are in focus, and at the same time it’s easy to apply filters when needed. A good example.
Showing how easy a filter is applied.
Search inside Filters
In the best case, a specific filter will contain a handful of values that are easily scanned just by looking at the list. In reality however, often these lists of filter values are long. How should you sort the list? Often, we sort them by “most first”, sometimes alphabetically. When the list is not easily scannable, provide a way to “search” inside the filter. Like this:
Typing inside this filter is helping the user more quickly find “Sweden”.
Filters values with Zero Results
Hey, if a filter value will yield zero results, like Calendar, Local files and Archived files below. Show the filter value but don’t make it clickable! Why on earth would you want that? You don’t want to send your users to a dead end. Sometimes they will end up there anyway, and then you have to help. Skip ahead to the part about the Zero Results Page to learn about how to help users recover.
A filter with some values returning zero results. Good to show them, but important to make them not clickable.
Filter tags
I said above that the results should be the graphical element that stands out the most. And also, that making the first refinement should be easy to make. Well, this will mean that the filters will be hidden behind something. This does not mean, by the way, that the filter selection made by the user, should be hidden. On the contrary. You definitely want to be clear about what things affect the search results. This is normally the query, the filter selections and the sorting. A filter tag is simply a graphical element that is clearly visible above the search results when activated. It is also easy to remove it, simply by clicking on it. Below, I show you an example when the user has filtered on “News”.
“News” is the active filter. A green filter tag is visible and is easy to see and easy to remove.
This was all I had for you regarding filters. I hope some of it made sense, if not let’s get in touch, you can ask me about more details. Or perhaps tell me something I have missed. Always be learning! Next post will discuss results, see you over there.
You are currently reading the second part in the series Design Elements of Search, the one about autocomplete suggestions. When you’re typing text into the search bar, something is happening just below. A list of words relevant to the text appears. You probably know this from Google and around the web. I will share my findings and some best practices for autocomplete suggestions now. Call me a search-nerd, because I really enjoy implementing awesome autocomplete features!
A blog series – Six posts about Design Elements of Search
Equally important as having a good user interface is having the right technology and the right relevance model set-up. I will not cover technology and relevance in this blog series. If you wish to read more, these topics is well covered by Findwise since before: Improve search relevancy and Findwise.com/technology.
Designing Autocomplete Suggestions
I bet you recognize this? It just works right. But how do you get here? Read on and I will tell you.
How autocomplete works at google, a solid experience.
Instant Search
Autocomplete suggestions is a nice feature to offer when you expect your users to execute the query by clicking the search-icon or pressing the enter key. However sometimes your search solution is set up in such a way that for each character the user enters, a new search is performed automatically, this is called instant search. When this is the case you do not want autocomplete suggestions. Google experimented with instant search a few years ago. Google decided to revert back due to a few reasons. However, providing instant search in your use case might still be a good idea. In my experience instant search works well for structured data sets, like a product catalogue, or similar. When your information is diversified, the results could be either documents, web pages, images, people, videos and so on, you are probably better of providing traditional search in combination with autocomplete suggestions.
Suggestions based on User Queries
In my experience, using queries as the foundation for suggestions is the way to go. You can’t just take all queries and potentially suggest it to your entire user base though. What happens if you have a bad actor who want to troll and mess up your suggestions? Let’s say a popular query among your users is “money transfer” and your bad actor searches for something as nasty as “monkeyballs” 100 times. How do you make sure to provide the right suggestion when your user types “mon” in the search bar? You definitely don’t want your search team to actively monitor your potential autocomplete suggestions and manually weed out the bad ones.
One effective method we use is to check if the query matches any document in the index. Hopefully (!?) you do not have any document containing the word “monkeyballs” in your index, and therefore these terms will not be suggested to your users in the autocomplete suggestions. Using this method will make sure your suggestions is always domain specific to your particular case.
Another safeguard to ensure high quality suggestions is to have a threshold. A threshold means a query need to be performed X amount of times before it ends up as a potential suggested term. You can experiment with this threshold in your specific case for the best effect. This threshold will weed out “strange” queries like seemingly random numbers and other queries entered by mistake, that happens to yield some results.
Here is a high-level architecture of a successfully implemented autocomplete suggester at a large client.
Architectural overview of a good performing autocomplete suggester implemented at a client.
Right information, in the right time
So far, I have explained how to weed out the poor and nasty terms. More importantly however, how do you suggest terms in a good order? Basically, to achieve this, we consider the more people searching for something, the higher up the term will be in the list of suggestions. How do you solve the following case? Let’s say summer is coming up, and people are interested in “Vacation planning 2020”, how do you provide this suggestion above “Vacation planning 2019” in the spring of 2020? The term “Vacation Planning 2019” have been searched for 10.000 times and “Vacation planning 2020” only have been searched for 200 times?
Basically, you need to consider when these searches have been performed, and value recency together with number of searches. I don’t have an exact formula to share, but as you can see in the high-level architecture, we divide the queries on “last year, last month, last week”. Getting a good balance here will help boost recent queries that will be of interest to your users.
Add Static lists
Sometimes, you possess high quality lists of words that you want to appear in the autocomplete suggestions without the users first searching for them. Then you can populate the suggestions manually once. You may have a list of all the conference room names in your building, you may have a list of subjects that content creators use to tag documents. Please go ahead and use lists like this in your autocomplete suggestions.
Highlight the right thing
When presenting search results on the results page, you want to highlight where the query matched the document. Read about Results in the fourth part in this series. In the autocomplete suggestions however, you want to do the opposite. In this state, users know what characters they just entered, they are looking for what you are suggesting, this is what you highlight.
Highlighting what comes after, not what the user has already entered.
Here we are, right at the end of autocomplete suggestions. Coming up in the next part, I will give you details about filters. Filters is surprisingly difficult to get right. But with some effort, it’s possible to make them shine. See you on the other side.
Time for the first part in the series Design Elements of Search. How do you design a search solution so that it provides value to your organization? How do you make sure users enjoy, use and actually find what they expect? There are already so many great implementations of successful search applications, what can we learn from them? If these questions are in your domain, then you have reached the right place. Buckle up, you are in for a ride! Let’s dive into it right away by discussing the search bar.
A blog series – Six posts about Design Elements of Search
Equally important as having a good user interface is having the right technology and the right relevance model set-up. I will not cover technology and relevance in this blog series. If you wish to read more, these topics is well covered by Findwise since before: Improve search relevancy and Findwise.com/technology.
Designing the Search Bar
To set the scene and get cozy, here are some search bars.
A selection of search bars, for your pleasure.
Placing the search bar in the “right” place
Before discussing the individual graphical elements of the search bar, let’s consider where a search bar can be placed. On the search page itself, it normally resides in the top of the page (think Google). However, consider the vast landscape of your digital workplace and you might understand where I am going. A search bar can be placed on your intranet, usually in the header. It can be placed in the taskbar of your workforces’ computers. It can be placed in multiple other business applications in your control. From our perspective this is called entry points. It is well worth following up where your users come from. This is only one data point, you definitely want to follow up more usage statistics. You want to be data informed. In our client projects we usually use Kibana for statistics, showing graphs in custom dashboards. Before redesigning something, we first analyze existing usage statistics, and then follow up with users to draw conclusions that will inform design decisions. I’ll stop talking about usage statistics now, let’s go ahead and break down the search bar.
Placeholder Text
A placeholder text invites users to the search bar. The placeholder text explains what your users can expect to find in this search solution. While respecting the tone of voice of your application, it doesn’t hurt to be friendly and helpful here. Examples of good placeholder texts is: “What are you looking for today?” “How can we help?” “Find people, projects and more”. H&M, the clothing store have implemented a dynamic placeholder text that animates in a neat way.
Animated placeholder text that sparks interest in the different kind of things you can search for at IKEA.com
Google Photos is switching it around and suggests what you can search for based on the meta data of your uploaded photos, here are a few examples.
A variety of placeholder texts helping the user discover what can be searched for. The text is also personalized.
The placeholder text should be gray, so that the text is not mistaken to be actual data entered into the search bar. The placeholder text should immediately disappear when your user starts typing.
Contrast
Make sure the color of the search bar and the background color of the page provides enough contrast so that the search bar is clearly visible. It’s is also fine to have the same color if you provide a border around the search bar with enough contrast. Here a few good examples, and some bad.
High Contrast
Clearly enough contrast on Bing.com
Easy to find the search bar on Dustin
Low Contrast
Google actually have low contrast on the border surrounding the search bar. The search bar also has the same color as the page. Normally this is something to avoid. There is few items on the page, and users expect to search at Google.com, so they get away with low contrast I guess. Still, Bing is better in this regard.
Too little contrast on Google.
Where is the search bar? Look hard.
If you are unsure, check if your current colors provide enough contrast using an online Contrast Checker.Chances are your contrasts are too low and need improvement.
The Search Button
This is the button that performs the search. Many people use the Enter key on their keyboard instead of clicking this button. However, you still want to keep the search button for clarity and ease of use. Generally, all icons should have labels. The search button is one of the few icons for which it´s safe to skip the label. I can argue that the search icon is generally recognized, especially in the context of search. On the other hand, if you have the room. Why not use a label? I mean it cannot be clearer than this:
Clearly labeled buttons, easy to comprehend.
Clear the search bar easily with an “X”
As frequently implemented on mobile applications, you should provide an easy way of clearing the text-field on your desktop application. This is accomplished by an “X”-icon. As discussed above, not many icons are recognized by majority of users. Therefore, it is common practice to provide labels for icons. For the “X”-icon in this specific context, is also fine to skip the label.
Make the text easy to remove.
Number of Results
After the query has been executed and results are showing, it is helpful to communicate how many results that were returned. This provides value in itself, and in combination with filters it is even more powerful. Telling the users how many results were returned is helping them understand how your search application is working, especially in combinations with applied filters. Skip ahead to Filters and read all about it. Avoid sounding like a robot, don’t say “Showing 10 of 28482 results on Pages 1-2849. Plainly say “Showing 123 results” or “123 results found”.
Make your search solution friendly and approachable, not robotic and stiff.
Did you mean
Use the power of search technologies and query analysis to give your users the option to adjust the initial query for the better. Sometimes you will suggest a correctly spelled query when your user misspelled, or you can suggest alternative phrases or other related terms.
The search solution can help you spell words correctly.
Here we are, right at the end of the first part. I hope it was compelling, there is more where this came from, so keep on reading. To sum up this first part, when designing the search bar, just the obvious things need to be right. In the second part, you’ll get to know something called autocomplete suggestions. This feature helps your users formulate better queries, and that really is a good start.
In the last blog posts [1, 2] in this series by Fredric Landqvist and Peter Voisey we have outlined for you, at a high level, about the benefits of making data smarter and F.A.I.R., ideally made findable through a shareable, but controlled, type of Information Commons. In this post, we introduce you to Knowledge Graphs (based on Semantic Web Technologies), the source for the magic of smart and FAIR data automation. Data that is findable, accessible, interoperable and reusable. They can help tackle a range of problems, from the data tsunami to the scarcity of (quality) data for that next AI project.
What is a Knowledge Graph?
There are several different types of graph and certainly many have been many attempted definitions of a Knowledge Graph. Here’s ours:
A Knowledge Graph is the structural representation of explicit knowledge for a domain, encoded in such a way that both humans and machines can read (process) it.
Ultimately, we are wanting to exploit data and their connections or relationships within the graph format in order to surface important and relevant data and information. Without these relationships, the understandings, the stories and the searches around our data tend to dry up fairly quickly. Our world is increasingly connected. So we hope, from an organisational perspective, you are asking: Why isn’t our data connected?!
Where does the term “Knowledge Graph” come from?
The term Knowledge Graph was coined by Google on the release of its own Knowledge Graph in 2012. More recently, organisations have been cottoning on to the collective benefits of employing a Knowledge Graph, so much so, that many refer to the Enterprise Knowledge Graph today.
What are the technologies behind the Enterprise Knowledge Graph?
The Enterprise Knowledge Graph is based on a stack of W3C-ratified Semantic Web Technologies. As their name alludes to, they form the basis of the Semantic Web. Their formulation began in 2001 with Sir Tim Berners-Lee. Sir Tim, not content with giving us the World Wide Web for free, pictured a web of connected data and concepts, besides the web of linked documents, so that machines would be able to understand our requests by virtue of known connections and relationships.
Why Enterprise Knowledge Graphs now?
These technologies are complex to the layperson and some of them are nearly 20 years old. What’s changed to make Enterprises take note of them now? Well worsening internal data management problems, the need for some knowledge input for most sustainable AI projects and the fact that Knowledge Graph building tools have improved to become collaborative and more user-friendly for the knowledge engineer, domain expert and business executive. The underlying technologies in new tools are more hidden from the end user’s perspective, allowing them to concentrate on encoding their knowledge so that it can be used across enterprise systems and applications. In essence, linking enterprise data.
Thanks to Google’s success in using their Knowledge Graph with their search, Enterprise Knowledge Graphs are becoming recognised as the difference between “googling” and using the sometimes-less-than-satisfying enterprise consumer-facing or intranet search.
The key takeaway here though is that real power of any knowledge graph is in its relationships/connections between concepts. We’ll look into this in more detail next.
RDF, at the heart of the Enterprise Knowledge Graphs (EKGs)
EKGs use the simple RDF graph data model at their base. RDF stands for Resource Description Framework – a framework for the way resources or things are described so that we can recognise more easily plus understand more about them.
An aside: We’re talking RDF (namespace) Knowledge Graphs here, rather than their sister graph type, Property Graphs, which we will cover in a future post. It is important to note that there are advantages with both types of graph and indeed new technologies are being developed, so processes can straddle both types.
The RDF graph data model describes a thing or a resource in terms of “triples”: Subject – predicate – Object. The diagram below illustrates this more clearly with an example.
The graph consists of nodes (vertices) that represent entities (a.k.a. concepts both concrete and abstract, terms, phrases, but now thinkthings, not strings), and edges (lines or arrows) representing the relationships between nodes. Each concept and each relationship have their own URI (a kind of ID), that helps a search engine or application understand their meaning to spot differences (disambiguation) e.g. homonyms (words spelt or pronounced similarly, but that have different meaning) or similarities e.g. alternative labels, synonyms, acronyms, misspellings, foreign language term equivalents etc.
Google uses its Knowledge Graph when it crawls websites to recognise entities like: People, Places, Products, Organisations and more recently Topics, plus all their known relationships between them. There is often a dire need within most organisations for readily available knowledge about People and their related Roles, Skills/Competencies, Projects, Organisations/Departments and Locations.
There are of course many other well-known Knowledge Graphs now including IBM’s Watson, Microsoft’s Academic Knowledge Graph, Amazon’s Cortex Knowledge Graph, the Bing Knowledge Graph etc.
One thing to note about Google is that the space devoted to their organic (non-paid for) search results has reduced dramatically over the last ten years. In place, they have used their Knowledge Graph to better understand the end user’s query and context. Information too is served automatically based on query concept relationships, either within an Information Panel or as commonly known Questions and Answers (Q&As). Your employees (as consumers) of course are at home with this intuitive, easy-click user experience. While Google’s supply of information has become sharper, so has its automatic assessment of all webpage content, relying increasingly on websites to provide it with semantic information e.g. declaring their “aboutness” by using schema.org or other microformats in their markup rather than relying on SEO keywords.
How does Knowledge Graph engineering differ from traditional KM/IM processes?
In reality, not that much. We still want the same governing principles that can give data good structure, metadata, context and meaning.
Constructing a Knowledge Graph can still be likened to the development of taxonomy or thesaurus with their concepts and an ontology (the relationships between concepts). Here the relationships include firstly: poly-hierarchical relationships (in terms of the taxonomy): a concept may have several broader concepts meaning that the concept itself (with its own URI) can appear in multiple times within a taxonomy. This polyhierarchy can be exploited later for example in both search filtering and website navigation.
Secondly, relationships can also be associative/relational with regards to meaning and context – your organisation’s own made +/or industry-adopted concepts and the key relationships that define your business, and even its goals, strategy and workflows.
A key difference though is the way in which you can think about your data and its organisation. It is no longer flat or 2-D, but rather think 3-D and 360-degree concept- or consumer-centric views to see how they connect to other concepts.
A semantic layer for Automatic Annotation, smarter data & semantic search
We will look at the many different benefits of a Knowledge Graph and further use cases in the next post, but for now, we go with the magic that an EKG can sit virtually on top of any or all your data sources (with different formats and metadata) without the need to move or copy any data. Any data source or data catalogue then consumed via a processing pipeline can be automatically and consistently be annotated (“tagged”) and classified according to declared industry or in-house standards, thus becoming more structured and its meaning more readily “understood,” ready to be found and consumed in accordance with any known or stated conditions.
The classification may also extend to including levels of data security and sensitivity, provenance or trust or location, device and time accessibility.
Figure 2 The automatic annotation & classification process for making data/content smart by using an Enterprise Knowledge Graph
It’s often assumed, incorrectly, that there is only one Enterprise Knowledge Graph. Essentially an enterprise can have one or many, perhaps overlapping graphs for different purposes, subject domains or applications. The importance is that knowledge becomes encoded and readily usable for humans and machines.
What’s wrong with Relational Databases?
There’s nothing wrong with relational databases per se and Knowledge Graphs will not necessarily replace them any time soon. It’s good to note though that data in tabular format can be converted to RDF graph data (triples/tuples) relatively easily and stored in a triple store (Graph Database) or some equivalent.
In relational databases, references to other rows and tables are indicated by referring to primary key attributes via foreign key columns. Joins are computed at query time by matching primary and foreign keys of all rows in the connected tables.
Understanding the connections or relations is usually very cumbersome, and those types of costly join operations are often addressed by denormalizing the data to reduce the number of joins necessary, therefore breaking the data integrity of a relational database.
The data models for relational versus graph are different. If you are used to modelling with relational databases, remember the ease and beauty of a well-designed, normalized entity-relationship diagram (i.e using UML) – a graph is exactly that – a clear model of the domain. Each node (entity or attribute) in the graph model directly and physically contains a list of relationship records that represent the relationships to other nodes. These relationship records are organized by type and direction and may hold additional attributes.
Querying relational databases is easy with SQL. The graph has something similar by using SPARQL, a query language for RDF. If you have ever tried to write a SQL statement with a large number of joins, you know that you quickly lose sight of what the query actually does. In SPARQL, the syntax remains concise and focused on domain components and the connections among them.
Toward data-centric solutions with RDF
With enterprise-linked-data, as with knowledge graphs, one is able to connect many different schemas (data models) and formats in different relational databases and build a connected worldview, domain of discourse. Herein lays the strengths with linking-data, and liberating data from lock-in mechanisms either by schemas (data models) or vendor (software). To do queries and inferencing to find new knowledge and insights that were not possible before due to time or human computation factors. Semantics support this reasoning!
Of course, having interoperable graph data means could well mean fewer code patches on individual systems and more sustainable and agile data-centric solutions in the future.
In conclusion
The expression “in the right place, at the right time” is generally associated with luck. We’ve been talking in our enterprises about “the right information, in the right place, at the right time” for ages, unfortunately sometimes with similar fortune attached. The opportunity is here now to embark on a journey to take back control of your data if you haven’t already, and make it an asset again in achieving your enterprise aims and goals.
More reading on graphs and linked enterprise data:
This is the second post in a new series by Fredric Landqvist & Peter Voisey, explaining how your organisation could best shape its data landscape for the future.
How to create a smart data framework for your organisation
In our last post for you, we presented the benefits of F.A.I.R data, how to make data smarter for search engines and the potentials of an Information Commons. In this post, we’re giving you the pragmatic steps to make your data FAIR by creating and applying your own smart data framework. Your data-sharing dream, internally and externally, is possible.
A smart data framework, using FAIR data principles, encompasses the tooling, models and standards that govern datasets and the different context-specific information systems (registers, catalogues). The data is then ingested and processed (enriched/refined) into smart data, datasets and data catalogues. It can then be used and reused by different applications and e-services via open APIs. In this ecosystem, all actors and information behaviours (personas) interplay: provision agents, owners, builders, enrichers, end-user searchers and referrers.
A smart data & metadata catalogue
A smart data & metadata catalogue (illustrated below), provides an organisational capability that aligns data management with the FAIR data principles. View it not so much as one system to rule them all, but rather an ecosystem that is smart and sustainable. In order to simplify your complex and heterogeneous information environment, this set-up can be instantiated, as one overarching mechanism. Although we are describing a data and metadata catalogue here, the exact same framework and set up would of course apply also to your organisation’s content, making it smarter and more findable (i.e. it gets the sustainable stamp).
The above picture illustrates the services and components that, together, build smart data and metadata catalogue capabilities. We now describe each one of them for you:
Processing (Ingestion & Enrichment) for great Findability & Interoperability
(A) Ingest, harvest and operate. Here you connect the heterogeneous data sources for ingestion.
The configured input mechanisms describe each of the data sources, with their data, datasets and metadata ready for your catalogue search. Hopefully, at the dataset upload stage, you have provided a good system/form that now provides your search engine with great metadata (i.e. we recommend you use the open data catalogue standard DCAT-AP). The concept upload is interchangeable with either machine-to-machine harvester mechanisms, as with open-data, traditional data integration, or manual provision by human upload effort. (D) Enterprise Metadata Repository: here is the persistent storage of data in both data catalogue, index and graph. All things get a persistent ID (how to design persistent URI) and rich metadata.
(B) Enrich, refine analyze, and curate. This is the AI part (NLP, Semantics, ML) that enriches the data and datasets, making them smarter.
Concepts (read also entities, terms, phrases, synonyms, acronyms etc.) from the data sources are found using named entity extraction (NER). By referring to a Knowledge Graph in the Enricher, the appropriate resources are annotated (“tagged”) with the said concept. It does not end here, however. The concept also takes with it from the Knowledge Graph all of the known relationships it has with other concepts.
Essentially a Knowledge Graph is your encoded domain knowledge in a connected graph format. It is by reading these encoded relationships that the machine “understands” the meaning or aboutness of data.
This opens up a very nice Pandora’s box for your search (understanding query intent) and for your Graphical User Interface (GUI) as your data becomes smarter now through your ability to exploit the relationships and connections (semantics and context) between concepts.
You and AI can have a symbiotic relationship in the development of your Knowledge Graph. AI can suggest new concepts and relationships as new data is added. It is, however, you and your colleagues who determine the of concepts/relationships in the Knowledge Graph – concepts/relationships that are important to your department or business. Remember you can utilise more than one knowledge graph, or part of one, for a particular business need(s) or data source(s). The Knowledge Graph is a flexible expression of your business/information models that give structure to all your data and its access.
Extra optional step: If you can manage not only to index the dataset metadata but the datasets themselves, you can make your Pandora’s box even nicer. Those cryptic/nonsensical field names that your traditional database experts love to create can also be incorporated and mapped (one time only!) into your Knowledge Graph, thus increasing the machine “understanding” of the data. Thus, there is a better chance of the data asset being used more widely.
The configuration of processing with your Knowledge Graph can take care of dataset versioning, lineage and add further specific classifications e.g. data sensitivity, user access and personal information.
Lastly on Processing, your cultural and system interoperability is immensely improved. We’re not talking everyone speaking the same language here, rather everyone talking their language (/culture) and still being able to find the same thing. In this open and FAIR vocabularies further, enrich the meaning to data and your metadata is linked. System interoperability is partially achieved by exploiting the graph of connections that now “sit over” your various data sources.
Controlled Access (Accessible and Reusable)
(C) Access, search and visualize APIs. These tools control and influence the delivery, representation, exploration and consumption/use of datasets and data catalogues via a smarter search (made so by smarter data) and a more intuitive Graphical User interface (GUI).
This means your search can now “understand” user intent from just one or two keyword queries (through known relationship connections in the Knowledge Graph).
Your search now also caters for your searchers who are searching in an unfamiliar subject area or are just having a query off day. Besides offering the standard results page, the GUI can also present related information (again due to the Knowledge Graph), past related user queries, information and question-answer (Q&A) type material. So: search, discovery, learning, serendipity.
Your GUI can also now become more intuitive, changing its information presentation and facets/filters automatically, depending on the query itself (more sustainable front-end coding).
An alternative to complex scenario coding also includes the possibility for you to create rules (set in your Knowledge Graph) that can control what data users can access (when, how and where) based on their profile, their role, their location, the time and on the device they are using. This same Knowledge Graph can help push and recommend data for certain users proactively. Accessibility will be possible by using standard communication protocols, open access (when possible), authentication where necessary, and always with metadata at hand.
Reusable: your new smart data framework can help increase the time your Data Managers (/Scientists, Analysts) spend using data (and not trying to find it, the 80/20 data science dilemma). It can also help reduce the risk to your AI projects (50% failure rate) by helping searchers find the right data, with its meaning and context, more easily. Reuse will also be possible with the design that metadata multiple attributes, use licence and provenance in line with community standards
Users and information behaviour (personas)
From experience we have defined the following broad conceptual user-groups:
Data Managers, a.k.a. Data Op’s or Data Scientists Data Managers are i.e. knowledge engineers, taxonomists and analysts.
Data Stewards Data Stewards are responsible for Data Governance, such as data lineage.
Business Professionals/Business end-users Business Users may have a diverse background. Hence Business end-users.
Actor System are different information systems and applications and services that integrate information via the rich open APIs from the Smart Data Catalogue
The outlined collaborative actors (E-H user groups) and their interplay as information behaviour (personas) with the data (repository) and services (components), together, build the foundation for a more FAIR data management within your organisation, providing for you at the same time, the option to contribute to an even broader shared open FAIR information commons.
(E) Data Op’s workplace and dashboard is a combination of tools supporting Data Op’s data management processes in the information behaviours: data provision agents, enrichers and developers.
(F) Data Governance workplace is the tools to support Data Stewards collaborative data governance work with Data Managers in the information behaviours: data owner.
(G) Access, search, visualize APIs, is the user experience to explore, find and interact with the catalogue and data in the information behaviours: searcher and referrer.
(H) API, is the set of open APIs to support access to catalogue data for consuming information systems in the information behaviours: referrer (a.k.a. data exchange).
We hope you enjoyed this post and understand the potential benefits such a smart data framework incorporating FAIR data principles can have on your data catalogue, or for that matter, your organisational content or even your data swamps.
In the next post, Toward data-centric solutions with Knowledge Graphs, we talk about Knowledge Graphs (KG) and its non-proprietary RDF semantic web tech, how you can create your KG(s) and the benefits they can bring to your future data landscape.
This is the first post in a new series by Fredric Landqvist and Peter Voisey, explaining how your organisation could best shape its data landscape for the future.
A Quest for a FAIR Information Commons
You might have heard recently of the phrase, “data that saves lives”. It certainly can, but just as you need to be in shape to do your work, so does data, to work its magic. Data too needs to be shaped by governing principles that we can apply along their life journey, in order that we can reap the consequential rewards and benefits that are there to be had. Data in shape, saves lives.
We all need to fix problems, usually quickly hence the presence of the closed model, data silos and data interoperability. It has had to happen this way, will continue to do so and there’s no shame in that. But if we can be part of a reliable data sharing community, whose data can help us to collaborate and solve better, well, we’d be foolish to turn it down.
So imagine a type of information commons. This isn’t so far-fetched, we just need to widen our horizons and collaborative ecosystem for it to happen, and perhaps take the same model advice internally for our own organisations.
The challenge of really saving lives with data requires new collaborators. As collaborators we require trust. In essence then to be part of this challenge we need to be willing to share data (we use the term data, content, information interchangeably here). Proof of that trust, is to sign an agreement to be part of an information commons, where data has certain principles (a.k.a. terms & conditions, T&Cs). In essence rules of engagement!
Declare interest
Sign a future rules of engagement to share and access data
Get ready to adhere to them
The T&Cs largely apply to the condition of the data being shared and the information about them. They match precisely how you would hope to find data in this new treasure trove. They may also be known as F.A.I.R. – data that is Findable, Accessible, Interoperable and Reusable. FAIR obviously alludes also to the fairness in collaboration and the F.A.I.R data principles originate from a good sharing place. Here’s a great summary in image form, from Australian National Data Service [ANDS]:
How FAIR is your data today? Simply answer by following a brief checklist or later go for a more comprehensive description at Go FAIR.
Still here? Great! Let’s get started then with Findable!
Findable
We can only truly make data findable when we really think about the range of people who might want to find it and how they might want to use or reuse it (their need determines how they will ask for it). The reality of different data sources, formats, protocols and their possible attributes or descriptors, makes describing data for others problematic, plus, do you really have the time for tagging? Regardless of time, we’re not very good at putting ourselves in somebody else’s shoes (unless of course we’re selling something) and certainly not able to cover the variation in how people (with differing perspectives) search for data.
The best answer we have at the moment is to describe data or datasets by using agreed standards, perhaps that may vary a bit from domain to domain. Sharing or uploading data to the “ether” gives a different feeling to uploading data that matters to a known shared source and accessed by users who understand its value. In doing so it may inspire us to describe data according to a collective standard, with that feeling of having done something good for a bigger cause.
But hang on. Why is the onus on the end-user? We have the tech here now to automate much of this process. We just need a good sharing and upload design that can recognise the (hopefully changeable) standards of description (metadata). By processing data on upload, we can get a better understanding of data with reference to our standards and rules. Thus, according to what the machine recognises (pattern matching) or “understands” (by way of concept relationships in a knowledge graph) it can annotate the data, ready to serve the requests of data searchers and data applications, or at least be able to offer a related alternative.
Such processing is done using AI (NLP, ML etc.), but it’s not magic. We still have to teach our machines the agreed standards and rules in the first place. While that may sound cumbersome to some, it’s not like you keep having to teach them repeatedly. Conversely the student (AI) can also suggest new rules and annotations, keeping them current according to the data being processed. The beauty for most, is though, that we can employ more than one descriptive rule set for different data or datasets. Depending on data source, format and context, the machine can activate different metadata rule sets. The smart part for the uploader is the presentation of a semi-automated metadata form for their data, leaving them to confirm or alter it before hitting send. The “uploader” in this context, is a broad concept to address any agent that contributes with data to the shared information space, be they programmatic or human.
Let’s not forget we’re at the stage where we can use “search” not only for indexing and this automatic annotation, but for calculations on parsing to potentially annotate with even higher understanding. Such a solution fits well with the increasing demand for real-time data too.
So Findable, is really both about making data smarter, and findable.
Accessible
There’s nothing worse than finding something you want, only to be told you can’t use it.
While the premise of an Information Commons is sharing, it doesn’t necessarily have to mean that everything is accessible by everyone – the reason that why some readers left this page at the third paragraph.
Let’s be clever about this. There are lots of ways to automatically control accessibility and automatically police it. This could be technical, IP address, sign on, authorisation (classification of user) etc. But it could also be done by processing data on upload in determining the sensitivity level of data and/or the indicators of GDPR data.
Back to the end-user: they don’t want to see stuff they can’t use, but they also want to see from the go, if they need to get any new software to be able to access data that they are interested in.
Interoperable
Now for the hard part. The reality is that variety in data sources, protocols and formats ain’t going to go away any time soon. We have to accept that. We’ve just mentioned about the technical interoperability in Accessible. There’s also language interoperability (cultural and language) that can again be solved by using a knowledge graph with search (tinkering with knowledge graphs, just like Google does).
Lastly there’s data interoperability. Barriers preventing data and system interoperability are slowly being brought down through collaboration. In the meantime, it is possible for us to convert key data into the same data format, so AI and inferencing can be used on different (previously incompatible) datasets. The kind of thing that can lead to computation-derived insights that a human on their own couldn’t make. Converting data to RDF could be such a point in case, a real lingua franca of data, also connected to the Web.
Reusable
The “F.A.I.” part of FAIR, really already covers Reusable. We want to be able to find data that we can reuse. To do this, we need to be able to see related information on finding content, information, datasets and data catalogues as to the how, what, when, who, why, where of its potential usage. More: working on the shoulders of giants, less: reinventing the wheel. The information (rich metadata) associated with Reusable, also refers to its usefulness: value, age and provenance.
Healthcare Data Commons
There is an emerging FAIR Data paradigm shift within the health informatics and research professional communities, that has been sparked by those within the bio- and life science domains.
There are obvious regulatory constraints when speaking about patient data, or health data, that any data commons arena will have to nail, upfront.
Health Data: quality register data, EHRs data and the patient’s self-created data, together, would be a real gold mine in the pursuit of personalised medicine and health care. Patient-centric data and FAIR data governance will be key.
The outlined scenario for a FAIR Data Commons
The illustration above shows a FAIR data commons. It will be the foundation framework for all information systems (register) in the data ecology. These information systems need to harmonise and align to become FAIR. There is a set of generic agent informationbehaviour patterns (user personas):
Data provision agent, is an information behaviour with either a human actor that upload (provision data) or machine to machine data integration contributing to the datasets in the register.
Data owner, is an information behaviour relating to governance and ownership, stewardship to the datasets in the register.
Application builder, is an information behaviour relating to building capabilities with the use and reused datasets in the register.
Data enricher, is an information behaviour relating to expanding the models, and enriching the datasets. With i.e. use of linked-data, semantics and more to create richer metadata.
Searcher, is an information behaviour relating to finding and acting upon data.
Referrer, is an information behaviour relating to using data in information flows and data exchange to support different kinds of processes, activities and actions with other actors in the ecology.
The business value realised (effect) using the FAIR Data Commons will be via different means to e-services, used in the scenarios for searcher and referrer, but also in improved efficiency and improved data quality in the other information behaviours.
The Life Science industry together with Healthcare have some impressive initiatives e.g. Electronic Health Records 4 Clinical Research [EHR4CR], with its information platform, InSite (by TriNetX), in line with FAIR data.
This is the first joint post in a series where Findwise & SearchExplained, together decompose Microsoft’s realm with the focus on knowledge graphs and AI. The advent of graph technologies and more specific knowledge graphs have become the epicentre of the AI hyperbole.
The use of a symbolic representation of the world, as with ontologies (domain models) within AI is by far nothing new. The CyC project, for instance, started back in the 80’s. The most common use for average Joe would be by the use of Google Knowlege Graph that links things and concepts. In the world of Microsoft, this has become a foundational platform capacity with the Microsoft Graph.
It is key to separate the wheat from the chaff since the Microsoft Graph is by no means a Knowledge Graph. It is a highly platform-centric way to connect things, applications, users and information and data. Which is good, but still it lacks the obvious capacity to disambiguate complex things of the world, since this is not its core functionality to build a knowledge graph (i.e ontology).
From a Microsoft centric worldview, one should combine the Microsoft Graph with different applications with AI to automate, and augment the life with Microsoft at Work. The reality is that most enterprises do not use Microsoft only to envelop the enterprise information landscape. The information environment goes far beyond, into a multitude of organising systems within or outside to company walls.
Question: How does one connect the dots in this maze-like workplace? By using knowledge graphs and infuse them into the Microsoft Graph realm?
The model, artefacts and pragmatics
People at work continuously have to balance between modalities (provision/find/act) independent of work practice, or discipline when dealing with data and information. People also have to interact with groups, and imaged entities (i.e. organisations, corporations and institutions). These interactions become the mould whereupon shared narratives emerge.
Knowledge Graphs (ontologies) are the pillar artefacts where users will find a level playing field for communication and codification of knowledge in organising systems. When linking the knowledge graphs, with a smart semantic information engine utility, we get enterprise-linked-data that connect the dots. A sustainable resilient model in the content continuum.
Microsoft at Work – the platform, as with Office 365 have some key building blocks, the content model that goes cross applications and services. The Meccano pieces like collections [libraries/sites] and resources [documents, pages, feeds, lists] should be configured with sound resource descriptions (metadata) and organising principles. One of the back-end service to deal with this is Managed Metadata Service and the cumbersome TermStore (it is not a taxonomy management system!). The pragmatic approach will be to infuse/integrate the smart semantic information engine (knowledge graphs) with these foundation blocks. One outstanding question, is why Microsoft has left these services unchanged and with few improvements for many years?
The unabridged pathway and lifecycle to content provision, as the creation of sites curating documents, will be a guided (automated and augmented [AI & Semantics]) route ( in the best of worlds). The Microsoft Graph and the set of API:s and connectors, push the envelope with people at centre. As mentioned, it is a platform-centric graph service, but it lacks connection to shared narratives (as with knowledge graphs). Fuzzy logic, where end-user profiles and behaviour patterns connect content and people. But no, or very limited opportunity to fine-tune, or align these patterns to the models (concepts and facts).
Akin to the provision modality pragmatics above is the find (search, navigate and link) domain in Office 365. The Search road-map from Microsoft, like a yellow brick road, envision a cohesive experience across all applications. The reality, it is a silo search still 😉 The Microsoft Graph will go hand in hand to realise personalised search, but since it is still constraint in the means to deliver a targeted search experience (search-driven-application) in the modern search. It is problematic, to say the least. And the back-end processing steps, as well as the user experience do not lean upon the models to deliver i.e semantic-search to connect the dots. Only using the end-user behaviour patterns, end-user tags (/system/keyword) surface as a disjoint experience with low precision and recall.
The smart semantic information engine will usually be a mix of services or platforms that work in tandem, an example: