Building a chatbot – that actually works

In the era of artificial intelligence and machine learning, chatbots have gained a lot of attention. Chatbots can for example help a user to book restaurants or schedule flights. But why should organizations use chatbots instead of simple user interaction (UI) systems? Considering that chatbots are both easier and more natural to interact with -compared to that of a UI system – endorses the implementation of chatbots in certain use cases. Additionally, a chatbot can engage a user for a longer time which can result in a company increasing its business. A chatbot needs to understand the natural language as there can be multiple ways to express one’s intention with language ambiguity. Natural Language Processing (NLP) helps us to achieve this to some extent.

Natural language processing – the foundation for a chatbot

Compared to rule-based solutions, chatbots using machine learning and language understanding are much more efficient. After years and new waves of statistical models, such as deep learning RNN, LSTM, transformers etc., these algorithms have now become market standards.

NLP is a part of linguistics and artificial intelligence, where algorithms are used to understand, analyze, manipulate and potentially generate human readable text. Usually, it contains two components: Natural Language Understanding (NLU) and Natural Language Generation (NLG).

To start with, the natural language input is mapped into useful representation for machine reading comprehension. This is achieved through using basics like: tokenization, stemming / lemmatization or tagging part of speech. There are also more advanced elements such as recognizing named entities or chunking. The latter is a processing method which organizes the individual terms found previously into a more prominent structure. For example: ’South Africa’ – is more useful as a chunk than the individual words ‘South’ and ‘Africa’.

FIGURE 1: A PROCESS OF BREAKING A USER’S MESSAGE INTO TOKENS

FIGURE 1: A PROCESS OF BREAKING A USER’S MESSAGE INTO TOKENS

From the other side, NLG is the process of producing meaningful phrases and sentences in natural language from an internal structural representation using e.g. content determination, discourse planning, sentence aggregation, lexicalization, referring expression generation or linguistic realization.

Open-domain and Goal-Driven Chatbot

Chatbots can be classified into two categories: Goal-driven and Open-domain. Goal-driven chatbots are built to solve specific problems such as a flight bookings or restaurant reservations. On the other hand, the Open-domain dialogue system attempts to establish a long-term connection with the user, such as psychological support and language learning.

Goal-driven chatbots are based on slot filling and handcrafted rules, which are reliable but restrictive in conversation. A user has to go through a predefined dialogue flow to accomplish a task.

FIGURE 2: ARCHITECTURE FOR GOAL-DRIVEN CHATBOT

FIGURE 2: ARCHITECTURE FOR GOAL-DRIVEN CHATBOT

Open domain chatbots are intended to converse coherently and engagingly with humans and maintain a long dialog flow with a user. However, we need to have big amounts of data to train these chatbots.

FIGURE 3: ARCHITECTURE FOR OPEN-DOMAIN CHATBOT

FIGURE 3: ARCHITECTURE FOR OPEN-DOMAIN CHATBOT

Knowledge graphs bring connections and data structures to information

Knowledge graphs provides a semantic layer on the top of your database which provides you with all possible entities and the relationships between them. There are a number of representation and modeling instruments available for building a knowledge graph, ontologies being one of them.

Ontology comprises of classes, relationships and attributes as shown in Figure 9. This offers a robust way to store information and concepts – similar to how humans store information.

FIGURE 4: OVERVIEW OF A KNOWLEDGE GRAPH WITH AN RDF SCHEMA

FIGURE 4: OVERVIEW OF A KNOWLEDGE GRAPH WITH AN RDF SCHEMA

A chatbot based on ontology can help to clarify the user’s context and intent – and it can dynamically suggest related topics. Knowledge graphs represent the knowledge of an organization,  as depicted in the following Figure 10. Consider a knowledge graph based on an organization (as shown on the right image in Figure 10) and a chatbot (as shown on the left image in Figure 10) which is based on the ontology of this knowledge graph. In the chatbot example in Figure 10, the user asks a question about a specific employee. The NLP detects the employee as an entity and also detects the intent behind asking a question about this entity. The chatbot matches the employee entity in the ontology and navigates to the node in the graph. From that node we now know all possible relationships of that entity and the chatbot will ask back for possible options, such as co-workers and projects, to navigate further.

FIGURE 5: A SCENARIO - HOW A CHATBOT CAN INTERACT WITH A USER WITH A KNOWLEDGE GRAPH.

FIGURE 5: A SCENARIO – HOW A CHATBOT CAN INTERACT WITH A USER WITH A KNOWLEDGE GRAPH.

Moreover, the knowledge graph also improves the NLU in a chatbot. For example, if a user asks the following;

  • ‘Which assignments was employee A part of?’. To navigate further in the knowledge graph, a rank system can be created for possible connections from the employee node. This rank system might be based on word vector space and a similarity score.
  • In this scenario, ‘worked in, projects’ will have the highest rank when calculating the score with ‘part of, assignments’. So, the chatbot would know it needs to return the list of corresponding projects.

Virtual assistants with Lucidworks Fusion

Lucidworks Fusion is an example of a platform that supports building conversation interfaces. Fusion includes NLP features to understand the meaning of content and user intent. In the end, it’s all about retrieving the right answer at the right time. Virtual assistants, with a more human level of understanding, goes beyond static rules and profiles. It uses machine learning to predict user intention and provides insights. Customers and employees can locate critical insights to help them move to their next best action.

FIGURE 6: LUCIDWORKS FUSION DATA FLOW

FIGURE 6: LUCIDWORKS FUSION DATA FLOW

Lucidworks recently announced Smart Answers – new Fusion’s feature. Smart Answers enhances the intelligence of chatbots and virtual assistants by using deep learning to understand natural language questions. It uses deep learning models and mathematical logic to match the similarity of a question (which can be asked in many different ways) to the most relevant answer. As users interact with the system, Smart Answers continues to rank all answers and improve relevancy.

Fusion is focused on understanding a user’s intent. Smart Answers includes model training and serving methods for different scenarios:

  • When FAQs or question-answer pairs exist, they can be easily integrated into Smart Answers’ model training framework,
  • When there are no FAQ or question-answer pairs, knowledge base documents can be used to train deep learning models and match existing knowledge for the best answers to incoming queries. Once users click on documents returned for specific queries, they become question-answers pairs signals and can enrich the FAQ model training framework,
  • When there are no documents internally, Smart Answers uses cold-start models trained on large online sources, available in multiple languages. Once it goes live, the models begin training on actual user signals.

Smart Answers’ API enables easy integration with any platform, knowledge base, adding value to existing applications. One of the strengths of Fusion Smart Answers is integration with Rasa, an open-source conversation engine. It’s a framework that helps with understanding user intention and maintaining dialogue flow. It also has prebuilt NLP components such as word vectors, tokenizers, intent classifiers and entity extractor. Rasa allows to configure the pipeline that processes a user’s message and analyze human language. Another part of this engine enables modeling dialogues, so chatbot knows what the next action or response should be.

intent:greet 
- Hi 
- Hey 
- Hi bot 
- Hey bot 
 
## intent:request_restaurant 
- im looking for a restaurant 
- can i get [swedish](cuisine) food in any area. 
- a restaurant that serves [caribbean](cuisine) food. 
- id like a restaurant 
- im looking for a restaurant that serves [mediterranean](cuisine) food 
- can i find a restaurant that serves [chinese](cuisine)

Building chatbots requires a lot of training examples for every intent and entity to make them understand the user intention, domain knowledge and to improve NLU of the chatbot. When building a simple chatbot, using prebuilt trained models can be useful and requires less training data. For example: If we build a chatbot where we only need to detect the common location entity, few examples and spaCy models can be enough. However, there might be cases when you need to build a chatbot for an organization where you need different contextual entities – which might not be available in the pretrained models. Knowledge graphs can then be helpful to have a domain knowledge for a chatbot and can balance the amount of work related to training data.

Conclusion

Two main chatbot usages are: 1/solving employee frustration in accessing e.g. corporate information and 2/providing customers with answers to support questions. Both examples above are looking for a solution to reduce time spent on finding information. Especially for online commerce, key performance indicators are clear and can relate to e.g. decreasing call center traffic or call deflection from web and email – examples of situations where ontology based chatbots can be very helpful. From a short-term perspective creating a knowledge graph can initially require a lot of effort – but from a long-term perspective it can also create a lot of value. Companies rely on digital portals to provide information to users; employees search for HR or organization policies documents. Online retailers try to increase customers’ self-service in solving their problems or simply want to improve discovery of their products and services. With solutions like e.g. Fusion Smart Answers, we are able to cut down time-to-resolution, increase customer retention and take knowledge sharing to the next level. It helps employees and customers resolve issues more quickly and empowers users to find the right answer immediately without seeking out additional, digital channels.

Authors: Pragya Singh, Pedro Custodio, Tomasz Sobczak

To read more:

  1. Ehud Reiter and Robert Dale. 1997. Building applied natural language generation systems. Nat. Lang. Eng. 3, 1 (March 1997), 57–87. DOI:https://doi.org/10.1017/S1351324997001502.
  2. Challenges in Building Intelligent Open-domain Dialog Systems by Huang, M.; Zhu, X.; Gao, J.
  3. A Novel Approach for Ontology-Driven Information Retrieving Chatbot for Fashion Brands by Aisha Nazir, Muhammad Yaseen Khan, Tafseer Ahmed, Syed Imran Jami, Shaukat Wasi
  4. https://medium.com/@BhashkarKunal/conversational-ai-chatbot-using-rasa-nlu-rasa-core-how-dialogue-handling-with-rasa-core-can-use-331e7024f733
  5. https://lucidworks.com/products/smart-answers/

3 challenges for the internal service desk – and how to solve them

The digital transformation and the internal service desks

Most organizations today are focusing on creating a digital service desk experience. This transformation has of course been going on for many years and different organizations have different versions of ticket systems for reporting and solving internal (mostly IT) issues. A common trend, though, is efforts on creating a digital self-service.  Gartner has targeted self-service support as one of the top priority areas for 2020:

“Improve the customer service experience by reducing live contact volume by shifting from a live to a self-service functionality”

The self-service trend is mainly focusing on answering the “simple and reoccurring” questions. These are the types of questions, asked often and by different users, that typically have simple answers. Our experience at Findwise is that surprisingly many of all the questions handled by an internal service desk can be categorized as simple and reoccurring. We have targeted 3 challenges for the internal service desk – and suggestions on how to solve them.

The challenge of self-service in an internal service desk

In almost all organizations there is a need to handle internal support questions. It might be IT-related such as “how do I install a VPN to be able to work from home”, HR-related such as “how do I order terminal glasses”, Finance related such as “where do I report the financial result for last quarter “ etc.

This is generally handled by the “internal service desk” or “internal support”. It might be handled “case-by-case” using email by the responsible person or in a more structured form in a “ticket system”. Often IT has a structured and formalized way of working but other areas (HR, Finance etc.) might not be equally structured.

The business impact on an organization when the internal service desk does not deliver fast and accurate answers might be huge! People might not get their work done and instead need to “idle” in wait for a response or answer.

3 challenges to solve

Findwise has during the years created several digital, self-service, internal service desk portals with the ability to be proactive and give the users the fast and accurate answers they are looking for.

In this work we have learned that there are 3 main challenges you need to solve:

  1. Take control of your data

If you are going to provide proactive and self-service answers to simple and recurring questions you need to know where the answers are. You need to have control of your data!

In order to do that you need to have a plan and work continuously with the data challenges viewed in the picture below:

At Findwise we have measured Search and Findability in various organisations since 2012. As clearly shown in the result of the 2019 Search & Findability Survey, finding relevant information is still a major challenge to most organizations. When it comes to internal information, as many as 55% find it difficult or very difficult to find what they are looking for. Bad information quality is one of main reasons for poor findability.

Not only does insufficient information quality lead to poor findability, it also has a negative effect on digital transformation in general. To be able to extract value from data and create, in this case, a digital self-service, the first step always needs to be to “sort out the (information) mess”. Read more about, what we at Findwise call, “The pyramid of digital transformation” and why sorting the mess is fundamental.

  1. Create the appropriate platform

Creating a ticket in a ticket system is good for complex questions that are not occurring daily. They need to be handled by a person working in the internal support organization.

Finding an answer to the simple and reoccurring questions requires another kind of system. This is more of a search-platform than a ticket system. The user wants to find the answer – not create a ticket and then wait.

At Findwise we have created various search and information platforms, with service desk applications built on top, during the years. The solution depends on the user’s specific need, type of data and optimal way of consuming information.

service desk platform

An internal support service combining the ticket system for complex questions and the self-service portal for simple and recurring questions can handle any kind of internal issue in an efficient way!

  1. Make it easy to find the correct answer

Understanding user intent is hard. Luckily, we can use AI-technology to bridge the gap  in communication between a human and a system.

Users (supported by technological development) have moved from keyword search to searching in our Natural Language. Natural Language Processing is a substantial part of AI used for understanding the human language and being able to answer in the same way. Home assistants are a great example of NLP in our everyday life.

Digging deeper in the area of NLP you’ll find Name Entity Recognition (NER). This is how we at Findwise know that “surprisingly many of all the questions handled by an internal service desk can be categorized as simple and reoccurring”. Let’s look at some examples of questions that appear unique, but can actually be clustered.

In the case of the phone numbers the queries seem to be unique but since they all refer to the same “entity” they can be clustered and handled as “simple and recurring”.

There are obviously a lot of different ways of asking the same question! Natural Language Understanding, using dense vectors or embeddings, is likely the hottest area withing the deep learning NLP community today. Google’s BERT that was released late 2018 has even been able to outperform humans in question answering tasks.

But AI doesn’t need to be the silver bullet every time. Another example of how to make it easy to find the current answer is to work proactively. During a year different thing are important for different users. Approaching summer questions about vacation rules, vacation application etc. might be very common. Coming back after the summer vacation IT-departments might be bored with the simple and recurring question of “I have forgotten my password – how do I change it?”

Using search technology boosted with AI and a lot of common sense the support organization should be able to present answers to questions that they think many users will ask – at the right time of the year.

Summary

The trend towards a digital and self-service oriented internal service desk is rapidly gaining phase, in the short perspective driven by the fact that more people than ever are working from home.

The negative business impact of a poor service desk not giving fast and accurate answers can be significant.

Findwise experience within this filed can be summarized in three challenges that need to be solved:

  • Take control of your data
  • Create the appropriate platform
  • Make it easy to find the correct answer

If you want to know more about how Findwise solves these challenges and the solutions we provide, do not hesitate to contact us.

Making your data F.A.I.R and smart

This is the second post in a new series by Fredric Landqvist & Peter Voisey, explaining how your organisation could best shape its data landscape for the future.

How to create a smart data framework for your organisation

In our last post for you, we presented the benefits of F.A.I.R data, how to make data smarter for search engines and the potentials of an Information Commons. In this post, we’re giving you the pragmatic steps to make your data FAIR by creating and applying your own smart data framework. Your data-sharing dream, internally and externally, is possible.

A smart data framework, using FAIR data principles, encompasses the tooling, models and standards that govern datasets and the different context-specific information systems (registers, catalogues). The data is then ingested and processed (enriched/refined) into smart data, datasets and data catalogues. It can then be used and reused by different applications and e-services via open APIs. In this ecosystem, all actors and information behaviours (personas) interplay: provision agents, owners, builders, enrichers, end-user searchers and referrers.

The workings of a smart data framework

A smart data & metadata catalogue   

A smart data & metadata catalogue (illustrated below), provides an organisational capability that aligns data management with the FAIR data principles. View it not so much as one system to rule them all, but rather an ecosystem that is smart and sustainable. In order to simplify your complex and heterogeneous information environment, this set-up can be  instantiated, as one overarching mechanism. Although we are describing a data and metadata catalogue here, the exact same framework and set up would of course apply also to your organisation’s content, making it smarter and more findable (i.e. it gets the sustainable stamp).

Smart Data Catalogue
The necessary services and component of a smart data catalogue

The above picture illustrates the services and components that, together, build smart data and metadata catalogue capabilities. We now describe each one of them for you:

Processing (Ingestion & Enrichment) for great Findability & Interoperability

  • (A) Ingest, harvest and operate. Here you connect the heterogeneous data sources for ingestion.

The configured input mechanisms describe each of the data sources, with their data, datasets and metadata ready for your catalogue search. Hopefully, at the dataset upload stage, you have provided a good system/form that now provides your search engine with great metadata (i.e. we recommend you use the open data catalogue standard DCAT-AP). The concept upload is interchangeable with either machine-to-machine harvester mechanisms, as with open-data, traditional data integration, or manual provision by human upload effort. (D) Enterprise Metadata Repository: here is the persistent storage of data in both data catalogue, index and graph. All things get a persistent ID (how to design persistent URI) and rich metadata.

  • (B) Enrich, refine analyze, and curate. This is the AI part (NLP, Semantics, ML) that enriches the data and datasets, making them smarter. 

Concepts (read also entities, terms, phrases, synonyms, acronyms etc.) from the data sources are found using named entity extraction (NER). By referring to a Knowledge Graph in the Enricher, the appropriate resources are annotated (“tagged”) with the said concept. It does not end here, however. The concept also takes with it from the Knowledge Graph all of the known relationships it has with other concepts.

Essentially a Knowledge Graph is your encoded domain knowledge in a connected graph format. It is by reading these encoded relationships that the machine “understands” the meaning or aboutness of data.

This opens up a very nice Pandora’s box for your search (understanding query intent) and for your Graphical User Interface (GUI) as your data becomes smarter now through your ability to exploit the relationships and connections (semantics and context) between concepts.

You and AI can have a symbiotic relationship in the development of your Knowledge Graph. AI can suggest new concepts and relationships as new data is added. It is, however, you and your colleagues who determine the of concepts/relationships in the Knowledge Graph – concepts/relationships that are important to your department or business. Remember you can utilise more than one knowledge graph, or part of one, for a particular business need(s) or data source(s). The Knowledge Graph is a flexible expression of your business/information models that give structure to all your data and its access.

Extra optional step: If you can manage not only to index the dataset metadata but the datasets themselves, you can make your Pandora’s box even nicer. Those cryptic/nonsensical field names that your traditional database experts love to create can also be incorporated and mapped (one time only!) into your Knowledge Graph, thus increasing the machine “understanding” of the data. Thus, there is a better chance of the data asset being used more widely. 

The configuration of processing with your Knowledge Graph can take care of dataset versioning, lineage and add further specific classifications e.g. data sensitivity, user access and personal information.

Lastly on Processing, your cultural and system interoperability is immensely improved. We’re not talking everyone speaking the same language here, rather everyone talking their language (/culture) and still being able to find the same thing. In this open and FAIR vocabularies further, enrich the meaning to data and your metadata is linked. System interoperability is partially achieved by exploiting the graph of connections that now “sit over” your various data sources.

Controlled Access (Accessible and Reusable)

  • (C) Access, search and visualize APIs. These tools control and influence the delivery, representation, exploration and consumption/use of datasets and data catalogues via a smarter search (made so by smarter data) and a more intuitive Graphical User interface (GUI).

This means your search can now “understand” user intent from just one or two keyword queries (through known relationship connections in the Knowledge Graph). 

Your search now also caters for your searchers who are searching in an unfamiliar subject area or are just having a query off day. Besides offering the standard results page, the GUI can also present related information (again due to the Knowledge Graph), past related user queries, information and question-answer (Q&A) type material. So: search, discovery, learning, serendipity.

Your GUI can also now become more intuitive, changing its information presentation and facets/filters automatically, depending on the query itself (more sustainable front-end coding). 

An alternative to complex scenario coding also includes the possibility for you to create rules (set in your Knowledge Graph) that can control what data users can access (when, how and where) based on their profile, their role, their location, the time and on the device they are using. This same Knowledge Graph can help push and recommend data for certain users proactively. Accessibility will be possible by using standard communication protocols, open access (when possible), authentication where necessary, and always with metadata at hand.

Reusable: your new smart data framework can help increase the time your Data Managers (/Scientists, Analysts) spend using data (and not trying to find it, the 80/20 data science dilemma). It can also help reduce the risk to your AI projects (50% failure rate) by helping searchers find the right data, with its meaning and context, more easily.  Reuse will also be possible with the design that metadata multiple attributes, use licence and provenance in line with community standards

Users and information behaviour (personas)

Users and personas
User groups and services

From experience we have defined the following broad conceptual user-groups:

  • Data Managers, a.k.a. Data Op’s or Data Scientists
    Data Managers are i.e. knowledge engineers, taxonomists and analysts. 
  • Data Stewards
    Data Stewards are responsible for Data Governance, such as data lineage. 
  • Business Professionals/Business end-users
    Business Users may have a diverse background. Hence Business end-users.
  • Actor System are different information systems and applications and services that integrate information via the rich open APIs from the Smart Data Catalogue

The outlined collaborative actors (E-H user groups) and their interplay as information behaviour (personas) with the data (repository) and services (components), together, build the foundation for a more FAIR data management within your organisation, providing for you at the same time, the option to contribute to an even broader shared open FAIR information commons.

  • (E) Data Op’s workplace and dashboard is a combination of tools supporting Data Op’s data management processes in the information behaviours: data provision agents, enrichers and developers.
  • (F) Data Governance workplace is the tools to support Data Stewards collaborative data governance work with Data Managers in the information behaviours: data owner.
  • (G) Access, search, visualize APIs, is the user experience to explore, find and interact with the catalogue and data in the information behaviours: searcher and referrer.
  • (H) API, is the set of open APIs to support access to catalogue data for consuming information systems in the information behaviours: referrer (a.k.a. data exchange).

Potential tooling for this smart data framework:

We hope you enjoyed this post and understand the potential benefits such a smart data framework incorporating FAIR data principles can have on your data catalogue, or for that matter, your organisational content or even your data swamps.


In the next post, Toward data-centric solutions with Knowledge Graphs, we talk about Knowledge Graphs (KG) and its non-proprietary RDF semantic web tech, how you can create your KG(s) and the benefits they can bring to your future data landscape.

View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Peter Voisey's LinkedIn profilePeter Voisey

Beyond Office 365 – knowledge graphs, Microsoft Graph & AI!

This is the first joint post in a series where Findwise & SearchExplained, together decompose Microsoft’s realm with the focus on knowledge graphs and AI. The advent of graph technologies and more specific knowledge graphs have become the epicentre of the AI hyperbole.

microsoft_graph

The use of a symbolic representation of the world, as with ontologies (domain models) within AI is by far nothing new. The CyC project, for instance, started back in the 80’s. The most common use for average Joe would be by the use of Google Knowlege Graph that links things and concepts. In the world of Microsoft, this has become a foundational platform capacity with the Microsoft Graph.

It is key to separate the wheat from the chaff since the Microsoft Graph is by no means a Knowledge Graph. It is a highly platform-centric way to connect things, applications, users and information and data. Which is good, but still it lacks the obvious capacity to disambiguate complex things of the world, since this is not its core functionality to build a knowledge graph (i.e ontology).

From a Microsoft centric worldview, one should combine the Microsoft Graph with different applications with AI to automate, and augment the life with Microsoft at Work. The reality is that most enterprises do not use Microsoft only to envelop the enterprise information landscape. The information environment goes far beyond, into a multitude of organising systems within or outside to company walls.

Question: How does one connect the dots in this maze-like workplace? By using knowledge graphs and infuse them into the Microsoft Graph realm?

Office 365 MDM

The model, artefacts and pragmatics

People at work continuously have to balance between modalities (provision/find/act) independent of work practice, or discipline when dealing with data and information. People also have to interact with groups, and imaged entities (i.e. organisations, corporations and institutions). These interactions become the mould whereupon shared narratives emerge.

Knowledge Graphs (ontologies) are the pillar artefacts where users will find a level playing field for communication and codification of knowledge in organising systems. When linking the knowledge graphs, with a smart semantic information engine utility, we get enterprise-linked-data that connect the dots. A sustainable resilient model in the content continuum.

Microsoft at Work – the platform, as with Office 365 have some key building blocks, the content model that goes cross applications and services. The Meccano pieces like collections [libraries/sites] and resources [documents, pages, feeds, lists] should be configured with sound resource descriptions (metadata) and organising principles. One of the back-end service to deal with this is Managed Metadata Service and the cumbersome TermStore (it is not a taxonomy management system!). The pragmatic approach will be to infuse/integrate the smart semantic information engine (knowledge graphs) with these foundation blocks. One outstanding question, is why Microsoft has left these services unchanged and with few improvements for many years?

The unabridged pathway and lifecycle to content provision, as the creation of sites curating documents, will be a guided (automated and augmented [AI & Semantics]) route ( in the best of worlds). The Microsoft Graph and the set of API:s and connectors, push the envelope with people at centre. As mentioned, it is a platform-centric graph service, but it lacks connection to shared narratives (as with knowledge graphs).  Fuzzy logic, where end-user profiles and behaviour patterns connect content and people. But no, or very limited opportunity to fine-tune, or align these patterns to the models (concepts and facts).

Akin to the provision modality pragmatics above is the find (search, navigate and link) domain in Office 365. The Search road-map from Microsoft, like a yellow brick road, envision a cohesive experience across all applications. The reality, it is a silo search still 😉 The Microsoft Graph will go hand in hand to realise personalised search, but since it is still constraint in the means to deliver a targeted search experience (search-driven-application) in the modern search. It is problematic, to say the least. And the back-end processing steps, as well as the user experience do not lean upon the models to deliver i.e semantic-search to connect the dots. Only using the end-user behaviour patterns, end-user tags (/system/keyword) surface as a disjoint experience with low precision and recall.

The smart semantic information engine will usually be a mix of services or platforms that work in tandem,  an example:

  1. Semantic Tools (PoolParty, Semaphore)
  2. Search and Analytics (i3, Elastic Stack)
  3. Data Integration (Marklogic, Biztalk)
  4. AI modules (MS Cognitive stack)

In the forthcoming post on the theme Beyond Office 365 unpacking the promised land with knowledge graphs and AI, there will be some more technical assertions.
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Agnes Molnar's LinkedIn profileAgnes Molnar SearchExplained

.

Query Completion with Apache Solr

There are plenty of names for this functionality: query completion, suggestions, auto-complete, auto-suggest, word completion, type ahead and maybe some more. Even if we may point slight differences between them (suggestions can base on your index documents or external input such users queries), from technical point of view it’s all about the same: to propose a query for the end user.

google-suggestearly Google Suggest from 2008. Source: http://www.wpromote.com/blog/4-things-in-08-that-changed-the-face-of-search/

 

Suggester feature was started 8 years ago by Google, in 2008. Users got used to the query completion and nowadays it’s a common feature of all mature search engines, e-commerce platforms and even internal enterprise search solutions.

Suggestions help with navigating users through the web portal, allow to discover relevant content and recommend popular phrases (and thus search results). In the e-commerce area they are even more important because well implemented query completion is able to high up conversion rate and finally – increase sales revenue. Word completion never can lead to zero results, but this kind of mistake is made frequently.

And as many names describe this feature there are so many ways to build it. But still it’s not so trivial task to implement good working query completion. Software like Apache Solr doesn’t solve whole problem. Building auto-suggestions is also about data (what should we present to users), its quality (e.g. when we want to suggest other users’ queries), suggestions order (we got dozens matches, but we can show only 5; which are the most important?) or design (user experience or similar).

Going back to the technology. Query completion can be built in couple of ways with Apache Solr. You can use mechanisms like facets, terms, dedicated suggest component or just do a query (with e.g. dismax parser).

Take a look at Suggester. It’s very easy to run. You just need to configure searchComponent and requestHandler. Example:

<searchComponent name="suggester" class="solr.SuggestComponent">
  <lst name="suggester">
    <str name="name">suggester1</str>
    <str name="lookupImpl">FuzzyLookupFactory</str>
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="field">title</str>
    <str name="weightField">popularity</str>
    <str name="suggestAnalyzerFieldType">text</str>
  </lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <str name="suggest">true</str>
    <str name="suggest.count">10</str>
  </lst>
  <arr name="components">
    <str>suggester</str>
  </arr>
</requestHandler>

SuggestComponent is a ready-to-use implementation, which is responsible for serving up suggestions based on commands and queries. It’s an efficient solution, i.e. because it works on structure separated from main index and it’s being kept in memory. There are some basic settings like field used for autocompleting or defining text analyzing chain. LookImpl defines how to match terms in index. There are about 10 algorithms with different purpose. Probably the most popular are:

  • AnalyzingLookupFactory (default, finds matches based on prefix)
  • FuzzyLookupFactory (finds matches with misspellings),
  • AnalyzingInfixLookupFactory (finds matches anywhere in the text),
  • BlendedInfixLookupFactory (combines matches based on prefix and infix lookup)

You need to choose the one which fulfill your requirements. The second important parameter is dictionaryImpl which represents how indexed suggestions are stored. And again, you can choose between couple of implementations, e.g. DocumentDictionaryFactory (stores terms, weights, and optional payload) or HighFrequencyDictionaryFactory (works when very common terms overwhelm others, you can set up proper threshold).

There are plenty of different settings you can use to customize your suggester. SuggestComponent is a good start and probably covers many cases, but like everything, there are some limitations like e.g. you can’t easily filter out results.

Example execution:

http://localhost:8983/solr/index/suggest?wt=json&suggest.dictionary=analyzingSuggester&suggest.q=lond

suggestions: [
  { term: "london" },
  { term: "londonderry" },
  { term: "londoño" },
  { term: "londoners" },
  { term: "londo" }
]

Another way to build a query completion is to use mechanisms like faceting, terms or highlighting.

The example of QC built on facets:

http://localhost:8983/solr/index/select?q=*:*&facet=on&facet.field=title_keyword&facet.mincount=1&facet.contains=lon&rows=0&wt=json

title_keyword: [
  "blonde bombshell", 2,
  "12-pounder long gun", 1,
  "18-pounder long gun", 1,
  "1957 liga española de baloncesto", 1,
  "1958 liga española de baloncesto", 1
]

Please notice that here we have used facet.contains method, so query matches also in the middle of phrase. It works on the basis of regular expression. Additionally, we have a count for every suggestion in Solr response.

TermsComponent (returns indexed terms and the number of documents which contain each term) and highlighting (originally, emphasize fragments of documents that match the user’s query) can be also used, what is presented below.

Terms example:

<searchComponent name="terms" class="solr.TermsComponent"/>
<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <bool name="terms">true</bool>
    <bool name="distrib">false</bool>
  </lst>
  <arr name="components">
    <str>terms</str>
  </arr>
</requestHandler>
http://localhost:8983/solr/index/terms?terms.fl=title_general&terms.prefix=lond&terms.sort=index&wt=json

title_general: [
  "londinium",
  "londo",
  "london",
  "london's",
  "londonderry"
]

Highlighting example:

http://localhost:8983/solr/index/select?q=title_ngram:lond &fl=title&hl=true&hl.fl=title&hl.simple.pre=&hl.simple.post=

title_ngram: [
  "londinium",
  "londo",
  "london",
  "london's",
  "londonderry"
]

You can also do auto-complete even with usual, full-text query. It has lots of advantages: Lucene scoring is working, you have filtering, boosts, matching through many fields and whole Lucene/Solr queries syntax. Take a look at this eDisMax example:

http://localhost:8983/solr/index/select?q=lond&qf=title_ngram&fl=title&defType=edismax&wt=json

docs: [
  { title: "Londinium" },
  { title: "London" },
  { title: "Darling London" },
  { title: "London Canadians" },
  { title: "Poultry London" }
]

The secret is an analyzer chain whether you want to base on facets, query or SuggestComponent. Depending on what effect you want to achieve with your QC, you need to index data in a right way. Sometimes you may want to suggest single terms, another time – whole sentences or product names. If you want to suggest e.g. letter by letter you can use Edge N-Gram Filter. Example:

<fieldType name="text_ngram" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory minGramSize="1" maxGramSize="50" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

N-Gram is a structure of n items (size depends on given range) from a given sequence of text. Example: term Findwise, minGramSize = 1 and maxGramSize = 10 will be indexed as:

F
Fi
Fin
Find
Findw
Findwi
Findwis
Findwise

With such indexed text you can easily achieve functionality where user is able to see changing suggestions after each letter.

Another case is an ability to complete word after word (like Google does). It isn’t trivial, but you can try with shingle structure. Shingles are similar to N-Gram, but it works on whole words. Example: Searching is really awesome, minShingleSize = 2 and minShingleSize = 3 will be indexed as:

Searching is
Searching is really
is really
is really awesome
really awesome

Example of Shingle Filter:

<fieldType name="text_shingle" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ShingleFilterFactory" maxShingleSize="10" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

What if your users could use QC which supports synonyms? Then they could put e.g. abbreviation and find a full suggestion (NYC -> New York City, UEFA -> Union Of European Football Associations). It’s easy, just use Synonym Filter in your text field:

<fieldType name="text_synonym" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
  </analyzer>
</fieldType>

And then just do a query:

http://localhost:8983//select?defType=edismax&fl=title&q=nyc&qf=title_synonym&wt=json

docs: [
  { title: "New York City" },
  { title: "New York New York" },
  { title: "Welcome to New York City" },
  { title: "City Club of New York" },
  { title: "New York" }
]

Another very similar example concerns language support and matching suggestions regardless of the terms’ form. It can be especially valuable for languages with  the rich grammar rules and declination. In the same way how SynonymsFilter is used, we can configure a stemmer / lemmatization filter e.g. for English (take a look here and remember to put language filter both for index and query time) and expand matching suggestions.

As you can see, there are many ways to run query completion, you need to adjust right mechanism and text analysis based on your own limitations and also on what you want to achieve.

There are also other topics connected with preparing type ahead solution. You need to consider performance issues, they are mostly centered on response time and memory consumption. How many requests will generate QC? You can assume that at least 3 times more than your regular search service. You can handle traffic growth by optimizing Solr caches, installing separated Solr instanced only for suggesting service. If you’ll create n-gram, shingles or similar structures, be aware that your index size will increase. Remember that if you decided to use facets or highlighting for some reason to provide suggester, this both mechanisms make your CPU heavy loaded.

In my opinion, the most challenging issue to resolve is choosing a data source for query completion mechanism. Should you suggest parts of your documents (like titles, keywords, authors)? Or use NLP algorithms to extract meaningful phrases from your content? Maybe parse search/application logs and use the most popular users queries? Be careful, filter out rubbish, normalize users input). I believe the answer is YES – to all. Suggestions should be diversified (to lead your users to a wide range of search resources) and should come from variety of sources. More than likely, you will need to do a hard job when processing documents – remember that data cleaning is crucial.

Similarly, you need to take into account different strategies when we talk about the order of proposed suggestions. It’s good to show them in alphanumeric order (still respect scoring!), but you can’t stop here. Specificity of QC is that application can return hundreds of matches, but you can present only 5 or 10 of them. That’s why you need to promote suggestions with the highest occurrence in index or the most popular among the users. Further enhancements can involve personalizing query completion, using geographical coordinates or implementing security trimming (you can see only these suggestions you are allowed to).

I’m sure that this blog post doesn’t exhaust the subject of building query completion, but I hope I brought this topic closer and showed the complexity of such a task. There are many different dimension which you need to handle, like data source of your suggestions, choosing right indexing structure, performance issues, ranking or even UX and designing (how would you like to present hints – simple text or with some graphics/images? Would you like to divide suggestions into categories? Do you always want to show result page after clicked suggestion or maybe redirect to particular landing page?).

Search engine like Apache Solr is a tool, but you still need an application with whole business logic above it. Do you want to have a prefix-match and infix-match? To support typos and synonyms? To suggest letter after the letter or word by word? To implement security requirements or advanced ranking to propose the best tips for your users? These and even more questions need to be think over to deliver successful query completion.

Generational renewal at work – a search challenge

The big generational shift

There have been discussions surrounding the great generational renewal in the workplace for a while. The 50’s generation, who have spent a large part of their working lives within the same company, are being replaced by an agile bunch born in the 90’s. We are not taken by tabloid claims that this new generation does not want to work, or that companies do not know how to attract them. What we are concerned with is that businesses are not adapting fast enough to the way the new generation handle information to enable the transfer of knowledge within the organisation.

Working for the same employer for decades

Think about it for a while, for how long have the 50’s generation been allowed to learn everything they know? We see it all the time, large groups of employees ready to retire, after spending their whole working lives within the same organisation. They began their careers as teenagers working on the factory floor or in a similar role, step by step growing within the company, together with the company. These employees have tended to carry a deep understanding of how their organisation work and after years of training, they possess a great deal of knowledge and experience. How many companies nowadays are willing to offer the 90’s workers the same kind of journey? Or should they even?

2016 – It’s all about constant accessibility

The world is different today, than 50 years ago. A number of key factors are shaping the change in knowledge-intense professions:

  • Information overload – we produce more and more information. Thanks to the Internet and the World Wide Web, the amount of information available is greater than ever.
  • Education has changed. Employees of the 50’s grew up during a time when education was about learning facts by rote. The schools of today focus more on teaching how to learn through experience, to find information and how to assess its reliability.
  • Ownership is less important. We used to think it was important to own music albums, have them in our collection for display. Nowadays it’s all about accessibility, to be able to stream Spotify, Netflix or an online game or e-book on demand. Similarly we can see the increasing trend of leasing cars over owning them. Younger generations take these services and the accessibility they offer for granted and they treat information the same way, of course. Why wouldn’t they? It is no longer a competitive advantage to know something by heart, since that information is soon outdated. A smarter approach of course is to be able to access the latest information. Knowing how to search for information – when you need it.

Factors supporting the need for organising the free flow of the right information:

  • Employees don’t stay as long as they used to in the same workplace anymore, which for example, requires a more efficient on boarding process. It’s no longer feasible to invest the same amount of time and effort on training one individual since he/she might be changing workplace soon enough anyway.
  • It is much debated whether it is possible to transfer knowledge or not. Current information on the other hand is relatively easy to make available to others.
  • Access to information does not automatically mean that the quality of information is high and the benefits great.

Organisations lack the right tools

Knowing a lot of facts and knowledge about a gradually evolving industry was once a competitive advantage. Companies and organisations have naturally built their entire IT infrastructure around this way of working. A lot of IT applications used today were built for a previous generation with another way of working and thinking. Today most challenges involve knowing where and how to find information. This is something we experience in our daily work with clients. Organisations more or less lack the necessary tools to support the needs of the newer generation in their daily work.

To summarize the challenge: organisations need to be able to supply their new workforce with the right tools to constantly find (and also manipulate) the latest and best information required for them to shine.

Success depends on finding the right information

In order for the new generation to succeed, companies must regularly review how information is handled plus the tools supporting information-heavy work tasks.

New employees need to be able to access the information and knowledge left by retiring employees, while creating and finding new content and information in such a way that information realises its true value as an asset.

Efficiency, automation… And Information Management!

There are several ways of improving efficiency, the first step is often to investigate if parts, or perhaps the entire creating and finding process can be automated. Secondly, attack the information challenges.

When we get a grip of the information we are to handle, it’s time to look into the supporting IT systems. How are employees supposed to find what they are looking for? How do they want to?

We have gotten used to find answers by searching online. This is in the DNA of the 90’s employee. By investing in a great search platform and developing processes to ensure high information quality within the organisation, we are certain the organisation will not only manage the generational renewal but excel in continuously developing new information centric services.

Written by: Maria “Ia” Björk & Joar Svensson

Sensemaking or Digital Despair

Finding our way in the bright, futuristic, data-driven & intertwined world, often taxes us and our digital-hungry senses. Fast rewind to the recent FindabilityDay 2015 and the parade of brilliant speaker talents on stage. Starting of with our dear friend and peer, Martin White, on the topic the future of search.

Human factors, from idea inception to design and practical UX of our digital artifacts. The key has been make-do and ship. This is the reason the more technically-advanced mobiles fell by the wayside 8 years ago Apple’s iPhone.

The social life with information, shapes our daily lives, in a hyper-connected world. It’s still very hard to find that information needle in the haystack, and most days we feel despair when losing the scent of information nuggets. The results from the Findability Survey, spoke clearly. Without sound organising principles to information and data, and a pliable recorded vision, we won’t find anything of value.

Next, moving into an old business model, with Luna’s and Sara’s presentation, a great example, where we see that the orchestration and choreography of their data assets will determine their survival or demise – in conjunction with infused means to information management practices, processes and tools. They showed a new set of facets to delivering on their mission in their line-of business.

Regardless of the line of business, it becomes clear that our fragmented workplace setting now only partly “on tap”. It makes our daily lives a mess, since things do not interoperate. The vision should show the way to a shared information commons, where we all cultivate.

So finally, How do we make sense of any mess?

Answer: Architect a place where you can find comfort with social conventions shared on the information used. Abby Covert, laid out a beautiful tapestry of things we all need to take on, to make sense in everyday life, and life at work. With clear and distinct guardrails, and signposts we don’t feel so distracted or lost. Her talk was a true enlightenment for me, being of the same profession, Information Architect.

View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog

Stay Cleaning and moving boxes for cloud

This is the seventh post in a series (1, 2, 3, 4, 5, 6) on the challenges organisations face as they move from having online content and tools hosted firmly on their estate to renting space in the cloud.  We will help you to consider the options and guide on the steps you need to take.

Starting from our first post we have covered different aspects you need to consider as you take each step including information structure and how it is managed using Office 365 and SharePoint as a technology example.  Planning for migration.

Moving Boxes

Do not even think about moving into the cloud apartment without a proper  cleaning of the content buckets. Moving from an architected household to a rented place, taxes a structured audit. Clean out all redundant, outdated and trivial matter (ROT). The very same habit you have cleaning up the attic when moving out from your old house.

It is also a good idea to decorate and add any features to your new cloud apartment before the content furniture is there.  It means the content will fit with any new design and adapt to any extra functionality with new features like windows and doors.  This can be done by reviewing and updating your publishing templates at the same time.  This will save time in the future.

Leaning upon the information governance standards, it should be easy to address the cleaning before moving, for all content owners who have been appointed to a set of collections or habitats. Most organisations could use a content vacuum cleaner, or rather use the search facilities and metric means to deliver up to date reports on:

  1. Active / in-Active habitats
  2. No clear ownership or the owner has left the building
  3. Metadata and link quality to content and collections to be moved across to the cloud apartments.
  4. Review publishing templates and update features or design to be used in the Cloud

When all active habitats and qualified content buckets have been revisited by their set of curators and information owners. The preparation and use of moving boxes, should be applied.

All moving boxes do need proper tagging, so that any moving company will be able to sort out where about the stuff should be placed in the new house, or building. For collections, and habitats, this means using the very same set of questions stated for adding a new habitat or collection to the cloud apartment house. Who, why, where and so forth, through the use of a structured workflow and form. When this first cleaning steps have been addressed, there should be automatic metadata enhancement, aligned with the information management processes to be used in the new cloud.

With decent resource descriptions and cleaned up content through the audit (ROT), this last step will auto-tag content based upon the business rules applied for the collection or habitat. Then been loaded into the content moving truck, or loading dock. Ready to added to the cloud.

All content that neither have proper assigned information ownership, or are in such a shape that migration can’t be done should persist on the estate or be archived or purged. This means that all metadata and links to either content bucket or habitat that won’t be moved in the first instances, should at least have correct and unique uri:s, address, to this content. And in the case a bucket or habitat have been run down by a demolition firm, purged. All inter-linkage to that piece of content or collection have to be changed.

This is typically a perfect quality report, to the information owners and content editors, that they need to work through prior to actually loading the content on the content dock.

Rubbish and Weed
Finally when all rotten data, deserted habitats and unmanageable buckets have been weeded out. It is time to prepare the moving truck, sending the content into its new destination.

Our final thread will cover how will the organisation and it habitants will be able to find content in this mix of clouds, and things left behind on the old estate? Cloud Search and Enterprise Search, seamless or a nightmare?

Please join our Live Stream on YouTube the 20th November 8.30AM – 10AM Central European Time
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Mark Morrell's LinkedIn profileMark Morell intranet-pioneer

Placemaking, wayfinding and game rules in the Clouds

This is the sixth post in a series (1, 2, 3, 4, 5, 7) on the challenges organisations face as they move from having online content and tools hosted firmly on their estate to renting space in the cloud.  We will help you to consider the options and guide on the steps you need to take.

Starting from our first post we have covered different aspects you need to consider as you take each step including information structure and how it is managed using Office 365 and SharePoint as a technology example.  We will cover more about SharePoint in this post, and placemaking in the cloud.
Funky Village
In SharePoint there are a set of logic chunks. One could decompose the digital workplace into intranet sites, as departmental and organisational buckets; team sites where groups collaborate, and lastly your personal domain being the my site collection. Navigating between these, is a mix of traditional information architecture and search driven content.  When being within a such a habitat as a teamsite, it is not always obvious how to cross-link or navigate to other domains within the digital workplace hosted in Sharepoint.

One way to overcome this, is to render different forms of portals, based upon dynamic navigation. These intersections and aggregates help users to move around the maze of buckets and collections of the content. Sharepoint have very good features, and options to create search-based content delivery mechanisms.

 A metadata and search-based content model, gives us cues for the future design of the digital workplace, with connected habitats and sustainable information architecture. Where people don’t get lost, and have wayfinding means to survive everyday work practices.

This is where how you manage the content in SharePoint and Office 365 is critical.  As we said in our first post it is important you have a good information architecture combined with a good governance framework that helps you to transform your buckets of content from the estate into the cloud.  We have covered information architecture so we now move more towards how governance completes the picture for you.

There are three approaches to the governance your organisation needs to have with SharePoint and Office 365.  You don’t have to use just one.  You can combine some of each to find the right blend for your organisation.  What works best for you will depend on a number of different factors.  Among them:

  • Restricting use – stopping some features from being used e.g. SharePoint Designer
  • Encouraging best practice – guidance and training available
  • Preventing problems – checking content before it is published

Each of these approaches can support your governance strategy.  The key is to understand what you need to use.

Restricting use

You need to be clear why your organisation is using SharePoint and Office 365 and the benefits expected.  This will shape how tight or loose your governance needs to be.

Once you are clear on this, you then need to consider the strategic benefits and drawbacks such as SharePoint Designer and site collection administration rights.

Benefits

  • You control what is being used.
  • You decide who uses a feature e.g. SharePoint Designer.
  • You manage the level of autonomy each site owner has.
  • You find out why someone needs to use a feature.
  • You monitor costs for licences, users, servers, etc.
  • You measure who is using what and why for reporting.

Drawbacks

  • You stifle innovation by not allowing people to test out ideas.
  • You stop legitimate use by asking for permission to use features.
  • You prevent people being able to share knowledge how they wish to.
  • You may be unable to realise the maximum potential of SharePoint.
  • You create unnecessary administration.
  • You risk adding costs without any value to offset them with.

You need to get the balance right with governance that gives you maximum value for the effort needed managing SharePoint and Office 365.

Encourage best practice

The goal from implementing SharePoint and Office 365 is to have an environment that enables employees to publish, share, find and use information easily to help with their work.  They are confident the information is reliable and appropriate, whatever their need for it is.  People also feel comfortable using these tools rather than alternative methods like calling helpdesks or emailing other employees for help.

Encouraging best practice by giving them the opportunity to test to meet their needs is one approach to achieving this.  There are factors you need to consider that can help or hinder the success of using this approach.

Benefits

  • You inform employees of all the benefits to be gained.
  • You train people to use the right tools.
  • You design a registration process to direct people to the right tools.
  • You point employees to guidance on how to follow best practice.
  • You encourage innovation by giving everyone freedom of use.

Drawbacks

  • You can’t prevent people using different tools to those you recommend.
  • You risk confusing employees using content unsure of its integrity.
  • You can’t prevent everyone ignoring best practice when publishing.
  • You may make it difficult for people to share knowledge effectively.
  • Your governance model may be ineffective and need improving.

Getting the balance right between encouraging best practice and the level of governance to deter behaviour which can destroy the value from using SharePoint and Office 365 is critical.

Preventing problems

As well as encouraging best practice, preventing problems helps to reduce time and costs wasted on sorting out unnecessary issues.  While that is the aim of most organisations the practical realities as it is rolled out can divert plans from achieving this.

You need to get the right level of governance in place to prevent problems.  Is it encouraging innovation and keeping governance light touch?  Is it a heavier touch to prevent the ‘wrong’ behaviour and minimise risk of your brand and reputation being damaged?  How much do you want to spend preventing problems?  What does your cost/benefit analysis show?

Benefits

  • People using SharePoint and Office 365 have a great experience (especially the first time they use it).
  • Everyone is confident they can use it for what they need it for without experience problems.
  • Employees don’t waste time calling the helpdesk because many problems have been prevented.
  • Effective governance encourages early adoption and increased knowledge sharing.
  • Costs spent preventing problems are justified by increased productivity and reduced risk of errors.

Drawbacks

  • People find registering difficult and lengthy because of extra steps taken to prevent problems and don’t bother.
  • People find it too restrictive for their needs and it stifles innovation.
  • People turn to other tools (maybe not approved) to meet their needs and ask other people for help to use them.
  • Too restrictive governance prevents most beneficial use by raising the barrier too high for people to use.
  • Costs of preventing problems are higher than benefits to be gained and not justified.

You need to consider the potential benefits and drawbacks before deciding on the level of governance that is right for your organisation.

Remember, it is possible and probably desirable to have different levels of governance for each feature.  It may be lighter for personal views and opinions expressed in MyProfile and MySite but tighter for policies and formal news items in TeamSites.

That is the challenge!  You have so much flexibility to configure the tools to meet your organisation’s needs.  Don’t be afraid to test out on part of your intranet to see what effect it has and involve employees to feed back on their experience before launching it.

The way forward is to create a sustainable information architecture, that supports an information environment that is available on any platform, everywhere, anytime and on any device.  A governance  framework can show roles and responsibilities, how they fit with a strategy and plan with publishing standards as the foundation to a consistently good user experience.

Combining a governance framework and information architecture with the same scope avoids any gaps in your buckets of content being managed or not being found.  It helps you transform from your estate to the cloud successfully.

In our last concluding posts we will dive into more design oriented topics with a helping hand from findability experts and developers. Adding migration thoughts in next post. But first navigating the social graph being people centric, leaving some outstanding questions. How will the graph interoperate if your business runs several clouds, and still have buckets of content elsewhere?

Please join our Live Stream on YouTube the 20th November 8.30AM – 10AM Central European Time
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Mark Morrell's LinkedIn profileMark Morell intranet-pioneer