Query Completion with Apache Solr

There are plenty of names for this functionality: query completion, suggestions, auto-complete, auto-suggest, word completion, type ahead and maybe some more. Even if we may point slight differences between them (suggestions can base on your index documents or external input such users queries), from technical point of view it’s all about the same: to propose a query for the end user.

google-suggestearly Google Suggest from 2008. Source: http://www.wpromote.com/blog/4-things-in-08-that-changed-the-face-of-search/

 

Suggester feature was started 8 years ago by Google, in 2008. Users got used to the query completion and nowadays it’s a common feature of all mature search engines, e-commerce platforms and even internal enterprise search solutions.

Suggestions help with navigating users through the web portal, allow to discover relevant content and recommend popular phrases (and thus search results). In the e-commerce area they are even more important because well implemented query completion is able to high up conversion rate and finally – increase sales revenue. Word completion never can lead to zero results, but this kind of mistake is made frequently.

And as many names describe this feature there are so many ways to build it. But still it’s not so trivial task to implement good working query completion. Software like Apache Solr doesn’t solve whole problem. Building auto-suggestions is also about data (what should we present to users), its quality (e.g. when we want to suggest other users’ queries), suggestions order (we got dozens matches, but we can show only 5; which are the most important?) or design (user experience or similar).

Going back to the technology. Query completion can be built in couple of ways with Apache Solr. You can use mechanisms like facets, terms, dedicated suggest component or just do a query (with e.g. dismax parser).

Take a look at Suggester. It’s very easy to run. You just need to configure searchComponent and requestHandler. Example:

<searchComponent name="suggester" class="solr.SuggestComponent">
  <lst name="suggester">
    <str name="name">suggester1</str>
    <str name="lookupImpl">FuzzyLookupFactory</str>
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="field">title</str>
    <str name="weightField">popularity</str>
    <str name="suggestAnalyzerFieldType">text</str>
  </lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <str name="suggest">true</str>
    <str name="suggest.count">10</str>
  </lst>
  <arr name="components">
    <str>suggester</str>
  </arr>
</requestHandler>

SuggestComponent is a ready-to-use implementation, which is responsible for serving up suggestions based on commands and queries. It’s an efficient solution, i.e. because it works on structure separated from main index and it’s being kept in memory. There are some basic settings like field used for autocompleting or defining text analyzing chain. LookImpl defines how to match terms in index. There are about 10 algorithms with different purpose. Probably the most popular are:

  • AnalyzingLookupFactory (default, finds matches based on prefix)
  • FuzzyLookupFactory (finds matches with misspellings),
  • AnalyzingInfixLookupFactory (finds matches anywhere in the text),
  • BlendedInfixLookupFactory (combines matches based on prefix and infix lookup)

You need to choose the one which fulfill your requirements. The second important parameter is dictionaryImpl which represents how indexed suggestions are stored. And again, you can choose between couple of implementations, e.g. DocumentDictionaryFactory (stores terms, weights, and optional payload) or HighFrequencyDictionaryFactory (works when very common terms overwhelm others, you can set up proper threshold).

There are plenty of different settings you can use to customize your suggester. SuggestComponent is a good start and probably covers many cases, but like everything, there are some limitations like e.g. you can’t easily filter out results.

Example execution:

http://localhost:8983/solr/index/suggest?wt=json&suggest.dictionary=analyzingSuggester&suggest.q=lond

suggestions: [
  { term: "london" },
  { term: "londonderry" },
  { term: "londoño" },
  { term: "londoners" },
  { term: "londo" }
]

Another way to build a query completion is to use mechanisms like faceting, terms or highlighting.

The example of QC built on facets:

http://localhost:8983/solr/index/select?q=*:*&facet=on&facet.field=title_keyword&facet.mincount=1&facet.contains=lon&rows=0&wt=json

title_keyword: [
  "blonde bombshell", 2,
  "12-pounder long gun", 1,
  "18-pounder long gun", 1,
  "1957 liga española de baloncesto", 1,
  "1958 liga española de baloncesto", 1
]

Please notice that here we have used facet.contains method, so query matches also in the middle of phrase. It works on the basis of regular expression. Additionally, we have a count for every suggestion in Solr response.

TermsComponent (returns indexed terms and the number of documents which contain each term) and highlighting (originally, emphasize fragments of documents that match the user’s query) can be also used, what is presented below.

Terms example:

<searchComponent name="terms" class="solr.TermsComponent"/>
<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <bool name="terms">true</bool>
    <bool name="distrib">false</bool>
  </lst>
  <arr name="components">
    <str>terms</str>
  </arr>
</requestHandler>
http://localhost:8983/solr/index/terms?terms.fl=title_general&terms.prefix=lond&terms.sort=index&wt=json

title_general: [
  "londinium",
  "londo",
  "london",
  "london's",
  "londonderry"
]

Highlighting example:

http://localhost:8983/solr/index/select?q=title_ngram:lond &fl=title&hl=true&hl.fl=title&hl.simple.pre=&hl.simple.post=

title_ngram: [
  "londinium",
  "londo",
  "london",
  "london's",
  "londonderry"
]

You can also do auto-complete even with usual, full-text query. It has lots of advantages: Lucene scoring is working, you have filtering, boosts, matching through many fields and whole Lucene/Solr queries syntax. Take a look at this eDisMax example:

http://localhost:8983/solr/index/select?q=lond&qf=title_ngram&fl=title&defType=edismax&wt=json

docs: [
  { title: "Londinium" },
  { title: "London" },
  { title: "Darling London" },
  { title: "London Canadians" },
  { title: "Poultry London" }
]

The secret is an analyzer chain whether you want to base on facets, query or SuggestComponent. Depending on what effect you want to achieve with your QC, you need to index data in a right way. Sometimes you may want to suggest single terms, another time – whole sentences or product names. If you want to suggest e.g. letter by letter you can use Edge N-Gram Filter. Example:

<fieldType name="text_ngram" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory minGramSize="1" maxGramSize="50" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

N-Gram is a structure of n items (size depends on given range) from a given sequence of text. Example: term Findwise, minGramSize = 1 and maxGramSize = 10 will be indexed as:

F
Fi
Fin
Find
Findw
Findwi
Findwis
Findwise

With such indexed text you can easily achieve functionality where user is able to see changing suggestions after each letter.

Another case is an ability to complete word after word (like Google does). It isn’t trivial, but you can try with shingle structure. Shingles are similar to N-Gram, but it works on whole words. Example: Searching is really awesome, minShingleSize = 2 and minShingleSize = 3 will be indexed as:

Searching is
Searching is really
is really
is really awesome
really awesome

Example of Shingle Filter:

<fieldType name="text_shingle" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ShingleFilterFactory" maxShingleSize="10" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

What if your users could use QC which supports synonyms? Then they could put e.g. abbreviation and find a full suggestion (NYC -> New York City, UEFA -> Union Of European Football Associations). It’s easy, just use Synonym Filter in your text field:

<fieldType name="text_synonym" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
  </analyzer>
</fieldType>

And then just do a query:

http://localhost:8983//select?defType=edismax&fl=title&q=nyc&qf=title_synonym&wt=json

docs: [
  { title: "New York City" },
  { title: "New York New York" },
  { title: "Welcome to New York City" },
  { title: "City Club of New York" },
  { title: "New York" }
]

Another very similar example concerns language support and matching suggestions regardless of the terms’ form. It can be especially valuable for languages with  the rich grammar rules and declination. In the same way how SynonymsFilter is used, we can configure a stemmer / lemmatization filter e.g. for English (take a look here and remember to put language filter both for index and query time) and expand matching suggestions.

As you can see, there are many ways to run query completion, you need to adjust right mechanism and text analysis based on your own limitations and also on what you want to achieve.

There are also other topics connected with preparing type ahead solution. You need to consider performance issues, they are mostly centered on response time and memory consumption. How many requests will generate QC? You can assume that at least 3 times more than your regular search service. You can handle traffic growth by optimizing Solr caches, installing separated Solr instanced only for suggesting service. If you’ll create n-gram, shingles or similar structures, be aware that your index size will increase. Remember that if you decided to use facets or highlighting for some reason to provide suggester, this both mechanisms make your CPU heavy loaded.

In my opinion, the most challenging issue to resolve is choosing a data source for query completion mechanism. Should you suggest parts of your documents (like titles, keywords, authors)? Or use NLP algorithms to extract meaningful phrases from your content? Maybe parse search/application logs and use the most popular users queries? Be careful, filter out rubbish, normalize users input). I believe the answer is YES – to all. Suggestions should be diversified (to lead your users to a wide range of search resources) and should come from variety of sources. More than likely, you will need to do a hard job when processing documents – remember that data cleaning is crucial.

Similarly, you need to take into account different strategies when we talk about the order of proposed suggestions. It’s good to show them in alphanumeric order (still respect scoring!), but you can’t stop here. Specificity of QC is that application can return hundreds of matches, but you can present only 5 or 10 of them. That’s why you need to promote suggestions with the highest occurrence in index or the most popular among the users. Further enhancements can involve personalizing query completion, using geographical coordinates or implementing security trimming (you can see only these suggestions you are allowed to).

I’m sure that this blog post doesn’t exhaust the subject of building query completion, but I hope I brought this topic closer and showed the complexity of such a task. There are many different dimension which you need to handle, like data source of your suggestions, choosing right indexing structure, performance issues, ranking or even UX and designing (how would you like to present hints – simple text or with some graphics/images? Would you like to divide suggestions into categories? Do you always want to show result page after clicked suggestion or maybe redirect to particular landing page?).

Search engine like Apache Solr is a tool, but you still need an application with whole business logic above it. Do you want to have a prefix-match and infix-match? To support typos and synonyms? To suggest letter after the letter or word by word? To implement security requirements or advanced ranking to propose the best tips for your users? These and even more questions need to be think over to deliver successful query completion.

Generational renewal at work – a search challenge

The big generational shift

There have been discussions surrounding the great generational renewal in the workplace for a while. The 50’s generation, who have spent a large part of their working lives within the same company, are being replaced by an agile bunch born in the 90’s. We are not taken by tabloid claims that this new generation does not want to work, or that companies do not know how to attract them. What we are concerned with is that businesses are not adapting fast enough to the way the new generation handle information to enable the transfer of knowledge within the organisation.

Working for the same employer for decades

Think about it for a while, for how long have the 50’s generation been allowed to learn everything they know? We see it all the time, large groups of employees ready to retire, after spending their whole working lives within the same organisation. They began their careers as teenagers working on the factory floor or in a similar role, step by step growing within the company, together with the company. These employees have tended to carry a deep understanding of how their organisation work and after years of training, they possess a great deal of knowledge and experience. How many companies nowadays are willing to offer the 90’s workers the same kind of journey? Or should they even?

2016 – It’s all about constant accessibility

The world is different today, than 50 years ago. A number of key factors are shaping the change in knowledge-intense professions:

  • Information overload – we produce more and more information. Thanks to the Internet and the World Wide Web, the amount of information available is greater than ever.
  • Education has changed. Employees of the 50’s grew up during a time when education was about learning facts by rote. The schools of today focus more on teaching how to learn through experience, to find information and how to assess its reliability.
  • Ownership is less important. We used to think it was important to own music albums, have them in our collection for display. Nowadays it’s all about accessibility, to be able to stream Spotify, Netflix or an online game or e-book on demand. Similarly we can see the increasing trend of leasing cars over owning them. Younger generations take these services and the accessibility they offer for granted and they treat information the same way, of course. Why wouldn’t they? It is no longer a competitive advantage to know something by heart, since that information is soon outdated. A smarter approach of course is to be able to access the latest information. Knowing how to search for information – when you need it.

Factors supporting the need for organising the free flow of the right information:

  • Employees don’t stay as long as they used to in the same workplace anymore, which for example, requires a more efficient on boarding process. It’s no longer feasible to invest the same amount of time and effort on training one individual since he/she might be changing workplace soon enough anyway.
  • It is much debated whether it is possible to transfer knowledge or not. Current information on the other hand is relatively easy to make available to others.
  • Access to information does not automatically mean that the quality of information is high and the benefits great.

Organisations lack the right tools

Knowing a lot of facts and knowledge about a gradually evolving industry was once a competitive advantage. Companies and organisations have naturally built their entire IT infrastructure around this way of working. A lot of IT applications used today were built for a previous generation with another way of working and thinking. Today most challenges involve knowing where and how to find information. This is something we experience in our daily work with clients. Organisations more or less lack the necessary tools to support the needs of the newer generation in their daily work.

To summarize the challenge: organisations need to be able to supply their new workforce with the right tools to constantly find (and also manipulate) the latest and best information required for them to shine.

Success depends on finding the right information

In order for the new generation to succeed, companies must regularly review how information is handled plus the tools supporting information-heavy work tasks.

New employees need to be able to access the information and knowledge left by retiring employees, while creating and finding new content and information in such a way that information realises its true value as an asset.

Efficiency, automation… And Information Management!

There are several ways of improving efficiency, the first step is often to investigate if parts, or perhaps the entire creating and finding process can be automated. Secondly, attack the information challenges.

When we get a grip of the information we are to handle, it’s time to look into the supporting IT systems. How are employees supposed to find what they are looking for? How do they want to?

We have gotten used to find answers by searching online. This is in the DNA of the 90’s employee. By investing in a great search platform and developing processes to ensure high information quality within the organisation, we are certain the organisation will not only manage the generational renewal but excel in continuously developing new information centric services.

Written by: Maria “Ia” Björk & Joar Svensson

Sensemaking or Digital Despair

Finding our way in the bright, futuristic, data-driven & intertwined world, often taxes us and our digital-hungry senses. Fast rewind to the recent FindabilityDay 2015 and the parade of brilliant speaker talents on stage. Starting of with our dear friend and peer, Martin White, on the topic the future of search.

Human factors, from idea inception to design and practical UX of our digital artifacts. The key has been make-do and ship. This is the reason the more technically-advanced mobiles fell by the wayside 8 years ago Apple’s iPhone.

The social life with information, shapes our daily lives, in a hyper-connected world. It’s still very hard to find that information needle in the haystack, and most days we feel despair when losing the scent of information nuggets. The results from the Findability Survey, spoke clearly. Without sound organising principles to information and data, and a pliable recorded vision, we won’t find anything of value.

Next, moving into an old business model, with Luna’s and Sara’s presentation, a great example, where we see that the orchestration and choreography of their data assets will determine their survival or demise – in conjunction with infused means to information management practices, processes and tools. They showed a new set of facets to delivering on their mission in their line-of business.

Regardless of the line of business, it becomes clear that our fragmented workplace setting now only partly “on tap”. It makes our daily lives a mess, since things do not interoperate. The vision should show the way to a shared information commons, where we all cultivate.

So finally, How do we make sense of any mess?

Answer: Architect a place where you can find comfort with social conventions shared on the information used. Abby Covert, laid out a beautiful tapestry of things we all need to take on, to make sense in everyday life, and life at work. With clear and distinct guardrails, and signposts we don’t feel so distracted or lost. Her talk was a true enlightenment for me, being of the same profession, Information Architect.

View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog

Stay Cleaning and moving boxes for cloud

This is the seventh post in a series (1, 2, 3, 4, 5, 6) on the challenges organisations face as they move from having online content and tools hosted firmly on their estate to renting space in the cloud.  We will help you to consider the options and guide on the steps you need to take.

Starting from our first post we have covered different aspects you need to consider as you take each step including information structure and how it is managed using Office 365 and SharePoint as a technology example.  Planning for migration.

Moving Boxes

Do not even think about moving into the cloud apartment without a proper  cleaning of the content buckets. Moving from an architected household to a rented place, taxes a structured audit. Clean out all redundant, outdated and trivial matter (ROT). The very same habit you have cleaning up the attic when moving out from your old house.

It is also a good idea to decorate and add any features to your new cloud apartment before the content furniture is there.  It means the content will fit with any new design and adapt to any extra functionality with new features like windows and doors.  This can be done by reviewing and updating your publishing templates at the same time.  This will save time in the future.

Leaning upon the information governance standards, it should be easy to address the cleaning before moving, for all content owners who have been appointed to a set of collections or habitats. Most organisations could use a content vacuum cleaner, or rather use the search facilities and metric means to deliver up to date reports on:

  1. Active / in-Active habitats
  2. No clear ownership or the owner has left the building
  3. Metadata and link quality to content and collections to be moved across to the cloud apartments.
  4. Review publishing templates and update features or design to be used in the Cloud

When all active habitats and qualified content buckets have been revisited by their set of curators and information owners. The preparation and use of moving boxes, should be applied.

All moving boxes do need proper tagging, so that any moving company will be able to sort out where about the stuff should be placed in the new house, or building. For collections, and habitats, this means using the very same set of questions stated for adding a new habitat or collection to the cloud apartment house. Who, why, where and so forth, through the use of a structured workflow and form. When this first cleaning steps have been addressed, there should be automatic metadata enhancement, aligned with the information management processes to be used in the new cloud.

With decent resource descriptions and cleaned up content through the audit (ROT), this last step will auto-tag content based upon the business rules applied for the collection or habitat. Then been loaded into the content moving truck, or loading dock. Ready to added to the cloud.

All content that neither have proper assigned information ownership, or are in such a shape that migration can’t be done should persist on the estate or be archived or purged. This means that all metadata and links to either content bucket or habitat that won’t be moved in the first instances, should at least have correct and unique uri:s, address, to this content. And in the case a bucket or habitat have been run down by a demolition firm, purged. All inter-linkage to that piece of content or collection have to be changed.

This is typically a perfect quality report, to the information owners and content editors, that they need to work through prior to actually loading the content on the content dock.

Rubbish and Weed
Finally when all rotten data, deserted habitats and unmanageable buckets have been weeded out. It is time to prepare the moving truck, sending the content into its new destination.

Our final thread will cover how will the organisation and it habitants will be able to find content in this mix of clouds, and things left behind on the old estate? Cloud Search and Enterprise Search, seamless or a nightmare?

Please join our Live Stream on YouTube the 20th November 8.30AM – 10AM Central European Time
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Mark Morrell's LinkedIn profileMark Morell intranet-pioneer

Placemaking, wayfinding and game rules in the Clouds

This is the sixth post in a series (1, 2, 3, 4, 5, 7) on the challenges organisations face as they move from having online content and tools hosted firmly on their estate to renting space in the cloud.  We will help you to consider the options and guide on the steps you need to take.

Starting from our first post we have covered different aspects you need to consider as you take each step including information structure and how it is managed using Office 365 and SharePoint as a technology example.  We will cover more about SharePoint in this post, and placemaking in the cloud.
Funky Village
In SharePoint there are a set of logic chunks. One could decompose the digital workplace into intranet sites, as departmental and organisational buckets; team sites where groups collaborate, and lastly your personal domain being the my site collection. Navigating between these, is a mix of traditional information architecture and search driven content.  When being within a such a habitat as a teamsite, it is not always obvious how to cross-link or navigate to other domains within the digital workplace hosted in Sharepoint.

One way to overcome this, is to render different forms of portals, based upon dynamic navigation. These intersections and aggregates help users to move around the maze of buckets and collections of the content. Sharepoint have very good features, and options to create search-based content delivery mechanisms.

 A metadata and search-based content model, gives us cues for the future design of the digital workplace, with connected habitats and sustainable information architecture. Where people don’t get lost, and have wayfinding means to survive everyday work practices.

This is where how you manage the content in SharePoint and Office 365 is critical.  As we said in our first post it is important you have a good information architecture combined with a good governance framework that helps you to transform your buckets of content from the estate into the cloud.  We have covered information architecture so we now move more towards how governance completes the picture for you.

There are three approaches to the governance your organisation needs to have with SharePoint and Office 365.  You don’t have to use just one.  You can combine some of each to find the right blend for your organisation.  What works best for you will depend on a number of different factors.  Among them:

  • Restricting use – stopping some features from being used e.g. SharePoint Designer
  • Encouraging best practice – guidance and training available
  • Preventing problems – checking content before it is published

Each of these approaches can support your governance strategy.  The key is to understand what you need to use.

Restricting use

You need to be clear why your organisation is using SharePoint and Office 365 and the benefits expected.  This will shape how tight or loose your governance needs to be.

Once you are clear on this, you then need to consider the strategic benefits and drawbacks such as SharePoint Designer and site collection administration rights.

Benefits

  • You control what is being used.
  • You decide who uses a feature e.g. SharePoint Designer.
  • You manage the level of autonomy each site owner has.
  • You find out why someone needs to use a feature.
  • You monitor costs for licences, users, servers, etc.
  • You measure who is using what and why for reporting.

Drawbacks

  • You stifle innovation by not allowing people to test out ideas.
  • You stop legitimate use by asking for permission to use features.
  • You prevent people being able to share knowledge how they wish to.
  • You may be unable to realise the maximum potential of SharePoint.
  • You create unnecessary administration.
  • You risk adding costs without any value to offset them with.

You need to get the balance right with governance that gives you maximum value for the effort needed managing SharePoint and Office 365.

Encourage best practice

The goal from implementing SharePoint and Office 365 is to have an environment that enables employees to publish, share, find and use information easily to help with their work.  They are confident the information is reliable and appropriate, whatever their need for it is.  People also feel comfortable using these tools rather than alternative methods like calling helpdesks or emailing other employees for help.

Encouraging best practice by giving them the opportunity to test to meet their needs is one approach to achieving this.  There are factors you need to consider that can help or hinder the success of using this approach.

Benefits

  • You inform employees of all the benefits to be gained.
  • You train people to use the right tools.
  • You design a registration process to direct people to the right tools.
  • You point employees to guidance on how to follow best practice.
  • You encourage innovation by giving everyone freedom of use.

Drawbacks

  • You can’t prevent people using different tools to those you recommend.
  • You risk confusing employees using content unsure of its integrity.
  • You can’t prevent everyone ignoring best practice when publishing.
  • You may make it difficult for people to share knowledge effectively.
  • Your governance model may be ineffective and need improving.

Getting the balance right between encouraging best practice and the level of governance to deter behaviour which can destroy the value from using SharePoint and Office 365 is critical.

Preventing problems

As well as encouraging best practice, preventing problems helps to reduce time and costs wasted on sorting out unnecessary issues.  While that is the aim of most organisations the practical realities as it is rolled out can divert plans from achieving this.

You need to get the right level of governance in place to prevent problems.  Is it encouraging innovation and keeping governance light touch?  Is it a heavier touch to prevent the ‘wrong’ behaviour and minimise risk of your brand and reputation being damaged?  How much do you want to spend preventing problems?  What does your cost/benefit analysis show?

Benefits

  • People using SharePoint and Office 365 have a great experience (especially the first time they use it).
  • Everyone is confident they can use it for what they need it for without experience problems.
  • Employees don’t waste time calling the helpdesk because many problems have been prevented.
  • Effective governance encourages early adoption and increased knowledge sharing.
  • Costs spent preventing problems are justified by increased productivity and reduced risk of errors.

Drawbacks

  • People find registering difficult and lengthy because of extra steps taken to prevent problems and don’t bother.
  • People find it too restrictive for their needs and it stifles innovation.
  • People turn to other tools (maybe not approved) to meet their needs and ask other people for help to use them.
  • Too restrictive governance prevents most beneficial use by raising the barrier too high for people to use.
  • Costs of preventing problems are higher than benefits to be gained and not justified.

You need to consider the potential benefits and drawbacks before deciding on the level of governance that is right for your organisation.

Remember, it is possible and probably desirable to have different levels of governance for each feature.  It may be lighter for personal views and opinions expressed in MyProfile and MySite but tighter for policies and formal news items in TeamSites.

That is the challenge!  You have so much flexibility to configure the tools to meet your organisation’s needs.  Don’t be afraid to test out on part of your intranet to see what effect it has and involve employees to feed back on their experience before launching it.

The way forward is to create a sustainable information architecture, that supports an information environment that is available on any platform, everywhere, anytime and on any device.  A governance  framework can show roles and responsibilities, how they fit with a strategy and plan with publishing standards as the foundation to a consistently good user experience.

Combining a governance framework and information architecture with the same scope avoids any gaps in your buckets of content being managed or not being found.  It helps you transform from your estate to the cloud successfully.

In our last concluding posts we will dive into more design oriented topics with a helping hand from findability experts and developers. Adding migration thoughts in next post. But first navigating the social graph being people centric, leaving some outstanding questions. How will the graph interoperate if your business runs several clouds, and still have buckets of content elsewhere?

Please join our Live Stream on YouTube the 20th November 8.30AM – 10AM Central European Time
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Mark Morrell's LinkedIn profileMark Morell intranet-pioneer

Content Governance – life cycle and reach

This is the fifth post in a series (1, 2, 3, 4, 6, 7 ) on the challenges organisations face as they move from having online content and tools hosted firmly on their estate to renting space in the cloud.  We will help you to consider the options and guide on the steps you need to take.

 Starting from our first post we have covered different aspects you need to consider as you take each step including information structure and how it is managed using Office 365 and SharePoint as a technology example.  We will cover governance and how content should be managed in the cloud in this post.

content buckets

Content created within a context, as either a departmental site, or team habitat has usually only reach and bearing for the local context of fellow members of staff within this unit. Other pieces of content have a coverage that stretches all parts of the business. One simple example, is the bucket of content that makes up the management system, with governing principles, strategies, policies and guidelines that describes the core processes, activities, roles and so forth within an organisation.

Yet other content, as the outcome from a project, will build a bucket of content that either lives in a new context, improves a bucket of content or feeds into yet another following project.

From an information management perspective, it is vital that you have organising principles to all your content, where all these layers have been covered. Both reach, and the life cycle to the set of content.

You need a governance framework that reaches out to every bucket of content.  This covers what is still on your estate as well as the growing amount in the cloud.  All content needs to be managed to remove risks of leakage of sensitive information and prevent people having an inconsistent user experience as they move from one bucket of content in the cloud to another content bucket still on the estate.

You need to make sure people do not see the difference between buckets of content on the estate from content buckets in the cloud.  People using your content to help with their work don’t need to know where the content is kept.  They need to find it as easily as before, preferably even easier!  Content in the cloud  should feel the same and be a natural extension to the digital environment people are already used to.  Manage it with a governance framework that covers every bucket of content and make it more easy to adopt quicker and use more often without caution or delay.

Part of your governance needs to cover publishing standards based on business needs so it is easy to access from any device e.g laptops, tablets and smartphones, and to view without unnecessary authentication levels.  This helps to create that consistent good user experience that encourages people to use your content whether the bucket is in the cloud or not.

A professional team from group HR, might work in their local teamsite, with on-going conversations, work-in-progress documents and so forth. Pieces of their content production leads to governing policies that have a global reach within the organisation, and needs to be linked from the corporate intranet spaces. with versioning and good quality to resource descriptions (meta data). This practice and professional network of HR people, do also share content on a departmental site. With links and resources, that have direct impact on their internal processes. The group of people, have outreaching triggers, and in-bound conversations. And have to balance these two states.

When it comes to temporal content buckets, like a project team site. There are several considerations one have to capture. First where will the outcome and result be stored, when the project is finished. In which context will these content pieces contribute. Second, what should be captured from all on-going conversations (social elements) and work-in-progress and drafts developed during the projects lifecycle? Should a project habitat, be searchable after closing down? Or do the habitat change status, hence all documentation stay within the collection, but the overarching state to the habitat changes? Within Sharepoint these temporal states, versions, workflow and properties. All sum up the organising principles.

If these principles haven’t been ironed out, and been described and decided. Inevitable there will be emerging ghost towns, of dead habitats and lost collections of content. With no governance or ownership whatsoever. All this will become a digital landfill.

We will cover more about SharePoint in our next post in this series. Please visit Michael Sampson‘s recent slides where he takes you through strategy, planning, governance and user adoption for collaboration!
Please join our Live Stream on YouTube the 20th November 8.30AM – 10AM Central European Time
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Mark Morrell's LinkedIn profileMark Morell intranet-pioneer

The Curator – how to cultivate the habitat

This is the fourth post in a series (1, 2, 3, 5, 6, 7) on the challenges organisations face as they move from having online content and tools hosted firmly on their estate to renting space in the cloud.  We will help you to consider the options and guide on the steps you need to take.

In the first post we set out the most common challenges you are likely to face and how you may overcome these.  In the second post we focused on how Office 365 and SharePoint can play a part in moving to the cloud.  In the third post we covered how they can help join up your organisation online using their collaboration tools and features.

In this post we will cover engagement and how sorting and categorisation of artifacts, according to a simple-to-understand and easy-to-use standard, will form the bits and parts of the curation and cultivation process.

CultivationAll document libraries should have one standard listing of all items – with two very distinct audiences: being either actors within the habitat or the people contributing, acting and joining the daily conversation; and secondly, those visitors who pass-by the habitat to collect, link and act upon the content presented within the habitats realm.

This makes it very easy for visitors to find their way around a habitat, if the visitors’ area (business lounge) is pretty much aligned to the overarching theme of the site… and all artifacts that the project team like to share wider, have been listed in a virtual bookshelf, with major versions only. The visitors’ area, has all the relevant data, presented upfront. Basically the answers to the questions set when starting the project. The visitors’ area shouldn’t be a backdrop, but rather a storefront. The content has to be of good quality. Then there should be options to engage with the inner-living-room of the habitat, and enter the messy on-going conversations, depending on access-rights. But the default setting, should always be open for unexpected “internal” (within the realm of the organisation) visitors. If the visitors’ area is compiled in a nice and easy to use manner, most visitors are just happy to pick the best-read from the bookshelf, or at least raise a questions for the team! The social construct for this is “welcoming a stranger”, since that visitor might link to your team’s content, cross-linking into his social-spaces.

The habitat’s livingroom and social conversations, will address new context-specific organising principles. A team might want to add new list-items, sort categories or introduce very local what-goes-where themes. This may be especially so when the team consists of actors who have different roles and responsibilities with regard to the overall outcome. And because of this, there may be a certain mix of tools or services in this one habitat of many, where they hang-out for project tasks.

The contextual adjustment is where the curator has to work on a cultivation process that glues the team together. The shared terminology within a group conversation, is what match their practices together. At inception, the curator picks a bouquet of on-topic terms from the controlled vocabularies. Mixing this with everyday use, and contributions from all members, this can be the fruitful and semantically-enhanced conversations with end-user generated tags or “folksonomies”. The same goes for interior design of links, tools, chosen content types and other forms of artifacts that the team will be needing to fulfill their goals and outcome.

The governance of the habitat, leans very much on the shared experiences in the group, and assigned responsibilities for stewardship and curation – where publishing standards, guidelines and training should be part of the mix.

We will cover more on governance and how content should be managed in the cloud in our next post.
Please join our Live Stream on YouTube the 20th November 8.30AM – 10AM Central European Time
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Mark Morrell's LinkedIn profileMark Morell intranet-pioneer

Housekeeping rules within the Habitat

This is the third post in a series (1, 2, 4, 5, 6, 7) on the challenges organisations face as they move from having online content and tools hosted firmly on their estate to renting space in the cloud.  We will help you to consider the options and guide on the steps you need to take.

 In the first post we set out the most common challenges you are likely to face and how you may overcome these.  In the second post we focused on how Office 365 and SharePoint can play a part in moving to the cloud.  Here we cover how they can help join up your organisation online using their collaboration tools and features.

Habitat

When arranging the habitat, it is key to address the theme of collaboration. Since each of these themes, derives different feature settings of artifacts and services. In many cases, teamwork is situated in the context of a project. Other themes for collaboration are the line of business unit teamwork, or the more learning networks a.k.a communities of practice. I will leave these later themes for now.

Most enterprises have some project management process (i.e. PMP) that all projects do have to adhere to, with added complementary documentation, and reporting mechanisms. This is so the leadership within the organisation will be able to align resources, govern the change portfolio across different business units. Given this structure, it is very easy to depict measurable outcomes, as project documents have to be produced, regardless of what the project is supposed to contribute towards.

The construction of a habitat, or design of a joint workplace, all boils down to pragmatic steps that are aligned with the overarching project framework at hand. Answering a few simple Questions (Inverted Pyramid):

  • Who? will be participating, who will own (organisation) the outcome from the joint effort pulling together a project (dc.contributor ; dc.creator ; dc.provenance ) and reach ( dc.coverage ; dc.audience )
  • What? is the project all about, topic and theme (dc.subject ; dc.title ; dc.description, dc.type )
  • When? will this project be running, and timeline for ending the project. All temporal themes around the life of a project. (dc.date)
  • Where? will participants contribute. What goes where and why? (dc.source ; dc.format ; dc.identifier )
  • Why? usually defined in project description, setting common ground for the goals and expected outcome. ( dc.description )
  • How? defines used processes, practices and tools to create the expected outcome for the project, with links to common resources as the PMP framework, but also links to other key data-sets. Like ERP record keeping and masterdata, for project number and other measures not stored in the habitat, but still pillars to align to the overarching model. (dc.relation)

When these questions have been answered, the resource description for the habitat is set. In Sharepoint the properties bag (code) feature. During the lifespan of the on-going project, all contribution, conversations and creation of things can inherit rule-based metadata for the artifacts from the collections resource description. This reduces the burden weighing on the actors building the content, by enabling automagic metadata completion where applicable. And from the wayfinding, and findability within and between habitats, these resource descriptions will be the building blocks for a sustainable information architecture.

In our next post we will cover how to encourage employee engagement with your content.

Please join our Live Stream on YouTube the 20th November 8.30AM – 10AM Central European Time
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Mark Morrell's LinkedIn profileMark Morell intranet-pioneer

Wagon Trains to the Cloud

This is the first post in a series(2, 3, 4, 5, 6, 7) on the challenges organisations face when they move from having online content and tools hosted firmly on their estate to renting space in the cloud.  We will help you to consider the options and guide you on the steps you need to take.

In this first post we show you  the most common challenges that you are likely to face and how you may overcome these.

A fast migration path, to become tenants in a cloud apartment housing unfolds a set of business critical issues that have to be mitigated:

  • Wayfinding in a maze of content buckets and social habitats.
  • Emerging digital Ghost Towns due to lack of information governance.
  • Digital Landfills without organising principles for information and data.
  • Digital Litter with little or no governance or principles for ownership, with redundant, outdated and trivial (ROT) content.
  • With no strategy or plan, erodes any possibility to positive business outcome from moving to the clouds.

WagonTrn.jpg
WagonTrn” by Tillman at en.wikipedia – Transferred from en.wikipedia by SreeBot. Licensed under Public domain via Wikimedia Commons.

The way forward is to settle a sustainable information architecture, that supports an information environment in constant flux. With information and data interoperable on any platform, everywhere, anytime and on any device.

You need to show how everything is managed and everyone fits together.  A governance framework can help do this.  It can show who is responsible for the intranet, what their responsibilities are and fit with the strategy and plan.  Making it available to everyone on the intranet helps their understanding of how it is managed and supports the business.

The main point is to have a governance framework and information architecture with the same scope to avoid gaps in content being managed or not being found.

Both need to be in harmony and included in any digital strategy.  This avoids competing information architectures and governance frameworks being created by different people that causes people to have inconsistent experiences not finding that they need and using alternative, less efficient, ways in future to find what they need to help with their work.

Background

Building huts, houses and villages is an emerging social construction. As humans we coordinate our common resources, tools and practices. A habitat populated by people needs housekeeping rules with available resources for cooking, cleaning, social life and so on. Routines that defines who does what task and by when in order to keep everything ok.

A framework with governing principles that set out roles and responsibilities along with standards that set out the expected level of quality and quantity of each task that everyone is engaged and complies with, is similar to how the best intranets and digital workplaces are managed.

In the early stages with a small number of habitats the rules for coordination are pretty simple, both for shared resources between the groups and pathways to connect them. The bigger a village gets, it taxes the new structures to keep things smooth. When we move ahead into mega cities with 20+ million people living close, it boils down to a general overarching plan and common infrastructures, but you also need local networked communities, in order to find feasible solutions for living together.

Like villages and mega cities there is a need for consistency that helps everyone to work and live together.  Whenever you go out you know that there are pavements to walk on, roads for driving, traffic lights that we stop at when they turn red and signs to help us show the easiest way to get to our destination.

Sustainable architecture and governance creates a consistent user experience. A well structured information architecture that is aligned with a clear governance framework sets out roles and responsibilities. Publishing standards based on business needs that supports the publishers follow them. This means wherever content is published, whether it is accredited or collaborative, it will appear to be consistent to people and located where they expect it to be.  This encourages a normal way to move through a digital environment with recognizable headings and consistently placed search and other features.

This allegori, fits like a glove when moving into large enterprise-wide shared spaces for collaboration. Whether it is cloud based, on-premises or a mix thereof. The social constructions and constraints still remain the same. As an IT-services on tap, cloud, has certainly constraints for a flexible and adjustable habitual construction to be able to host as many similar habitats as possible. But offers a key solutions to instantly move into! Tenants share the same apartment building (Sharepoint online).

When the set of habitats grow, navigation in this maze becomes a hazard for most of us. Wayfinding in a digital mega city, is extremely difficult. To a large extent, enterprises moving into collaboration suites suffer from the same stigma. Regardless if it is SharePoint, IBM Connections, Google Apps for Work, or a similar setting. It is not a discussion of which type of house to choose, but rather which architecture and plan that work in the emerging environment.

Information Architecture for Digital Habitats

If one leans upon linked-data,  linked-open-data, and emerging semantic web and web of data standards, there are a set of very simple guidelines that one should adhere to when building a Digital Village or Mega City. The 5 stars, our beacon of light!

All collections and shared spaces, should have persistent URI:s, which is the fourth star in the ladder. When it comes to the third star of non-proprietary formats it obviously becomes a bit tricky, since i.e. MS Sharepoint and MS Office like to encourage their own format to things. But if one add resource descriptions to collections and artifacts using Dublin Core elements, it will be possible to connect different types of matter. With feasible and standardised resource descriptions it will be possible to add schemas and structures, that can tell us a little bit more about the artifacts or collection thereof. Hence the option to adhere to the second star. The first star, will inside the corporate setting become key to connect different business units, areas with open licenses and with restrictions to internal use only and in some cases open for other external parties.

Linking data-sets, that is collections or habitats, with different artifacts is the fifth star. This is where it all starts to make sense, enabling a connected digital workplace. Building a city plan, with pathways, traffic signals and rules, highways, roads, neighborhoods  and infrastructural services and more. In other words, placemaking!

Placemaking is a multi-faceted approach to the planning, design and management of public spaces. Placemaking capitalizes on a local community’s assets, inspiration, and potential, with the intention of creating public spaces that promote people’s health, happiness, and well being.

We will cover more about how this applies to Office 365 and SharePoint in our next post.

Please join our Live Stream on YouTube the 20th November 8.30AM – 10AM Central European Time
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Mark Morrell's LinkedIn profileMark Morell intranet-pioneer