Making your data F.A.I.R and smart

This is the second post in a new series by Fredric Landqvist & Peter Voisey, explaining how your organisation could best shape its data landscape for the future.

How to create a smart data framework for your organisation

In our last post for you, we presented the benefits of F.A.I.R data, how to make data smarter for search engines and the potentials of an Information Commons. In this post, we’re giving you the pragmatic steps to make your data FAIR by creating and applying your own smart data framework. Your data-sharing dream, internally and externally, is possible.

A smart data framework, using FAIR data principles, encompasses the tooling, models and standards that govern datasets and the different context-specific information systems (registers, catalogues). The data is then ingested and processed (enriched/refined) into smart data, datasets and data catalogues. It can then be used and reused by different applications and e-services via open APIs. In this ecosystem, all actors and information behaviours (personas) interplay: provision agents, owners, builders, enrichers, end-user searchers and referrers.

The workings of a smart data framework

A smart data & metadata catalogue   

A smart data & metadata catalogue (illustrated below), provides an organisational capability that aligns data management with the FAIR data principles. View it not so much as one system to rule them all, but rather an ecosystem that is smart and sustainable. In order to simplify your complex and heterogeneous information environment, this set-up can be  instantiated, as one overarching mechanism. Although we are describing a data and metadata catalogue here, the exact same framework and set up would of course apply also to your organisation’s content, making it smarter and more findable (i.e. it gets the sustainable stamp).

Smart Data Catalogue
The necessary services and component of a smart data catalogue

The above picture illustrates the services and components that, together, build smart data and metadata catalogue capabilities. We now describe each one of them for you:

Processing (Ingestion & Enrichment) for great Findability & Interoperability

  • (A) Ingest, harvest and operate. Here you connect the heterogeneous data sources for ingestion.

The configured input mechanisms describe each of the data sources, with their data, datasets and metadata ready for your catalogue search. Hopefully, at the dataset upload stage, you have provided a good system/form that now provides your search engine with great metadata (i.e. we recommend you use the open data catalogue standard DCAT-AP). The concept upload is interchangeable with either machine-to-machine harvester mechanisms, as with open-data, traditional data integration, or manual provision by human upload effort. (D) Enterprise Metadata Repository: here is the persistent storage of data in both data catalogue, index and graph. All things get a persistent ID (how to design persistent URI) and rich metadata.

  • (B) Enrich, refine analyze, and curate. This is the AI part (NLP, Semantics, ML) that enriches the data and datasets, making them smarter. 

Concepts (read also entities, terms, phrases, synonyms, acronyms etc.) from the data sources are found using named entity extraction (NER). By referring to a Knowledge Graph in the Enricher, the appropriate resources are annotated (“tagged”) with the said concept. It does not end here, however. The concept also takes with it from the Knowledge Graph all of the known relationships it has with other concepts.

Essentially a Knowledge Graph is your encoded domain knowledge in a connected graph format. It is by reading these encoded relationships that the machine “understands” the meaning or aboutness of data.

This opens up a very nice Pandora’s box for your search (understanding query intent) and for your Graphical User Interface (GUI) as your data becomes smarter now through your ability to exploit the relationships and connections (semantics and context) between concepts.

You and AI can have a symbiotic relationship in the development of your Knowledge Graph. AI can suggest new concepts and relationships as new data is added. It is, however, you and your colleagues who determine the of concepts/relationships in the Knowledge Graph – concepts/relationships that are important to your department or business. Remember you can utilise more than one knowledge graph, or part of one, for a particular business need(s) or data source(s). The Knowledge Graph is a flexible expression of your business/information models that give structure to all your data and its access.

Extra optional step: If you can manage not only to index the dataset metadata but the datasets themselves, you can make your Pandora’s box even nicer. Those cryptic/nonsensical field names that your traditional database experts love to create can also be incorporated and mapped (one time only!) into your Knowledge Graph, thus increasing the machine “understanding” of the data. Thus, there is a better chance of the data asset being used more widely. 

The configuration of processing with your Knowledge Graph can take care of dataset versioning, lineage and add further specific classifications e.g. data sensitivity, user access and personal information.

Lastly on Processing, your cultural and system interoperability is immensely improved. We’re not talking everyone speaking the same language here, rather everyone talking their language (/culture) and still being able to find the same thing. In this open and FAIR vocabularies further, enrich the meaning to data and your metadata is linked. System interoperability is partially achieved by exploiting the graph of connections that now “sit over” your various data sources.

Controlled Access (Accessible and Reusable)

  • (C) Access, search and visualize APIs. These tools control and influence the delivery, representation, exploration and consumption/use of datasets and data catalogues via a smarter search (made so by smarter data) and a more intuitive Graphical User interface (GUI).

This means your search can now “understand” user intent from just one or two keyword queries (through known relationship connections in the Knowledge Graph). 

Your search now also caters for your searchers who are searching in an unfamiliar subject area or are just having a query off day. Besides offering the standard results page, the GUI can also present related information (again due to the Knowledge Graph), past related user queries, information and question-answer (Q&A) type material. So: search, discovery, learning, serendipity.

Your GUI can also now become more intuitive, changing its information presentation and facets/filters automatically, depending on the query itself (more sustainable front-end coding). 

An alternative to complex scenario coding also includes the possibility for you to create rules (set in your Knowledge Graph) that can control what data users can access (when, how and where) based on their profile, their role, their location, the time and on the device they are using. This same Knowledge Graph can help push and recommend data for certain users proactively. Accessibility will be possible by using standard communication protocols, open access (when possible), authentication where necessary, and always with metadata at hand.

Reusable: your new smart data framework can help increase the time your Data Managers (/Scientists, Analysts) spend using data (and not trying to find it, the 80/20 data science dilemma). It can also help reduce the risk to your AI projects (50% failure rate) by helping searchers find the right data, with its meaning and context, more easily.  Reuse will also be possible with the design that metadata multiple attributes, use licence and provenance in line with community standards

Users and information behaviour (personas)

Users and personas
User groups and services

From experience we have defined the following broad conceptual user-groups:

  • Data Managers, a.k.a. Data Op’s or Data Scientists
    Data Managers are i.e. knowledge engineers, taxonomists and analysts. 
  • Data Stewards
    Data Stewards are responsible for Data Governance, such as data lineage. 
  • Business Professionals/Business end-users
    Business Users may have a diverse background. Hence Business end-users.
  • Actor System are different information systems and applications and services that integrate information via the rich open APIs from the Smart Data Catalogue

The outlined collaborative actors (E-H user groups) and their interplay as information behaviour (personas) with the data (repository) and services (components), together, build the foundation for a more FAIR data management within your organisation, providing for you at the same time, the option to contribute to an even broader shared open FAIR information commons.

  • (E) Data Op’s workplace and dashboard is a combination of tools supporting Data Op’s data management processes in the information behaviours: data provision agents, enrichers and developers.
  • (F) Data Governance workplace is the tools to support Data Stewards collaborative data governance work with Data Managers in the information behaviours: data owner.
  • (G) Access, search, visualize APIs, is the user experience to explore, find and interact with the catalogue and data in the information behaviours: searcher and referrer.
  • (H) API, is the set of open APIs to support access to catalogue data for consuming information systems in the information behaviours: referrer (a.k.a. data exchange).

Potential tooling for this smart data framework:

We hope you enjoyed this post and understand the potential benefits such a smart data framework incorporating FAIR data principles can have on your data catalogue, or for that matter, your organisational content or even your data swamps.


In the next post, Toward data-centric solutions with Knowledge Graphs, we talk about Knowledge Graphs (KG) and its non-proprietary RDF semantic web tech, how you can create your KG(s) and the benefits they can bring to your future data landscape.

View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Peter Voisey's LinkedIn profilePeter Voisey

A Health Care Information Commons Vision: from frozen assets to liquid gold

This is the second post in a series (1), unpacking interoperability in the healthcare system. The basis in this post is semantic and technical interoperability, hence a systemic overview.

The future of health care relies on the improved flow of captured patient health information across the whole care continuum. This means a shared information system linking systems and devices from participating health care organisations while maintaining patient privacy and security standards. Such a realization would not only enhance the clinician and patient experience but also enable faster treatment and better care coordination for patients.

Information Commons is an information system, …, that exists to produce, conserve, and preserve information for current and future generations.

 A seamless and secure hub, heavily-linked, providing point-of-care access to critical patient data and care decision support information for the delivery of timely care, reducing the duplication of tests and procedures.

All in all, this has to be built upon a participatory community paradigm, where clinicians, policy makers and leaders, and patients share a vision to create an interoperable information space – that is sustainable, regardless of previous lock-in mechanisms set by different technical, and semantic standards, vendors and process and policy making.

Healthcare Information Commons

How do we create a interoperability climate?

 Changes for interoperability lie in the development of new pilots with strong collaboration. They are generally more successful where they are based on patient or illness groups, value-orientated, open and scalable. Post requirements phase, iteration based on early adopters’ feedback can identify the need for improvements and enhancements around the relevancy, format and visual display of data and information, the usability of the solution and provide insight into workflow impact. The Information Commons is also a good arena for clinicians to share positive anecdotes from their experiences upon which scalable pilots can be expanded.

Such developed infrastructure and services can also support or be leveraged by other national or regional health initiatives.

Technical Layers of interoperability

Interoperability can cover many layers but at its basis would be an interoperable access layer that integrates and securely shares clinical data from multiple sources giving one point of access. The user interface (GUI) could then provide and display data and information based on stakeholder users and medical/situational context.

Such a layer would have to accommodate and support various data from the distributed system of actors, aligning both to open standards while at the same time being plastic enough in design and instantiation.

Interoperability not only covers the sharing of information but also its usage. This may include added functionality by the EHR vendor themselves or the creation of further value-adding knowledge layers that can take advantage of both structured and (the untapped wealth of) unstructured data within EHRs.

Findwise in its EU funded KConnect project is doing just that. It is currently collecting use case studies from Jönköping (RJI/Qulturum) in order to create a pilot solution for clinicians to take advantage of ‘hidden’ textual data.

Questions of interoperability also lie in the physical user experience of the systems themselves. Should the basic layer provided by EHR vendors be open to include value-added software from other parties, should it be embedded or be made into another GUI? Which ultimately is best for the clinician workflow and the agility of software solutions in supporting new value-based outcomes and reiteration for improvements in efficiency and effectiveness?

Semantic Transformer

The annotations made in the healthcare systems across different domains, all have very similar outset, but lack coherent interoperable mechanism to work smoothly outside the local context. On a international, and national and regional level there should be services that acts as the electric grid to provide society with energy to be used in many contexts. A semantic grid that host controlled vocabularies within the domain, but also share practices and processes. With the use of open standards these could bridge across organisational boundaries and help clean the current messy Healthcare information space.

The healthcare information commons, do not per se have to be one system, but rather an interoperable set of services/systems that share standards to be able to exchange information and data. Very similar to they way Internet and linked data work today –  not restricted by walled gardens. The governance of the commons, should be a matter of public services, with sustainable resources and open governance agenda that can invite participation and engagement. No single actor in the network, be it a large hospital, private caretaker or regional public governing body will be able take care of this single-handedly. It should be a true “commons” undertaking!

The infusion of the Information Commons into everyday healthcare provisioning use cases with semantic transformer applications could be in several modalities: finding and acting upon information or contributing in the local context.

In the data entry or capture point, there will be options to add semantic layers and attributes to the type of content and data provisioned. An easy way to illustrate this, is the emerging use of schema.org templated entities and properties for the MedicalTypes, MedicalConditions, Drugs, Guidelines, Codes from controlled vocabularies like SnoMedCT, Mesh, ICD10 and the like.

 Analogously using digital cameras from smartphones or other devices, means that the user might add “some” metadata or tags about the picture. Devices and sensors add more layers of granularity with attributes that most end-users, never see or bother about. These extra resource descriptions, will interplay with cloud based services as Google Photos – where different algorithms reformat, package the content into new forms, as contextual albums, scenes and so forth.

 A set of semantic transformer application layers should be intertwingled with the Healthcare Information Commons. Firstly to make easy linkages between data sets – as the Web of Data scenarios and Linked Data propose –  but also to  provide smarter integration points in back-end supporting processes in the Healthcare systems where more private and locked-in data-sets exist about the patient conditions, treatments and drugs etc.

 The semantic transformer applications could both be open api:s developed by the community for the commons, but also could be commercial applications provided by line-of-business specialist software vendors. As long as all of these layers, are compliant with the open standards!

For such legacy systems as EHR , and off-the-shelf healthcare applications and business applications that are semantically impaired, these semantic transformer applications could work as a repair-kit for already old broken systems. Consequently there would be no need to overhaul all legacy software within the caretaker’s organisation. A kind of smoother migration path to interoperability.

There also exists the need for semantic interoperability between the contextual patient information within the EHR and the provision of clinical decision support information. This could be in the form of internal medical guidelines and best practices, or from external resources such as medical journals or clinical trial reports.

The KConnect project are providing semantic annotation and semantic search services in different languages for clinicians and researchers to access the very latest in medical literature. This is achievable by semantically annotating required medical information (EHRs, guidelines, journals etc) and having the semantic search engine take full advantage of known key medical entities/concepts and their relationships.

Through the indexing of new information about drug usage, best practices, guidelines, new clinical trials and journals, clinicians then access up-to-date relevant information whenever they need.

In the near future to maximise both clinician and patient user engagement with EHRs, different uses and views of the EHR will have to be driven by suitable context and stakeholder semantics.

Shared Decision making

When moving into valued-based health care and outcome measurement, (as presented here by Sveus), it is critical that all actors participate on a connected level field, so that communication between healthcare practitioners and patients and their social networks works.  This includes the need for shared norms and definitions as well as systems to support the decision making – and obviously a harmonised set of metrics to measure outcomes.

As presented by Peter Ubel, in his talks and recent book on Critical Decisions, it is key that we are able to share a common view between the clinician and the patient. All practitioners share jargon that do not always communicate well to the receiver. Hence there are plenty of communication breakdowns recorded in the everyday practices, leading to “malpractice” in the worst cases for the patient. In the last couple of decades, there has been a shift in power relations between healthcare professionals and patients and their families. Patient empowerment is a good thing, but if things get lost in translation, there is the risk that critical decisions are not fully supported.

With a Healthcare Information Commons pool of resources, there lays opportunities to guide patients and practitioners in their critical decision making. But also to strengthen the learning and innovation within the communities of practice, with open feedback loops to the pool.

Privacy & Security upfront

Just as data interoperability can be seen as the sharing of data, data security can be seen as the sharing of data in the right way and data privacy seen as the sharing of data with the right person in the right way. We are naturally concerned as to who may be using our data and want to be able to control its use.

The boundary between citizens’ App data and their medical data is blurring rapidly as App developments and sensors continue to provide new and different data that the individual, health care and clinical research can capitalise on in the effort to move towards better wellbeing and more value-based healthcare.

While data privacy and security have become the headline darlings of the media, they can often be distractors of innovation, often masking the true benefits of the flow of information. Just as with physical assets there are best practices for data misuse prevention, protection and policing. The majority of misuse or abuse of personal data is more often caused by human error and misjudgement than by the failure of technology.

Data interoperability can be better supported when services have clear guidelines to inform citizens as to who, when and how their data is shared, for what purpose and the available steps to alter said process. A better informed public would then see more free data resources being used for clinical research e.g. the Million Hearts initiative in the US where citizen data is being used to lower heart attacks and strokes.

Open regulations, collaboration and co-ordination along with risk assessment and protection practices such as encryption, anonymisation and de-identification, all can go a long way to allowing secure data interoperability, be it personal or aggregated data. IT has the potential too of rule-based access and forensic data access reports. No system can be made fool-proof, however precautions and the presence of well-designed data breach response plan are achievable.

Obviously we do not want all our healthcare records to be open in the air for anybody to use or read, as little as we want our financial records to be in the open. Privacy is really key! The means with the Information Commons should work with aggregated data. Not the singular set of records for one patient.

Patient security derives the need to a more free flow of data between actor systems. The medical conditions and contexts sets the standards for sharing, where extracts or segments should be possible to share aligned with privacy policies.

Future real-life experience exposé

Having a recent Swedish report on diabetes care and outcome measurement in mind. It makes sense, to illustrate the case of a diabetes patient living and acting in Göteborg, West of Sweden. They have a medical condition, being a lifelong journey with an endocrine system out of order. This has a great impact on the patient’s everyday life, and diabetes related complications. With good life balance to training, exercise and eating habits, it is possible to keep the glucose patterns in such a way that your life expectancy will equal to anybody else.

The use of personal choices to trigger improved behavior, gives the person options to chose selected wellbeing (e.g. Weight Watchers), fitness (e.g. Runkeeper) and health monitoring applications. In most cases these are closed down ecosystems, e.g. iOS included Health app, with options to share in social-media (about your progress, in terms of eating well, or improve your personal training). Many Life Science corporations are developing medical condition / disease area / treatment specific Health monitoring applications (e.g. FreeStyle Libre from Abbot for improving Glucose Monitoring) that clinicians recommend during patient consultations.

For clinical researchers there are ecosystem specific toolkits, like the open-sourced Apple Research Kit.  The existence of a closed ecosystem naturally makes it more problematic to share and exchange data. In this space a Open Standards based on the idea Information Commons makes sense too – where semantic translators could improve the transmission of data from one closed ecosystem to another, without privacy infringement.

A Personal Health Record (PHR) , is a health record where health data and information related to the care of a patient is maintained by the patient

In a future more seamlessly interoperable world, the citizen / patient should be provided one-secure-access point to his/hers health account, e.g. in Sweden 1177 and Mina Vårdkontakter and Hälsa för mig.

The outstanding question: How to get interoperability between PHR and Wellbeing, Fitness and Health apps where it is easy to share vital data bits in a sound manner?

In this scene, open standards should be applied to create a make-do semantic transformation.

Lastly – interoperability within the Professional Clinician Workplace?

The statements and real-life stories from the trenches in any clinical workplace, show a mess of supporting information systems. EHRs that by no means either cooperate or interoperate. Many clinicians realise that they have to do data provision into a handful of systems with significant double manual workload. This comes with risks, given the stressful environment, and many “malpractice” incidents can arise from this workplace disorder.

Each system support its part of the process. While some software suites try to close-down into one-system to ‘rule them all paradigm,’ they still barely lean upon any open standards and they lack semantic and structured ways for the use of data and information outside of the supporting system’s narrow scope.

 A diabetes nurse (post patient consultation) has to enter data into more than 10 different areas, including quality assurance and measurement systems e.g. NDR in Sweden. In some cases there have been integrated point-to-point solutions put in place, but mostly this is not the case and so unnecessary frustration is created.

In every intervention where clinicians and patients communicate, regardless of it being online, remote, on-site, there should be opportunities to tap into the Healthcare Information Commons space. With the potential to find recent new medical treatments, emerging standards/guidelines, breaking news for clinicians as well as patient-oriented and formatted communications. In the best of worlds, semantic translator applications will bridge between ecosystems inside the personal health space as well as into the workplace environment for clinicians – helping, guiding and improving all dimensions of interoperability.

Concluding remarks

Having value-based Healthcare and Outcome Measurement domain as a specific health care change driver, will push the use of standards on all levels to the limit. In the following blog post in this series, the ambition is to unpack information governance, since the data ownership and trust also have to be ironed out. And as stated by Prof Michael E. Porter, the capture of data to do proper Outcome Measurement is one of the major road-blocks ahead. The orchestration of all resources and governance still have to be unfolded. Happily some building blocks to the Healthcare Information Commons have emerged, so we do not need to reinvent the wheel:

  • Wikimedia realm “commons“- with all entries of semantic useful data in wikidata.org
  • Standard Sets for Medical Conditions by international collaboration at ICHOM, and in Sweden Sveus. Standards from Hl7 FHIR, W3C and Web of Data / Semantic Web. The Swedish National Board of Health and Welfare, have an embroic information structure (not in semantic machine readible, RDF, format). Information intermediaries like Google have settle for simple schemas for health and medicin.
  • Open Innovation, and the “open” paradigm, will change evidence based medicine, Bad Pharma and Science on a sociatal level, as stated by Ben Goldacre (TED) where we as patient together with clinicians are able to question treatments based on open data, and improve quality to Healthcare Information Commons.
  • The technology stack with smarter devices, sensors and things, along with Internet anywhere with cognitive computing and computational knowledge on-top of the commons will bring forward semantic translators.
  • New leaps in collaborative work and development with the use of the notebook theme, language and platform agnostic ways.

Making sense, defrosting health data into liguid gold improving healthcare for all.

For more information on Findwise research, please visit KConnect and Orios (Open Standards)


View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Peter Voisey's LinkedIn profilePeter Voisey

Enterprise-Linked-Data and the Connected Digital Workplace

The emerging hyper-connected and agile enterprises of today are stigmatised by their IS/IT-legacy, so the question is: Will emerging web and semantic technologies and practices undo this stigma?

The Shift

Semantic Technologies and Linked-Open-Data (LOD) have evolved since Tim Berners-Lee introduced their basic concepts, and they are now part of everyday business on the Internet, thanks mainly due to their uptake by information and data-run companies like Google, social networks like Facebook and large content sites, like Wikipedia. The enterprise information landscape is ready to be enhanced by semantic web, to increase findability and usability. This change will enable a more agile digital workplace where members of staff can use cloud based services, anywhere, anytime on any device, in combination with the set of legacy systems backing their line-of-business. All in all, more efficient organising principles for information and data.

The Corporate Information Landscape of today

In everyday workplace we use digital tools to cope with the tasks at hand. These tools have been set into action to address meta models to structure the social life dealing with information and data. The legacy of more than 60 years of digital records keeping, has left us in an extremely complex environment, where most end-users have a multitude of spaces where they are supposed to contribute. In many cases their information environment lacks interoperability.

A good, or rather bad example of this, is the electronic health records (EHR) of a hospital, where several different health professionals try to codify their on-going work in order to make better informed decisions regarding the different medical treatments. While this is a good thing, it is heavily hampered with closed-down silos of data that do not work in conjunction with the new more agile work practices. It is not uncommon to have more than 20 different information systems employed to do provisioning during a workday.

The information systems architecture, in any organisation or enterprise, may comprise of home-grown legacy systems from the past, or bought off-the-shelf software suites and extremely complex enterprise-wide information systems like ERP, BI, CRM and the like. The connections between these information systems (or integration points) often resemble “spaghetti” syndrome, point-to-point. The work practice for many IT professionals is to map this landscape of connections and information flows, using for example Enterprise Architecture models. Many organisations use information integration engines, like enterprise-service-bus applications, or master data applications, as means to decouple the tight integration and get away from the proprietary software lock-in.

On top of all these schema-based, structured data, information systems, lies the social and collaborative layer of services, with things like intranet (web based applications), document management, enterprise wide social networks (e.g. Yammer) and collaborative platforms (e.g SharePoint) and more obviously e-mail, instant messaging and voice/video meeting applications. All of these platforms and spaces where one  carries out work tasks, have either semi-structured (document management) or unstructured data.

Wayfinding

A matter of survival in the enterprise information environment, requires a large dose of endurance, and skills. Many end-users get lost in their quest to find the relevant data when they should be concentrating on making well-informed decisions. Wayfinding is our in-built adaptive way of coping with the unexpected and dealing with it. Finding different pathways and means to solve the issues. In other words … Findability.

Outside-in and Inside-Out

Today most organisations and enterprises workers act on the edge of the corporate landscape – in network conversations with customers, clients, patients/citizens, partners, or even competitors, often employing means not necessarily hosted inside the corporate walls. On the Internet we see newly emerging technologies become used and adapted at a faster rate and in a more seamless fashion than the existing cumbersome ones of the internal information landscape. So the obvious question raised in all this flux is: why can’t our digital workplace (the inside information landscape) be as easy to use and to find things / information as in the external digital landscape? Why do I find knowledgeable peers in communities of practice more easily outside than I do on the inside? Knowledge sharing on the outpost of the corporate wall is vivid, and truly passionate whereas inside it is pretty stale and lame to say the least.

Release the DATA now

Aggregate technologies, such as Business Intelligence and Datawarehouse, use a capture, clean-up, transform and load mechanism (ETL) from all the existing supporting information systems. The problem is that the schemas and structures of things do not compile that easily. Different uses and contexts make even the most central terms difficult to unleash into a new context. This simply does not work. The same problem can be seen in the enterprise search realm where we try to cope with both unstructured or semi-structured data. One way of solving all this is to create one standard that all the others have to follow and including a least common denominator combined with master data management. In some cases this can work, but often the set of failures fromsuch efforts are bigger than those arising from trying to squeeze an enterprise into a one-size-fits-all mega-matrix ERP-system.

Why is that? you might ask, from the blueprint it sounds compelling. Just align the business processes and then all data flows will follow a common path. The reality unfortunately is way more complex because any organisation comprises of several different processes, practices, professions and disciplines. These all have a different perspectives of the information and data that is to be shared. This is precisely why we have so many applications in the first place! To what extent are we able to solve this with emerging semantic technologies? These technologies are not a silver bullet, far from it! The Web however shows a very different way of integration thinking, with interoperability and standards becoming the main pillars that all the other things rely on. If you use agreed and controlled vocabularies and standards, there is a better chance of actually being able to sort out all the other things.

Remember that most members of staff, work on the edges of the corporate body, so they have to align themselves to the lingo from all the external actor-networks and then translate it all into codified knowledge for the inside.

Semantic Interoperability

Today most end-users use internet applications and services that already use semantic enhancements to bridge the gap between things, without ever having to think about such things. One very omnipresent social network is Facebook, that relies upon the FOAF (Friend-of-a-Friend) standard for their OpenGraph. Using a Graph to connect data, is the very corner stone of linked-data and the semantic web. A thing (entity) has descriptive properties, and relations to other entities. One entity’s property might be another entity in the Graph. The simple relationship subject-predicate-object. Hence from the graph we get a very flexible and resilient platform, in stark contrast to the more traditional fixed schemas.

The Semantic Web and Linked-Data are a way to link different data sets that may grow from a multitude of schemas and contexts into one fluid interlinked experience. If all internal supporting systems or at least the aggregate engines could simply apply a semantic texture to all the bits and bytes flowing around, it could well provide a solution to the area where other set ups have failed. Remember that these linked-data sets are resilient by nature.

There is a  set of controlled vocabularies (thesauri, ontologies and taxonomies) that capture all the of topics, themes and entities that make up the world. These vocabs have to some extent already been developed, classified and been given sound resource descriptors (RDF). The Linked-Open-Data clouds are experiencing a rapid growth of meaningful expressions. WikiData, dbPedia, Freebase and many more ontologies have a vast set of crispy and useful data that when intersected with internal vocabularies, can make things so much easier. A very good example of such useful vocabularies, are the ones developed by professional information science people is that of the Getty Institute’s recently released thesari for AAT (Arts and Architecture), CONA (Cultural Object Authority) and TGN (Geographical Names). These are very trustworthy resources, and using linked-data anybody developing a web or mobile app can reuse their namespace for free and with high accuracy. And the same goes for all the other data-sets in the linked-open-data cloud. Many governments have declared open data as the main innovation space in which to release their things, under the realm of the “Commons”.

Inaddition to this, all major search engines have agreed on a set of very simple-to-use schemas captured in the www.schema.org world. These schemas have been very well received from their very inception by the webmaster community. All of these are feeding into the Google Knowledge Graph and all the other smart-things (search-enabled) we are using daily.

From the corporate world, these Internet mega-trends, have, or should have, a big impact on the way we do information management inside the corporate walls. This would be particularly the case if the siloed repositories and records were semantically enhanced from their inception (creation), for subsequent use and archiving. We would then see more flexible and fluid information management within the digital workplace.

The name of the game is interoperability at every level: not just technical device specifics, but interoperability at the semantic level and at the level we use governing principles for how we organise our data and information, regardless of their origin.

Stepping down, to some real-life examples

In the law enforcement system in any country, there is a set of actor-networks at play: the police, attorneys, courts, prisons and the like. All of them work within an inter-organisational process from capturing a suspect, filing a case, running a court session, judgement, sentencing and imprisonment; followed at the end by a reassimilated member of society.  Each of these actor-networks or public agencies have their own internal information landscape with supporting information systems, and they all rely on a coherent and smooth flow of information and data between each other. The problem is that while they may use similar vocabularies, the contexts in which they are used may be very different due to their different responsibilities and enacted environment (laws, regulations, policies, guidelines, processes and practices) when looking from a holistic perspective.

IA LOD Innovation

A way to supersede this would be to infuse semantic technologies and shared controlled vocabularies throughout, so that the mix of internal information systems could become interoperable regardless of the supporting information system or storage type. In such a case linked-open-data and semantic enhancements could glue and bridge the gaps to form one united composite, managed by just one individual’s record keeping. In such a way, the actual content would not be exposed, rather a metadata schema would be employed to cross any of the previously existing boundaries.

This is a win-win situation, as semantic technologies and any linked-open-data tinkering use the shared conversation (terms and terminologies) that already exists within the various parts of the process. While all parts cohere to the semantic layers, there is no need to reconfigure  internal processes or apply other parties’ resource descriptions and elements. In such a way only parts of schemas are used that are context specific for a given part of a process, and so allowing the lingo of the related practices and professions to be aligned.

This is already happening in practice in the internal workplace environment of an existing court, where a shared intranet is based on such organising principles as already mentioned, uses applied sound and pragmatic information management practices and metadata standards like Dublin Core and Common Vocabularies –  all of which are infused in Content Provisioning.

For the members of staff, working inside a court setting, this is a major improvement, as they use external databases everyday to gain insights in order to carry out their duties. And when the internal workplace uses such a set up, their knowledge sharing can grow –  leading to both improved wayfinding and findability.

Yet another interesting case, is a service company that operates on a global scale. They are an authoritative resource in their line-of-business, maintaining a resource of rules and regulations that have become a canonical reference. By moving into a new expanded digital workplace environment (internet, extranet and intranet) and using semantic enhancement and search, they get a linked-data set that can be used by clients, competitors and all others working within their environment. At the same time their members of staff can use the very same vocabularies to semantically enhance their provision of information and data into the different information systems internally.

The last example is an industrial company with a mix of products within their line-of-business. They have grown through M&A over the years, and ended up in a dead-end mess of information systems that do not interoperate at all. A way to overcome the effect of past mergers and aquisitions, was to create an information governance framework. Applying it  with MDM and semantic search they were able to decouple data and information, and as a result making their workplace more resilient in a world of constant flux.

One could potentially apply these pragmatic steps to any line of business, since most themes and topics have been created and captured by the emerging semantic web and linked-data realm. It is only a matter of time before more will jump on this bandwagon in order to take advantage of changes that have the ability to make them a canonical reference, and a market leader. Just think of the film industry’s IMDB.

A final thought: Are the vendors ready and open-minded enough to alter their software and online services in order to realise this outlined future enterprise information landscape?

For more information please read these online resources, or go for the executive brief video clip:
Enterprise-Linked-Data
http://testing.rachaelkalicun.info/led_book/led-contents.html

Exec Brief

Europeana brief for memory institutions using linked-open-data:
http://en.wikipedia.org/wiki/File:Linked-open-data-Europeana-video.ogv

Linked-Open-Data network Sweden 2014 presentation:
http://livingarchives.mah.se/2014/03/linked-data-2014/
and Fredric’s talk about semantic enhanced citizen participation and slides.

The future linked-data enterprise, from Intranätverk conference in Göteborg, in May 2014
Fredric Landqvist and Kerstin Forsbergs’s talk, and slides.