Major highlights from Elastic{ON} 2018 – Findwise reporting

Two Elastic fans have just returned from San Francisco and the Elastic{ON} 2018 conference. With almost 3.000 participants this year Elastic{ON} is the biggest Elastic conference in the world.

Findwise regularly organises events and meetups, covering among other topics Elastic. Keep an eye for an event close to you.

Here are some of the main highlights from Elastic{ON} 2018.

Let’s start with the biggest announcement of them all, Elastic is opening the source code of the XPack. This mean that you now not only will be able to access the Elastic stack source code, but also the subscription-based code of XPack that up until now have been inaccessible. This opens the opportunity for you as a developer to contribute back code.

news-elasticon-2018

 

Data rollups is a great new feature for anyone with the need to look at old data but feel the storage costs are too high. With rollups only predetermined metrics and terms will be stored. Still allowing you to analyze these dimensions of your data but no longer being able to view the individual documents.

Azure monitoring available in Xpack Basic. Elastic will in an upcoming 6.x release an Azure Monitoring Module, which will consist of a bundle of Kibana dashboards and make it really easy to get started exploring your Azure infrastructure. The monitoring module will be released as part of the XPack basic version – in other words, it will be free to use.

Forecasting was the big new thing in X-packs Machine learning component. As the name suggest the machine learning module can now not only spot anomalies in your data but also predict how it will change in the future.

Security in Kibana will get an update to make it work more like the Security module in Elasticsearch. This will also mean that one of the most requested security questions for Kibana will be resolved, giving users access to only some dashboards.

Dashboard are great and a fundamental part of Kibana but sometimes you want to present your data in more dynamic ways with less focus on data density. This is where Canvas comes in. Canvas is a new Kibana module to produce infographics rather than dashboards but still using live data from Elasticsearch.

Monitoring of Kubernetes and Docker containers will be made a lot easier with the Elastic stack. A new infra component will be created just for this growing use case. This component will be powered by data collected by Beats which now also has an auto discovery functionality within Kubernetes. This will give an overview of not only your Kubernetes cluster but also the individual containers within the cluster.

Geo capabilities within Kibana will be extended to support multiple map layers. This will make it possible to do more kinds of visualizations on maps. Furthermore, work is being done on supporting not only Geo points but also shapes.

One problem some have had with maps is that you need access to the Elastic map service and if you deploy the Elastic stack within a company network this might not be reachable. To solve this work is being done to make it possible to deploy the Elastic maps service locally.

Elastic acquired SaaS solution Swiftype last year. Since then Swiftype have been busy developing even more features to its portfolio. At current Swiftype comes in 3 different version:

  • Swiftype site Search – An out of the box (OOTB) solution for website search
  • Swiftype Enterprise Search – Currently in beta version, but with focus on internal, cloud based datasources (for now) like G Suite, Dropbox, O365, Zendesk etc.
  • Swiftype App Search – A set of API’s and developer tools that makes it quick to build user faced search applications

 

Elastic has also started to look at replacing the Zen protocol used to keep clusters in sync. Currently a PoC is being made to try to create a consensus algorithm that follow modern academic best practices. With the added benefit to remove the minimum master nodes setting, currently one of the most common pitfalls when running Elasticsearch in production.

ECE – Elastic Cloud Enterprise is big focus for Elastic and make it possible for customers to setup a fully service-based search solution being maintained by Elastic.

If you are interested in hearing more about Elastic or Findwise visit https://findwise.com/en/technology/elastic-elasticsearch

elasticon 2018

 

Writers: Mads Elbrond, regional manager Findwise Denmark & Torsten Landergren, senior expert consultant

Plan for General Data Protection Regulation (GDPR)

Another new regulation from EU? Will this affect us? It seems so complex. Can’t we sit back and wait for the first fine to come and then act if necessary?

We have to care and act – start planning now!

I think we have to care and act now. Start planning now so you get it right. The GDPR is a good thing. This is not another EU thing about the right size of a strawberry or how bendy a banana could be. This is about the fact that all individuals should feel safe giving their personal information to business. Cyber security is a good thing, not protecting our data and our customers’ data is a bad thing for us. Credit card numbers and personal data leaks out from companies worldwide with large business risks, companies don’t just face fines or reputational damage, they can have their permission to issue credit cards and other financial services products withdrawn by the regulator and responsible employees faces imprisonment. We can only guess whether a company needs to be GDPR compliant or not to be allowed to compete in a bidding process?

What is the General Data Protection Regulation (GDPR)?

The General Data Protection Regulation (GDPR) is a new legal framework approved by the European Union (EU) to strengthen and unify data protection of personal information. GDPR will replace the current data protection directive (in Sweden Personuppgiftslagen, PUL) and applies from 25 May 2018.

Who is affected?

GDPR has global reach and applies to all companies worldwide that process personal data of European Union citizens.

Identify personal data and protect it

GDPR widely defines what constitutes personal data. Organisations needs to fully understand what information they have, where it is located and how it was collected. Discover, classify and manage all information, both structured and unstructured data and secure it.

Data breach notifications

GDPR requires organisations to notify the local data protection authority of a data breach within 72 hours after discovery.

Do you have the right to store this information? Explicit consent

Personal data should be gathered under strict conditions. Organisations need to ask for consent to collect personal data and they need to be clear about how they will use the information.

The right of access

Individuals will have the right to obtain access to their personal data and other supplementary information in a portable format. You must provide a copy of the information free of charge. GDPR also give individuals the right to have personal data corrected if it is inaccurate or incomplete.

The right to be forgotten

GDPR also introduces the right to be forgotten, or erased. Data are not to be hold for any longer than absolutely necessary, and data should not be used in any other way than it was originally collected for.

Penalties and fines

Companies that fails to protect customer data adequately will face significant fines up to €20m, or up to 4% of global turnover. This should be a serious incentive for companies to start preparing now.

First steps to GDPR compliance

  1. Create awareness and allocate resources
    First step is to make sure that your organisation is aware of the new EU legislation and what it means for you. How will your business be affected by the new regulation? You need to allocate enough resources, make sure you involve decision-makers and stakeholders in your organisation. Last, but not least, start today!
  2. Content Inventory
    The second step is to discover and classify all your information to identify exactly what types of personal identifiable data you have, where you have it and how it is collected.

Findwise can assist you in this process, please contact Maria Sunnefors and visit our website for more information.

Want to read more?

Read more about the GDPR at Datainspektionen (in Swedish) or at iCO.

How it all began: a brief history of Intranet Search

In accordance to sources, the birth of the intranet fell on a 1994 – 1996, that was true prehistory from an IT systems point of view. Intranet history is bound up with the development of Internet – the global network. The idea of WWW, proposed in 1989 by Tim Berners-Lee and others, which aim was to enable the connection and access to many various sources, became the prototype for the first internal networks. The goal of intranet invention was to increase employees productivity through the easier access to documents, their faster circulation and more effective communication. Although, access to information was always a crucial matter, in fact, intranet offered lots more functionalities, i.e.: e-mail, group work support, audio-video communication, texts or personal data searching.

Overload of information

Over the course of the years, the content placed on WWW servers had becoming more important than other intranet components. First, managing of more and more complicated software and required hardware led to development of new specializations. Second, paradoxically the easiness of information printing became a source of serious problems. There was too much information, documents were partly outdated, duplicated, without homogeneous structure or hierarchy. Difficulties in content management and lack of people responsible for this process led to situation, when final user was not able to reach desired piece of information or this had been requiring too much effort.

Google to the rescue

As early as in 1998 the Gartner company made a document which described this state of Internet as a “Wild West”. In case of Internet, this problem was being solved by Yahoo or Google, which became a global leader on information searching. In internal networks it had to be improved by rules of information publishing and by CMS and Enterprise Search software. In many organizations the struggle for easier access to information is still actual, in the others – it has just began.

cowboys

And the Search approached

It was search engine which impacted the most on intranet perception. From one side, search engine is directly responsible for realization of basic assumptions of knowledge management in the company. From the other, it is the main source of complaints and frustration among internal networks users. There are many reasons of this status quo: wrong or unreadable searching results, lack of documents, security problems and poor access to some resources. What are the consequences of such situation? First and foremost, they can be observed in high work costs (duplication of tasks, diminution in quality, waste of time, less efficient cooperation) as well as in lost chances for business. It must not be forgotten that search engine problems often overshadow using of intranet as a whole.

How to measure efficiency?

In 2002 Nielsen Norman Group consultants estimated that productivity difference between employees using the best and the worst corporate network is about 43%. On the other hand, annual report of Enterprise Search and Findability Survey shows that in situation, when almost 60% of companies underline the high importance of information searching for their business, nearly as 45% of employees have problem with finding the information.
Leaving aside comfort and level of employees satisfaction, the natural effect of implementation and improvement of Enterprise Search solutions is financial benefit. Contrary to popular belief, investments profits and savings from reaching the information faster are completely countable. Preparing such calculations is not pretty easy. The first step is: to estimate time, which is spent by employees on searching for information, to calculate what percentage of quests end in a fiasco and how long does it take to perform a task without necessary materials. It should be pointed out that findings of such companies as IDC or AIIM shows that office workers set aside at least 15-35% of their working hours for searching necessary information.
Problems with searching are rarely connected with technical issues. Search engines, currently present on our market, are mature products, regardless of technologies type (commercial/open-source). Usually, it is always a matter of default installation and leaving the system in untouched state just after taking it “out of the box”. Each search engine is different because it deals with various documents collections. Another thing is that users expectations and business requirements are changing continually. In conclusion, ensuring good quality searching is an unremitting process.

Knowledge workers main tool?

Intranet has become a comprehensive tool used for companies goals accomplishment. It supports employees commitment and effectiveness, internal communication and knowledge sharing. However, its main task is to find information, which is often hide in stack of documents or dispersed among various data sources. Equipped with search engine, intranet has become invaluable working tool practically in all sectors, especially in specific departments as customer service or administration.

So, how is your company’s access to information?


This text makes an introduction to series of articles dedicated to intranet searching. Subsequent articles are intended to deal with: search engine function in organization, benefit from using Enterprise Search, requirements of searching information system, the most frequent errors and obstacles of implementations and systems architecture.

Stay Cleaning and moving boxes for cloud

This is the seventh post in a series (1, 2, 3, 4, 5, 6) on the challenges organisations face as they move from having online content and tools hosted firmly on their estate to renting space in the cloud.  We will help you to consider the options and guide on the steps you need to take.

Starting from our first post we have covered different aspects you need to consider as you take each step including information structure and how it is managed using Office 365 and SharePoint as a technology example.  Planning for migration.

Moving Boxes

Do not even think about moving into the cloud apartment without a proper  cleaning of the content buckets. Moving from an architected household to a rented place, taxes a structured audit. Clean out all redundant, outdated and trivial matter (ROT). The very same habit you have cleaning up the attic when moving out from your old house.

It is also a good idea to decorate and add any features to your new cloud apartment before the content furniture is there.  It means the content will fit with any new design and adapt to any extra functionality with new features like windows and doors.  This can be done by reviewing and updating your publishing templates at the same time.  This will save time in the future.

Leaning upon the information governance standards, it should be easy to address the cleaning before moving, for all content owners who have been appointed to a set of collections or habitats. Most organisations could use a content vacuum cleaner, or rather use the search facilities and metric means to deliver up to date reports on:

  1. Active / in-Active habitats
  2. No clear ownership or the owner has left the building
  3. Metadata and link quality to content and collections to be moved across to the cloud apartments.
  4. Review publishing templates and update features or design to be used in the Cloud

When all active habitats and qualified content buckets have been revisited by their set of curators and information owners. The preparation and use of moving boxes, should be applied.

All moving boxes do need proper tagging, so that any moving company will be able to sort out where about the stuff should be placed in the new house, or building. For collections, and habitats, this means using the very same set of questions stated for adding a new habitat or collection to the cloud apartment house. Who, why, where and so forth, through the use of a structured workflow and form. When this first cleaning steps have been addressed, there should be automatic metadata enhancement, aligned with the information management processes to be used in the new cloud.

With decent resource descriptions and cleaned up content through the audit (ROT), this last step will auto-tag content based upon the business rules applied for the collection or habitat. Then been loaded into the content moving truck, or loading dock. Ready to added to the cloud.

All content that neither have proper assigned information ownership, or are in such a shape that migration can’t be done should persist on the estate or be archived or purged. This means that all metadata and links to either content bucket or habitat that won’t be moved in the first instances, should at least have correct and unique uri:s, address, to this content. And in the case a bucket or habitat have been run down by a demolition firm, purged. All inter-linkage to that piece of content or collection have to be changed.

This is typically a perfect quality report, to the information owners and content editors, that they need to work through prior to actually loading the content on the content dock.

Rubbish and Weed
Finally when all rotten data, deserted habitats and unmanageable buckets have been weeded out. It is time to prepare the moving truck, sending the content into its new destination.

Our final thread will cover how will the organisation and it habitants will be able to find content in this mix of clouds, and things left behind on the old estate? Cloud Search and Enterprise Search, seamless or a nightmare?

Please join our Live Stream on YouTube the 20th November 8.30AM – 10AM Central European Time
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Mark Morrell's LinkedIn profileMark Morell intranet-pioneer

The Curator – how to cultivate the habitat

This is the fourth post in a series (1, 2, 3, 5, 6, 7) on the challenges organisations face as they move from having online content and tools hosted firmly on their estate to renting space in the cloud.  We will help you to consider the options and guide on the steps you need to take.

In the first post we set out the most common challenges you are likely to face and how you may overcome these.  In the second post we focused on how Office 365 and SharePoint can play a part in moving to the cloud.  In the third post we covered how they can help join up your organisation online using their collaboration tools and features.

In this post we will cover engagement and how sorting and categorisation of artifacts, according to a simple-to-understand and easy-to-use standard, will form the bits and parts of the curation and cultivation process.

CultivationAll document libraries should have one standard listing of all items – with two very distinct audiences: being either actors within the habitat or the people contributing, acting and joining the daily conversation; and secondly, those visitors who pass-by the habitat to collect, link and act upon the content presented within the habitats realm.

This makes it very easy for visitors to find their way around a habitat, if the visitors’ area (business lounge) is pretty much aligned to the overarching theme of the site… and all artifacts that the project team like to share wider, have been listed in a virtual bookshelf, with major versions only. The visitors’ area, has all the relevant data, presented upfront. Basically the answers to the questions set when starting the project. The visitors’ area shouldn’t be a backdrop, but rather a storefront. The content has to be of good quality. Then there should be options to engage with the inner-living-room of the habitat, and enter the messy on-going conversations, depending on access-rights. But the default setting, should always be open for unexpected “internal” (within the realm of the organisation) visitors. If the visitors’ area is compiled in a nice and easy to use manner, most visitors are just happy to pick the best-read from the bookshelf, or at least raise a questions for the team! The social construct for this is “welcoming a stranger”, since that visitor might link to your team’s content, cross-linking into his social-spaces.

The habitat’s livingroom and social conversations, will address new context-specific organising principles. A team might want to add new list-items, sort categories or introduce very local what-goes-where themes. This may be especially so when the team consists of actors who have different roles and responsibilities with regard to the overall outcome. And because of this, there may be a certain mix of tools or services in this one habitat of many, where they hang-out for project tasks.

The contextual adjustment is where the curator has to work on a cultivation process that glues the team together. The shared terminology within a group conversation, is what match their practices together. At inception, the curator picks a bouquet of on-topic terms from the controlled vocabularies. Mixing this with everyday use, and contributions from all members, this can be the fruitful and semantically-enhanced conversations with end-user generated tags or “folksonomies”. The same goes for interior design of links, tools, chosen content types and other forms of artifacts that the team will be needing to fulfill their goals and outcome.

The governance of the habitat, leans very much on the shared experiences in the group, and assigned responsibilities for stewardship and curation – where publishing standards, guidelines and training should be part of the mix.

We will cover more on governance and how content should be managed in the cloud in our next post.
Please join our Live Stream on YouTube the 20th November 8.30AM – 10AM Central European Time
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Mark Morrell's LinkedIn profileMark Morell intranet-pioneer

Housekeeping rules within the Habitat

This is the third post in a series (1, 2, 4, 5, 6, 7) on the challenges organisations face as they move from having online content and tools hosted firmly on their estate to renting space in the cloud.  We will help you to consider the options and guide on the steps you need to take.

 In the first post we set out the most common challenges you are likely to face and how you may overcome these.  In the second post we focused on how Office 365 and SharePoint can play a part in moving to the cloud.  Here we cover how they can help join up your organisation online using their collaboration tools and features.

Habitat

When arranging the habitat, it is key to address the theme of collaboration. Since each of these themes, derives different feature settings of artifacts and services. In many cases, teamwork is situated in the context of a project. Other themes for collaboration are the line of business unit teamwork, or the more learning networks a.k.a communities of practice. I will leave these later themes for now.

Most enterprises have some project management process (i.e. PMP) that all projects do have to adhere to, with added complementary documentation, and reporting mechanisms. This is so the leadership within the organisation will be able to align resources, govern the change portfolio across different business units. Given this structure, it is very easy to depict measurable outcomes, as project documents have to be produced, regardless of what the project is supposed to contribute towards.

The construction of a habitat, or design of a joint workplace, all boils down to pragmatic steps that are aligned with the overarching project framework at hand. Answering a few simple Questions (Inverted Pyramid):

  • Who? will be participating, who will own (organisation) the outcome from the joint effort pulling together a project (dc.contributor ; dc.creator ; dc.provenance ) and reach ( dc.coverage ; dc.audience )
  • What? is the project all about, topic and theme (dc.subject ; dc.title ; dc.description, dc.type )
  • When? will this project be running, and timeline for ending the project. All temporal themes around the life of a project. (dc.date)
  • Where? will participants contribute. What goes where and why? (dc.source ; dc.format ; dc.identifier )
  • Why? usually defined in project description, setting common ground for the goals and expected outcome. ( dc.description )
  • How? defines used processes, practices and tools to create the expected outcome for the project, with links to common resources as the PMP framework, but also links to other key data-sets. Like ERP record keeping and masterdata, for project number and other measures not stored in the habitat, but still pillars to align to the overarching model. (dc.relation)

When these questions have been answered, the resource description for the habitat is set. In Sharepoint the properties bag (code) feature. During the lifespan of the on-going project, all contribution, conversations and creation of things can inherit rule-based metadata for the artifacts from the collections resource description. This reduces the burden weighing on the actors building the content, by enabling automagic metadata completion where applicable. And from the wayfinding, and findability within and between habitats, these resource descriptions will be the building blocks for a sustainable information architecture.

In our next post we will cover how to encourage employee engagement with your content.

Please join our Live Stream on YouTube the 20th November 8.30AM – 10AM Central European Time
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Mark Morrell's LinkedIn profileMark Morell intranet-pioneer

Enterprise-Linked-Data and the Connected Digital Workplace

The emerging hyper-connected and agile enterprises of today are stigmatised by their IS/IT-legacy, so the question is: Will emerging web and semantic technologies and practices undo this stigma?

The Shift

Semantic Technologies and Linked-Open-Data (LOD) have evolved since Tim Berners-Lee introduced their basic concepts, and they are now part of everyday business on the Internet, thanks mainly due to their uptake by information and data-run companies like Google, social networks like Facebook and large content sites, like Wikipedia. The enterprise information landscape is ready to be enhanced by semantic web, to increase findability and usability. This change will enable a more agile digital workplace where members of staff can use cloud based services, anywhere, anytime on any device, in combination with the set of legacy systems backing their line-of-business. All in all, more efficient organising principles for information and data.

The Corporate Information Landscape of today

In everyday workplace we use digital tools to cope with the tasks at hand. These tools have been set into action to address meta models to structure the social life dealing with information and data. The legacy of more than 60 years of digital records keeping, has left us in an extremely complex environment, where most end-users have a multitude of spaces where they are supposed to contribute. In many cases their information environment lacks interoperability.

A good, or rather bad example of this, is the electronic health records (EHR) of a hospital, where several different health professionals try to codify their on-going work in order to make better informed decisions regarding the different medical treatments. While this is a good thing, it is heavily hampered with closed-down silos of data that do not work in conjunction with the new more agile work practices. It is not uncommon to have more than 20 different information systems employed to do provisioning during a workday.

The information systems architecture, in any organisation or enterprise, may comprise of home-grown legacy systems from the past, or bought off-the-shelf software suites and extremely complex enterprise-wide information systems like ERP, BI, CRM and the like. The connections between these information systems (or integration points) often resemble “spaghetti” syndrome, point-to-point. The work practice for many IT professionals is to map this landscape of connections and information flows, using for example Enterprise Architecture models. Many organisations use information integration engines, like enterprise-service-bus applications, or master data applications, as means to decouple the tight integration and get away from the proprietary software lock-in.

On top of all these schema-based, structured data, information systems, lies the social and collaborative layer of services, with things like intranet (web based applications), document management, enterprise wide social networks (e.g. Yammer) and collaborative platforms (e.g SharePoint) and more obviously e-mail, instant messaging and voice/video meeting applications. All of these platforms and spaces where one  carries out work tasks, have either semi-structured (document management) or unstructured data.

Wayfinding

A matter of survival in the enterprise information environment, requires a large dose of endurance, and skills. Many end-users get lost in their quest to find the relevant data when they should be concentrating on making well-informed decisions. Wayfinding is our in-built adaptive way of coping with the unexpected and dealing with it. Finding different pathways and means to solve the issues. In other words … Findability.

Outside-in and Inside-Out

Today most organisations and enterprises workers act on the edge of the corporate landscape – in network conversations with customers, clients, patients/citizens, partners, or even competitors, often employing means not necessarily hosted inside the corporate walls. On the Internet we see newly emerging technologies become used and adapted at a faster rate and in a more seamless fashion than the existing cumbersome ones of the internal information landscape. So the obvious question raised in all this flux is: why can’t our digital workplace (the inside information landscape) be as easy to use and to find things / information as in the external digital landscape? Why do I find knowledgeable peers in communities of practice more easily outside than I do on the inside? Knowledge sharing on the outpost of the corporate wall is vivid, and truly passionate whereas inside it is pretty stale and lame to say the least.

Release the DATA now

Aggregate technologies, such as Business Intelligence and Datawarehouse, use a capture, clean-up, transform and load mechanism (ETL) from all the existing supporting information systems. The problem is that the schemas and structures of things do not compile that easily. Different uses and contexts make even the most central terms difficult to unleash into a new context. This simply does not work. The same problem can be seen in the enterprise search realm where we try to cope with both unstructured or semi-structured data. One way of solving all this is to create one standard that all the others have to follow and including a least common denominator combined with master data management. In some cases this can work, but often the set of failures fromsuch efforts are bigger than those arising from trying to squeeze an enterprise into a one-size-fits-all mega-matrix ERP-system.

Why is that? you might ask, from the blueprint it sounds compelling. Just align the business processes and then all data flows will follow a common path. The reality unfortunately is way more complex because any organisation comprises of several different processes, practices, professions and disciplines. These all have a different perspectives of the information and data that is to be shared. This is precisely why we have so many applications in the first place! To what extent are we able to solve this with emerging semantic technologies? These technologies are not a silver bullet, far from it! The Web however shows a very different way of integration thinking, with interoperability and standards becoming the main pillars that all the other things rely on. If you use agreed and controlled vocabularies and standards, there is a better chance of actually being able to sort out all the other things.

Remember that most members of staff, work on the edges of the corporate body, so they have to align themselves to the lingo from all the external actor-networks and then translate it all into codified knowledge for the inside.

Semantic Interoperability

Today most end-users use internet applications and services that already use semantic enhancements to bridge the gap between things, without ever having to think about such things. One very omnipresent social network is Facebook, that relies upon the FOAF (Friend-of-a-Friend) standard for their OpenGraph. Using a Graph to connect data, is the very corner stone of linked-data and the semantic web. A thing (entity) has descriptive properties, and relations to other entities. One entity’s property might be another entity in the Graph. The simple relationship subject-predicate-object. Hence from the graph we get a very flexible and resilient platform, in stark contrast to the more traditional fixed schemas.

The Semantic Web and Linked-Data are a way to link different data sets that may grow from a multitude of schemas and contexts into one fluid interlinked experience. If all internal supporting systems or at least the aggregate engines could simply apply a semantic texture to all the bits and bytes flowing around, it could well provide a solution to the area where other set ups have failed. Remember that these linked-data sets are resilient by nature.

There is a  set of controlled vocabularies (thesauri, ontologies and taxonomies) that capture all the of topics, themes and entities that make up the world. These vocabs have to some extent already been developed, classified and been given sound resource descriptors (RDF). The Linked-Open-Data clouds are experiencing a rapid growth of meaningful expressions. WikiData, dbPedia, Freebase and many more ontologies have a vast set of crispy and useful data that when intersected with internal vocabularies, can make things so much easier. A very good example of such useful vocabularies, are the ones developed by professional information science people is that of the Getty Institute’s recently released thesari for AAT (Arts and Architecture), CONA (Cultural Object Authority) and TGN (Geographical Names). These are very trustworthy resources, and using linked-data anybody developing a web or mobile app can reuse their namespace for free and with high accuracy. And the same goes for all the other data-sets in the linked-open-data cloud. Many governments have declared open data as the main innovation space in which to release their things, under the realm of the “Commons”.

Inaddition to this, all major search engines have agreed on a set of very simple-to-use schemas captured in the www.schema.org world. These schemas have been very well received from their very inception by the webmaster community. All of these are feeding into the Google Knowledge Graph and all the other smart-things (search-enabled) we are using daily.

From the corporate world, these Internet mega-trends, have, or should have, a big impact on the way we do information management inside the corporate walls. This would be particularly the case if the siloed repositories and records were semantically enhanced from their inception (creation), for subsequent use and archiving. We would then see more flexible and fluid information management within the digital workplace.

The name of the game is interoperability at every level: not just technical device specifics, but interoperability at the semantic level and at the level we use governing principles for how we organise our data and information, regardless of their origin.

Stepping down, to some real-life examples

In the law enforcement system in any country, there is a set of actor-networks at play: the police, attorneys, courts, prisons and the like. All of them work within an inter-organisational process from capturing a suspect, filing a case, running a court session, judgement, sentencing and imprisonment; followed at the end by a reassimilated member of society.  Each of these actor-networks or public agencies have their own internal information landscape with supporting information systems, and they all rely on a coherent and smooth flow of information and data between each other. The problem is that while they may use similar vocabularies, the contexts in which they are used may be very different due to their different responsibilities and enacted environment (laws, regulations, policies, guidelines, processes and practices) when looking from a holistic perspective.

IA LOD Innovation

A way to supersede this would be to infuse semantic technologies and shared controlled vocabularies throughout, so that the mix of internal information systems could become interoperable regardless of the supporting information system or storage type. In such a case linked-open-data and semantic enhancements could glue and bridge the gaps to form one united composite, managed by just one individual’s record keeping. In such a way, the actual content would not be exposed, rather a metadata schema would be employed to cross any of the previously existing boundaries.

This is a win-win situation, as semantic technologies and any linked-open-data tinkering use the shared conversation (terms and terminologies) that already exists within the various parts of the process. While all parts cohere to the semantic layers, there is no need to reconfigure  internal processes or apply other parties’ resource descriptions and elements. In such a way only parts of schemas are used that are context specific for a given part of a process, and so allowing the lingo of the related practices and professions to be aligned.

This is already happening in practice in the internal workplace environment of an existing court, where a shared intranet is based on such organising principles as already mentioned, uses applied sound and pragmatic information management practices and metadata standards like Dublin Core and Common Vocabularies –  all of which are infused in Content Provisioning.

For the members of staff, working inside a court setting, this is a major improvement, as they use external databases everyday to gain insights in order to carry out their duties. And when the internal workplace uses such a set up, their knowledge sharing can grow –  leading to both improved wayfinding and findability.

Yet another interesting case, is a service company that operates on a global scale. They are an authoritative resource in their line-of-business, maintaining a resource of rules and regulations that have become a canonical reference. By moving into a new expanded digital workplace environment (internet, extranet and intranet) and using semantic enhancement and search, they get a linked-data set that can be used by clients, competitors and all others working within their environment. At the same time their members of staff can use the very same vocabularies to semantically enhance their provision of information and data into the different information systems internally.

The last example is an industrial company with a mix of products within their line-of-business. They have grown through M&A over the years, and ended up in a dead-end mess of information systems that do not interoperate at all. A way to overcome the effect of past mergers and aquisitions, was to create an information governance framework. Applying it  with MDM and semantic search they were able to decouple data and information, and as a result making their workplace more resilient in a world of constant flux.

One could potentially apply these pragmatic steps to any line of business, since most themes and topics have been created and captured by the emerging semantic web and linked-data realm. It is only a matter of time before more will jump on this bandwagon in order to take advantage of changes that have the ability to make them a canonical reference, and a market leader. Just think of the film industry’s IMDB.

A final thought: Are the vendors ready and open-minded enough to alter their software and online services in order to realise this outlined future enterprise information landscape?

For more information please read these online resources, or go for the executive brief video clip:
Enterprise-Linked-Data
http://testing.rachaelkalicun.info/led_book/led-contents.html

Exec Brief

Europeana brief for memory institutions using linked-open-data:
http://en.wikipedia.org/wiki/File:Linked-open-data-Europeana-video.ogv

Linked-Open-Data network Sweden 2014 presentation:
http://livingarchives.mah.se/2014/03/linked-data-2014/
and Fredric’s talk about semantic enhanced citizen participation and slides.

The future linked-data enterprise, from Intranätverk conference in Göteborg, in May 2014
Fredric Landqvist and Kerstin Forsbergs’s talk, and slides.

Swedish language support (natural language processing) for IBM Content Analytics (ICA)

Findwise has now extended the NLP (natural language processing) in ICA to include both support for Swedish PoS tagging and Swedish sentiment analysis.

IBM Content Analytics with Enterprise Search (ICA) has its strength in natural language processing (NLP) which is achieved in the UIMA pipeline. From a Swedish perspective, one concern with ICA has always been its lack of NLP for Swedish. Previously the Swedish support in ICA consisted only of dictionary-based lemmatization (word: “sprang” -> lemma: “springa”). However, for a number of other languages ICA has also provided part of speech (PoS) tagging and sentiment analysis. One of the benefits of the PoS tagger is its ability to disambiguate words, which belong to multiple classes (e.g. “run” can be both a noun and a verb) as well as assign tags to words, which are not found in the dictionary. Furthermore, the POS tagger is crucial when it comes to improving entity extraction, which is important when a deeper understanding of the indexed text is needed.

Findwise has now extended the NLP in ICA to include both support for Swedish PoS tagging and Swedish sentiment analysis. The two images below shows simple examples of the PoS support.

Example when ICA uses NLP to analyse the string "ICA är en produkt som klarar entitetsextrahering"Example when ICA uses NLP to analyse the string "Watson deltog i jeopardy"

The question is how this extended functionality could be used?

IBM uses ICA and its NLP support together with several of their products. The jeopardy playing computer Watson may be the most famous example, even if it is not a real product. Watson used NLP in its UIMA pipeline when it analyzed its data from sources such as Wikipedia and Imdb.

One product which leverage from ICA and its NLP capabilities is Content and Predictive Analytics for Healthcare. This product helps doctors to determine which action to take for a patient given the patient’s journal and the symptoms. By also leveraging the predictive analytics from SPSS it is possible to suggest the next action for the patient.

ICA can also be connected directly to IBM Cognos or SPSS where ICA is the tool which creates structure to unstructured data. By using the NLP or sentiment analytics in ICA, structured data can be extracted from text documents. This data can then be fed to IBM Cognos, SPSS or non IBM products such as Splunk.

ICA can also be used on its own as a text miner or a search platform, but in many cases ICA delivers its maximum value together with other products. ICA is a product which helps enriching data by creating structure to unstructured data. The processed data can then be used by other products which normally work with structured data.