What are organisations planning to focus on to impove Search and Findability?

This year’s Search and Findability survey gave us a good indication of upcoming trends on the market. The activities and technologies that organisations are planning to start working with, are all connected to improving effectiveness. By using technology to automatically perform tasks, and by understanding the users’ needs and giving them a tailored search experience, there is a lot of potential to save time and effort. 

Top 5 activities organisations will focus in:

  • Natural language search interface, e.g. Query aid or chatbots (29%)
  • Personalisation e.g. tailored search experience (27%)
  • Automatic content tagging (24%)
  • Natural Language Processing, NLP (22%)
  • Machine Learning (20%)

The respondents planning to start working with one of these areas are more likely to be interested in, or are already working with, the other areas in the top 5. For example, out of the respondents saying that they are planning to use a natural language search interface, 44% are planning to start with personalisation as well. If you were to add the respondents already working with personalisation to that amount, it would increase by 75%. This might not be a big surprise since the different areas are much related to one another. A natural language search interface can support a tailored search experience, in other words – lead to personalisation. Automatic content tagging can be enabled by using techniques such as NLP and Machine Learning.

A Natural Language Search interface is a way of trying to find targeted answers to user questions. Instead of search based on keywords, the goal is to understand the question and generate answers with a higher relevancy. Since a large amount of the questions asked in an organisation are similar, you could save a lot of time by clustering and/or providing answers automatically using conversational UI. Learn more about Conversational UI.

One way to improve the Natural Language Search interface is by using Natural Language Processing (NLP). The aim with NLP is to improve a computer’s speech recognition for example by interpreting synonyms and spelling mistakes. NLP started out as a rule-based technique which was manually coded, but the introduction of Machine Learning (ML) improved the technology further. By using statistical techniques, ML makes it possible to learn from data without having to manually program the computer system.  Read more about improving search with NLP.

Automatic content tagging is a trend that we see within the area of Information Management. Instead of relying on user created tags (of various quality) the tags are created automatically based on different patterns. The advantage of using automatic content tagging is that the metadata will be consistent and that the data will be easier to analyse.

Personalisation e.g. tailored search experience is a way to sort out information based on the user profile. Basically, search results are adapted to the user needs, for example by not showing things that the user do not have access to and promoting search results that the user frequently looks for. Our findings in this year’s survey, show that respondents saying they are currently working with personalisation consider that users on both the internal and extern site find information easier. Users that find the information they search for easily, tend to be more satisfied with the search solution.


Results from this year’s survey indicates that organisations are working with or planning to working with, AI and Cognitive-related techniques. The percentage doing so has grown compared to previous surveys.

Do you want to learn more about cognitive search

Author: Angelica Lahti, Findability Business Consultant

Comparison of two different methods for generating tree facets, with Elasticsearch and Solr

Let’s try to explain what a tree facet is, by starting with a common use case of a “normal” facet. It consists of a list of filters, each corresponding to a value of a common search engine field and a count representing the number of documents matching that value. The main characteristic of a tree facet is that its filters each may have a list of child filters, each of which may have a list of child filters, etc. This is where the “tree” part of its name comes from.

Tree facets are therefore well suited to represent data that is inherently hierarchical, e.g. a decision tree, a taxonomy or a file system.

Two commons methods of generating tree facets, using either Elasticsearch or Solr, are the pivot approach and the path approach. Some of the characteristics, benefits and drawbacks of each method are presented below.

While ordinary facets consist of a flat list of buckets, tree facets consist of multiple levels of buckets, where each bucket may have child buckets, etc. If applying a filter query equivalent to some bucket, all documents matching that bucket, or any bucket in that sub-tree of child buckets, are returned.

Tree facets with Pivot

The name is taken from Solr (Pivot faceting) and allows faceting within results of the parent facet. This is a recursive setting, so pivot faceting can be configured for any number of levels. Think of pivot faceting as a Cartesian product of field values.

A list of fields is provided, where the first element in the list will generate the root level facet, the second element will generate the second level facet, and so on. In Elasticsearch, the same result is achieved by using the more general concept of aggregations. If we take a terms aggregation as an example, this simply means a terms aggregation within a parent terms aggregation, and so on.

Benefits

The major benefit of pivot faceting is that it can all be configured in query time and the data does not need to be indexed in any specific way. E.g. the list of fields can be modified to change the structure of the returned facet, without having to re-index any content.

The values of the returned facet/aggregation are already in a structured, hierarchical format. There is no need for any parsing of paths to build the tree.

Drawbacks

The number of levels in the tree must be known at query time. Since each field must be specified explicitly, it puts a limit on the maximum depth of the tree. If the tree should be extended to allow for more levels, then content must be indexed to new fields and the query needs to include these new fields.

Pivot faceting assumes a uniformity in the data, in that the values on each level in the tree, regardless of their parent, are of the same types. This is because all values on some specific level comes from the same field.

When to use

At least one of the following statements hold:

  • The data is homogenous – different objects share similar sets of properties
  • The data will, structurally, not change much over time
  • There is a requirement on a high level of query time flexibility
  • There is a requirement on a high level of flexibility without re-indexing documents

Tree facets with Path

Data is indexed into a single field, on a Unix style file path format, e.g. root/middle/leaf (the path separator is configurable). The index analyzer of this field should be using a path hierarchy tokenizer (Elasticsearch, Solr). It will expand the path so that a filter query for some node in the tree will include the nodes in the sub-tree below the node. The example path above would be expanded to root, root/middle, root/middle/leaf. These represent the filter queries for which the document with this path should be returned. Note that the query analyzer should be keyword/string so that queries are interpreted verbatim.

Once the values have been indexed, a normal facet or terms aggregation is put on the field. This will return all possible paths and sub-paths, which can be a large number, so make sure to request all of them. Once facet/aggregation is returned, its values need to be parsed and built into a tree structure.

Benefits

The path approach can handle any number of levels in the tree, without any configuration explicitly stating how many levels there are, both on the indexing side and on the query side. It is also a natural way of handling different depths in different places in the tree, not all branches need to be the same length.

Closely related to the above-mentioned benefit, is the fact that the path approach does not impose any restrictions on the uniformity of the tree. Nodes on a specific level in the tree may represent different concepts, dependent only on their parent. This fits very well with many real-world applications, as different objects and entities have different sets of properties.

Drawbacks

Data must be formatted in index time. If any structural changes to the tree are required, affected documents need to be re-indexed.

To construct a full tree representation of the paths returned in the facet/aggregation, all paths need to be requested. If the tree is big, this can become costly, both for the search engines to generate and with respect to the size of the response payload.

Data is not returned in a hierarchical format and must be parsed to build the tree structure.

When to use

At least one of the following statements hold:

  • The data is heterogenous – different objects have different sets of properties, varying numbers of levels needed in different places in the tree
  • The data could change structurally over time
  • The content and structure of the tree should be controlled by content only, no configuration changes

Tree facets – Conclusion

The listed benefits and drawback of each method can be used as a guide to find the best method from case to case.

When there is no clear choice, I personally tend to go for the path approach, just because it is so powerful and dynamic. This comes with the main drawback of added cost of configuration for index time data formatting, but it is usually worth it in my opinion.

tree facets, data

Author: Martin Johansson, Senior Search Consultant at Findwise

Beyond Office 365 – knowledge graphs, Microsoft Graph & AI!

This is the first joint post in a series where Findwise & SearchExplained, together decompose Microsoft’s realm with the focus on knowledge graphs and AI. The advent of graph technologies and more specific knowledge graphs have become the epicentre of the AI hyperbole.

microsoft_graph

The use of a symbolic representation of the world, as with ontologies (domain models) within AI is by far nothing new. The CyC project, for instance, started back in the 80’s. The most common use for average Joe would be by the use of Google Knowlege Graph that links things and concepts. In the world of Microsoft, this has become a foundational platform capacity with the Microsoft Graph.

It is key to separate the wheat from the chaff since the Microsoft Graph is by no means a Knowledge Graph. It is a highly platform-centric way to connect things, applications, users and information and data. Which is good, but still it lacks the obvious capacity to disambiguate complex things of the world, since this is not its core functionality to build a knowledge graph (i.e ontology).

From a Microsoft centric worldview, one should combine the Microsoft Graph with different applications with AI to automate, and augment the life with Microsoft at Work. The reality is that most enterprises do not use Microsoft only to envelop the enterprise information landscape. The information environment goes far beyond, into a multitude of organising systems within or outside to company walls.

Question: How does one connect the dots in this maze-like workplace? By using knowledge graphs and infuse them into the Microsoft Graph realm?

Office 365 MDM

The model, artefacts and pragmatics

People at work continuously have to balance between modalities (provision/find/act) independent of work practice, or discipline when dealing with data and information. People also have to interact with groups, and imaged entities (i.e. organisations, corporations and institutions). These interactions become the mould whereupon shared narratives emerge.

Knowledge Graphs (ontologies) are the pillar artefacts where users will find a level playing field for communication and codification of knowledge in organising systems. When linking the knowledge graphs, with a smart semantic information engine utility, we get enterprise-linked-data that connect the dots. A sustainable resilient model in the content continuum.

Microsoft at Work – the platform, as with Office 365 have some key building blocks, the content model that goes cross applications and services. The Meccano pieces like collections [libraries/sites] and resources [documents, pages, feeds, lists] should be configured with sound resource descriptions (metadata) and organising principles. One of the back-end service to deal with this is Managed Metadata Service and the cumbersome TermStore (it is not a taxonomy management system!). The pragmatic approach will be to infuse/integrate the smart semantic information engine (knowledge graphs) with these foundation blocks. One outstanding question, is why Microsoft has left these services unchanged and with few improvements for many years?

The unabridged pathway and lifecycle to content provision, as the creation of sites curating documents, will be a guided (automated and augmented [AI & Semantics]) route ( in the best of worlds). The Microsoft Graph and the set of API:s and connectors, push the envelope with people at centre. As mentioned, it is a platform-centric graph service, but it lacks connection to shared narratives (as with knowledge graphs).  Fuzzy logic, where end-user profiles and behaviour patterns connect content and people. But no, or very limited opportunity to fine-tune, or align these patterns to the models (concepts and facts).

Akin to the provision modality pragmatics above is the find (search, navigate and link) domain in Office 365. The Search road-map from Microsoft, like a yellow brick road, envision a cohesive experience across all applications. The reality, it is a silo search still 😉 The Microsoft Graph will go hand in hand to realise personalised search, but since it is still constraint in the means to deliver a targeted search experience (search-driven-application) in the modern search. It is problematic, to say the least. And the back-end processing steps, as well as the user experience do not lean upon the models to deliver i.e semantic-search to connect the dots. Only using the end-user behaviour patterns, end-user tags (/system/keyword) surface as a disjoint experience with low precision and recall.

The smart semantic information engine will usually be a mix of services or platforms that work in tandem,  an example:

  1. Semantic Tools (PoolParty, Semaphore)
  2. Search and Analytics (i3, Elastic Stack)
  3. Data Integration (Marklogic, Biztalk)
  4. AI modules (MS Cognitive stack)

In the forthcoming post on the theme Beyond Office 365 unpacking the promised land with knowledge graphs and AI, there will be some more technical assertions.
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Agnes Molnar's LinkedIn profileAgnes Molnar SearchExplained

.

Tinkering with knowledge graphs

I don’t want to sail with this ship of fools, on the opulent data sea, where people are drowning without any sense-making knowledge shores in sight. You don’t see the edge before you drop!

Knowledge EngineeringEchoencephalogram (Lars Leksell)  and neural networks

How do organisations reach a level playing field, where it is possible to create a sustainable learning organisation [cybernetics]?
(Enacted Knowledge Management practices and processes)

Sadly, in many cases, we face the tragedy of the commons!

There is an urgent need to iron out the social dilemmas and focus on motivational solutions that strive for cooperation and collective action. Knowledge deciphered with the notion of intelligence and emerging utilities with AI as an assistant with us humans. We the peoples!

To make a model of the world, to codify our knowledge and enable worldviews to complex data is nothing new per se. A Knowlege Graph – is in its essence a constituted shared narrative within the collective imagination (i.e organisation). Where facts of things and their inherited relationships and constraints define the model to be used to master the matrix.  These concepts and topics are our communication means to bridge between groups of people. Shared nomenclatures and vocabularies.

Terminology Management

Knowledge Engineering in practice


At work – building a knowledge graph – there are some pillars, that the architecture rests upon.  First and foremost is the language we use every day to undertake our practices within an organisation. The corpus of concepts, topics and things that revolve around the overarching theme. No entity act in a vacuum with no shared concepts. Humans coordinate work practices by shared narratives embedded into concepts and their translations from person to person. This communication might be using different means, like cuneiform (in ancient Babel) or digital tools of today. To curate, cultivate and nurture a good organisational vocabulary, we also need to develop practices and disciplines that to some extent renders similarities to ancient clay-tablet librarians. Organising principles, to the organising system (information system, applications).  This discipline could be defined as taxonomists (taxonomy manager) or knowledge engineers. (or information architect)

Set the scope – no need to boil the ocean


All organisations, independent of business vertical, have known domain concepts that either are defined by standards, code systems or open vocabularies. A good idea will obviously be to first go foraging in the sea of terminologies, to link, re-hash/re-use and manage the domain. The second task in this scoping effort will be to audit and map the internal terrain of content corpora. Since information is scattered across a multitude of organising systems, but within these, there are pockets of a structure. Here we will find glossaries, controlled vocabularies, data-models and the like.  The taxonomist will then together with subject matter experts arrange governance principles and engage in conversations on how the outer and inner loop of concepts link, and start to build domain-specific taxonomies. Preferable using the simple knowledge organisation system (SKOS) standard

Participatory Design from inception


Concepts and their resource description will need to be evaluated and semantically enhanced with several different worldviews from all practices and disciplines within the organisation. Concepts might have a different meaning. Meaning is subjective, demographic, socio-political, and complex. Meaning sometimes gets lost in translation (between different communities of practices).

The best approach to get a highly participatory design in the development of a sustainable model is by simply publish the concepts as open thesauri. A great example is the HealthDirect thesaurus. This service becomes a canonical reference that people are able to search, navigate and annotate.

It is smart to let people edit and refine and comment (annotate) in the same manner as the Wikipedia evolves, i.e edit wiki data entries. These annotations will then feedback to the governance network of the terminologies. 

Term Uppdate

Link to organising systems

All models (taxonomies, vocabularies, ontologies etc.) should be interlinked to the existing base of organising systems (information systems [IS]) or platforms. Most IS’s have schemas and in-built models and business rules to serve as applications for a specific use-case.  This implies also the use of concepts to define and describe the data in metadata, as reference data tables or as user experience controls. In all these lego pieces within an IS or platform, there are opportunities to link these concepts to the shared narratives in the terminology service.  Linked-enterprise-data building a web of meaning, and opening up for a more interoperable information landscape.

One omnipresent quest is to set-up a sound content model and design for i.e Office 365, where content types, collections, resource descriptions and metadata have to be concerted in the back-end services as managed-metadata-service. Within these features and capacities, it is wise to integrate with the semantic layer. (terminologies, and graphs). Other highly relevant integrations relate to search-as-a-service, where the semantic layer co-acts in the pipeline steps, add semantics, link, auto-classify and disambiguate with entity extraction. In the user experience journey, the semantic layer augments and connect things. Which is for instance how Microsoft Graph has been ingrained all through their platform. Search and semantics push the envelope 😉

Data integration and information mechanics

A decoupled information systems architecture using an enterprise service bus (messaging techniques) is by far the most used model.  To enable a sustainable data integration, there is a need to have a data architecture and clear integration design. Adjacent to the data integration, are means for cleaning up data and harmonise data-sets into a cohesive matter, extract-load-transfer [etl]. Data Governance is essential! In this ballpark we also find cues to master data management. Data and information have fluid properties, and the flow has to be seamless and smooth.  

When defining the message structure (asynchronous) in information exchange protocols and packages. It is highly desired to rely on standards, well-defined models (ontologies). As within the healthcare & life science domain using Hl7/FHIR.  These standards have domain-models with entities, properties, relations and graphs. The data serialisation for data exchange might use XML or RDF (JSON-LD, Turtle etc.). The value-set (namespaces) for properties will be possible to link to SKOS vocabularies with terms.

Query the graph

Knowledge engineering is both setting the useful terminologies into action, but also load, refine and develop ontologies (information models, data models). There are many very useful open ontologies that could or should be used and refined by the taxonomists, i.e ISA2 Core Vocabularies, With data-sets stored in a graph (triplestore) there are many ways to query the graph to get results and insights (links). Either by using SPARQL (similar to SQL in schema-based systems), or combine this with SHACL (constraints) or via Restful APIs.

These means to query the knowledge graph will be one reasoning to add semantics to data integration as described above.

Adding smartness and we are all done…

Semantic AI or means to bridge between symbolic representation (semantics) and machine learning (ML), natural language processing (NLP), and deep-learning is where all thing come together.

In the works (knowledge engineering) to build the knowledge graph, and govern it, it taxes many manual steps as mapping models, standards and large corpora of terminologies.  Here AI capacities enable automation and continuous improvements with learning networks. Understanding human capacities and intelligence, unpacking the neurosciences (as Lars Leksell) combined with neural-networks will be our road ahead with safe and sustainable uses of AI.
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog

Benevolent & sustainable smart city development

The digitisation of society emerge in all sectors, and the key driver to all this is the abundance of data that needs to be brought into context and use.

Participation

When discussing digitisation, people commonly think in data highways and server farms as being the infrastructure. Access to comprehensive information resources is increasingly becoming a commodity, enabling and enhancing societal living conditions. To achieve this, sense-making of data has to be in integrative part of the digital infrastructure. Reflecting this to traditional patterns, digital roads need junctions, signs and semaphores to function, just as their physical counterparts.

The ambition with AI and smart society and cities should be for the benefit of its inhabitants, but without a blueprint to get a coherent model that will be working in all these utilities, it will all break. Second to this, benevolence, participation and sustainability, have to be the overarching theme, to contrast dystopian visions with citizen surveillance and fraudulent behaviour.

Data needs context to make sense and create value, and this frame of reference will be realised through domain models of the world, with shared vocabularies to disambiguate concepts. In short a semantic layer. It is impossible to boil the ocean, which makes us rather lean toward a layered approach.

All complex systems (or complex adaptive system, CAS) revolve around a set of autonomous agents, for example, cells in a human body or citizens in an urban city. The emergent behaviour in CAS is governed by self-organising principles. A City Information Architecture is by nature a CAS, and hence the design has to be resilient and coherent.

What infrastructural dimensions should a smart city design build upon?

  • Urban Environment, the physical spaces comprised of geodata means, register of cadastre (real-estate), roads and other things in the landscape.
  • Movable Objects, with mobile sensing platforms capturing things like vehicles, traffic and more, in short, the dynamics of a city environment.
  • Human actor networks, the social economic mobility, culture and community in the habitat
  • Virtual Urban Systems augmented and immersive platforms to model the present or envision future states of the city environment

Each of these organising systems and categories holds many different types of data, but the data flows also intertwine. Many of the things described in the geospatial and urban environment domain, might be enveloped in a set of building information models (BIM) and geographical information systems (GIS). The resource descriptions link the objects, moving from one building to a city block or area. Similar behaviour will be found in the movable object’s domain because the agents moving around will by nature do so in the physical spaces. So when building information infrastructures, the design has to be able to cross-boundaries with linked-models for all useful concepts. One way to express this is through a city information model (CIM).

When you add the human actor networks layer to your data, things will become messy. In an urban system, there are many organisations and some of these act as public agencies to serve the citizens all through the life and business events. This socially knitted interaction model, use the urban environment and in many cases moveble objects. The social life of information when people work together, co-act and collaborate, become the shared content continuum.
Lastly, data from all the above-mentioned categories also feeds into the virtual urban system, that either augment the perceived city real environment, or the city information modelling used to create instrumental scenarios of the future state of the complex system.

Everything is deeply intertwingled

Connect people and things using semantics and artificial intelligence (AI) companions. There will be no useful AI without a sustainable information architecture (IA). Interoperability on all levels is the prerequisite; systemic (technical and semantic),  organisational (process and climate).

Only when we follow the approach of integration and the use of a semantic layer to glue together all the different types and models – thereby linking heterogeneous information and data from several sources to solve the data variety problem – are we able to develop an interoperable and sustainable City Information Model (CIM).

Such model can not only be used inside one city or municipality – it should be used also to interlink and exchange data and information between cities as well as between cities and provinces, regions, countries and societal digitalisation transformation.

A semantic layer completes the four-layered Data & Content Architecture that usual systems have in place:

semantic-layer

Fig.: Four layered content & data architecture

Use standards (as ISA2), and meld them into contextual schemas and models (ontologies), disambiguate concepts and link these with verbatim thesauri and taxonomies (i.e SKOS). Start making sense and let AI co-act as companions (Deep-learning AI) in the real and virtual smart city, applying semantic search technologies over various sources to provide new insights. Participation and engagement from all actor-networks will be the default value-chain, the drivers being new and cheaper, more efficient smart services, the building block for the city innovation platform.

The recorded webinar and also the slides presented

 

View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Peter Voisey's LinkedIn profilePeter Voisey
View Martin Kaltenböck's LinkedIn profileMartin Kaltenböck
View Sebastian Gabler's LinkedIn profileSebastian Gabler

Trials & Jubilations: the two sides of the GDPR coin

We have all heard about the totally unhip GDPR and the potential wave of fines and lawsuits. The long arm of the law and it’s stick have been noted. Less talked about but infinitely more exciting is the other side. Turn over the coin and there’s a whole A-Z of organisational and employee carrots. How so?

Sign up to the joint webinar the 18th of April 3PM CET with Smartlogic & Findwise, to find out more.

https://flic.kr/p/fJD1eA

Signal Tools

We all leave digital trails behind us, trails about us. Others that have access to these trails can use our data and information. The new European General Data Protection Regulation (GDPR) intends the usage of such Personal Identifiable Information (PII) to be correct and regulated, with the power to decide given to the individual.

Some organisations are wondering how on earth they can become GDPR compliant when they already have a business to run. But instead of a chore, setting a pathway to allow for some more principled digital organisational housekeeping can bring big organisational gains sooner rather than later.

Many enterprises are now beginning to realise the extra potential gains of having introduced new organisational principles to become compliant. The initial fear of painful change soon subsides when the better quality data comes along to make business life easier. With the further experience of new initiatives from new data analysis, NLP, deep learning, AI, comes the feeling:  why we didn’t we just do this sooner?

Most organisations have a system(s) in place holding PII data, even if getting the right data out in the right format remains problematical. The organisation of data for GDPR compliance can be best achieved so that it becomes transformed to be part of a semantic data layer. With such a layer, knowing all the related data from different sources you have on Joe Bloggs becomes so much easier when he asks for a copy of the data you have about him. Such a semantic data layer will also bring other far-reaching and organisation-wide benefits.

Semantic Data Layer

Semantic Data Layer

For example, heterogeneous data in different formats and from different sources can become unified for all sorts of new smart applications, new insights and new innovation that would have been previously unthinkable. Data can stay where it is… no need to change that relational database yet again because of a new type of data. The same information principles and technologies involved in keeping an eye on PII use, can also be used to improve processes or efficiencies and detect consumer behaviour or market changes.

But it’s not just the business operations that benefit, empowered employees become happier having the right information at hand to do their job. Something that is often difficult to achieve, as in many organisations, no one area “owns” search, making it is usually somebody else’s problem to solve. For the Google-loving employee, not finding stuff at work to help them in their job can be downright frustrating. Well ordered data (better still in a semantic layer) can give them the empowering results page they need. It’s easy to forget that Google only deals with the best structured and linked documentation, why shouldn’t we do the same in our organisations?

Just as the combination of (previously heterogeneous) datasets can give us new insights for innovation, we also observe that innovation increasingly comes in the form of external collaboration. Such collaboration of course increases the potential GDPR risk through data sharing, Facebook being a very current point in case. This brings in the need for organisational policy covering data access, the use and handling of existing data and any new (extra) data created through its use. Such policy should for example cover newly created personal data from statistical inference analysis.

While having a semantic layer may in fact make human error in data usage potentially more possible through increased access, it also provides a better potential solution to prevent misuse as metadata can be baked into the data to classify both information “sensitivity” and control user accessibility rights.

So how does one start?

The first step is to apply some organising principles to any digital domain, be it in or outside the corporate walls [the discipline of organising, Robert Gluschko] and to ask the key questions:

  1. What is being organised?
  2. Why is it being organised?
  3. How much of it is being organised?
  4. When is it being organised?
  5. Where is it being organised?

Secondly start small, apply organising principles by focusing on the low-hanging fruit: the already structured data within systems. The creation of quality data with added metadata in a semantic layer can have a magnetic effect within an organisation (build that semantic platform and they will come).

Step three: start being creative and agile.

A case story

A recent case, within the insurance industry reveals some cues to why these set of tools will improve signals and attention for becoming more compliant with regulations dealing with PII. Our client knew about a set of collections (file shares) where PII might be found. Adding search, and NLP/ML opened up the pandoras box with visual analytic tools. This is the simple starting point, finding i.e names or personal number concepts in the text. Second to this will be to add semantics, where industry standard terminologies and ontologies can further help define the meaning of things.

In all corporate settings, there exist both well-cultivated and governed collections of information resources, but usually also a massive unmapped terrain of content collections, where no one has a clue if there might be PII hidden amongst it. The strategy using a semantic data layer should always be combined with operations to narrowing down the collections to become part of the signalling system – it is generally not a good idea to boil the whole-data-ocean in the enterprise information environment. Rather through such work practices, workers are aware of the data hot-spots, the well-cultivated collections of information and that unmapped terrain. Having the additional notion of PII to contend with will make it that just bit easier to recognise those places where semantic enhancement is needed.

not a good idea to boil the whole-data-ocean

Running with the same pipeline (with the option of further models to refine and improve certain data) will not only allow for the discovery of multiple occurrences of named entities (individuals) but also the narrative and context in which they appear.
Having a targeted model & terminology for the insurance industry will only go to improve this semantic process further. This process can certainly ease what may be currently manual processes or processes that don’t exist because of their manual pain: for example, finding sensitive textual information from documents within applications or from online textual chats. Developing such a smart information platform enables the smarter linking of other things from the model, such as service packages, service units / or organisational entities, spatial data as named places or timelines, or medical treatments, things perhaps currently you have less control over.

There’s not much time before the 25th May and the new GDPR, but we’ll still be here afterwards to help you with a compliance burden or a creative pathway, depending on your outlook.

Alternatively sign up to the joint webinar the 11th of April 3PM CET with Smartlogic & Findwise, to find out more.

View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Peter Voisey's LinkedIn profilePeter Voisey
View James Morris's LinkedIn profileJames Morris

Major highlights from Elastic{ON} 2018 – Findwise reporting

Two Elastic fans have just returned from San Francisco and the Elastic{ON} 2018 conference. With almost 3.000 participants this year Elastic{ON} is the biggest Elastic conference in the world.

Findwise regularly organises events and meetups, covering among other topics Elastic. Keep an eye for an event close to you.

Here are some of the main highlights from Elastic{ON} 2018.

Let’s start with the biggest announcement of them all, Elastic is opening the source code of the XPack. This mean that you now not only will be able to access the Elastic stack source code, but also the subscription-based code of XPack that up until now have been inaccessible. This opens the opportunity for you as a developer to contribute back code.

news-elasticon-2018

 

Data rollups is a great new feature for anyone with the need to look at old data but feel the storage costs are too high. With rollups only predetermined metrics and terms will be stored. Still allowing you to analyze these dimensions of your data but no longer being able to view the individual documents.

Azure monitoring available in Xpack Basic. Elastic will in an upcoming 6.x release an Azure Monitoring Module, which will consist of a bundle of Kibana dashboards and make it really easy to get started exploring your Azure infrastructure. The monitoring module will be released as part of the XPack basic version – in other words, it will be free to use.

Forecasting was the big new thing in X-packs Machine learning component. As the name suggest the machine learning module can now not only spot anomalies in your data but also predict how it will change in the future.

Security in Kibana will get an update to make it work more like the Security module in Elasticsearch. This will also mean that one of the most requested security questions for Kibana will be resolved, giving users access to only some dashboards.

Dashboard are great and a fundamental part of Kibana but sometimes you want to present your data in more dynamic ways with less focus on data density. This is where Canvas comes in. Canvas is a new Kibana module to produce infographics rather than dashboards but still using live data from Elasticsearch.

Monitoring of Kubernetes and Docker containers will be made a lot easier with the Elastic stack. A new infra component will be created just for this growing use case. This component will be powered by data collected by Beats which now also has an auto discovery functionality within Kubernetes. This will give an overview of not only your Kubernetes cluster but also the individual containers within the cluster.

Geo capabilities within Kibana will be extended to support multiple map layers. This will make it possible to do more kinds of visualizations on maps. Furthermore, work is being done on supporting not only Geo points but also shapes.

One problem some have had with maps is that you need access to the Elastic map service and if you deploy the Elastic stack within a company network this might not be reachable. To solve this work is being done to make it possible to deploy the Elastic maps service locally.

Elastic acquired SaaS solution Swiftype last year. Since then Swiftype have been busy developing even more features to its portfolio. At current Swiftype comes in 3 different version:

  • Swiftype site Search – An out of the box (OOTB) solution for website search
  • Swiftype Enterprise Search – Currently in beta version, but with focus on internal, cloud based datasources (for now) like G Suite, Dropbox, O365, Zendesk etc.
  • Swiftype App Search – A set of API’s and developer tools that makes it quick to build user faced search applications

 

Elastic has also started to look at replacing the Zen protocol used to keep clusters in sync. Currently a PoC is being made to try to create a consensus algorithm that follow modern academic best practices. With the added benefit to remove the minimum master nodes setting, currently one of the most common pitfalls when running Elasticsearch in production.

ECE – Elastic Cloud Enterprise is big focus for Elastic and make it possible for customers to setup a fully service-based search solution being maintained by Elastic.

If you are interested in hearing more about Elastic or Findwise visit https://findwise.com/en/technology/elastic-elasticsearch

elasticon 2018

 

Writers: Mads Elbrond, regional manager Findwise Denmark & Torsten Landergren, senior expert consultant

Summary from Enterprise Search and Discovery Summit 2017

This year at Enterprise Search and Discovery Summit, Findwise was represented by us – search experts Simon Stenström and Amelia Andersson. With over a thousand attendees at the event, we’ve enjoyed the company of many peers. Let’s stay in touch for inspiration and to create magic over the Atlantic – you know who you are!

Enterprise Search and Discovery 2017 - findwise experts

Amelia Andersson and Simon Stenström, search experts from Findwise

 

Back to the event: We opened the Enterprise Search-track with our talk on how you can improve your search solutions through taking several aspects of relevance into account. (The presentation can be found in full here, no video unfortunately). If you want to know more about how to improve relevancy feel free to contact us or download the free guide on Improved search relevancy.

A few themes kept reoccurring during the Enterprise Search-track; Machine learning and NLP, bots and digital assistants, statistics and logs and GDPR. We’ve summarized our main takeaways from these topics below.

 

Machine learning and NLP

Machine learning and NLP were the unchallenged buzzwords of the conference. Everybody wants to do it, some have already started working with it, and some provided products for working with it. Not a lot of concrete examples of how organizations are using machine learning were presented unfortunately, giving us the feeling that few organizations are there yet. We’re at the forefront!

 

Bots, QA systems and digital assistants

Everyone is walking around with Siri or Google assistant in their pocket, but still our enterprise search solutions don’t make use of it. Panels were discussing voice based search (TV remote controls that could search content on all TV channels to set the right channel, a demo om Amazon Alexa providing answers for simple procedures for medical treatments etc.) pointing out that voice-to-text is now working well enough (at least in English) to use in many mobile use cases.

But bots can of course be used without voice input. A few different examples of using bots in a dialog setting were showed. One of the most exciting demos showed a search engine powered bot that used facet values to ask questions to specify what information the user was looking for.

 

Statistics and logs

Collect logs! And when you’ve done that: Use them! A clear theme was how logs were stored, displayed and used. Knowledge managements systems where content creators could monitor how users were finding their information inspired us to consider looking at dashboard for intranet content creators as well. If we can help our content creators understand how their content is found, maybe they are encouraged to use better metadata or wordings or to create information that their users are missing.

 

GDPR

Surprisingly, GDPR is not only a “European thing”, but will have a global impact following the legislation change in May. American companies will have to look at how they handle the personal information of their EU customers. This statement took many attendees by surprise and there were many worried questions on what was considered non-compliant of GDPR.

 

We’ve had an exciting time in Washington and can happily say that we are able bring back inspiration and new experience to our customers and colleagues at Findwise. On the same subject, a couple of weeks ago some or our fellow experts at Findwise wrote the report “In search for Insight”, addressing the new trends (machine learning, NLP etc) in Enterprise Search. Make sure to get your copy of the report if you are interested in this area.

Most of the presentations from Enterprise Search and Discovery Summit can be found here.

 

AuthorsAmelia Andersson and Simon Stenström, search experts from Findwise

Microsoft Ignite 2017 – from a Search and Findability perspective

Microsoft Ignite – the biggest Microsoft conference in the world. 700+ sessions, insights and roadmaps from industry leaders, and deep dives and live demos on the products you use every day. And yes, Findwise was there!

But how do you summarize a conference with more than 700 different sessions?

Well – you focus on one subject (search and findability in this case) and then you collaborate with some of the most brilliant and experienced people around the world within that subject. Add a little bit of your own knowledge – and the result is this Podcast.

Enjoy!

Expert Panel Shares Highlights and Opportunities in Microsoft’s Latest Announcements

microsoft ignite podcast findwise

Do you want to know more about Findwise and Microsoft? Find our how you can make SharePoint and Office 365 more powerful than ever before.