Activate conference 2018

Opensource has won! Now, what about AI?

Grant Ingersoll is on stage at the opening of Activate18 explaining the reasoning behind changing the name.

The revolution is won, opensource won, search as a concept to reckon with, they all won.

The times I come across a new search project where someone is pushing anything but opensource search is few and far between these days.

Since Search has taken a turn towards AI, a merge with that topic seems reasonable, not to say obvious. But AI in this context should probably be interpreted as AI to support good search results. At least if judging from the talks I attended. Interesting steps forward is expert systems and similar, none which was extensively discussed as of my knowledge. A kind of system we work with at Findwise. For instance, using NLP, machine learning and text analytics to improve a customer service.

Among the more interesting talks I attended was Doug Turnbulls talk on Neural Search Frontier. Some of the matrix-math threw me back to a ANN-course I took 10 years ago. Way before I ever learned any matrix maths. Now, way post remembering any matrix math-course I ever took, it’s equally confusing, possibly on a bit higher level. But he pointed out interesting aspects and show conceptually how Word2Vec-vectors work and won’t work. Simon Hughes talk “Vectors in search – Towards more semantic matching” is in the same area but more towards actually using it.

Machine Learning is finally mainstream

If we have a look at the overall distribution of talks, I think it’s safe to say that almost all talks touched on machine learning in some way. Most commonly using Learning to Rank and Word2Vec. None of these are new techniques (Our own Mickaël Delaunay wrote a nice blog-post about how to use LTR for personalization a couple of years ago. They have been covered before to some extent but this time around we see some proper, big scale implementations that utilizes the techniques. Bloomberg gave a really interesting presentation on what their evolution from hand tuned relevance to LTR over millions of queries have been like. Even if many talks were held on a theoretical/demo-level it is now very clear. It’s fully possible and feasible to build actual, useful and ROI-reasonable Machine Learning into your solutions.

As Trey Grainer pointed out, there are different generations of this conference. A couple of years ago Hadoop were everywhere. Before that everything was Solr cloud. This year not one talk-description referenced the Apache elephant (but migration to cloud was still referenced, albeit not in the topic). Probably not because big data has grown out of fashion, even though that point was kind of made, but rather that we have other ways of handling and manage it these days.

Don’t forget: shit in > shit out!

And of course, there were the mandatory share of how-we-handle-our-massive-data-talks. Most prominently presented by Slack, all developers favourite tool. They did show a MapReduce offline indexing pipeline that not only enabled them to handle their 100 billion documents, but also gave them an environment which was quick on its feet and super suitable for testing new stuff and experimenting. Something an environment that size usually completely blocks due to re-indexing times, fear of bogging down your search-machines and just general sluggishness.

Among all these super interesting technical solutions to our problems, it’s really easy to forget that loads of time still have to be spent getting all that good data into our systems. Doing the groundwork, building connectors and optimizing data analysis. It doesn’t make for so good talks though. At Findwise we ususally do that using our i3-framework which enables you to ingest, process, index and query your unstructured data in a nice framework.activate 2018 solr lucid opensource

I now look forward to doing the not so ground work using inspiration from loads of interesting solutions here at Activate.

Thanks so much for this year!

Eventually, the presentations will appear on YouTube in Lucidworks playlist for Activate18. If history is any guide that might take a couple of weeks.

 

Author and event participant: Johan Persson Tingström, Findability Expert at Findwise

Tinkering with knowledge graphs

I don’t want to sail with this ship of fools, on the opulent data sea, where people are drowning without any sense-making knowledge shores in sight. You don’t see the edge before you drop!

Knowledge EngineeringEchoencephalogram (Lars Leksell)  and neural networks

How do organisations reach a level playing field, where it is possible to create a sustainable learning organisation [cybernetics]?
(Enacted Knowledge Management practices and processes)

Sadly, in many cases, we face the tragedy of the commons!

There is an urgent need to iron out the social dilemmas and focus on motivational solutions that strive for cooperation and collective action. Knowledge deciphered with the notion of intelligence and emerging utilities with AI as an assistant with us humans. We the peoples!

To make a model of the world, to codify our knowledge and enable worldviews to complex data is nothing new per se. A Knowlege Graph – is in its essence a constituted shared narrative within the collective imagination (i.e organisation). Where facts of things and their inherited relationships and constraints define the model to be used to master the matrix.  These concepts and topics are our communication means to bridge between groups of people. Shared nomenclatures and vocabularies.

Terminology Management

Knowledge Engineering in practice


At work – building a knowledge graph – there are some pillars, that the architecture rests upon.  First and foremost is the language we use every day to undertake our practices within an organisation. The corpus of concepts, topics and things that revolve around the overarching theme. No entity act in a vacuum with no shared concepts. Humans coordinate work practices by shared narratives embedded into concepts and their translations from person to person. This communication might be using different means, like cuneiform (in ancient Babel) or digital tools of today. To curate, cultivate and nurture a good organisational vocabulary, we also need to develop practices and disciplines that to some extent renders similarities to ancient clay-tablet librarians. Organising principles, to the organising system (information system, applications).  This discipline could be defined as taxonomists (taxonomy manager) or knowledge engineers. (or information architect)

Set the scope – no need to boil the ocean


All organisations, independent of business vertical, have known domain concepts that either are defined by standards, code systems or open vocabularies. A good idea will obviously be to first go foraging in the sea of terminologies, to link, re-hash/re-use and manage the domain. The second task in this scoping effort will be to audit and map the internal terrain of content corpora. Since information is scattered across a multitude of organising systems, but within these, there are pockets of a structure. Here we will find glossaries, controlled vocabularies, data-models and the like.  The taxonomist will then together with subject matter experts arrange governance principles and engage in conversations on how the outer and inner loop of concepts link, and start to build domain-specific taxonomies. Preferable using the simple knowledge organisation system (SKOS) standard

Participatory Design from inception


Concepts and their resource description will need to be evaluated and semantically enhanced with several different worldviews from all practices and disciplines within the organisation. Concepts might have a different meaning. Meaning is subjective, demographic, socio-political, and complex. Meaning sometimes gets lost in translation (between different communities of practices).

The best approach to get a highly participatory design in the development of a sustainable model is by simply publish the concepts as open thesauri. A great example is the HealthDirect thesaurus. This service becomes a canonical reference that people are able to search, navigate and annotate.

It is smart to let people edit and refine and comment (annotate) in the same manner as the Wikipedia evolves, i.e edit wiki data entries. These annotations will then feedback to the governance network of the terminologies. 

Term Uppdate

Link to organising systems

All models (taxonomies, vocabularies, ontologies etc.) should be interlinked to the existing base of organising systems (information systems [IS]) or platforms. Most IS’s have schemas and in-built models and business rules to serve as applications for a specific use-case.  This implies also the use of concepts to define and describe the data in metadata, as reference data tables or as user experience controls. In all these lego pieces within an IS or platform, there are opportunities to link these concepts to the shared narratives in the terminology service.  Linked-enterprise-data building a web of meaning, and opening up for a more interoperable information landscape.

One omnipresent quest is to set-up a sound content model and design for i.e Office 365, where content types, collections, resource descriptions and metadata have to be concerted in the back-end services as managed-metadata-service. Within these features and capacities, it is wise to integrate with the semantic layer. (terminologies, and graphs). Other highly relevant integrations relate to search-as-a-service, where the semantic layer co-acts in the pipeline steps, add semantics, link, auto-classify and disambiguate with entity extraction. In the user experience journey, the semantic layer augments and connect things. Which is for instance how Microsoft Graph has been ingrained all through their platform. Search and semantics push the envelope 😉

Data integration and information mechanics

A decoupled information systems architecture using an enterprise service bus (messaging techniques) is by far the most used model.  To enable a sustainable data integration, there is a need to have a data architecture and clear integration design. Adjacent to the data integration, are means for cleaning up data and harmonise data-sets into a cohesive matter, extract-load-transfer [etl]. Data Governance is essential! In this ballpark we also find cues to master data management. Data and information have fluid properties, and the flow has to be seamless and smooth.  

When defining the message structure (asynchronous) in information exchange protocols and packages. It is highly desired to rely on standards, well-defined models (ontologies). As within the healthcare & life science domain using Hl7/FHIR.  These standards have domain-models with entities, properties, relations and graphs. The data serialisation for data exchange might use XML or RDF (JSON-LD, Turtle etc.). The value-set (namespaces) for properties will be possible to link to SKOS vocabularies with terms.

Query the graph

Knowledge engineering is both setting the useful terminologies into action, but also load, refine and develop ontologies (information models, data models). There are many very useful open ontologies that could or should be used and refined by the taxonomists, i.e ISA2 Core Vocabularies, With data-sets stored in a graph (triplestore) there are many ways to query the graph to get results and insights (links). Either by using SPARQL (similar to SQL in schema-based systems), or combine this with SHACL (constraints) or via Restful APIs.

These means to query the knowledge graph will be one reasoning to add semantics to data integration as described above.

Adding smartness and we are all done…

Semantic AI or means to bridge between symbolic representation (semantics) and machine learning (ML), natural language processing (NLP), and deep-learning is where all thing come together.

In the works (knowledge engineering) to build the knowledge graph, and govern it, it taxes many manual steps as mapping models, standards and large corpora of terminologies.  Here AI capacities enable automation and continuous improvements with learning networks. Understanding human capacities and intelligence, unpacking the neurosciences (as Lars Leksell) combined with neural-networks will be our road ahead with safe and sustainable uses of AI.
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog

Reflection, part 2

Some time ago I was writing about the Reflection mechanism in .NET Framework.

This time I will show you a use case, where it’s better NOT TO USE the Reflection.

Introduction

In the previous post about the Reflection I mentioned some doubt thoughts about using this mechanism and one of them has actually its justification.

So, when it’s better not to use Reflection and why?

Consider a method that accepts some objects and we want to access some property on these objects inside this method.

private void MyUniversalMethod(object obj)
{
    if (obj.GetType().GetProperty("MyPropertyName ") is System.Reflection.PropertyInfo myProperty) //Check if our object actually has the property we're interested in.
    {
        var myPropertyValue = myProperty.GetValue(obj); //Get the property value on our object.
        myProperty.SetValue(obj, new object()); //Set the property value on our object.
    }
} 

Although, technically, there’s nothing wrong with this approach, it should be avoided in most cases, because it totally breaks the concept of strong typing.

How do we do it properly?

If we are in control over the classes we are using, we should always extract the property we want to access in such method to an interface.

interface IMyInterface
{
    object MyProperty { get; set; }
} 

So our method will look a lot simpler and, what’s most important, the compiler upholds the code integrity so we don’t have to bother if the property doesn’t exist or is inaccessible on our object, because the interface forces the accessibility for us:

private void MyUniversalMethod(IMyInterface obj)
{
    var myPropertyValue = obj.MyProperty; //Get the property value on our object.
    obj.MyProperty = new object(); //Set the property value on our object.
} 

But, what if we have no control over the classes?

There are scenarios where we have to use someone else’s code and we have to adapt our code to the already existing one. And, what’s worse, the property we are interested in is not defined in any interface but there are several classes that can contain such property.

But then, it is still recommended that we don’t use the Reflection in that case.

Instead of that, we should filter the objects that come to our method to specific types that actually contain the property we are interested in.

private void MyUniversalMethod(object obj)
{
    if (obj is TheirClass theirClass) //Check if our object is of type that has the property we're interested in. If so, assign it to a temporary variable.
    {
        var theirPropertyValue = theirClass.TheirProperty; //Get the property value on our object.
        theirClass.TheirProperty = new object(); //Set the property value on our object.
    }
} 

There’s an inconvenience in the example above that we have to specify all the types that might contain the property we are interested in and handle them separately but this protects us from cases where in different classes a property of the same name is of a different type. Here we have the full control over what’s happening with strongly typed property.

Then, what is the Reflection good for?

Although I said it is not recommended in most cases, there are, however, cases where the Reflection approach would be the preferred way.

Consider a list containing names of objects represented by some classes.

We create a method that will retrieve the name for us to display:

private string GetName(object obj)
{
    var type = obj.GetType();
    return (type.GetProperty("Name") as System.Reflection.PropertyInfo)? //Try to get the "Name" property.
        .GetValue(obj)? //Try to get the "Name" property value from the object.
        .ToString() //Get the string representation of the value, if it's string it just returns its value.
        ?? type.Name; //If the tries above fail, get the type name.
}

The property “Name” is commonly used by many classes, though it’s very rarely defined in an interface. We can be also almost certain that it will be a string. We can then just look for this property by a Reflection and in case we didn’t find it use the type name. This approach is commonly used in Windows Forms PropertyGrid’s collection editors.

Use of dynamic keyword

At the point we are certain we don’t want to rely on strong typing, we can access the properties at runtime in an even simpler way, by using the dynamic keyword, which introduces the flexibility of duck typing.

private void MyUniversalMethod(dynamic obj)
{
    var theirPropertyValue = obj.TheirProperty; //Get the property value on our object.
    obj.TheirProperty = new object(); //Set the property value on our object.
} 

This is very useful in cases where we don’t know the type of the object passed to the method at the design time. It is also required by some interop interfaces.

But be careful what you are passing to this method, because in case you try to access a member which doesn’t exist or is inaccessible you will get a RuntimeBinderException.

Note that all members you will try to access on a dynamic object are also dynamic and the IntelliSense is disabled for them – you’re on your own.

Are the messages on the election posters just empty words?

It is impossible not to notice all the political conversations in Sweden now, less then two weeks before election day. During times like these parties focus a lot of energy on getting their point across to the public, but how much is just slogans that sound good when you print them on a poster and how much is rooted in the everyday work of their organisation.

Are the words printed on the posters present in every street corner really the same as the ones being exchanged between the walls of the Swedish parliament building?

While ferociously staying away from the subject of who is right or wrong, let’s see if there is a way to evaluate if what they are talking about in the parliament’s everyday sessions is the same as what is being printed in the manifestos released during the last two elections (2014 and 2018 respectively). Continue reading

Benevolent & sustainable smart city development

The digitisation of society emerge in all sectors, and the key driver to all this is the abundance of data that needs to be brought into context and use.

Participation

When discussing digitisation, people commonly think in data highways and server farms as being the infrastructure. Access to comprehensive information resources is increasingly becoming a commodity, enabling and enhancing societal living conditions. To achieve this, sense-making of data has to be in integrative part of the digital infrastructure. Reflecting this to traditional patterns, digital roads need junctions, signs and semaphores to function, just as their physical counterparts.

The ambition with AI and smart society and cities should be for the benefit of its inhabitants, but without a blueprint to get a coherent model that will be working in all these utilities, it will all break. Second to this, benevolence, participation and sustainability, have to be the overarching theme, to contrast dystopian visions with citizen surveillance and fraudulent behaviour.

Data needs context to make sense and create value, and this frame of reference will be realised through domain models of the world, with shared vocabularies to disambiguate concepts. In short a semantic layer. It is impossible to boil the ocean, which makes us rather lean toward a layered approach.

All complex systems (or complex adaptive system, CAS) revolve around a set of autonomous agents, for example, cells in a human body or citizens in an urban city. The emergent behaviour in CAS is governed by self-organising principles. A City Information Architecture is by nature a CAS, and hence the design has to be resilient and coherent.

What infrastructural dimensions should a smart city design build upon?

  • Urban Environment, the physical spaces comprised of geodata means, register of cadastre (real-estate), roads and other things in the landscape.
  • Movable Objects, with mobile sensing platforms capturing things like vehicles, traffic and more, in short, the dynamics of a city environment.
  • Human actor networks, the social economic mobility, culture and community in the habitat
  • Virtual Urban Systems augmented and immersive platforms to model the present or envision future states of the city environment

Each of these organising systems and categories holds many different types of data, but the data flows also intertwine. Many of the things described in the geospatial and urban environment domain, might be enveloped in a set of building information models (BIM) and geographical information systems (GIS). The resource descriptions link the objects, moving from one building to a city block or area. Similar behaviour will be found in the movable object’s domain because the agents moving around will by nature do so in the physical spaces. So when building information infrastructures, the design has to be able to cross-boundaries with linked-models for all useful concepts. One way to express this is through a city information model (CIM).

When you add the human actor networks layer to your data, things will become messy. In an urban system, there are many organisations and some of these act as public agencies to serve the citizens all through the life and business events. This socially knitted interaction model, use the urban environment and in many cases moveble objects. The social life of information when people work together, co-act and collaborate, become the shared content continuum.
Lastly, data from all the above-mentioned categories also feeds into the virtual urban system, that either augment the perceived city real environment, or the city information modelling used to create instrumental scenarios of the future state of the complex system.

Everything is deeply intertwingled

Connect people and things using semantics and artificial intelligence (AI) companions. There will be no useful AI without a sustainable information architecture (IA). Interoperability on all levels is the prerequisite; systemic (technical and semantic),  organisational (process and climate).

Only when we follow the approach of integration and the use of a semantic layer to glue together all the different types and models – thereby linking heterogeneous information and data from several sources to solve the data variety problem – are we able to develop an interoperable and sustainable City Information Model (CIM).

Such model can not only be used inside one city or municipality – it should be used also to interlink and exchange data and information between cities as well as between cities and provinces, regions, countries and societal digitalisation transformation.

A semantic layer completes the four-layered Data & Content Architecture that usual systems have in place:

semantic-layer

Fig.: Four layered content & data architecture

Use standards (as ISA2), and meld them into contextual schemas and models (ontologies), disambiguate concepts and link these with verbatim thesauri and taxonomies (i.e SKOS). Start making sense and let AI co-act as companions (Deep-learning AI) in the real and virtual smart city, applying semantic search technologies over various sources to provide new insights. Participation and engagement from all actor-networks will be the default value-chain, the drivers being new and cheaper, more efficient smart services, the building block for the city innovation platform.

The recorded webinar and also the slides presented

 

View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Peter Voisey's LinkedIn profilePeter Voisey
View Martin Kaltenböck's LinkedIn profileMartin Kaltenböck
View Sebastian Gabler's LinkedIn profileSebastian Gabler

Analytical power at your fingertips with natural language and modern visualisation

Today we are all getting used to interactive dashboards and plots in self-service business intelligence (BI) solutions to drill down and slice our facts and figures. The market for BI tools has seen an increased competition recently with Microsoft Power BI challenging proven solutions such as Tableau, Qlik, IBM Cognos, SAP Lumira and others. At the same time, it is hard to benchmark tools against each other as they all come with very similar features. Has the BI development saturated?

Compared to how we are used to consume graphics and information, the BI approach to interactive analysis is somewhat different. For instance: a dashboard or report is typically presented in a printer-oriented flat layout on white background, weeks of user training is typically needed before “self-service” can be reached, and interactions are heavily click-oriented – you could almost feel it in your mouse elbow when opening the BI frontend.

On the other hand, when surfing top internet sites and utilizing social media, our interactions are centred around the search box and the natural interface of typing or speaking. Furthermore, there is typically no training needed to make use of Google, Facebook, LinkedIn, Pinterest, Twitter, etc. Through an intuitive interface we learn along the way. And looking at graphics and visualization, we can learn a lot from the gaming industry where players are presented with well-designed artwork – including statistics presented in an intuitive way to maximize the graphical impression.

Take a look at this live presentation to see how a visiual analysis using natural language can look like. 

screenshot007040

Rethink your business analytics

It appears as if BI tools are sub optimized for a limited scope and use case. To really drive digitalization and make use of our full information potential, we need a new way of thinking for business analytics. Not just continuous development, rather a revolution to the business intelligence approach. Remember: e-mail was not a consequence of the continuous development of post offices and mail handling. We need to rethink business analytics.

At Findwise, we see that the future for business analytics involves:

  • added value by enriching information with new unstructured sources,
  • utilizing the full potential of visualization and graphics to explore our information,
  • using natural language to empower colleagues to draw their own conclusions intuitively and secure

 

Enrich data

There is a lot of talk about data science today; how we can draw conclusions from our data and make predictions about the future. This power largely depends on the value in the data we possess. Enriching data is all about adding new value. The enrichment may include a multitude of sources, internal and external, for instance:

  • detailed customer transaction logs
  • weather history and forecasts
  • geospatial data (locations and maps)
  • user tracking and streams
  • social media and (fake) news

Comparing with existing data, a new data source could be orthogonal to the existing data and add a completely new understanding. Business solutions of today are often limited to highly structured information sources or information providers. There is a large power in unstructured, often untouched, information sources. However, it is not as straight forward as launching a data warehouse integration, since big data techniques are required to handle the volume, velocity and variety.

At Findwise, utilizing the unstructured data has always been the key in developing unique solutions for search and analytics. The power of our solutions lies in incorporating multiple sources online and continuously enrich with new aspects. For this we even developed our own framework, i3, with over hundred connectors for unstructured data sources. A modern search engine (or insight engine) scales horizontally for big data applications and easily consumes billions of texts, logs, geospatial and other unstructured – as well as structured – data. This is where search meets analytics, and where all the enrichment takes place to add unique information value.

 

Visually explore

As human beings we have very strong visual and cognitive abilities, developed over millions of years to distinguish complex patterns and scenarios. Visualization of data is all about packaging information in such a way that we can utilize our cognitive skills to make sense out of the noise. Great visualization and interaction unleash the human power of perception and derivation. It allows us make sense out of the complex world around us.

When it comes to computer visualization, we have seen strong development in the use of graphical processors (GPUs) for games but recently also for analytics – not the least in deep learning where powerful GPUs solve heavy computations. For visualisation however, typical business intelligence tools today only use a minimal fraction of the total power of our modern devices. As a comparison: a typical computer game renders millions of pixels in 3D several times per second (even via the web browser). In a modern BI tool however, we may struggle to display 20 000 distinct points in a plot.

There are open standards and interfaces to fully utilize the graphical power of a modern display. Computer games often build on OpenGL  to interact with the GPU. In web browsers, a similar performance can be reached with WebGL and JavaScript libraries. Thus, this is not only about regular computers or installed applications, The Manhattan Population Explorer (built with JavaScript on D3.js and Mapbox GL JS) is a notable example of an interactive and visually appealing analysis application that very well runs on a regular smart phone.

price-over-time

Example from one of our prototypes: analysing the housing market – plotting 500 000 points interactively utilizing OpenGL.

Current analysis solutions and application built with advanced graphical analysis are typically custom made for a specific purpose and topic, as in the example above. This is very similar to how BI solutions were built before self-service BI came in to play – specific solutions hand crafted for a few use cases. In contrast to this, Open graphical libraries, incorporated as the core of visualizations, with inspiration from gaming art work, can spark a revolution to how we visually consume and utilize information.

screenshot014002

Natural language empowers

The process of interpreting and working with speech and text is referred to as Natural Language Processing (NLP). NLP interfaces are moving towards the default interface to interaction. For instance Google’s search engine can give you instant replies on questions such as “weather London tomorrow” and with Google Duplex (under development) NLP is used to automate phone calls making appointments for you.  Other examples include the search box popping up as a central feature on many larger web sites and voice services such as Amazon Alexa, Microsoft Cortana, Apple Siri, etc.

When it comes to analysis tools we have seen some movements in this direction lately. In Power BI Service (web) Cortana can be activated to allow for simple Q&A on your prepared reports. Tableau has started talking about NLP for data exploration with “research prototypes you might see in the not too distant future”. The clearest example in this direction is probably ThoughtSpot built with a search-driven analytics interface. Although for most of the business analytics carried out today, clicking is still in focus and clicking is what is being taught on trainings. How can this be, when our other interactions with information move towards natural language interfaces? The key to move forward is to give NLP and advanced visualization a vital role in our solutions, allowing for an entirely natural interface.

Initially it may appear hard to know exactly what to type to get the data right. Isn’t training needed also with an NLP interface? This is where AI comes in to help us interpret our requests and provide us with smart feedback. Having a look at Google again, we continuously get recommendations, automatic spelling correction and lookup of synonyms to optimize our search and hits. With a modern NLP interface, we learn along the way as we utilize it. Frankly speaking though, a natural language interface is best suited for common queries that aren’t too advanced. For more advanced data munging and customized analysis, a data scientist skillset and environment may well be needed. However, the power of e.g. Scientific Python or the R language could easily be incorporated into an NLP interface, where query suggestions turn into code completion. Scripting is a core part of the data science workflow.

An analytical interface built around natural language helps direct focus and fine-tunes your analysis to arrive at intuitive facts and figures, explaining relevant business questions. This is all about empowering all users, friends and colleagues to draw their own conclusions and spread a data-driven mentality. Data science and machine learning techniques fit well into this concept to leverage deeper insights.

 

Conclusion – Business data at everyone’s fingertips

We have highlighted the importance of enriching data with concern taken to unstructured data sources, demonstrated the importance of visual exploration to enable our cognitive abilities, and finally empowering colleagues to draw conclusions through a natural language interface.

Compared with the current state of the art for analysis and business intelligence tools, we stand before a paradigm shift. Standardized self-service tools built on clicking, basic graphics and the focus on structured data will be overrun by a new way of thinking analysis. We all want to create intuitive insights without the need of thorough training on how to use a tool. And we all want our insights and findings to be visually appealing. Seeing is believing. To communicate our findings, conclusions and decisions we need to show the why. Convincing. This is where advanced graphics and art will help us. Natural language is the interface we use for more and more services. It can easily be powered by voice as well. With a natural interface, anyone will learn to utilize the analytical power in the information and draw conclusions. Business data at everyone’s fingertips!

To experience our latest prototype where we demonstrate the concept of data enrichment, advanced visualization and natural language interfaces, take a look at this live presentation.

 

Author: Fredrik Moeschlin, senior Data Scientist at Findwise

Trials & Jubilations: the two sides of the GDPR coin

We have all heard about the totally unhip GDPR and the potential wave of fines and lawsuits. The long arm of the law and it’s stick have been noted. Less talked about but infinitely more exciting is the other side. Turn over the coin and there’s a whole A-Z of organisational and employee carrots. How so?

Sign up to the joint webinar the 18th of April 3PM CET with Smartlogic & Findwise, to find out more.

https://flic.kr/p/fJD1eA

Signal Tools

We all leave digital trails behind us, trails about us. Others that have access to these trails can use our data and information. The new European General Data Protection Regulation (GDPR) intends the usage of such Personal Identifiable Information (PII) to be correct and regulated, with the power to decide given to the individual.

Some organisations are wondering how on earth they can become GDPR compliant when they already have a business to run. But instead of a chore, setting a pathway to allow for some more principled digital organisational housekeeping can bring big organisational gains sooner rather than later.

Many enterprises are now beginning to realise the extra potential gains of having introduced new organisational principles to become compliant. The initial fear of painful change soon subsides when the better quality data comes along to make business life easier. With the further experience of new initiatives from new data analysis, NLP, deep learning, AI, comes the feeling:  why we didn’t we just do this sooner?

Most organisations have a system(s) in place holding PII data, even if getting the right data out in the right format remains problematical. The organisation of data for GDPR compliance can be best achieved so that it becomes transformed to be part of a semantic data layer. With such a layer, knowing all the related data from different sources you have on Joe Bloggs becomes so much easier when he asks for a copy of the data you have about him. Such a semantic data layer will also bring other far-reaching and organisation-wide benefits.

Semantic Data Layer

Semantic Data Layer

For example, heterogeneous data in different formats and from different sources can become unified for all sorts of new smart applications, new insights and new innovation that would have been previously unthinkable. Data can stay where it is… no need to change that relational database yet again because of a new type of data. The same information principles and technologies involved in keeping an eye on PII use, can also be used to improve processes or efficiencies and detect consumer behaviour or market changes.

But it’s not just the business operations that benefit, empowered employees become happier having the right information at hand to do their job. Something that is often difficult to achieve, as in many organisations, no one area “owns” search, making it is usually somebody else’s problem to solve. For the Google-loving employee, not finding stuff at work to help them in their job can be downright frustrating. Well ordered data (better still in a semantic layer) can give them the empowering results page they need. It’s easy to forget that Google only deals with the best structured and linked documentation, why shouldn’t we do the same in our organisations?

Just as the combination of (previously heterogeneous) datasets can give us new insights for innovation, we also observe that innovation increasingly comes in the form of external collaboration. Such collaboration of course increases the potential GDPR risk through data sharing, Facebook being a very current point in case. This brings in the need for organisational policy covering data access, the use and handling of existing data and any new (extra) data created through its use. Such policy should for example cover newly created personal data from statistical inference analysis.

While having a semantic layer may in fact make human error in data usage potentially more possible through increased access, it also provides a better potential solution to prevent misuse as metadata can be baked into the data to classify both information “sensitivity” and control user accessibility rights.

So how does one start?

The first step is to apply some organising principles to any digital domain, be it in or outside the corporate walls [the discipline of organising, Robert Gluschko] and to ask the key questions:

  1. What is being organised?
  2. Why is it being organised?
  3. How much of it is being organised?
  4. When is it being organised?
  5. Where is it being organised?

Secondly start small, apply organising principles by focusing on the low-hanging fruit: the already structured data within systems. The creation of quality data with added metadata in a semantic layer can have a magnetic effect within an organisation (build that semantic platform and they will come).

Step three: start being creative and agile.

A case story

A recent case, within the insurance industry reveals some cues to why these set of tools will improve signals and attention for becoming more compliant with regulations dealing with PII. Our client knew about a set of collections (file shares) where PII might be found. Adding search, and NLP/ML opened up the pandoras box with visual analytic tools. This is the simple starting point, finding i.e names or personal number concepts in the text. Second to this will be to add semantics, where industry standard terminologies and ontologies can further help define the meaning of things.

In all corporate settings, there exist both well-cultivated and governed collections of information resources, but usually also a massive unmapped terrain of content collections, where no one has a clue if there might be PII hidden amongst it. The strategy using a semantic data layer should always be combined with operations to narrowing down the collections to become part of the signalling system – it is generally not a good idea to boil the whole-data-ocean in the enterprise information environment. Rather through such work practices, workers are aware of the data hot-spots, the well-cultivated collections of information and that unmapped terrain. Having the additional notion of PII to contend with will make it that just bit easier to recognise those places where semantic enhancement is needed.

not a good idea to boil the whole-data-ocean

Running with the same pipeline (with the option of further models to refine and improve certain data) will not only allow for the discovery of multiple occurrences of named entities (individuals) but also the narrative and context in which they appear.
Having a targeted model & terminology for the insurance industry will only go to improve this semantic process further. This process can certainly ease what may be currently manual processes or processes that don’t exist because of their manual pain: for example, finding sensitive textual information from documents within applications or from online textual chats. Developing such a smart information platform enables the smarter linking of other things from the model, such as service packages, service units / or organisational entities, spatial data as named places or timelines, or medical treatments, things perhaps currently you have less control over.

There’s not much time before the 25th May and the new GDPR, but we’ll still be here afterwards to help you with a compliance burden or a creative pathway, depending on your outlook.

Alternatively sign up to the joint webinar the 11th of April 3PM CET with Smartlogic & Findwise, to find out more.

View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Peter Voisey's LinkedIn profilePeter Voisey
View James Morris's LinkedIn profileJames Morris

Major highlights from Elastic{ON} 2018 – Findwise reporting

Two Elastic fans have just returned from San Francisco and the Elastic{ON} 2018 conference. With almost 3.000 participants this year Elastic{ON} is the biggest Elastic conference in the world.

Findwise regularly organises events and meetups, covering among other topics Elastic. Keep an eye for an event close to you.

Here are some of the main highlights from Elastic{ON} 2018.

Let’s start with the biggest announcement of them all, Elastic is opening the source code of the XPack. This mean that you now not only will be able to access the Elastic stack source code, but also the subscription-based code of XPack that up until now have been inaccessible. This opens the opportunity for you as a developer to contribute back code.

news-elasticon-2018

 

Data rollups is a great new feature for anyone with the need to look at old data but feel the storage costs are too high. With rollups only predetermined metrics and terms will be stored. Still allowing you to analyze these dimensions of your data but no longer being able to view the individual documents.

Azure monitoring available in Xpack Basic. Elastic will in an upcoming 6.x release an Azure Monitoring Module, which will consist of a bundle of Kibana dashboards and make it really easy to get started exploring your Azure infrastructure. The monitoring module will be released as part of the XPack basic version – in other words, it will be free to use.

Forecasting was the big new thing in X-packs Machine learning component. As the name suggest the machine learning module can now not only spot anomalies in your data but also predict how it will change in the future.

Security in Kibana will get an update to make it work more like the Security module in Elasticsearch. This will also mean that one of the most requested security questions for Kibana will be resolved, giving users access to only some dashboards.

Dashboard are great and a fundamental part of Kibana but sometimes you want to present your data in more dynamic ways with less focus on data density. This is where Canvas comes in. Canvas is a new Kibana module to produce infographics rather than dashboards but still using live data from Elasticsearch.

Monitoring of Kubernetes and Docker containers will be made a lot easier with the Elastic stack. A new infra component will be created just for this growing use case. This component will be powered by data collected by Beats which now also has an auto discovery functionality within Kubernetes. This will give an overview of not only your Kubernetes cluster but also the individual containers within the cluster.

Geo capabilities within Kibana will be extended to support multiple map layers. This will make it possible to do more kinds of visualizations on maps. Furthermore, work is being done on supporting not only Geo points but also shapes.

One problem some have had with maps is that you need access to the Elastic map service and if you deploy the Elastic stack within a company network this might not be reachable. To solve this work is being done to make it possible to deploy the Elastic maps service locally.

Elastic acquired SaaS solution Swiftype last year. Since then Swiftype have been busy developing even more features to its portfolio. At current Swiftype comes in 3 different version:

  • Swiftype site Search – An out of the box (OOTB) solution for website search
  • Swiftype Enterprise Search – Currently in beta version, but with focus on internal, cloud based datasources (for now) like G Suite, Dropbox, O365, Zendesk etc.
  • Swiftype App Search – A set of API’s and developer tools that makes it quick to build user faced search applications

 

Elastic has also started to look at replacing the Zen protocol used to keep clusters in sync. Currently a PoC is being made to try to create a consensus algorithm that follow modern academic best practices. With the added benefit to remove the minimum master nodes setting, currently one of the most common pitfalls when running Elasticsearch in production.

ECE – Elastic Cloud Enterprise is big focus for Elastic and make it possible for customers to setup a fully service-based search solution being maintained by Elastic.

If you are interested in hearing more about Elastic or Findwise visit https://findwise.com/en/technology/elastic-elasticsearch

elasticon 2018

 

Writers: Mads Elbrond, regional manager Findwise Denmark & Torsten Landergren, senior expert consultant

XRANK in SharePoint Search REST API

I work with SharePoint Search from some time now. Since many clients need assistance on Search optimization KQL is one of my best mates. Especially XRANK is very powerful function that leverage KQL capabilities but also enlarge its complexity. Anyway I feel quite sure about what we can achieve using KQL and how. However last week a colleague of mine asked me about what is proper syntax of XRANK in REST search query…and I was like “emmm…”.

There are many not obvious questions – which characters need to be encoded? Is the syntax the same as in common KQL query?

I did quick documentation check as well as googling for an answer but there was no satisfying results at all (if there is no answer in Stack Overflow the web contains no answer).

So this post is about clarification for XRANK syntax in REST API calls.

Use Search Query Tool

The old sentence says “Do not break open doors”. That’s why I did not investigate topic by myself trying different REST queries to SP Search. Instead I used great great great tool called Search Query Tool. It really makes your work with search easier and faster. You can build any kind of KQL query in it and it will be translated to REST query because it uses it to communicate with SharePoint.

So for instance if you want to execute following KQL query

*  XRANK(cb=1) Position:Manager

Its REST equivalent will be:

<SearchEndpointURL>?querytext=’*+XRANK(cb%3d1)+Position:Manager’

As you can see syntax is the same as in common KQL query however ‘=’ character has been encoded to URI format in order to be properly understood by browser and endpoint and any spaces has been replaced by “+”.

Complex XRANK queries

Remember that in order to build you must remember about proper use of parenthesis. For instance if you want to make multiple XRANK boosts you need to arrange them in following way:

(SearchQuery XRANK(cb=1) condition1) XRANK(cb=1) condition2

In other words, if you want to add boosting for position AND for date freshness your KQL will look like below:

(* XRANK(cb=1) Position:Manager) XRANK(cb=0.5) LastModifiedTime>{Today-365}

and your REST query text will be like following:

querytext='(*+XRANK(cb%3d1)+Position:Manager)+XRANK(cb%3d0.5)+LastModifiedTime>{Today-30}’

which gives you following results:

  • results older than 30 days and for person that position does not contain “Manager” in its name will get 0 ranking points
  • results modified less than 30 days ago and for person that position does contain “Manager” in its name will get 0.5 ranking points
  • results older than 30 days and for person that position does contain “Manager” in its name will get 1 ranking points
  • results modified less than 30 days ago and for person that position does not contain “Manager” in its name will get 1.5 ranking points

 

Hope it helps you in using XRANK and KQL in REST API queries.

 

Thanks & have a great day!

How to execute ANY SharePoint powershell command programmatically using C#

In one of my projects my team faced following challenges:

  • How to add query rules programmatically using C#
  • How to update thesaurus programmatically using C#

I tried to find information in official documentation but it was not very helpful neither was googling.

Powershell cmdlets to c# assembly mapping

In my team we were thinking what to do in this situation and one of my colleagues came with brilliant idea – he searched for PowerShell cmdlet in file explorer with searching in files content option turned on.

Result? What he found was exactly what we were looking for.

In location “C:\Program Files\Common Files\microsoft shared\Web Server Extensions\16\CONFIG\PowerShell\Registration” there is file named OSSSearchCmdlets.xml.

What it contains is xml structure with following structure:

<ps:Cmdlet>

<ps:VerbName>Get-SPEnterpriseSearchCrawlContentSource</ps:VerbName>

<ps:ClassName>Microsoft.Office.Server.Search.Cmdlet.GetSearchCrawlContentSource</ps:ClassName>

<ps:HelpFile>Microsoft.Office.Server.Search.dll-help.xml</ps:HelpFile>

</ps:Cmdlet>

 

My eyes see this just as below:

<PowershellToAssembllyMapping>

<PowerShellCmdName>What-I-Have</PowerShellCmdName>

<C#NameAndLocation>What-I-Am-Looking-For</C#NameAndLocation>

<Whatever>Whatever.xml</Whatever>

</PowershellToAssembllyMapping>

Maps for Search, WSS and many more

OSSSearchCmdlets.xml file contains ps cmdlets to .NET assemblies mapping only for SharePoint Search.

But in the same location there is also another file called WSSCmdlet.xml that contains all kind of cmdlets mapping like

  • Enable-SPFeature
  • New-SPContentDatabase
  • Get-SPFarm
  • Etc.

Shortly everything that you can do with SharePoint Application using PowerShell.

 

If you just want to quickly check what those files contains I’ve uploaded them to my github. I put there also more files like for Reporting Services, Workflows etc. You can check it here.

Have you found useful this tip? Maybe you know alternative way? Share it in comments!

Thanks & Have a great day! 🙂