Analytical power at your fingertips with natural language and modern visualisation

Today we are all getting used to interactive dashboards and plots in self-service business intelligence (BI) solutions to drill down and slice our facts and figures. The market for BI tools has seen an increased competition recently with Microsoft Power BI challenging proven solutions such as Tableau, Qlik, IBM Cognos, SAP Lumira and others. At the same time, it is hard to benchmark tools against each other as they all come with very similar features. Has the BI development saturated?

Compared to how we are used to consume graphics and information, the BI approach to interactive analysis is somewhat different. For instance: a dashboard or report is typically presented in a printer-oriented flat layout on white background, weeks of user training is typically needed before “self-service” can be reached, and interactions are heavily click-oriented – you could almost feel it in your mouse elbow when opening the BI frontend.

On the other hand, when surfing top internet sites and utilizing social media, our interactions are centred around the search box and the natural interface of typing or speaking. Furthermore, there is typically no training needed to make use of Google, Facebook, LinkedIn, Pinterest, Twitter, etc. Through an intuitive interface we learn along the way. And looking at graphics and visualization, we can learn a lot from the gaming industry where players are presented with well-designed artwork – including statistics presented in an intuitive way to maximize the graphical impression.

Join our breakfast seminar in Stockholm May 23rd to see how a visiual analysis using natural language can look like. 

screenshot007040

Rethink your business analytics

It appears as if BI tools are sub optimized for a limited scope and use case. To really drive digitalization and make use of our full information potential, we need a new way of thinking for business analytics. Not just continuous development, rather a revolution to the business intelligence approach. Remember: e-mail was not a consequence of the continuous development of post offices and mail handling. We need to rethink business analytics.

At Findwise, we see that the future for business analytics involves:

  • added value by enriching information with new unstructured sources,
  • utilizing the full potential of visualization and graphics to explore our information,
  • using natural language to empower colleagues to draw their own conclusions intuitively and secure

 

Enrich data

There is a lot of talk about data science today; how we can draw conclusions from our data and make predictions about the future. This power largely depends on the value in the data we possess. Enriching data is all about adding new value. The enrichment may include a multitude of sources, internal and external, for instance:

  • detailed customer transaction logs
  • weather history and forecasts
  • geospatial data (locations and maps)
  • user tracking and streams
  • social media and (fake) news

Comparing with existing data, a new data source could be orthogonal to the existing data and add a completely new understanding. Business solutions of today are often limited to highly structured information sources or information providers. There is a large power in unstructured, often untouched, information sources. However, it is not as straight forward as launching a data warehouse integration, since big data techniques are required to handle the volume, velocity and variety.

At Findwise, utilizing the unstructured data has always been the key in developing unique solutions for search and analytics. The power of our solutions lies in incorporating multiple sources online and continuously enrich with new aspects. For this we even developed our own framework, i3, with over hundred connectors for unstructured data sources. A modern search engine (or insight engine) scales horizontally for big data applications and easily consumes billions of texts, logs, geospatial and other unstructured – as well as structured – data. This is where search meets analytics, and where all the enrichment takes place to add unique information value.

 

Visually explore

As human beings we have very strong visual and cognitive abilities, developed over millions of years to distinguish complex patterns and scenarios. Visualization of data is all about packaging information in such a way that we can utilize our cognitive skills to make sense out of the noise. Great visualization and interaction unleash the human power of perception and derivation. It allows us make sense out of the complex world around us.

When it comes to computer visualization, we have seen strong development in the use of graphical processors (GPUs) for games but recently also for analytics – not the least in deep learning where powerful GPUs solve heavy computations. For visualisation however, typical business intelligence tools today only use a minimal fraction of the total power of our modern devices. As a comparison: a typical computer game renders millions of pixels in 3D several times per second (even via the web browser). In a modern BI tool however, we may struggle to display 20 000 distinct points in a plot.

There are open standards and interfaces to fully utilize the graphical power of a modern display. Computer games often build on OpenGL https://www.opengl.org/ to interact with the GPU. In web browsers, a similar performance can be reached with WebGL and JavaScript libraries. Thus, this is not only about regular computers or installed applications, The Manhattan Population Explorer (built with JavaScript on D3.js and Mapbox GL JS) is a notable example of an interactive and visually appealing analysis application that very well runs on a regular smart phone.

price-over-time

Example from one of our prototypes: analysing the housing market – plotting 500 000 points interactively utilizing OpenGL.

Current analysis solutions and application built with advanced graphical analysis are typically custom made for a specific purpose and topic, as in the example above. This is very similar to how BI solutions were built before self-service BI came in to play – specific solutions hand crafted for a few use cases. In contrast to this, Open graphical libraries, incorporated as the core of visualizations, with inspiration from gaming art work, can spark a revolution to how we visually consume and utilize information.

screenshot014002

Natural language empowers

The process of interpreting and working with speech and text is referred to as Natural Language Processing (NLP). NLP interfaces are moving towards the default interface to interaction. For instance Google’s search engine can give you instant replies on questions such as “weather London tomorrow” and with Google Duplex (under development) NLP is used to automate phone calls making appointments for you.  Other examples include the search box popping up as a central feature on many larger web sites and voice services such as Amazon Alexa, Microsoft Cortana, Apple Siri, etc.

When it comes to analysis tools we have seen some movements in this direction lately. In Power BI Service (web) Cortana can be activated to allow for simple Q&A on your prepared reports. Tableau has started talking about NLP for data exploration with “research prototypes you might see in the not too distant future”. The clearest example in this direction is probably ThoughtSpot https://www.thoughtspot.com/ built with a search-driven analytics interface. Although for most of the business analytics carried out today, clicking is still in focus and clicking is what is being taught on trainings. How can this be, when our other interactions with information move towards natural language interfaces? The key to move forward is to give NLP and advanced visualization a vital role in our solutions, allowing for an entirely natural interface.

Initially it may appear hard to know exactly what to type to get the data right. Isn’t training needed also with an NLP interface? This is where AI comes in to help us interpret our requests and provide us with smart feedback. Having a look at Google again, we continuously get recommendations, automatic spelling correction and lookup of synonyms to optimize our search and hits. With a modern NLP interface, we learn along the way as we utilize it. Frankly speaking though, a natural language interface is best suited for common queries that aren’t too advanced. For more advanced data munging and customized analysis, a data scientist skillset and environment may well be needed. However, the power of e.g. Scientific Python or the R language could easily be incorporated into an NLP interface, where query suggestions turn into code completion. Scripting is a core part of the data science workflow.

An analytical interface built around natural language helps direct focus and fine-tunes your analysis to arrive at intuitive facts and figures, explaining relevant business questions. This is all about empowering all users, friends and colleagues to draw their own conclusions and spread a data-driven mentality. Data science and machine learning techniques fit well into this concept to leverage deeper insights.

 

Conclusion – Business data at everyone’s fingertips

We have highlighted the importance of enriching data with concern taken to unstructured data sources, demonstrated the importance of visual exploration to enable our cognitive abilities, and finally empowering colleagues to draw conclusions through a natural language interface.

Compared with the current state of the art for analysis and business intelligence tools, we stand before a paradigm shift. Standardized self-service tools built on clicking, basic graphics and the focus on structured data will be overrun by a new way of thinking analysis. We all want to create intuitive insights without the need of thorough training on how to use a tool. And we all want our insights and findings to be visually appealing. Seeing is believing. To communicate our findings, conclusions and decisions we need to show the why. Convincing. This is where advanced graphics and art will help us. Natural language is the interface we use for more and more services. It can easily be powered by voice as well. With a natural interface, anyone will learn to utilize the analytical power in the information and draw conclusions. Business data at everyone’s fingertips!

To experience our latest prototype where we demonstrate the concept of data enrichment, advanced visualization and natural language interfaces, come join our breakfast event in Stockholm next week.

 

Author: Fredrik Moeschlin, senior Data Scientist at Findwise

Trials & Jubilations: the two sides of the GDPR coin

We have all heard about the totally unhip GDPR and the potential wave of fines and lawsuits. The long arm of the law and it’s stick have been noted. Less talked about but infinitely more exciting is the other side. Turn over the coin and there’s a whole A-Z of organisational and employee carrots. How so?

Sign up to the joint webinar the 18th of April 3PM CET with Smartlogic & Findwise, to find out more.

https://flic.kr/p/fJD1eA

Signal Tools

We all leave digital trails behind us, trails about us. Others that have access to these trails can use our data and information. The new European General Data Protection Regulation (GDPR) intends the usage of such Personal Identifiable Information (PII) to be correct and regulated, with the power to decide given to the individual.

Some organisations are wondering how on earth they can become GDPR compliant when they already have a business to run. But instead of a chore, setting a pathway to allow for some more principled digital organisational housekeeping can bring big organisational gains sooner rather than later.

Many enterprises are now beginning to realise the extra potential gains of having introduced new organisational principles to become compliant. The initial fear of painful change soon subsides when the better quality data comes along to make business life easier. With the further experience of new initiatives from new data analysis, NLP, deep learning, AI, comes the feeling:  why we didn’t we just do this sooner?

Most organisations have a system(s) in place holding PII data, even if getting the right data out in the right format remains problematical. The organisation of data for GDPR compliance can be best achieved so that it becomes transformed to be part of a semantic data layer. With such a layer, knowing all the related data from different sources you have on Joe Bloggs becomes so much easier when he asks for a copy of the data you have about him. Such a semantic data layer will also bring other far-reaching and organisation-wide benefits.

Semantic Data Layer

Semantic Data Layer

For example, heterogeneous data in different formats and from different sources can become unified for all sorts of new smart applications, new insights and new innovation that would have been previously unthinkable. Data can stay where it is… no need to change that relational database yet again because of a new type of data. The same information principles and technologies involved in keeping an eye on PII use, can also be used to improve processes or efficiencies and detect consumer behaviour or market changes.

But it’s not just the business operations that benefit, empowered employees become happier having the right information at hand to do their job. Something that is often difficult to achieve, as in many organisations, no one area “owns” search, making it is usually somebody else’s problem to solve. For the Google-loving employee, not finding stuff at work to help them in their job can be downright frustrating. Well ordered data (better still in a semantic layer) can give them the empowering results page they need. It’s easy to forget that Google only deals with the best structured and linked documentation, why shouldn’t we do the same in our organisations?

Just as the combination of (previously heterogeneous) datasets can give us new insights for innovation, we also observe that innovation increasingly comes in the form of external collaboration. Such collaboration of course increases the potential GDPR risk through data sharing, Facebook being a very current point in case. This brings in the need for organisational policy covering data access, the use and handling of existing data and any new (extra) data created through its use. Such policy should for example cover newly created personal data from statistical inference analysis.

While having a semantic layer may in fact make human error in data usage potentially more possible through increased access, it also provides a better potential solution to prevent misuse as metadata can be baked into the data to classify both information “sensitivity” and control user accessibility rights.

So how does one start?

The first step is to apply some organising principles to any digital domain, be it in or outside the corporate walls [the discipline of organising, Robert Gluschko] and to ask the key questions:

  1. What is being organised?
  2. Why is it being organised?
  3. How much of it is being organised?
  4. When is it being organised?
  5. Where is it being organised?

Secondly start small, apply organising principles by focusing on the low-hanging fruit: the already structured data within systems. The creation of quality data with added metadata in a semantic layer can have a magnetic effect within an organisation (build that semantic platform and they will come).

Step three: start being creative and agile.

A case story

A recent case, within the insurance industry reveals some cues to why these set of tools will improve signals and attention for becoming more compliant with regulations dealing with PII. Our client knew about a set of collections (file shares) where PII might be found. Adding search, and NLP/ML opened up the pandoras box with visual analytic tools. This is the simple starting point, finding i.e names or personal number concepts in the text. Second to this will be to add semantics, where industry standard terminologies and ontologies can further help define the meaning of things.

In all corporate settings, there exist both well-cultivated and governed collections of information resources, but usually also a massive unmapped terrain of content collections, where no one has a clue if there might be PII hidden amongst it. The strategy using a semantic data layer should always be combined with operations to narrowing down the collections to become part of the signalling system – it is generally not a good idea to boil the whole-data-ocean in the enterprise information environment. Rather through such work practices, workers are aware of the data hot-spots, the well-cultivated collections of information and that unmapped terrain. Having the additional notion of PII to contend with will make it that just bit easier to recognise those places where semantic enhancement is needed.

not a good idea to boil the whole-data-ocean

Running with the same pipeline (with the option of further models to refine and improve certain data) will not only allow for the discovery of multiple occurrences of named entities (individuals) but also the narrative and context in which they appear.
Having a targeted model & terminology for the insurance industry will only go to improve this semantic process further. This process can certainly ease what may be currently manual processes or processes that don’t exist because of their manual pain: for example, finding sensitive textual information from documents within applications or from online textual chats. Developing such a smart information platform enables the smarter linking of other things from the model, such as service packages, service units / or organisational entities, spatial data as named places or timelines, or medical treatments, things perhaps currently you have less control over.

There’s not much time before the 25th May and the new GDPR, but we’ll still be here afterwards to help you with a compliance burden or a creative pathway, depending on your outlook.

Alternatively sign up to the joint webinar the 11th of April 3PM CET with Smartlogic & Findwise, to find out more.

View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Peter Voisey's LinkedIn profilePeter Voisey
View James Morris's LinkedIn profileJames Morris

Major highlights from Elastic{ON} 2018 – Findwise reporting

Two Elastic fans have just returned from San Francisco and the Elastic{ON} 2018 conference. With almost 3.000 participants this year Elastic{ON} is the biggest Elastic conference in the world.

Findwise regularly organises events and meetups, covering among other topics Elastic. Keep an eye for an event close to you.

Here are some of the main highlights from Elastic{ON} 2018.

Let’s start with the biggest announcement of them all, Elastic is opening the source code of the XPack. This mean that you now not only will be able to access the Elastic stack source code, but also the subscription-based code of XPack that up until now have been inaccessible. This opens the opportunity for you as a developer to contribute back code.

news-elasticon-2018

 

Data rollups is a great new feature for anyone with the need to look at old data but feel the storage costs are too high. With rollups only predetermined metrics and terms will be stored. Still allowing you to analyze these dimensions of your data but no longer being able to view the individual documents.

Azure monitoring available in Xpack Basic. Elastic will in an upcoming 6.x release an Azure Monitoring Module, which will consist of a bundle of Kibana dashboards and make it really easy to get started exploring your Azure infrastructure. The monitoring module will be released as part of the XPack basic version – in other words, it will be free to use.

Forecasting was the big new thing in X-packs Machine learning component. As the name suggest the machine learning module can now not only spot anomalies in your data but also predict how it will change in the future.

Security in Kibana will get an update to make it work more like the Security module in Elasticsearch. This will also mean that one of the most requested security questions for Kibana will be resolved, giving users access to only some dashboards.

Dashboard are great and a fundamental part of Kibana but sometimes you want to present your data in more dynamic ways with less focus on data density. This is where Canvas comes in. Canvas is a new Kibana module to produce infographics rather than dashboards but still using live data from Elasticsearch.

Monitoring of Kubernetes and Docker containers will be made a lot easier with the Elastic stack. A new infra component will be created just for this growing use case. This component will be powered by data collected by Beats which now also has an auto discovery functionality within Kubernetes. This will give an overview of not only your Kubernetes cluster but also the individual containers within the cluster.

Geo capabilities within Kibana will be extended to support multiple map layers. This will make it possible to do more kinds of visualizations on maps. Furthermore, work is being done on supporting not only Geo points but also shapes.

One problem some have had with maps is that you need access to the Elastic map service and if you deploy the Elastic stack within a company network this might not be reachable. To solve this work is being done to make it possible to deploy the Elastic maps service locally.

Elastic acquired SaaS solution Swiftype last year. Since then Swiftype have been busy developing even more features to its portfolio. At current Swiftype comes in 3 different version:

  • Swiftype site Search – An out of the box (OOTB) solution for website search
  • Swiftype Enterprise Search – Currently in beta version, but with focus on internal, cloud based datasources (for now) like G Suite, Dropbox, O365, Zendesk etc.
  • Swiftype App Search – A set of API’s and developer tools that makes it quick to build user faced search applications

 

Elastic has also started to look at replacing the Zen protocol used to keep clusters in sync. Currently a PoC is being made to try to create a consensus algorithm that follow modern academic best practices. With the added benefit to remove the minimum master nodes setting, currently one of the most common pitfalls when running Elasticsearch in production.

ECE – Elastic Cloud Enterprise is big focus for Elastic and make it possible for customers to setup a fully service-based search solution being maintained by Elastic.

If you are interested in hearing more about Elastic or Findwise visit https://findwise.com/en/technology/elastic-elasticsearch

elasticon 2018

 

Writers: Mads Elbrond, regional manager Findwise Denmark & Torsten Landergren, senior expert consultant

XRANK in SharePoint Search REST API

I work with SharePoint Search from some time now. Since many clients need assistance on Search optimization KQL is one of my best mates. Especially XRANK is very powerful function that leverage KQL capabilities but also enlarge its complexity. Anyway I feel quite sure about what we can achieve using KQL and how. However last week a colleague of mine asked me about what is proper syntax of XRANK in REST search query…and I was like “emmm…”.

There are many not obvious questions – which characters need to be encoded? Is the syntax the same as in common KQL query?

I did quick documentation check as well as googling for an answer but there was no satisfying results at all (if there is no answer in Stack Overflow the web contains no answer).

So this post is about clarification for XRANK syntax in REST API calls.

Use Search Query Tool

The old sentence says “Do not break open doors”. That’s why I did not investigate topic by myself trying different REST queries to SP Search. Instead I used great great great tool called Search Query Tool. It really makes your work with search easier and faster. You can build any kind of KQL query in it and it will be translated to REST query because it uses it to communicate with SharePoint.

So for instance if you want to execute following KQL query

*  XRANK(cb=1) Position:Manager

Its REST equivalent will be:

<SearchEndpointURL>?querytext=’*+XRANK(cb%3d1)+Position:Manager’

As you can see syntax is the same as in common KQL query however ‘=’ character has been encoded to URI format in order to be properly understood by browser and endpoint and any spaces has been replaced by “+”.

Complex XRANK queries

Remember that in order to build you must remember about proper use of parenthesis. For instance if you want to make multiple XRANK boosts you need to arrange them in following way:

(SearchQuery XRANK(cb=1) condition1) XRANK(cb=1) condition2

In other words, if you want to add boosting for position AND for date freshness your KQL will look like below:

(* XRANK(cb=1) Position:Manager) XRANK(cb=0.5) LastModifiedTime>{Today-365}

and your REST query text will be like following:

querytext='(*+XRANK(cb%3d1)+Position:Manager)+XRANK(cb%3d0.5)+LastModifiedTime>{Today-30}’

which gives you following results:

  • results older than 30 days and for person that position does not contain “Manager” in its name will get 0 ranking points
  • results modified less than 30 days ago and for person that position does contain “Manager” in its name will get 0.5 ranking points
  • results older than 30 days and for person that position does contain “Manager” in its name will get 1 ranking points
  • results modified less than 30 days ago and for person that position does not contain “Manager” in its name will get 1.5 ranking points

 

Hope it helps you in using XRANK and KQL in REST API queries.

 

Thanks & have a great day!

How to execute ANY SharePoint powershell command programmatically using C#

In one of my projects my team faced following challenges:

  • How to add query rules programmatically using C#
  • How to update thesaurus programmatically using C#

I tried to find information in official documentation but it was not very helpful neither was googling.

Powershell cmdlets to c# assembly mapping

In my team we were thinking what to do in this situation and one of my colleagues came with brilliant idea – he searched for PowerShell cmdlet in file explorer with searching in files content option turned on.

Result? What he found was exactly what we were looking for.

In location “C:\Program Files\Common Files\microsoft shared\Web Server Extensions\16\CONFIG\PowerShell\Registration” there is file named OSSSearchCmdlets.xml.

What it contains is xml structure with following structure:

<ps:Cmdlet>

<ps:VerbName>Get-SPEnterpriseSearchCrawlContentSource</ps:VerbName>

<ps:ClassName>Microsoft.Office.Server.Search.Cmdlet.GetSearchCrawlContentSource</ps:ClassName>

<ps:HelpFile>Microsoft.Office.Server.Search.dll-help.xml</ps:HelpFile>

</ps:Cmdlet>

 

My eyes see this just as below:

<PowershellToAssembllyMapping>

<PowerShellCmdName>What-I-Have</PowerShellCmdName>

<C#NameAndLocation>What-I-Am-Looking-For</C#NameAndLocation>

<Whatever>Whatever.xml</Whatever>

</PowershellToAssembllyMapping>

Maps for Search, WSS and many more

OSSSearchCmdlets.xml file contains ps cmdlets to .NET assemblies mapping only for SharePoint Search.

But in the same location there is also another file called WSSCmdlet.xml that contains all kind of cmdlets mapping like

  • Enable-SPFeature
  • New-SPContentDatabase
  • Get-SPFarm
  • Etc.

Shortly everything that you can do with SharePoint Application using PowerShell.

 

If you just want to quickly check what those files contains I’ve uploaded them to my github. I put there also more files like for Reporting Services, Workflows etc. You can check it here.

Have you found useful this tip? Maybe you know alternative way? Share it in comments!

Thanks & Have a great day! 🙂

Reflection… is like violence

“If it doesn’t solve your problems, you are probably not using enough of it”

Introduction

What is Reflection?

You can think of your class like it is your body. The code contained in it doesn’t “know” anything about its contents (unless it’s statically referenced, but I wouldn’t consider it “knowing about”), just like you don’t know what is inside your body.

The Reflection is then like an encyclopedia of a human body that contains all the information about it. You can use it to discover all the information about all your body parts.

So, what is that Reflection in computer programming?

The Reflection is a mechanism that allows you to discover the information about what your class is made of.

You can enumerate in the runtime through Properties, Methods, Events and Fields, no matter if they are static (Shared in Visual Basic) or instance, public or non-public.

But it does not limit you to discover information about only your classes. You can actually get the information about all the classes currently loaded into your application domain.

This means, Reflection gives you access to the places you can’t get to in a “normal” way.

The purpose of using reflection

Some say that there’s no point in getting access to class members through Reflection because:

  • Public members are accessible in the “normal” way, so using Reflection just breaks the strong typing – the compiler will no longer tell you if you are doing something wrong;
  • Accessing non-public members is just hacking.

They may be right, in both cases, but

  • Sometimes it’s way more convenient to let the program discover members in runtime so we don’t need to explicitly access them one by one, it’s more refactoring-proof, sometimes it’s the only way to implement a well-known design pattern;
  • And sometimes, hacking is the only reasonable way of achieving a goal.

Still in doubt? Let’s take a look at some examples below that describe when to use Reflection and get a profit and when it’s more like a bad practice…

Use-case 1: Enumerate your own members

One of the most popular use cases of Reflection is when you have to serialize your class and send such prepared data over the net e.g. using JSON format. A serializer (a class containing one essential method responsible for creating a data ready to send out of your class object and probably second method to restore the data back to your class object) does nothing else than enumerate through all the properties contained by the class and reads all the data from the class object and then formats it to prepare desired format (such as JSON).

Static typing approach – specify properties explicitly when assembling the output string

Now imagine you have to serialize a class without using the reflection. Let’s take a look at the example below:

Suppose we have a model class

public class SampleClass
{
    public int Id { get; set; }
    public string Name { get; set; }
}

And we want to serialize an instance of this class to a comma separated string.

Sounds simple, so let’s do it. We create a serializer class:

public static class SampleClassSerializer
{
    static string Serialize(SampleClass instance)
    {
        return $"{obj.Id},{obj.Name}";
    }
} 

Looks simple and straightforward.

But what if we need to add a property to our class? What if we have to serialize another class? All these operations require to change the serializer class which involves additional frustrating work that needs to be done each time something changes in the model class.

Reflection approach – enumerate properties automatically when assembling the output string

How to do it properly, then?

We create a seemingly bit more complicated serializer method:

 

//Make the method generic so object of any class can be passed.
//Optionally constrain it so only object of a class specified in a constraint,
//a class derived from a class specified in a constraint
//or a class or an interface implementing an interface specified in a constraint can be passed.
public static string Serialize<T>(T obj) where T : SampleClass
{
    //Get System.Type that the passed object is of.
    //An object of System.Type contains all the information provided by the Reflection mechanism we need to serialize an object.
    System.Type type = obj.GetType();

    //Get an IEnumerable of System.Reflection.PropertyInfo containing a collection of information about every discovered property.
    IEnumerable<System.Reflection.PropertyInfo> properties = type.GetProperties()
    //Filter properties to ones that are not decorated with Browsable(false) attribute
    //in case we want to prevent them from being discovered be the serializer.
        .Where(property => property.GetCustomAttributes(false).OfType<BrowsableAttribute>().FirstOrDefault()?.Browsable ?? true);

    return string.Join(",", //Format a comma separated string.
        properties.Select(property => //Enumerate through discovered properties.
        {
            //Get the value of the passed object's property.
            //At this point we don't know the type of the property but we don't care about it,
            //because all we want to do is to get its value and format it to string.
            object value = property.GetValue(obj);

            //Return value formatted to string. If a custom formatting is required,
            //in this simple example the type of the property should override the Object.ToString() method.
            return value.ToString();
        }));
} 

The example above shows that adding a few simple instructions to the serializer makes us we can virtually forget about it. Now we can add, delete or change any of the properties, we can create new classes and still Reflection used in the serializer will still discover all the properties we need for us.

With this approach all the changes we make to the class are automatically reflected by the serializer and the way of changing its behavior for certain properties is to decorate them with appropriate attributes.

For example, we add a property to our model class we don’t want to be serialized. We can use the BrowsableAttribute to achieve that. Now our class will look like this:

public class SampleClass
{
    public int Id { get; set; }

    public string Name { get; set; }

    [Browsable(false)]
    public object Tag { get; set; }
} 

Note that after GetProperties method in serializer we are using Where method to filter out properties that have Browsable attribute specified and set to false. This will prevent newly added property Tag (with [Browsable(false)]) from being serialized.

Use-case 2: Enumerate class’ methods

This is an academic example. Suppose we are taking a test at a university where our objective is to create a simple calculator that processes two decimal numbers. We’ve been told that to pass the test, the calculator must perform addition and subtraction operations.

So we create the following Form:

There are two rows, each one uses different approach to the problem. The upper row uses common “academic” approach which involves the switch statement. The lower row uses the Reflection.

The upper row consist of following controls (from left to right): numericUpDown1, comboBox1, numericUpDown2, button1, label1

The lower row consist of following controls (from left to right): numericUpDown3, comboBox2, numericUpDown4, button2, label2

In a real case we would create only one row.

Static typing approach – use the switch statement to decide which method to use based on user input

Next, to make it properly, we create a class containing some static arithmetic methods that will do some math for us:

public class Calc
{
    public static decimal Add(decimal x, decimal y)
    {
        return x + y;
    }

    public static decimal Subtract(decimal x, decimal y)
    {
        return x - y;
    }
} 

And then we are leveraging our newly created class. Since it’s an academic test, we don’t want to waste precious time on thinking how to do it properly, so we are choosing the “switch” approach:

private void button1_Click(object sender, EventArgs e)
{
    switch (comboBox1.SelectedItem)
    {
        case "+":
            label1.Text = Calc.Add(numericUpDown1.Value, numericUpDown2.Value).ToString();
            break;

        case "-":
            label1.Text = Calc.Subtract(numericUpDown1.Value, numericUpDown2.Value).ToString();
            break;

        default:
            break;
    }
} 

Plus WinForms Designer generated piece of code that populates the operation combobox:

this.comboBox1.Items.AddRange(new object[] 
{
    "+",
    "-"
}); 

The code is pretty simple and self-explanatory.

But then, the lecturer says that if we want an A grade, the calculator must perform multiplication and division operations as well, otherwise, we will get B grade at most.

Additional hazard (by hazard I mean a potential threat that may in some circumstances lead to occurrence of unhandled exception or other unexpected behavior) in this approach is that we are using “magic constants”, the “+”,“-” strings. Note that for each arithmetic operation one instance of the string is generated by the designer and another instance is used in each case statement.

If by any reason we decide to change one instance, the other one will remain unchanged and there’s even no chance that IntelliSense will catch such misspell and for compiler everything is fine.

But, apart from that, let’s hope that the lecturer will like the “+”,“-“ strings, we need to implement two additional arithmetic operations in order to get our scholarship.

So, we implement two additional methods in our little arithmetic class:

public static decimal Multiply(decimal x, decimal y)
{
    return x * y;
}

public static decimal Divide(decimal x, decimal y)
{
    return x / y; //Since we are using decimals instead of doubles, this can produce divide by zero exception, but error handling is out of scope of this test.
} 

But obviously that’s not enough. We have to add two another cases in our switch statement:

case "×":
    label1.Text = Calc.Multiply(numericUpDown1.Value, numericUpDown2.Value).ToString();
    break;

case "÷":
    label1.Text = Calc.Divide(numericUpDown1.Value, numericUpDown2.Value).ToString();
    break; 

and two another symbols in the combobox Items collection, so the WinForms Designer generated piece of code looks like this:

this.comboBox1.Items.AddRange(new object[] 
{
    "+",
    "-",
    "×",
    "÷"
}); 

We must also bear in mind changing used method names if we copy/paste the existing piece of code, it can be a real hell if we have to add multiple methods or if there are multiple usages.

Reflection approach – enumerate class methods and let user choose one of them

So, how can we do it better?

Instead of using evil switch statement we use Reflection.

First, we discover our arithmetic methods and populate the combobox with them – literally, the combobox items are the System.Reflection.MethodInfo objects which can be directly invoked. This can be done in form constructor or in form Load event handler:

private void Form1_Load(object sender, EventArgs e)
{
    //Populate the combobox.
    comboBox2.DataSource = typeof(Calc).GetMethods(BindingFlags.Static | BindingFlags.Public).ToList();

    //Set some friendly display name for combobox items.
    comboBox2.DisplayMember = nameof(MethodInfo.Name);
} 

Then in the “=” button Click event handler we just have to retrieve the System.Reflection.MethodInfo from selected combobox item and simply call its Invoke method which will directly invoke desired method:

private void button2_Click(object sender, EventArgs e)
{
    label2.Text =                              //Set the result directly in the label's Text property.
        ((MethodInfo)comboBox2.SelectedValue)? //Direct cast the selected value to System.Reflection.MethodInfo,
                                               //we have the control over what is in the combobox, so we are not expecting any surprises.
        .Invoke(                               //Invoke the method.
            null,                              //Null for static method invoke.
            new object[] { numericUpDown3.Value, numericUpDown4.Value }) //Pass parameters in almost the same way as in normal call,
                                                                         //except that we must pack them into the object[] array.
        .ToString();
} 

That’s all. We reduced whole switch, designer populated list with symbols and magic strings to 3 simple lines.

Not only the code is smaller, its refactoring tolerance is greatly improved. Now, when lecturer says that we need to implement two another arithmetic operations, we simply add the static methods to our arithmetic class (as stated before) and we’re done.

However, this approach has a hazard. The usage in button2
Click event handler assumes that all arithmetic methods in Calc class accept only two parameters of type that is implicitly convertible from decimal.

In other words, if someone add a method that processes 3 numbers, the operation represented by this method will show up in the combobox, but attempt to execute it will result in target invocation exception.

But if we tried to use it in the previous approach, we would get the error as well. The difference is that it would be a compile time error saying that we are trying to use method that accepts 3 parameters but we have provided only 2 arguments.

That’s one advantage of static typing, the compiler upholds the code integrity and the IntelliSense will inform us about any misusages long before we even hit the compile button. When we are using dynamic invoke, we must predict any exceptions and handle them ourselves.

Conclusion

The examples above describe how the Reflection mechanism can improve code maintainability and robustness. They also show that Using Reflection, though it’s handy, lays more responsibility on the developer as checking for compatibility is moved from design time to run time and the compiler will no longer lend its hand when invoking dynamically typed members.

In this article we learned how can we use the Reflection to retrieve data from properties and how to invoke methods that are now explicitly specified in design time – they can change over time but the code that are using them will remain unchanged.

In the next article I’ll show you more advanced usages of the Reflection such as storing data, accessing private members and instantiating classes on some interesting examples like Building An Object From Database or Implementing Plugin System. You will also eventually find out why Reflection is like violence

Pragmatic or spontaneous – What are the most common personal qualities in IT-job ads?

Open Data Analytics

At Findwise we regularly have companywide hackathons with different themes. The latest theme was insights in open data, which I personally find very interesting.

Our group chose to fetch data from the Arbetsförmedlingen (Swedish Employment Agency), where ten years of job ads are available. There are about 4 million job ads in total during this time-period, so there is some material to analyze.

To make it easier to enable ad hoc analysis, we started off by extracting competences and personal traits mentioned in the job ads. This would allow us to spot trends in competences over time, in different regions or correlate competences and trait. Lots of possibilities.

 

Personal qualities and IT competences

As an IT-ninja I find it more exciting to focus on jobs, competences and traits within the IT industry. A lot is happening, and it is easy for me to relate to this area, of course. A report from Almega suggests that there is a huge demand of competences within IT for the coming years and it brings up a lot of examples of lacking technical skills. What is rarely addressed is what personality types are connected to these specific competences. We’re able to answer this interesting question from our data:

 

What personal traits are common complementary to the competences that are in demand?

arbetsförmedlingen hack

Figure 1 – Relevant worktitles, competences and traits for the search term “big data”

 

The most wanted personal traits are in general “Social, driven, passionate, communicative”. All these results should of course be taken with a grain of salt, since a few staffing/general IT consulting companies are a big part of the number of job ads within IT. But we can also look at a single competence and answer the question:

 

What traits are more common with this competence than in general? (Making the question a bit more specific.)

Some examples of competences in demand are system architecture, support and JavaScript. The most outstanding traits for system architecture are sharp, quality orientated and experienced. It can always be discussed if experienced is a trait (although our model thoughts so) but it makes sense in any case since system architecture tend to be more common among senior roles. For support we find traits such as service orientated, happy and nice, which is not unexpected, Lastly, for job-ads needing javascript-competence, personal traits such as quality orientated, quality aware and creative are the most predominant.

 

Differences between Stockholm and Gothenburg

Or let’s have a look at geographical differences between Sweden’s two largest cities when it comes to personal qualities in IT-job ads. In Gothenburg there is a stronger correlation to the traits spontaneous, flexible and curious while Stockholm correlates with traits such as sharp, pragmatic and delivery-focused.

 

What is best suitable for your personality?

You could also look at it the other way around and start with the personal traits to see which jobs/competences are meant for you. If you are analytical then jobs as controller or accountant could be jobs for you. If you are an optimist, then job coach or guidance counselors seems to be a good fit. We created a small application where you can type in competences or personal traits and get suggested jobs in this way. Try it out here!

 

Lear more about Open Data Analytics

In addition, we’re hosting a breakfast seminar December 12th where we’ll use the open data from Arbetsförmedlingen to show a process of how to make more data driven decisions. More information and registration (the seminar will be held in Swedish)

 

Author: Henrik Alburg, Data Scientist

SharePoint optimized – part 2, Search power

Last week I wrote a post about how I fix CSOM code in order to accelerate whole query execution. Final result was not that bad though still not good enough:

  • 0.8s for fetching ~500 subsites
  • 6.5s for fetching ~900 subsites recursively for whole subsites hierarchy

My aim is to fetch whole subsites hierarchy within time that is reasonable to wait (1-2s total).

In this post I show you how to achieve it – we can fetch whole subsites hierarchy in less than 2s!

Continue reading

Summary from Enterprise Search and Discovery Summit 2017

This year at Enterprise Search and Discovery Summit, Findwise was represented by us – search experts Simon Stenström and Amelia Andersson. With over a thousand attendees at the event, we’ve enjoyed the company of many peers. Let’s stay in touch for inspiration and to create magic over the Atlantic – you know who you are!

Enterprise Search and Discovery 2017 - findwise experts

Amelia Andersson and Simon Stenström, search experts from Findwise

 

Back to the event: We opened the Enterprise Search-track with our talk on how you can improve your search solutions through taking several aspects of relevance into account. (The presentation can be found in full here, no video unfortunately). If you want to know more about how to improve relevancy feel free to contact us or download the free guide on Improved search relevancy.

A few themes kept reoccurring during the Enterprise Search-track; Machine learning and NLP, bots and digital assistants, statistics and logs and GDPR. We’ve summarized our main takeaways from these topics below.

 

Machine learning and NLP

Machine learning and NLP were the unchallenged buzzwords of the conference. Everybody wants to do it, some have already started working with it, and some provided products for working with it. Not a lot of concrete examples of how organizations are using machine learning were presented unfortunately, giving us the feeling that few organizations are there yet. We’re at the forefront!

 

Bots, QA systems and digital assistants

Everyone is walking around with Siri or Google assistant in their pocket, but still our enterprise search solutions don’t make use of it. Panels were discussing voice based search (TV remote controls that could search content on all TV channels to set the right channel, a demo om Amazon Alexa providing answers for simple procedures for medical treatments etc.) pointing out that voice-to-text is now working well enough (at least in English) to use in many mobile use cases.

But bots can of course be used without voice input. A few different examples of using bots in a dialog setting were showed. One of the most exciting demos showed a search engine powered bot that used facet values to ask questions to specify what information the user was looking for.

 

Statistics and logs

Collect logs! And when you’ve done that: Use them! A clear theme was how logs were stored, displayed and used. Knowledge managements systems where content creators could monitor how users were finding their information inspired us to consider looking at dashboard for intranet content creators as well. If we can help our content creators understand how their content is found, maybe they are encouraged to use better metadata or wordings or to create information that their users are missing.

 

GDPR

Surprisingly, GDPR is not only a “European thing”, but will have a global impact following the legislation change in May. American companies will have to look at how they handle the personal information of their EU customers. This statement took many attendees by surprise and there were many worried questions on what was considered non-compliant of GDPR.

 

We’ve had an exciting time in Washington and can happily say that we are able bring back inspiration and new experience to our customers and colleagues at Findwise. On the same subject, a couple of weeks ago some or our fellow experts at Findwise wrote the report “In search for Insight”, addressing the new trends (machine learning, NLP etc) in Enterprise Search. Make sure to get your copy of the report if you are interested in this area.

Most of the presentations from Enterprise Search and Discovery Summit can be found here.

 

AuthorsAmelia Andersson and Simon Stenström, search experts from Findwise

SharePoint optimized – part 1, CSOM calls

Intranet home page should contains all information that are needed in daily manner. In fact many companies use home page as a traffic node where everybody comes just to find a navigation link pointing to another part of intranet. In my current company, Findwise, we do that too. However one of our components that allows us to quickly navigate through intranet sites gets slower and slower year by year. Currently it’s loading time is almost 10 seconds! I decided to fix it or even rebuild it if needed. Especially that few weeks ago on ShareCon 365 conference I talked about SharePoint Framework in Search Driven Architecture where I described the customer case, PGNIG Termika, who saved about 600k PLN (~$165.000) per year thanks to their information accessibility improvements (information time access dropped from 5-10 minutes to 1-2 seconds).

In this post I wanted to show you what was the problem, how I fixed it and how my fix cuts the component loading time 6 times!

Continue reading