Trials & Jubilations: the two sides of the GDPR coin

We have all heard about the totally unhip GDPR and the potential wave of fines and lawsuits. The long arm of the law and it’s stick have been noted. Less talked about but infinitely more exciting is the other side. Turn over the coin and there’s a whole A-Z of organisational and employee carrots. How so?

Sign up to the joint webinar the 18th of April 3PM CET with Smartlogic & Findwise, to find out more.

https://flic.kr/p/fJD1eA

Signal Tools

We all leave digital trails behind us, trails about us. Others that have access to these trails can use our data and information. The new European General Data Protection Regulation (GDPR) intends the usage of such Personal Identifiable Information (PII) to be correct and regulated, with the power to decide given to the individual.

Some organisations are wondering how on earth they can become GDPR compliant when they already have a business to run. But instead of a chore, setting a pathway to allow for some more principled digital organisational housekeeping can bring big organisational gains sooner rather than later.

Many enterprises are now beginning to realise the extra potential gains of having introduced new organisational principles to become compliant. The initial fear of painful change soon subsides when the better quality data comes along to make business life easier. With the further experience of new initiatives from new data analysis, NLP, deep learning, AI, comes the feeling:  why we didn’t we just do this sooner?

Most organisations have a system(s) in place holding PII data, even if getting the right data out in the right format remains problematical. The organisation of data for GDPR compliance can be best achieved so that it becomes transformed to be part of a semantic data layer. With such a layer, knowing all the related data from different sources you have on Joe Bloggs becomes so much easier when he asks for a copy of the data you have about him. Such a semantic data layer will also bring other far-reaching and organisation-wide benefits.

Semantic Data Layer

Semantic Data Layer

For example, heterogeneous data in different formats and from different sources can become unified for all sorts of new smart applications, new insights and new innovation that would have been previously unthinkable. Data can stay where it is… no need to change that relational database yet again because of a new type of data. The same information principles and technologies involved in keeping an eye on PII use, can also be used to improve processes or efficiencies and detect consumer behaviour or market changes.

But it’s not just the business operations that benefit, empowered employees become happier having the right information at hand to do their job. Something that is often difficult to achieve, as in many organisations, no one area “owns” search, making it is usually somebody else’s problem to solve. For the Google-loving employee, not finding stuff at work to help them in their job can be downright frustrating. Well ordered data (better still in a semantic layer) can give them the empowering results page they need. It’s easy to forget that Google only deals with the best structured and linked documentation, why shouldn’t we do the same in our organisations?

Just as the combination of (previously heterogeneous) datasets can give us new insights for innovation, we also observe that innovation increasingly comes in the form of external collaboration. Such collaboration of course increases the potential GDPR risk through data sharing, Facebook being a very current point in case. This brings in the need for organisational policy covering data access, the use and handling of existing data and any new (extra) data created through its use. Such policy should for example cover newly created personal data from statistical inference analysis.

While having a semantic layer may in fact make human error in data usage potentially more possible through increased access, it also provides a better potential solution to prevent misuse as metadata can be baked into the data to classify both information “sensitivity” and control user accessibility rights.

So how does one start?

The first step is to apply some organising principles to any digital domain, be it in or outside the corporate walls [the discipline of organising, Robert Gluschko] and to ask the key questions:

  1. What is being organised?
  2. Why is it being organised?
  3. How much of it is being organised?
  4. When is it being organised?
  5. Where is it being organised?

Secondly start small, apply organising principles by focusing on the low-hanging fruit: the already structured data within systems. The creation of quality data with added metadata in a semantic layer can have a magnetic effect within an organisation (build that semantic platform and they will come).

Step three: start being creative and agile.

A case story

A recent case, within the insurance industry reveals some cues to why these set of tools will improve signals and attention for becoming more compliant with regulations dealing with PII. Our client knew about a set of collections (file shares) where PII might be found. Adding search, and NLP/ML opened up the pandoras box with visual analytic tools. This is the simple starting point, finding i.e names or personal number concepts in the text. Second to this will be to add semantics, where industry standard terminologies and ontologies can further help define the meaning of things.

In all corporate settings, there exist both well-cultivated and governed collections of information resources, but usually also a massive unmapped terrain of content collections, where no one has a clue if there might be PII hidden amongst it. The strategy using a semantic data layer should always be combined with operations to narrowing down the collections to become part of the signalling system – it is generally not a good idea to boil the whole-data-ocean in the enterprise information environment. Rather through such work practices, workers are aware of the data hot-spots, the well-cultivated collections of information and that unmapped terrain. Having the additional notion of PII to contend with will make it that just bit easier to recognise those places where semantic enhancement is needed.

not a good idea to boil the whole-data-ocean

Running with the same pipeline (with the option of further models to refine and improve certain data) will not only allow for the discovery of multiple occurrences of named entities (individuals) but also the narrative and context in which they appear.
Having a targeted model & terminology for the insurance industry will only go to improve this semantic process further. This process can certainly ease what may be currently manual processes or processes that don’t exist because of their manual pain: for example, finding sensitive textual information from documents within applications or from online textual chats. Developing such a smart information platform enables the smarter linking of other things from the model, such as service packages, service units / or organisational entities, spatial data as named places or timelines, or medical treatments, things perhaps currently you have less control over.

There’s not much time before the 25th May and the new GDPR, but we’ll still be here afterwards to help you with a compliance burden or a creative pathway, depending on your outlook.

Alternatively sign up to the joint webinar the 11th of April 3PM CET with Smartlogic & Findwise, to find out more.

View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Peter Voisey's LinkedIn profilePeter Voisey
View James Morris's LinkedIn profileJames Morris

Major highlights from Elastic{ON} 2018 – Findwise reporting

Two Elastic fans have just returned from San Francisco and the Elastic{ON} 2018 conference. With almost 3.000 participants this year Elastic{ON} is the biggest Elastic conference in the world.

Findwise regularly organises events and meetups, covering among other topics Elastic. Keep an eye for an event close to you.

Here are some of the main highlights from Elastic{ON} 2018.

Let’s start with the biggest announcement of them all, Elastic is opening the source code of the XPack. This mean that you now not only will be able to access the Elastic stack source code, but also the subscription-based code of XPack that up until now have been inaccessible. This opens the opportunity for you as a developer to contribute back code.

news-elasticon-2018

 

Data rollups is a great new feature for anyone with the need to look at old data but feel the storage costs are too high. With rollups only predetermined metrics and terms will be stored. Still allowing you to analyze these dimensions of your data but no longer being able to view the individual documents.

Azure monitoring available in Xpack Basic. Elastic will in an upcoming 6.x release an Azure Monitoring Module, which will consist of a bundle of Kibana dashboards and make it really easy to get started exploring your Azure infrastructure. The monitoring module will be released as part of the XPack basic version – in other words, it will be free to use.

Forecasting was the big new thing in X-packs Machine learning component. As the name suggest the machine learning module can now not only spot anomalies in your data but also predict how it will change in the future.

Security in Kibana will get an update to make it work more like the Security module in Elasticsearch. This will also mean that one of the most requested security questions for Kibana will be resolved, giving users access to only some dashboards.

Dashboard are great and a fundamental part of Kibana but sometimes you want to present your data in more dynamic ways with less focus on data density. This is where Canvas comes in. Canvas is a new Kibana module to produce infographics rather than dashboards but still using live data from Elasticsearch.

Monitoring of Kubernetes and Docker containers will be made a lot easier with the Elastic stack. A new infra component will be created just for this growing use case. This component will be powered by data collected by Beats which now also has an auto discovery functionality within Kubernetes. This will give an overview of not only your Kubernetes cluster but also the individual containers within the cluster.

Geo capabilities within Kibana will be extended to support multiple map layers. This will make it possible to do more kinds of visualizations on maps. Furthermore, work is being done on supporting not only Geo points but also shapes.

One problem some have had with maps is that you need access to the Elastic map service and if you deploy the Elastic stack within a company network this might not be reachable. To solve this work is being done to make it possible to deploy the Elastic maps service locally.

Elastic acquired SaaS solution Swiftype last year. Since then Swiftype have been busy developing even more features to its portfolio. At current Swiftype comes in 3 different version:

  • Swiftype site Search – An out of the box (OOTB) solution for website search
  • Swiftype Enterprise Search – Currently in beta version, but with focus on internal, cloud based datasources (for now) like G Suite, Dropbox, O365, Zendesk etc.
  • Swiftype App Search – A set of API’s and developer tools that makes it quick to build user faced search applications

 

Elastic has also started to look at replacing the Zen protocol used to keep clusters in sync. Currently a PoC is being made to try to create a consensus algorithm that follow modern academic best practices. With the added benefit to remove the minimum master nodes setting, currently one of the most common pitfalls when running Elasticsearch in production.

ECE – Elastic Cloud Enterprise is big focus for Elastic and make it possible for customers to setup a fully service-based search solution being maintained by Elastic.

If you are interested in hearing more about Elastic or Findwise visit https://findwise.com/en/technology/elastic-elasticsearch

elasticon 2018

 

Writers: Mads Elbrond, regional manager Findwise Denmark & Torsten Landergren, senior expert consultant

XRANK in SharePoint Search REST API

I work with SharePoint Search from some time now. Since many clients need assistance on Search optimization KQL is one of my best mates. Especially XRANK is very powerful function that leverage KQL capabilities but also enlarge its complexity. Anyway I feel quite sure about what we can achieve using KQL and how. However last week a colleague of mine asked me about what is proper syntax of XRANK in REST search query…and I was like “emmm…”.

There are many not obvious questions – which characters need to be encoded? Is the syntax the same as in common KQL query?

I did quick documentation check as well as googling for an answer but there was no satisfying results at all (if there is no answer in Stack Overflow the web contains no answer).

So this post is about clarification for XRANK syntax in REST API calls.

Use Search Query Tool

The old sentence says “Do not break open doors”. That’s why I did not investigate topic by myself trying different REST queries to SP Search. Instead I used great great great tool called Search Query Tool. It really makes your work with search easier and faster. You can build any kind of KQL query in it and it will be translated to REST query because it uses it to communicate with SharePoint.

So for instance if you want to execute following KQL query

*  XRANK(cb=1) Position:Manager

Its REST equivalent will be:

<SearchEndpointURL>?querytext=’*+XRANK(cb%3d1)+Position:Manager’

As you can see syntax is the same as in common KQL query however ‘=’ character has been encoded to URI format in order to be properly understood by browser and endpoint and any spaces has been replaced by “+”.

Complex XRANK queries

Remember that in order to build you must remember about proper use of parenthesis. For instance if you want to make multiple XRANK boosts you need to arrange them in following way:

(SearchQuery XRANK(cb=1) condition1) XRANK(cb=1) condition2

In other words, if you want to add boosting for position AND for date freshness your KQL will look like below:

(* XRANK(cb=1) Position:Manager) XRANK(cb=0.5) LastModifiedTime>{Today-365}

and your REST query text will be like following:

querytext='(*+XRANK(cb%3d1)+Position:Manager)+XRANK(cb%3d0.5)+LastModifiedTime>{Today-30}’

which gives you following results:

  • results older than 30 days and for person that position does not contain “Manager” in its name will get 0 ranking points
  • results modified less than 30 days ago and for person that position does contain “Manager” in its name will get 0.5 ranking points
  • results older than 30 days and for person that position does contain “Manager” in its name will get 1 ranking points
  • results modified less than 30 days ago and for person that position does not contain “Manager” in its name will get 1.5 ranking points

 

Hope it helps you in using XRANK and KQL in REST API queries.

 

Thanks & have a great day!

How to execute ANY SharePoint powershell command programmatically using C#

In one of my projects my team faced following challenges:

  • How to add query rules programmatically using C#
  • How to update thesaurus programmatically using C#

I tried to find information in official documentation but it was not very helpful neither was googling.

Powershell cmdlets to c# assembly mapping

In my team we were thinking what to do in this situation and one of my colleagues came with brilliant idea – he searched for PowerShell cmdlet in file explorer with searching in files content option turned on.

Result? What he found was exactly what we were looking for.

In location “C:\Program Files\Common Files\microsoft shared\Web Server Extensions\16\CONFIG\PowerShell\Registration” there is file named OSSSearchCmdlets.xml.

What it contains is xml structure with following structure:

<ps:Cmdlet>

<ps:VerbName>Get-SPEnterpriseSearchCrawlContentSource</ps:VerbName>

<ps:ClassName>Microsoft.Office.Server.Search.Cmdlet.GetSearchCrawlContentSource</ps:ClassName>

<ps:HelpFile>Microsoft.Office.Server.Search.dll-help.xml</ps:HelpFile>

</ps:Cmdlet>

 

My eyes see this just as below:

<PowershellToAssembllyMapping>

<PowerShellCmdName>What-I-Have</PowerShellCmdName>

<C#NameAndLocation>What-I-Am-Looking-For</C#NameAndLocation>

<Whatever>Whatever.xml</Whatever>

</PowershellToAssembllyMapping>

Maps for Search, WSS and many more

OSSSearchCmdlets.xml file contains ps cmdlets to .NET assemblies mapping only for SharePoint Search.

But in the same location there is also another file called WSSCmdlet.xml that contains all kind of cmdlets mapping like

  • Enable-SPFeature
  • New-SPContentDatabase
  • Get-SPFarm
  • Etc.

Shortly everything that you can do with SharePoint Application using PowerShell.

 

If you just want to quickly check what those files contains I’ve uploaded them to my github. I put there also more files like for Reporting Services, Workflows etc. You can check it here.

Have you found useful this tip? Maybe you know alternative way? Share it in comments!

Thanks & Have a great day! 🙂

Pragmatic or spontaneous – What are the most common personal qualities in IT-job ads?

Open Data Analytics

At Findwise we regularly have companywide hackathons with different themes. The latest theme was insights in open data, which I personally find very interesting.

Our group chose to fetch data from the Arbetsförmedlingen (Swedish Employment Agency), where ten years of job ads are available. There are about 4 million job ads in total during this time-period, so there is some material to analyze.

To make it easier to enable ad hoc analysis, we started off by extracting competences and personal traits mentioned in the job ads. This would allow us to spot trends in competences over time, in different regions or correlate competences and trait. Lots of possibilities.

 

Personal qualities and IT competences

As an IT-ninja I find it more exciting to focus on jobs, competences and traits within the IT industry. A lot is happening, and it is easy for me to relate to this area, of course. A report from Almega suggests that there is a huge demand of competences within IT for the coming years and it brings up a lot of examples of lacking technical skills. What is rarely addressed is what personality types are connected to these specific competences. We’re able to answer this interesting question from our data:

 

What personal traits are common complementary to the competences that are in demand?

arbetsförmedlingen hack

Figure 1 – Relevant worktitles, competences and traits for the search term “big data”

 

The most wanted personal traits are in general “Social, driven, passionate, communicative”. All these results should of course be taken with a grain of salt, since a few staffing/general IT consulting companies are a big part of the number of job ads within IT. But we can also look at a single competence and answer the question:

 

What traits are more common with this competence than in general? (Making the question a bit more specific.)

Some examples of competences in demand are system architecture, support and JavaScript. The most outstanding traits for system architecture are sharp, quality orientated and experienced. It can always be discussed if experienced is a trait (although our model thoughts so) but it makes sense in any case since system architecture tend to be more common among senior roles. For support we find traits such as service orientated, happy and nice, which is not unexpected, Lastly, for job-ads needing javascript-competence, personal traits such as quality orientated, quality aware and creative are the most predominant.

 

Differences between Stockholm and Gothenburg

Or let’s have a look at geographical differences between Sweden’s two largest cities when it comes to personal qualities in IT-job ads. In Gothenburg there is a stronger correlation to the traits spontaneous, flexible and curious while Stockholm correlates with traits such as sharp, pragmatic and delivery-focused.

 

What is best suitable for your personality?

You could also look at it the other way around and start with the personal traits to see which jobs/competences are meant for you. If you are analytical then jobs as controller or accountant could be jobs for you. If you are an optimist, then job coach or guidance counselors seems to be a good fit. We created a small application where you can type in competences or personal traits and get suggested jobs in this way. Try it out here!

 

Lear more about Open Data Analytics

In addition, we’re hosting a breakfast seminar December 12th where we’ll use the open data from Arbetsförmedlingen to show a process of how to make more data driven decisions. More information and registration (the seminar will be held in Swedish)

 

Author: Henrik Alburg, Data Scientist

SharePoint optimized – part 2, Search power

Last week I wrote a post about how I fix CSOM code in order to accelerate whole query execution. Final result was not that bad though still not good enough:

  • 0.8s for fetching ~500 subsites
  • 6.5s for fetching ~900 subsites recursively for whole subsites hierarchy

My aim is to fetch whole subsites hierarchy within time that is reasonable to wait (1-2s total).

In this post I show you how to achieve it – we can fetch whole subsites hierarchy in less than 2s!

Continue reading

Summary from Enterprise Search and Discovery Summit 2017

This year at Enterprise Search and Discovery Summit, Findwise was represented by us – search experts Simon Stenström and Amelia Andersson. With over a thousand attendees at the event, we’ve enjoyed the company of many peers. Let’s stay in touch for inspiration and to create magic over the Atlantic – you know who you are!

Enterprise Search and Discovery 2017 - findwise experts

Amelia Andersson and Simon Stenström, search experts from Findwise

 

Back to the event: We opened the Enterprise Search-track with our talk on how you can improve your search solutions through taking several aspects of relevance into account. (The presentation can be found in full here, no video unfortunately). If you want to know more about how to improve relevancy feel free to contact us or download the free guide on Improved search relevancy.

A few themes kept reoccurring during the Enterprise Search-track; Machine learning and NLP, bots and digital assistants, statistics and logs and GDPR. We’ve summarized our main takeaways from these topics below.

 

Machine learning and NLP

Machine learning and NLP were the unchallenged buzzwords of the conference. Everybody wants to do it, some have already started working with it, and some provided products for working with it. Not a lot of concrete examples of how organizations are using machine learning were presented unfortunately, giving us the feeling that few organizations are there yet. We’re at the forefront!

 

Bots, QA systems and digital assistants

Everyone is walking around with Siri or Google assistant in their pocket, but still our enterprise search solutions don’t make use of it. Panels were discussing voice based search (TV remote controls that could search content on all TV channels to set the right channel, a demo om Amazon Alexa providing answers for simple procedures for medical treatments etc.) pointing out that voice-to-text is now working well enough (at least in English) to use in many mobile use cases.

But bots can of course be used without voice input. A few different examples of using bots in a dialog setting were showed. One of the most exciting demos showed a search engine powered bot that used facet values to ask questions to specify what information the user was looking for.

 

Statistics and logs

Collect logs! And when you’ve done that: Use them! A clear theme was how logs were stored, displayed and used. Knowledge managements systems where content creators could monitor how users were finding their information inspired us to consider looking at dashboard for intranet content creators as well. If we can help our content creators understand how their content is found, maybe they are encouraged to use better metadata or wordings or to create information that their users are missing.

 

GDPR

Surprisingly, GDPR is not only a “European thing”, but will have a global impact following the legislation change in May. American companies will have to look at how they handle the personal information of their EU customers. This statement took many attendees by surprise and there were many worried questions on what was considered non-compliant of GDPR.

 

We’ve had an exciting time in Washington and can happily say that we are able bring back inspiration and new experience to our customers and colleagues at Findwise. On the same subject, a couple of weeks ago some or our fellow experts at Findwise wrote the report “In search for Insight”, addressing the new trends (machine learning, NLP etc) in Enterprise Search. Make sure to get your copy of the report if you are interested in this area.

Most of the presentations from Enterprise Search and Discovery Summit can be found here.

 

AuthorsAmelia Andersson and Simon Stenström, search experts from Findwise

SharePoint optimized – part 1, CSOM calls

Intranet home page should contains all information that are needed in daily manner. In fact many companies use home page as a traffic node where everybody comes just to find a navigation link pointing to another part of intranet. In my current company, Findwise, we do that too. However one of our components that allows us to quickly navigate through intranet sites gets slower and slower year by year. Currently it’s loading time is almost 10 seconds! I decided to fix it or even rebuild it if needed. Especially that few weeks ago on ShareCon 365 conference I talked about SharePoint Framework in Search Driven Architecture where I described the customer case, PGNIG Termika, who saved about 600k PLN (~$165.000) per year thanks to their information accessibility improvements (information time access dropped from 5-10 minutes to 1-2 seconds).

In this post I wanted to show you what was the problem, how I fixed it and how my fix cuts the component loading time 6 times!

Continue reading

Microsoft Ignite 2017 – from a Search and Findability perspective

Microsoft Ignite – the biggest Microsoft conference in the world. 700+ sessions, insights and roadmaps from industry leaders, and deep dives and live demos on the products you use every day. And yes, Findwise was there!

But how do you summarize a conference with more than 700 different sessions?

Well – you focus on one subject (search and findability in this case) and then you collaborate with some of the most brilliant and experienced people around the world within that subject. Add a little bit of your own knowledge – and the result is this Podcast.

Enjoy!

Expert Panel Shares Highlights and Opportunities in Microsoft’s Latest Announcements

microsoft ignite podcast findwise

Do you want to know more about Findwise and Microsoft? Find our how you can make SharePoint and Office 365 more powerful than ever before.

 

Sharepoint Framework in SharePoint Search Driven Architecture

On 16.10.2017 I had a privilege to be one of speakers on ShareCon365. I had technical speech where I showed how to make Sharepoint Framework (SPFx) apps in Search Driven Architecture. If you were on my speech you are probably interested in materials which you can find here: My presentation materials.

If you were not…than keep reading 🙂

Continue reading