Well-known findability challenges in the AI-hype

Organisations are facing new types of information challenges in the AI-hype. At least the use cases, the data and the technology are different. The recommended approach and the currently experienced findability challenges remains however the same.

Findability is getting worse as the data landscape is changing

As clearly shown in the result of the 2019 Search & Findability Survey, finding relevant information is still a major challenge to most organisations. In the internal context as many as 55% find it difficult or very difficult to find information which brings us back to same levels that we saw in the survey results from 2012 and 2013.

Given the main obstacles that the respondents experience to improve search and findability, this is not very surprising:

  • Lack of resources/staff
  • Lack of ownership/mandate
  • Poor information quality

One reason behind the poor information quality might be the decreasing focus and efforts spent on traditional information management activities such as content life cycle, controlled vocabularies and metadata standards as illustrated in the below diagrams*. In 2015-16 we saw an increase in these activities which made perfect sense since “lack of tags” or “inconsistent tagging” was considered the largest obstacles for findability in 2013-2015. Unfortunately, the lack of attention to these areas don’t seem to indicate that the information quality has improved, rather the opposite.

(*percent working with the noted areas)

A likely reason behind the experienced obstacles and the lack of resources to improve search and findability is a shift of focus in data and metadata management efforts following the rapid restructuring of the data landscape. In the era of digital transformation, attention is rather on the challenge to identify, collect and store the massive amounts of data that is being generated from all sorts of systems and sensors, both within and outside the enterprise. As a result, it is no longer only unstructured information and documents that are hard to find but all sorts of data that are being aggregated in data lakes and similar data storage solutions.

Does this mean that search and findability of unstructured information is no longer relevant? No, but in combination with finding individual documents, the target groups in focus (typically Data Scientists) have an interest in finding relevant and related data(sets) from various parts of the organisation in order to perform their analysis.

Digital (or data-driven) transformation is often focused on utilising data in combination with new technology to reach level 3 and 4 in the below “pyramid of data-driven transformation” (from In search for insight):

This fact is also illustrated by the technology trends that we can see from the survey results and that is presented in the article “What are organisations planning to focus on to improve Search and Findability?”. Two of the most emerging technologies are Natural Language Processing (NLP) and Machine Learning which are both key components in what is often labelled as “AI”. To use AI to drive transformation has become the ultimate goal for many organisations.

However, as the pyramid clearly shows, to realise digital transformation, automation and AI, you must start by sorting out the mess. If not, the mess will grow by the minute, quickly turning the data lake into a swamp. One of the biggest challenges for organisations in realising digital transformation initiatives still lies in how to access and use the right data.  

New data and use cases – same approach and challenges

The survey results indicate that, irrespective of what type of data that you want to make useful, you need to take a holistic approach to succeed. In other words, if you want to get passed the POC-phase and achieve true digital transformation you must consider all perspectives:

  • Business – Identify the business challenge and form a common vision of the solution
  • User – Get to know your users and what it takes to form a successful solution
  • Information – Identify relevant data and make it meaningful and F.A.I.R.*
  • Technology – Evaluate and select the technology that is best fit for purpose
  • Organisation – Establish roles and responsibilities to manage and improve the solution over time

You might recognise the five findability dimensions that was originally introduced back in 2010 and that are more relevant than ever in the new data landscape. The survey results and the experienced obstacles indicate that the main challenges will remain and even increase within the dimensions of information and organisation.

Also, it is important to remember that to create value from information it is not always necessary to aim for the top of the pyramid. In many cases it is enough to extract knowledge and thereby provide better insights and decision support by aggregating relevant data from different sources. Given that the data quality is good enough that is.

*The strategy to get a sustainable data management, implies leaning upon the FAIR Data Principles

  1. Make data Findable, through persistent ID, rich metadata, indexes and combine id+index.
  2. Make data Accessible, through standard communication protocols, open and free protocols and authentication mechanism where necessary and always keep metadata available.
  3. Make data Interoperable, through the use of vocabularies, terminologies, glossaries, use open vocabularies/models and link the metadata.
  4. Finally make data Reusable, by using multiple metadata attributes, set constraints based on licenses, and express provenance to build trusted and quality datasets leaning upon community standards.

Author: Mattias Ellison, Findability Business Consultant

What are organisations planning to focus on to impove Search and Findability?

This year’s Search and Findability survey gave us a good indication of upcoming trends on the market. The activities and technologies that organisations are planning to start working with, are all connected to improving effectiveness. By using technology to automatically perform tasks, and by understanding the users’ needs and giving them a tailored search experience, there is a lot of potential to save time and effort. 

Top 5 activities organisations will focus in:

  • Natural language search interface, e.g. Query aid or chatbots (29%)
  • Personalisation e.g. tailored search experience (27%)
  • Automatic content tagging (24%)
  • Natural Language Processing, NLP (22%)
  • Machine Learning (20%)

The respondents planning to start working with one of these areas are more likely to be interested in, or are already working with, the other areas in the top 5. For example, out of the respondents saying that they are planning to use a natural language search interface, 44% are planning to start with personalisation as well. If you were to add the respondents already working with personalisation to that amount, it would increase by 75%. This might not be a big surprise since the different areas are much related to one another. A natural language search interface can support a tailored search experience, in other words – lead to personalisation. Automatic content tagging can be enabled by using techniques such as NLP and Machine Learning.

A Natural Language Search interface is a way of trying to find targeted answers to user questions. Instead of search based on keywords, the goal is to understand the question and generate answers with a higher relevancy. Since a large amount of the questions asked in an organisation are similar, you could save a lot of time by clustering and/or providing answers automatically using conversational UI. Learn more about Conversational UI.

One way to improve the Natural Language Search interface is by using Natural Language Processing (NLP). The aim with NLP is to improve a computer’s speech recognition for example by interpreting synonyms and spelling mistakes. NLP started out as a rule-based technique which was manually coded, but the introduction of Machine Learning (ML) improved the technology further. By using statistical techniques, ML makes it possible to learn from data without having to manually program the computer system.  Read more about improving search with NLP.

Automatic content tagging is a trend that we see within the area of Information Management. Instead of relying on user created tags (of various quality) the tags are created automatically based on different patterns. The advantage of using automatic content tagging is that the metadata will be consistent and that the data will be easier to analyse.

Personalisation e.g. tailored search experience is a way to sort out information based on the user profile. Basically, search results are adapted to the user needs, for example by not showing things that the user do not have access to and promoting search results that the user frequently looks for. Our findings in this year’s survey, show that respondents saying they are currently working with personalisation consider that users on both the internal and extern site find information easier. Users that find the information they search for easily, tend to be more satisfied with the search solution.


Results from this year’s survey indicates that organisations are working with or planning to working with, AI and Cognitive-related techniques. The percentage doing so has grown compared to previous surveys.

Do you want to learn more about cognitive search

Author: Angelica Lahti, Findability Business Consultant

Comparison of two different methods for generating tree facets, with Elasticsearch and Solr

Let’s try to explain what a tree facet is, by starting with a common use case of a “normal” facet. It consists of a list of filters, each corresponding to a value of a common search engine field and a count representing the number of documents matching that value. The main characteristic of a tree facet is that its filters each may have a list of child filters, each of which may have a list of child filters, etc. This is where the “tree” part of its name comes from.

Tree facets are therefore well suited to represent data that is inherently hierarchical, e.g. a decision tree, a taxonomy or a file system.

Two commons methods of generating tree facets, using either Elasticsearch or Solr, are the pivot approach and the path approach. Some of the characteristics, benefits and drawbacks of each method are presented below.

While ordinary facets consist of a flat list of buckets, tree facets consist of multiple levels of buckets, where each bucket may have child buckets, etc. If applying a filter query equivalent to some bucket, all documents matching that bucket, or any bucket in that sub-tree of child buckets, are returned.

Tree facets with Pivot

The name is taken from Solr (Pivot faceting) and allows faceting within results of the parent facet. This is a recursive setting, so pivot faceting can be configured for any number of levels. Think of pivot faceting as a Cartesian product of field values.

A list of fields is provided, where the first element in the list will generate the root level facet, the second element will generate the second level facet, and so on. In Elasticsearch, the same result is achieved by using the more general concept of aggregations. If we take a terms aggregation as an example, this simply means a terms aggregation within a parent terms aggregation, and so on.

Benefits

The major benefit of pivot faceting is that it can all be configured in query time and the data does not need to be indexed in any specific way. E.g. the list of fields can be modified to change the structure of the returned facet, without having to re-index any content.

The values of the returned facet/aggregation are already in a structured, hierarchical format. There is no need for any parsing of paths to build the tree.

Drawbacks

The number of levels in the tree must be known at query time. Since each field must be specified explicitly, it puts a limit on the maximum depth of the tree. If the tree should be extended to allow for more levels, then content must be indexed to new fields and the query needs to include these new fields.

Pivot faceting assumes a uniformity in the data, in that the values on each level in the tree, regardless of their parent, are of the same types. This is because all values on some specific level comes from the same field.

When to use

At least one of the following statements hold:

  • The data is homogenous – different objects share similar sets of properties
  • The data will, structurally, not change much over time
  • There is a requirement on a high level of query time flexibility
  • There is a requirement on a high level of flexibility without re-indexing documents

Tree facets with Path

Data is indexed into a single field, on a Unix style file path format, e.g. root/middle/leaf (the path separator is configurable). The index analyzer of this field should be using a path hierarchy tokenizer (Elasticsearch, Solr). It will expand the path so that a filter query for some node in the tree will include the nodes in the sub-tree below the node. The example path above would be expanded to root, root/middle, root/middle/leaf. These represent the filter queries for which the document with this path should be returned. Note that the query analyzer should be keyword/string so that queries are interpreted verbatim.

Once the values have been indexed, a normal facet or terms aggregation is put on the field. This will return all possible paths and sub-paths, which can be a large number, so make sure to request all of them. Once facet/aggregation is returned, its values need to be parsed and built into a tree structure.

Benefits

The path approach can handle any number of levels in the tree, without any configuration explicitly stating how many levels there are, both on the indexing side and on the query side. It is also a natural way of handling different depths in different places in the tree, not all branches need to be the same length.

Closely related to the above-mentioned benefit, is the fact that the path approach does not impose any restrictions on the uniformity of the tree. Nodes on a specific level in the tree may represent different concepts, dependent only on their parent. This fits very well with many real-world applications, as different objects and entities have different sets of properties.

Drawbacks

Data must be formatted in index time. If any structural changes to the tree are required, affected documents need to be re-indexed.

To construct a full tree representation of the paths returned in the facet/aggregation, all paths need to be requested. If the tree is big, this can become costly, both for the search engines to generate and with respect to the size of the response payload.

Data is not returned in a hierarchical format and must be parsed to build the tree structure.

When to use

At least one of the following statements hold:

  • The data is heterogenous – different objects have different sets of properties, varying numbers of levels needed in different places in the tree
  • The data could change structurally over time
  • The content and structure of the tree should be controlled by content only, no configuration changes

Tree facets – Conclusion

The listed benefits and drawback of each method can be used as a guide to find the best method from case to case.

When there is no clear choice, I personally tend to go for the path approach, just because it is so powerful and dynamic. This comes with the main drawback of added cost of configuration for index time data formatting, but it is usually worth it in my opinion.

tree facets, data

Author: Martin Johansson, Senior Search Consultant at Findwise

Activate conference 2018

Opensource has won! Now, what about AI?

Grant Ingersoll is on stage at the opening of Activate18 explaining the reasoning behind changing the name.

The revolution is won, opensource won, search as a concept to reckon with, they all won.

The times I come across a new search project where someone is pushing anything but opensource search is few and far between these days.

Since Search has taken a turn towards AI, a merge with that topic seems reasonable, not to say obvious. But AI in this context should probably be interpreted as AI to support good search results. At least if judging from the talks I attended. Interesting steps forward is expert systems and similar, none which was extensively discussed as of my knowledge. A kind of system we work with at Findwise. For instance, using NLP, machine learning and text analytics to improve a customer service.

Among the more interesting talks I attended was Doug Turnbulls talk on Neural Search Frontier. Some of the matrix-math threw me back to a ANN-course I took 10 years ago. Way before I ever learned any matrix maths. Now, way post remembering any matrix math-course I ever took, it’s equally confusing, possibly on a bit higher level. But he pointed out interesting aspects and show conceptually how Word2Vec-vectors work and won’t work. Simon Hughes talk “Vectors in search – Towards more semantic matching” is in the same area but more towards actually using it.

Machine Learning is finally mainstream

If we have a look at the overall distribution of talks, I think it’s safe to say that almost all talks touched on machine learning in some way. Most commonly using Learning to Rank and Word2Vec. None of these are new techniques (Our own Mickaël Delaunay wrote a nice blog-post about how to use LTR for personalization a couple of years ago. They have been covered before to some extent but this time around we see some proper, big scale implementations that utilizes the techniques. Bloomberg gave a really interesting presentation on what their evolution from hand tuned relevance to LTR over millions of queries have been like. Even if many talks were held on a theoretical/demo-level it is now very clear. It’s fully possible and feasible to build actual, useful and ROI-reasonable Machine Learning into your solutions.

As Trey Grainer pointed out, there are different generations of this conference. A couple of years ago Hadoop were everywhere. Before that everything was Solr cloud. This year not one talk-description referenced the Apache elephant (but migration to cloud was still referenced, albeit not in the topic). Probably not because big data has grown out of fashion, even though that point was kind of made, but rather that we have other ways of handling and manage it these days.

Don’t forget: shit in > shit out!

And of course, there were the mandatory share of how-we-handle-our-massive-data-talks. Most prominently presented by Slack, all developers favourite tool. They did show a MapReduce offline indexing pipeline that not only enabled them to handle their 100 billion documents, but also gave them an environment which was quick on its feet and super suitable for testing new stuff and experimenting. Something an environment that size usually completely blocks due to re-indexing times, fear of bogging down your search-machines and just general sluggishness.

Among all these super interesting technical solutions to our problems, it’s really easy to forget that loads of time still have to be spent getting all that good data into our systems. Doing the groundwork, building connectors and optimizing data analysis. It doesn’t make for so good talks though. At Findwise we ususally do that using our i3-framework which enables you to ingest, process, index and query your unstructured data in a nice framework.activate 2018 solr lucid opensource

I now look forward to doing the not so ground work using inspiration from loads of interesting solutions here at Activate.

Thanks so much for this year!

The presentations from the conference are available on YouTube in Lucidworks playlist for Activate18.

Author and event participant: Johan Persson Tingström, Findability Expert at Findwise

Analytical power at your fingertips with natural language and modern visualisation

Today we are all getting used to interactive dashboards and plots in self-service business intelligence (BI) solutions to drill down and slice our facts and figures. The market for BI tools has seen an increased competition recently with Microsoft Power BI challenging proven solutions such as Tableau, Qlik, IBM Cognos, SAP Lumira and others. At the same time, it is hard to benchmark tools against each other as they all come with very similar features. Has the BI development saturated?

Compared to how we are used to consume graphics and information, the BI approach to interactive analysis is somewhat different. For instance: a dashboard or report is typically presented in a printer-oriented flat layout on white background, weeks of user training is typically needed before “self-service” can be reached, and interactions are heavily click-oriented – you could almost feel it in your mouse elbow when opening the BI frontend.

On the other hand, when surfing top internet sites and utilizing social media, our interactions are centred around the search box and the natural interface of typing or speaking. Furthermore, there is typically no training needed to make use of Google, Facebook, LinkedIn, Pinterest, Twitter, etc. Through an intuitive interface we learn along the way. And looking at graphics and visualization, we can learn a lot from the gaming industry where players are presented with well-designed artwork – including statistics presented in an intuitive way to maximize the graphical impression.

Take a look at this live presentation to see how a visiual analysis using natural language can look like. 

screenshot007040

Rethink your business analytics

It appears as if BI tools are sub optimized for a limited scope and use case. To really drive digitalization and make use of our full information potential, we need a new way of thinking for business analytics. Not just continuous development, rather a revolution to the business intelligence approach. Remember: e-mail was not a consequence of the continuous development of post offices and mail handling. We need to rethink business analytics.

At Findwise, we see that the future for business analytics involves:

  • added value by enriching information with new unstructured sources,
  • utilizing the full potential of visualization and graphics to explore our information,
  • using natural language to empower colleagues to draw their own conclusions intuitively and secure

 

Enrich data

There is a lot of talk about data science today; how we can draw conclusions from our data and make predictions about the future. This power largely depends on the value in the data we possess. Enriching data is all about adding new value. The enrichment may include a multitude of sources, internal and external, for instance:

  • detailed customer transaction logs
  • weather history and forecasts
  • geospatial data (locations and maps)
  • user tracking and streams
  • social media and (fake) news

Comparing with existing data, a new data source could be orthogonal to the existing data and add a completely new understanding. Business solutions of today are often limited to highly structured information sources or information providers. There is a large power in unstructured, often untouched, information sources. However, it is not as straight forward as launching a data warehouse integration, since big data techniques are required to handle the volume, velocity and variety.

At Findwise, utilizing the unstructured data has always been the key in developing unique solutions for search and analytics. The power of our solutions lies in incorporating multiple sources online and continuously enrich with new aspects. For this we even developed our own framework, i3, with over hundred connectors for unstructured data sources. A modern search engine (or insight engine) scales horizontally for big data applications and easily consumes billions of texts, logs, geospatial and other unstructured – as well as structured – data. This is where search meets analytics, and where all the enrichment takes place to add unique information value.

 

Visually explore

As human beings we have very strong visual and cognitive abilities, developed over millions of years to distinguish complex patterns and scenarios. Visualization of data is all about packaging information in such a way that we can utilize our cognitive skills to make sense out of the noise. Great visualization and interaction unleash the human power of perception and derivation. It allows us make sense out of the complex world around us.

When it comes to computer visualization, we have seen strong development in the use of graphical processors (GPUs) for games but recently also for analytics – not the least in deep learning where powerful GPUs solve heavy computations. For visualisation however, typical business intelligence tools today only use a minimal fraction of the total power of our modern devices. As a comparison: a typical computer game renders millions of pixels in 3D several times per second (even via the web browser). In a modern BI tool however, we may struggle to display 20 000 distinct points in a plot.

There are open standards and interfaces to fully utilize the graphical power of a modern display. Computer games often build on OpenGL  to interact with the GPU. In web browsers, a similar performance can be reached with WebGL and JavaScript libraries. Thus, this is not only about regular computers or installed applications, The Manhattan Population Explorer (built with JavaScript on D3.js and Mapbox GL JS) is a notable example of an interactive and visually appealing analysis application that very well runs on a regular smart phone.

price-over-time

Example from one of our prototypes: analysing the housing market – plotting 500 000 points interactively utilizing OpenGL.

Current analysis solutions and application built with advanced graphical analysis are typically custom made for a specific purpose and topic, as in the example above. This is very similar to how BI solutions were built before self-service BI came in to play – specific solutions hand crafted for a few use cases. In contrast to this, Open graphical libraries, incorporated as the core of visualizations, with inspiration from gaming art work, can spark a revolution to how we visually consume and utilize information.

screenshot014002

Natural language empowers

The process of interpreting and working with speech and text is referred to as Natural Language Processing (NLP). NLP interfaces are moving towards the default interface to interaction. For instance Google’s search engine can give you instant replies on questions such as “weather London tomorrow” and with Google Duplex (under development) NLP is used to automate phone calls making appointments for you.  Other examples include the search box popping up as a central feature on many larger web sites and voice services such as Amazon Alexa, Microsoft Cortana, Apple Siri, etc.

When it comes to analysis tools we have seen some movements in this direction lately. In Power BI Service (web) Cortana can be activated to allow for simple Q&A on your prepared reports. Tableau has started talking about NLP for data exploration with “research prototypes you might see in the not too distant future”. The clearest example in this direction is probably ThoughtSpot built with a search-driven analytics interface. Although for most of the business analytics carried out today, clicking is still in focus and clicking is what is being taught on trainings. How can this be, when our other interactions with information move towards natural language interfaces? The key to move forward is to give NLP and advanced visualization a vital role in our solutions, allowing for an entirely natural interface.

Initially it may appear hard to know exactly what to type to get the data right. Isn’t training needed also with an NLP interface? This is where AI comes in to help us interpret our requests and provide us with smart feedback. Having a look at Google again, we continuously get recommendations, automatic spelling correction and lookup of synonyms to optimize our search and hits. With a modern NLP interface, we learn along the way as we utilize it. Frankly speaking though, a natural language interface is best suited for common queries that aren’t too advanced. For more advanced data munging and customized analysis, a data scientist skillset and environment may well be needed. However, the power of e.g. Scientific Python or the R language could easily be incorporated into an NLP interface, where query suggestions turn into code completion. Scripting is a core part of the data science workflow.

An analytical interface built around natural language helps direct focus and fine-tunes your analysis to arrive at intuitive facts and figures, explaining relevant business questions. This is all about empowering all users, friends and colleagues to draw their own conclusions and spread a data-driven mentality. Data science and machine learning techniques fit well into this concept to leverage deeper insights.

 

Conclusion – Business data at everyone’s fingertips

We have highlighted the importance of enriching data with concern taken to unstructured data sources, demonstrated the importance of visual exploration to enable our cognitive abilities, and finally empowering colleagues to draw conclusions through a natural language interface.

Compared with the current state of the art for analysis and business intelligence tools, we stand before a paradigm shift. Standardized self-service tools built on clicking, basic graphics and the focus on structured data will be overrun by a new way of thinking analysis. We all want to create intuitive insights without the need of thorough training on how to use a tool. And we all want our insights and findings to be visually appealing. Seeing is believing. To communicate our findings, conclusions and decisions we need to show the why. Convincing. This is where advanced graphics and art will help us. Natural language is the interface we use for more and more services. It can easily be powered by voice as well. With a natural interface, anyone will learn to utilize the analytical power in the information and draw conclusions. Business data at everyone’s fingertips!

To experience our latest prototype where we demonstrate the concept of data enrichment, advanced visualization and natural language interfaces, take a look at this live presentation.

 

Author: Fredrik Moeschlin, senior Data Scientist at Findwise

Major highlights from Elastic{ON} 2018 – Findwise reporting

Two Elastic fans have just returned from San Francisco and the Elastic{ON} 2018 conference. With almost 3.000 participants this year Elastic{ON} is the biggest Elastic conference in the world.

Findwise regularly organises events and meetups, covering among other topics Elastic. Keep an eye for an event close to you.

Here are some of the main highlights from Elastic{ON} 2018.

Let’s start with the biggest announcement of them all, Elastic is opening the source code of the XPack. This mean that you now not only will be able to access the Elastic stack source code, but also the subscription-based code of XPack that up until now have been inaccessible. This opens the opportunity for you as a developer to contribute back code.

news-elasticon-2018

 

Data rollups is a great new feature for anyone with the need to look at old data but feel the storage costs are too high. With rollups only predetermined metrics and terms will be stored. Still allowing you to analyze these dimensions of your data but no longer being able to view the individual documents.

Azure monitoring available in Xpack Basic. Elastic will in an upcoming 6.x release an Azure Monitoring Module, which will consist of a bundle of Kibana dashboards and make it really easy to get started exploring your Azure infrastructure. The monitoring module will be released as part of the XPack basic version – in other words, it will be free to use.

Forecasting was the big new thing in X-packs Machine learning component. As the name suggest the machine learning module can now not only spot anomalies in your data but also predict how it will change in the future.

Security in Kibana will get an update to make it work more like the Security module in Elasticsearch. This will also mean that one of the most requested security questions for Kibana will be resolved, giving users access to only some dashboards.

Dashboard are great and a fundamental part of Kibana but sometimes you want to present your data in more dynamic ways with less focus on data density. This is where Canvas comes in. Canvas is a new Kibana module to produce infographics rather than dashboards but still using live data from Elasticsearch.

Monitoring of Kubernetes and Docker containers will be made a lot easier with the Elastic stack. A new infra component will be created just for this growing use case. This component will be powered by data collected by Beats which now also has an auto discovery functionality within Kubernetes. This will give an overview of not only your Kubernetes cluster but also the individual containers within the cluster.

Geo capabilities within Kibana will be extended to support multiple map layers. This will make it possible to do more kinds of visualizations on maps. Furthermore, work is being done on supporting not only Geo points but also shapes.

One problem some have had with maps is that you need access to the Elastic map service and if you deploy the Elastic stack within a company network this might not be reachable. To solve this work is being done to make it possible to deploy the Elastic maps service locally.

Elastic acquired SaaS solution Swiftype last year. Since then Swiftype have been busy developing even more features to its portfolio. At current Swiftype comes in 3 different version:

  • Swiftype site Search – An out of the box (OOTB) solution for website search
  • Swiftype Enterprise Search – Currently in beta version, but with focus on internal, cloud based datasources (for now) like G Suite, Dropbox, O365, Zendesk etc.
  • Swiftype App Search – A set of API’s and developer tools that makes it quick to build user faced search applications

 

Elastic has also started to look at replacing the Zen protocol used to keep clusters in sync. Currently a PoC is being made to try to create a consensus algorithm that follow modern academic best practices. With the added benefit to remove the minimum master nodes setting, currently one of the most common pitfalls when running Elasticsearch in production.

ECE – Elastic Cloud Enterprise is big focus for Elastic and make it possible for customers to setup a fully service-based search solution being maintained by Elastic.

If you are interested in hearing more about Elastic or Findwise visit https://findwise.com/en/technology/elastic-elasticsearch

elasticon 2018

 

Writers: Mads Elbrond, regional manager Findwise Denmark & Torsten Landergren, senior expert consultant

Pragmatic or spontaneous – What are the most common personal qualities in IT-job ads?

Open Data Analytics

At Findwise we regularly have companywide hackathons with different themes. The latest theme was insights in open data, which I personally find very interesting.

Our group chose to fetch data from the Arbetsförmedlingen (Swedish Employment Agency), where ten years of job ads are available. There are about 4 million job ads in total during this time-period, so there is some material to analyze.

To make it easier to enable ad hoc analysis, we started off by extracting competences and personal traits mentioned in the job ads. This would allow us to spot trends in competences over time, in different regions or correlate competences and trait. Lots of possibilities.

 

Personal qualities and IT competences

As an IT-ninja I find it more exciting to focus on jobs, competences and traits within the IT industry. A lot is happening, and it is easy for me to relate to this area, of course. A report from Almega suggests that there is a huge demand of competences within IT for the coming years and it brings up a lot of examples of lacking technical skills. What is rarely addressed is what personality types are connected to these specific competences. We’re able to answer this interesting question from our data:

 

What personal traits are common complementary to the competences that are in demand?

arbetsförmedlingen hack

Figure 1 – Relevant worktitles, competences and traits for the search term “big data”

 

The most wanted personal traits are in general “Social, driven, passionate, communicative”. All these results should of course be taken with a grain of salt, since a few staffing/general IT consulting companies are a big part of the number of job ads within IT. But we can also look at a single competence and answer the question:

 

What traits are more common with this competence than in general? (Making the question a bit more specific.)

Some examples of competences in demand are system architecture, support and JavaScript. The most outstanding traits for system architecture are sharp, quality orientated and experienced. It can always be discussed if experienced is a trait (although our model thoughts so) but it makes sense in any case since system architecture tend to be more common among senior roles. For support we find traits such as service orientated, happy and nice, which is not unexpected, Lastly, for job-ads needing javascript-competence, personal traits such as quality orientated, quality aware and creative are the most predominant.

 

Differences between Stockholm and Gothenburg

Or let’s have a look at geographical differences between Sweden’s two largest cities when it comes to personal qualities in IT-job ads. In Gothenburg there is a stronger correlation to the traits spontaneous, flexible and curious while Stockholm correlates with traits such as sharp, pragmatic and delivery-focused.

 

What is best suitable for your personality?

You could also look at it the other way around and start with the personal traits to see which jobs/competences are meant for you. If you are analytical then jobs as controller or accountant could be jobs for you. If you are an optimist, then job coach or guidance counselors seems to be a good fit. We created a small application where you can type in competences or personal traits and get suggested jobs in this way. Try it out here!

 

Lear more about Open Data Analytics

In addition, we’re hosting a breakfast seminar December 12th where we’ll use the open data from Arbetsförmedlingen to show a process of how to make more data driven decisions. More information and registration (the seminar will be held in Swedish)

 

Author: Henrik Alburg, Data Scientist

Summary from Enterprise Search and Discovery Summit 2017

This year at Enterprise Search and Discovery Summit, Findwise was represented by us – search experts Simon Stenström and Amelia Andersson. With over a thousand attendees at the event, we’ve enjoyed the company of many peers. Let’s stay in touch for inspiration and to create magic over the Atlantic – you know who you are!

Enterprise Search and Discovery 2017 - findwise experts

Amelia Andersson and Simon Stenström, search experts from Findwise

 

Back to the event: We opened the Enterprise Search-track with our talk on how you can improve your search solutions through taking several aspects of relevance into account. (The presentation can be found in full here, no video unfortunately). If you want to know more about how to improve relevancy feel free to contact us or download the free guide on Improved search relevancy.

A few themes kept reoccurring during the Enterprise Search-track; Machine learning and NLP, bots and digital assistants, statistics and logs and GDPR. We’ve summarized our main takeaways from these topics below.

 

Machine learning and NLP

Machine learning and NLP were the unchallenged buzzwords of the conference. Everybody wants to do it, some have already started working with it, and some provided products for working with it. Not a lot of concrete examples of how organizations are using machine learning were presented unfortunately, giving us the feeling that few organizations are there yet. We’re at the forefront!

 

Bots, QA systems and digital assistants

Everyone is walking around with Siri or Google assistant in their pocket, but still our enterprise search solutions don’t make use of it. Panels were discussing voice based search (TV remote controls that could search content on all TV channels to set the right channel, a demo om Amazon Alexa providing answers for simple procedures for medical treatments etc.) pointing out that voice-to-text is now working well enough (at least in English) to use in many mobile use cases.

But bots can of course be used without voice input. A few different examples of using bots in a dialog setting were showed. One of the most exciting demos showed a search engine powered bot that used facet values to ask questions to specify what information the user was looking for.

 

Statistics and logs

Collect logs! And when you’ve done that: Use them! A clear theme was how logs were stored, displayed and used. Knowledge managements systems where content creators could monitor how users were finding their information inspired us to consider looking at dashboard for intranet content creators as well. If we can help our content creators understand how their content is found, maybe they are encouraged to use better metadata or wordings or to create information that their users are missing.

 

GDPR

Surprisingly, GDPR is not only a “European thing”, but will have a global impact following the legislation change in May. American companies will have to look at how they handle the personal information of their EU customers. This statement took many attendees by surprise and there were many worried questions on what was considered non-compliant of GDPR.

 

We’ve had an exciting time in Washington and can happily say that we are able bring back inspiration and new experience to our customers and colleagues at Findwise. On the same subject, a couple of weeks ago some or our fellow experts at Findwise wrote the report “In search for Insight”, addressing the new trends (machine learning, NLP etc) in Enterprise Search. Make sure to get your copy of the report if you are interested in this area.

Most of the presentations from Enterprise Search and Discovery Summit can be found here.

 

AuthorsAmelia Andersson and Simon Stenström, search experts from Findwise

Microsoft Ignite 2017 – from a Search and Findability perspective

Microsoft Ignite – the biggest Microsoft conference in the world. 700+ sessions, insights and roadmaps from industry leaders, and deep dives and live demos on the products you use every day. And yes, Findwise was there!

But how do you summarize a conference with more than 700 different sessions?

Well – you focus on one subject (search and findability in this case) and then you collaborate with some of the most brilliant and experienced people around the world within that subject. Add a little bit of your own knowledge – and the result is this Podcast.

Enjoy!

Expert Panel Shares Highlights and Opportunities in Microsoft’s Latest Announcements

microsoft ignite podcast findwise

Do you want to know more about Findwise and Microsoft? Find our how you can make SharePoint and Office 365 more powerful than ever before.

 

Time of intelligence: from Lucene/Solr revolution 2017

Lucene/Solr revolution 2017 has ended with me, Daniel Gómez Villanueva, and Tomasz Sobczak from Findwise on the spot.

First of all, I would like to thank LucidWorks for such a great conference, gathering this talented community and engaging companies all together. I would also like to thank all the companies reaching out to us. We will see you all very soon.

Some takeaways from Lucene/Solr revolution 2017

The conference basically met all of my expectations specially when it comes to the session talks. They gave ideas, inspired and reflected the capabilities of Solr and how competent platform it is when it comes to search and analytics.

So, what is the key take-away from this year’s conference? As usual the talks about relevance attract most audience showing that it is still a concern of search experts and companies out there. But what is different in this years relevance talks from previous years is that, if you want to achieve better result you need to add intelligent layers above/into your platform to achieve it. It is no longer lucrative nor profitable to spend time to tune field weights and boosts to satisfy the end users. Talk from Home Depo: “User Behaviour Driven Intelligent Query Re-ranking and Suggestion”, “Learning to rank with Apache Solr and bees“ from Bloomberg,  “An Intelligent, Personalized Information Retrieval Environment” from Sandia National Laboratories, are just a few examples of many talks where they show how intelligence comes to rescue and lets us achieve what is desired.

Get smarter with Solr

Even if we want to use what is provided out of the box by Solr, we need to be smarter. “A Multifaceted Look at Faceting – Using Facets ‘Under the Hood’ to Facilitate Relevant Search” by LucidWorks shows how they use faceting techniques to extract keywords, understand query language and rescore documents. “Art and Science Come Together When Mastering Relevance Ranking” by Wolters Kluwer is another example where they change/tune Solr default similarity model and apply advanced index time boost techniques to achieve better result. All of this shows that we need to be smarter when it comes to relevance engineering. The time of tuning and tweaking is over. It is the time of intelligence, human intelligence if I may call it.

Thanks again to LucidWorks and the amazing Solr Community. See you all next year. If not sooner.

Writer: Mohammad Shadab, search consultant / head of Solr and fusion solutions at Findwise