Activate conference 2018

Opensource has won! Now, what about AI?

Grant Ingersoll is on stage at the opening of Activate18 explaining the reasoning behind changing the name.

The revolution is won, opensource won, search as a concept to reckon with, they all won.

The times I come across a new search project where someone is pushing anything but opensource search is few and far between these days.

Since Search has taken a turn towards AI, a merge with that topic seems reasonable, not to say obvious. But AI in this context should probably be interpreted as AI to support good search results. At least if judging from the talks I attended. Interesting steps forward is expert systems and similar, none which was extensively discussed as of my knowledge. A kind of system we work with at Findwise. For instance, using NLP, machine learning and text analytics to improve a customer service.

Among the more interesting talks I attended was Doug Turnbulls talk on Neural Search Frontier. Some of the matrix-math threw me back to a ANN-course I took 10 years ago. Way before I ever learned any matrix maths. Now, way post remembering any matrix math-course I ever took, it’s equally confusing, possibly on a bit higher level. But he pointed out interesting aspects and show conceptually how Word2Vec-vectors work and won’t work. Simon Hughes talk “Vectors in search – Towards more semantic matching” is in the same area but more towards actually using it.

Machine Learning is finally mainstream

If we have a look at the overall distribution of talks, I think it’s safe to say that almost all talks touched on machine learning in some way. Most commonly using Learning to Rank and Word2Vec. None of these are new techniques (Our own Mickaël Delaunay wrote a nice blog-post about how to use LTR for personalization a couple of years ago. They have been covered before to some extent but this time around we see some proper, big scale implementations that utilizes the techniques. Bloomberg gave a really interesting presentation on what their evolution from hand tuned relevance to LTR over millions of queries have been like. Even if many talks were held on a theoretical/demo-level it is now very clear. It’s fully possible and feasible to build actual, useful and ROI-reasonable Machine Learning into your solutions.

As Trey Grainer pointed out, there are different generations of this conference. A couple of years ago Hadoop were everywhere. Before that everything was Solr cloud. This year not one talk-description referenced the Apache elephant (but migration to cloud was still referenced, albeit not in the topic). Probably not because big data has grown out of fashion, even though that point was kind of made, but rather that we have other ways of handling and manage it these days.

Don’t forget: shit in > shit out!

And of course, there were the mandatory share of how-we-handle-our-massive-data-talks. Most prominently presented by Slack, all developers favourite tool. They did show a MapReduce offline indexing pipeline that not only enabled them to handle their 100 billion documents, but also gave them an environment which was quick on its feet and super suitable for testing new stuff and experimenting. Something an environment that size usually completely blocks due to re-indexing times, fear of bogging down your search-machines and just general sluggishness.

Among all these super interesting technical solutions to our problems, it’s really easy to forget that loads of time still have to be spent getting all that good data into our systems. Doing the groundwork, building connectors and optimizing data analysis. It doesn’t make for so good talks though. At Findwise we ususally do that using our i3-framework which enables you to ingest, process, index and query your unstructured data in a nice framework.activate 2018 solr lucid opensource

I now look forward to doing the not so ground work using inspiration from loads of interesting solutions here at Activate.

Thanks so much for this year!

The presentations from the conference are available on YouTube in Lucidworks playlist for Activate18.

Author and event participant: Johan Persson Tingström, Findability Expert at Findwise

Major highlights from Elastic{ON} 2018 – Findwise reporting

Two Elastic fans have just returned from San Francisco and the Elastic{ON} 2018 conference. With almost 3.000 participants this year Elastic{ON} is the biggest Elastic conference in the world.

Findwise regularly organises events and meetups, covering among other topics Elastic. Keep an eye for an event close to you.

Here are some of the main highlights from Elastic{ON} 2018.

Let’s start with the biggest announcement of them all, Elastic is opening the source code of the XPack. This mean that you now not only will be able to access the Elastic stack source code, but also the subscription-based code of XPack that up until now have been inaccessible. This opens the opportunity for you as a developer to contribute back code.

news-elasticon-2018

 

Data rollups is a great new feature for anyone with the need to look at old data but feel the storage costs are too high. With rollups only predetermined metrics and terms will be stored. Still allowing you to analyze these dimensions of your data but no longer being able to view the individual documents.

Azure monitoring available in Xpack Basic. Elastic will in an upcoming 6.x release an Azure Monitoring Module, which will consist of a bundle of Kibana dashboards and make it really easy to get started exploring your Azure infrastructure. The monitoring module will be released as part of the XPack basic version – in other words, it will be free to use.

Forecasting was the big new thing in X-packs Machine learning component. As the name suggest the machine learning module can now not only spot anomalies in your data but also predict how it will change in the future.

Security in Kibana will get an update to make it work more like the Security module in Elasticsearch. This will also mean that one of the most requested security questions for Kibana will be resolved, giving users access to only some dashboards.

Dashboard are great and a fundamental part of Kibana but sometimes you want to present your data in more dynamic ways with less focus on data density. This is where Canvas comes in. Canvas is a new Kibana module to produce infographics rather than dashboards but still using live data from Elasticsearch.

Monitoring of Kubernetes and Docker containers will be made a lot easier with the Elastic stack. A new infra component will be created just for this growing use case. This component will be powered by data collected by Beats which now also has an auto discovery functionality within Kubernetes. This will give an overview of not only your Kubernetes cluster but also the individual containers within the cluster.

Geo capabilities within Kibana will be extended to support multiple map layers. This will make it possible to do more kinds of visualizations on maps. Furthermore, work is being done on supporting not only Geo points but also shapes.

One problem some have had with maps is that you need access to the Elastic map service and if you deploy the Elastic stack within a company network this might not be reachable. To solve this work is being done to make it possible to deploy the Elastic maps service locally.

Elastic acquired SaaS solution Swiftype last year. Since then Swiftype have been busy developing even more features to its portfolio. At current Swiftype comes in 3 different version:

  • Swiftype site Search – An out of the box (OOTB) solution for website search
  • Swiftype Enterprise Search – Currently in beta version, but with focus on internal, cloud based datasources (for now) like G Suite, Dropbox, O365, Zendesk etc.
  • Swiftype App Search – A set of API’s and developer tools that makes it quick to build user faced search applications

 

Elastic has also started to look at replacing the Zen protocol used to keep clusters in sync. Currently a PoC is being made to try to create a consensus algorithm that follow modern academic best practices. With the added benefit to remove the minimum master nodes setting, currently one of the most common pitfalls when running Elasticsearch in production.

ECE – Elastic Cloud Enterprise is big focus for Elastic and make it possible for customers to setup a fully service-based search solution being maintained by Elastic.

If you are interested in hearing more about Elastic or Findwise visit https://findwise.com/en/technology/elastic-elasticsearch

elasticon 2018

 

Writers: Mads Elbrond, regional manager Findwise Denmark & Torsten Landergren, senior expert consultant

Digital wizardry for customers & employees – the next elements

A reflection on Mobile World Congress topics mobility, digitalisation, IoT, the Fourth Industrial Revolution and sustainability

MWC2017Commerce has always had a conversational, today it is digital. Organisations are asking how to organise clean effective data for an open digital conversation. 

Digitalization’s aim is to answer customer/consumer-centric demands effectively (with relevant and related data) and in an efficient manner. [for the remainder of the article read consumer and customer interchangeably]

This essentially means Joining the dots between clean data and information and being able to recognise the main and most consumer-valuable use cases, be it common transaction behaviour or negating their most painful user experiences.

This includes treading the fine line between being able to offer “intelligent” information (intelligent in terms of relevance and context)  to the consumer effectively and not seeming to be freaky or stalker-like. The latter is dealt with by forming a digital conversation where the consumer understands the use of their information only being used for their end needs or wants.

While clean, related data from the many various multi-channel customer touch-points forms the basis of an agile digital organisation, it is the combination of significant data analysis insight of user demand & behaviour (clicks, log analysis etc), machine learning and sensible prediction that forms the basis of artificial Intelligence. Artificial intelligence broken down is essentially resultant action based on the inferences of knowing certain information, i.e. the elementary Dr Watson, but done by computers.

This new digital data basis means being able to take data from what were previous data silos and combine it effectively in a meaningful way, for a valuable purpose. While the tag of Big Data becomes weary in a generalised context, key is the picking of data/information to get relevant answers to the mosts valuable questions, or in consumer speak, to get a question answered or a job done effectively.

Digitalisation (and then the following artificial intelligence) relies obviously on computer automation, but it still requires some thoughtful human-related input. Important steps in the move towards digitalization include:

  • Content and Data Inventory, to clean data/ the cleansing of data and information;
  • Its architecture (information modelling, content analysis, automatic classification and annotation/tagging);
  • Data analysis in combination with text analysis (or NLP: natural language processing for the more abundant unstructured data, content), the latter to put flesh on the bone as it were, or adding meaning and context
  • Information Governance: the process of being responsible for the collection, proper storage and use of important digital information (now made less ignorable with new citizen-centric data laws (GDPR) and the need for data agility or anonymization of data)
  • Data/system Interoperability: which data formats, structures, and standards, are most appropriate for you? What data collections are most Relational databases, Linked/graph data, data lakes etc.?); 
  • Language/cultural interoperability: letting people with different perspectives accessing the same information topics using their own terminology.
  • Interoperability for the future also means being able to link everything in your business ecosystem for collaboration, in- and outbound conversations, endless innovation and sustainability.
  • IoT or the Internet of Things is making the physical world digital and adding further to the interlinked network, soon to be superseded by the AoT (the analysis of things)
  • Newer steps of Machine learning (learning about consumer preferences and behaviour etc.) and artificial intelligence (being able to provide seemingly impossible relevant information, clever decision-making and/or seamless user experience).

The fusion of technologies continues further as the lines between the physical, digital, and biological spheres with developments in immersive Internet, as with Augmented Reality (AR) and Virtual Reality (VR).

The next elements are here already: semantic (‘intelligent’) search, virtual assistants, robots, chat bots… with 5G around the corner to move more data, faster.

Progress within mobility paves the way for a more sustainable world for all of us (UN Sustainable Development), with a future based on participation. In emerging markets we are seeing giant leaps in societal change. Rural areas now have access to the vast human resources of knowledge to service innovation e.g. through free-access to Wikipedia on cheap mobile devices and Open Campuses. Gender equality with changed monetary and mobile financial practices and blockchain means to raise to the challenge with interoperability. We have to address the open paradigm (e.g Open Data) and the participation economy, building the next elements. Shared experience and information commons. This also falls back to the intertwingled digital workplace, and practices to move into new cloud based arenas.

Some last remarks on the telecom industry, it is loaded with acronyms and for laymen in the area sometimes a maze to navigate and to build some sensemaking.

So are these steps straightforward, or is the reality still a potential headache for your organisation? 

Contact Findwise now to ease the process, before your competitor does 😉
View Fredric Landqvist's LinkedIn profileFredric Landqvist research blog
View Peter Voisey's LinkedIn profilePeter Voisey

What will happen in the information sector in 2017?

As we look back at 2016, we can say that it has been an exciting and groundbreaking year that has changed how we handle information. Let’s look back at the major developments from 2016 and list key focus areas that will play an important role in 2017.

3 trends from 2016 that will lay basis for shaping the year 2017

Cloud

There has been a massive shift towards cloud, not only using the cloud for hosting services but building on top of cloud-based services. This has affected all IT projects, especially the Enterprise Search market when Google decided to discontinue GSA and replace it with a cloud based Springboard. More official information on Springboard is still to be published in writing, but reach out to us if you are keen on hearing about the latest developments.

There are clear reasons why search is moving towards the cloud. Some of the main ones being machine learning and the amount of data. We have an astonishing amount of information available, and the cloud is simply the best way to handle this overflow. Development in the cloud is faster, the cloud gives practically unlimited processing power and the latest developments available in the cloud are at an affordable price.

Machine learning

One area that has taken huge steps forward has been machine learning. It is nowadays being used in everyday applications. Google wrote a very informative blog post about how they use Cloud machine learning in various scenarios. But Google is not alone in this space – today, everyone is doing machine learning. A very welcome development was the formation of Partnership on AI by Amazon, Google, Facebook, IBM and Microsoft.

We have seen how machine learning helps us in many areas. One good example is health care and IBM Watson managing to find a rare type of leukemia in 10 minutes. This type of expert assistance is becoming more common. While we know that it is still a long path to come before AI becomes smarter than human beings, we are taking leaps forward and this can be seen by DeepMind beating a human at the complex board game Go.

Internet of Things

Another important area is IoT. In 2016 most IoT projects have, in addition to consumer solutions, touched industry, creating a smart city, energy utilization or connected cars. Companies have realized that they nowadays can track any physical object to the benefits of being able to serve machines before they break, streamline or build better services or even create completely new business based on data knowledge. On the consumer side, we’ve in 2016 seen how IoT has become mainstream with unfortunate effect of poorly secured devices being used for massive attacks.

 

3 predictions for key developments happening in 2017

As we move towards the year 2017, we see that these trends from 2016 have positive effects on how information will be handled. We will have even more data and even more effective ways to use it. Here are three predictions for how we will see the information space evolve in 2017.

Insight engine

The collaboration with computers are changing. For decades, we have been giving tasks to computers and waited for their answer. This is slowly changing so that we start to collaborate with computers and even expect computers to take the initiative. The developments behind this is in machine learning and human language understanding. We no longer only index information and search it with free text. Nowadays, we can build a computer understanding information. This information includes everything from IoT data points to human created documents and data from other AI systems. This enables building an insight engine that can help us formulate the right question or even giving us insight based on information to a question we never ask. This will revolutionize how we handle our information how we interact with our user interfaces.

We will see virtual private assistants that users will be happy to use and train so that they can help us to use information like never before in our daily live. Google Now, in its current form, is merely the first step of something like this, being active towards bringing information to the user.

Search-driven analytics

The way we use and interact with data is changing. With collected information about pretty much anything, we have almost any information right at our fingertips and need effective ways to learn from this – in real time. In 2017, we will see a shift away from classic BI systems towards search-driven evolutions of this. We already have Kibana Dashboards with TimeLion and ThoughtSpot but these are only the first examples of how search is revolutionizing how we interact with data. Advanced analytics available for anyone within the organization, to get answers and predictions directly in graphs and diagrams, is what 2017 insights will be all about.

Conversational UIs

We have seen the rise of Chatbots in 2016. In 2017, this trend will also take on how we interact with enterprise systems. A smart conversational user interface builds on top of the same foundations as an enterprise search platform. It is highly personalized, contextually smart and builds its answers from information in various systems and information in many forms.

Imagine discussing future business focus areas with a machine that challenges us in our ideas and backs everything with data based facts. Imagine your enterprise search responding to your search with a question asking you to detail what you actually are achieving.

 

What are your thoughts on the future developement?

How do you see the 2017 change the way we interact with our information? Comment or reach out in other ways to discuss this further and have a Happy Year 2017!

 

Written by: Ivar Ekman

Understanding politics with Watson using Text Analytics

To understand the topics that actually are important to different political parties is a difficult task. Can text analytics together with an search index be an approach to given a better understanding?

This blog post describes how IBM Watson Explorer Content Analytics (WCA) can be used to make sense of Swedish politics. All speeches (in Swedish: anföranden) in the Swedish Parliament from 2004 to 2015 are analyzed using WCA. In total 139 110 transcribed text documents were analyzed. The Swedish language support build by Findwise for WCA is used together with a few text analytic processing steps which parses out person names, political party, dates and topics of interest. The selected topics in this analyzed are all related to infrastructure and different types of fuels.

We start by looking at how some of the topics are mentioned over time.

Analyze of terms of interets in Swedsih parlament between 2004 and 2014.

Analyze of terms of interest in Swedish parliament between 2004 and 2014.

The view shows topic which has a higher number of mentions compared to what would be expected during one year. Here we can see among other topics that the topic flygplats (airport) has a high increase in number of mentioning during 2014.

So let’s dive down and see what is being said about the topic flygplats during 2014.

Swedish political parties mentioning Bromma Airport.

Swedish political parties mentioning Bromma Airport during 2014.

The above image shows how the different political parties are mentioning the topic flygplats during the year 2014. The blue bar shows the number of times the topic flygplats was mentioned by each political party during the year. The green bar shows the WCA correlation value which indicates how strongly related a term is to the current filter. What we can conclude is that party Moderaterna mentioned flygplats during 2014 more frequently than other parties.

Reviewing the most correlated nouns when filtering on flygplats and the year 2014 shows among some other nouns: Bromma (place in Sweden), airport and nedläggning (closing). This gives some idea what was discussed during the period. By filtering on the speeches which was held by Moderaterna and reading some of them makes it clear that Moderaterna is against a closing of Bromma airport.

The text analytics and the index provided by WCA helps us both discover trending topics over time and gives us a tool for understanding who talked about a subject and what was said.

All the different topics about infrastructure can together create a single topic for infrastructure. Speeches that are mentioning tåg (train), bredband (broadband) or any other defined term for infrastructure are also tagged with the topic infrastructure. This wider concept of infrastructure can of course also be viewed over time.

Discussions in Swedish parliament mentioning the defined terms which builds up the subject infrastructure 2004 to 2015.

Discussions in Swedish parliament mentioning the defined terms which builds up the subject infrastructure 2004 to 2015.

Another way of finding which party that are most correlated to a subject is by comparing pair of facets. The following table shows parties highly related to terms regarding infrastructure and type of fuels.

Political parties highly correlated to subjects regarding infrastructure and types of fuel.

Swedish political parties highly correlated to subjects regarding infrastructure and types of fuel.

Let’s start by explain the first row in order to understand the table. Mobilnät (mobile net) has only been mentioned 44 times by Centerpartiet, but Centerpartiet is still highly related to the term with a WCA correlation value of 3.7. This means that Centerpartiet has a higher share of its speeches mentioning mobilnät compared to other parties. The table indicates that two parties Centerpartiet and Miljöpartiet are more involved about the subject infrastructure topics than other political parties.

Swedish parties mentioning the defined concept of infrastructure.

Swedish parties mentioning the defined concept of infrastructure.

Filtering on the concept infrastructure also shows that Miljöpartiet and Centerpartiet are the two parties which has the highest share of speeches mentioning the defined infrastructure topics.

Interested to dig deeper into the data? Parsing written text with text analytics is a successful approach for increasing an understanding of subjects such as politics. Using IBM Watson Explorer Content Analytics makes it easy. Most of the functionality used in this example is also out of the box functionalities in WCA.

Predictive Analytics World 2012

At the end of November 2012 top predictive analytics experts, practitioners, authors and business thought leaders met in London at Predictive Analytics World conference. Cameral nature of the conference combined with great variety of experiences brought by over 60 attendees and speakers made a unique opportunity to dive into the topic from Findwise perspective.

Dive into Big Data

In the Opening Keynote, presented by Program Chairman PhD Geert Verstraeten, we could hear about ways to increase the impact of Predictive Analytics. Unsurprisingly a lot of fuzz is about embracing Big Data.  As analysts have more and more data to process, their need for new tools is obvious. But business will cherish Big Data platforms only if it sees value behind it. Thus in my opinion before everything else that has impact on successful Big Data Analytics we should consider improving business-oriented communication. Even the most valuable data has no value if you can’t convince decision makers that it’s worth digging it.

But beeing able to clearly present benefits is not everything. Analysts must strive to create specific indicators and variables that are empirically measurable. Choose the right battles. As Gregory Piatetsky (data mining and predictive analytics expert) said: more data beats better algorithms, but better questions beat more data.

Finally, aim for impact. If you have a call center and want to persuade customers not to resign from your services, then it’s not wise just to call everyone. But it might also not be wise to call everyone you predict to have high risk of leaving. Even if as a result you loose less clients, there might be a large group of customers that will leave only because of the call. Such customers may also be predicted. And as you split high risk of leaving clients into “persuadable” ones and “touchy” ones, you are able to fully leverage your analytics potencial.

Find it exciting

Greatest thing about Predictive Analytics World 2012 was how diverse the presentations were. Many successful business cases from a large variety of domains and a lot of inspiring speeches makes it hard not to get at least a bit excited about Predictive Analytics.

From banking and financial scenarios, through sport training and performance prediction in rugby team (if you like at least one of: baseball, Predictive Analytics or Brad Pitt, I recommend you watch Moneyball movie). Not to mention Case Study about reducing youth unemployment in England. But there are two particular presentations I would like to say a word about.

First of them was a Case Study on Predicting Investor Behavior in First Social Media Sentiment-Based Hedge Fund presented by Alexander Farfuła – Chief Data Scientist at MarketPsy Capital LLC. I find it very interesting because it shows how powerful Big Data can be. By using massive amount of social media data (e.g. Twitter), they managed to predict a lot of global market behavior in certain industries. That is the essence of Big Data – harness large amount of small information chunks that are useless alone, to get useful Big Picture.

Second one was presented by Martine George – Head of Marketing Analytics & Research at BNP Paribas Fortis in Belgium. She had a really great presentation about developing and growing teams of predictive analysts. As the topic is brisk at Findwise and probably in every company interested in analytics and Big Data, I was pleased to learn so much and talk about it later on in person.

Big (Data) Picture

Day after the conference John Elder from Elder Research led an excellent workshop. What was really nice is that we’ve concentrated on the concepts not the equations. It was like a semester in one day – a big picture that can be digested into technical knowledge over time. But most valuable general conclusion was twofold:

  • Leverage – an incremental improvement will matter! When your turnover can be counted in millions of dollars even half percent of saving mean large additional revenue.
  • Low hanging fruit – there is lot to gain what nobody else has tried yet. That includes reaching for new kinds of data (text data, social media data) and daring to make use of it in a new, cool way with tools that weren’t there couple of years ago.

Plateau of Productivity

As a conclusion, I would say that Predictive Analytics has become a mature, one of the most useful disciplines on the market. As in the famous Gartner Hype, Predictive Analytics reached has reached the Plateau of Productivity. Though often ungrateful, requiring lots of resources, money and time, it can offer your company a successful future.

Video: Introducing Hydra – An Open Source Document Processing Framework

Introducing Hydra – An Open Source Document Processing Framework from presented at Lucene Revolution hosted on Vimeo.

Presented by Joel Westberg, Findwise AB
This presentation details the document-processing framework called Hydra that has been developed by Findwise. It is intended as a description of the framework and the problem it aims to solve. We will first discuss the need for scalable document processing, outlining that there is a missing link between the open source chain to bridge the gap between source system and the search engine, then will move on to describe the design goals of Hydra, as well as how it has been implemented to meet those demands on flexibility, robustness and ease of use. This session will end by discussing some of the possibilities that this new pipeline framework can offer, such as freely seamlessly scaling up the solution during peak loads, metadata enrichment as well as proposed integration with Hadoop for Map/Reduce tasks such as page rank calculations.