Understanding politics with Watson using Text Analytics

To understand the topics that actually are important to different political parties is a difficult task. Can text analytics together with an search index be an approach to given a better understanding?

This blog post describes how IBM Watson Explorer Content Analytics (WCA) can be used to make sense of Swedish politics. All speeches (in Swedish: anföranden) in the Swedish Parliament from 2004 to 2015 are analyzed using WCA. In total 139 110 transcribed text documents were analyzed. The Swedish language support build by Findwise for WCA is used together with a few text analytic processing steps which parses out person names, political party, dates and topics of interest. The selected topics in this analyzed are all related to infrastructure and different types of fuels.

We start by looking at how some of the topics are mentioned over time.

Analyze of terms of interets in Swedsih parlament between 2004 and 2014.

Analyze of terms of interest in Swedish parliament between 2004 and 2014.

The view shows topic which has a higher number of mentions compared to what would be expected during one year. Here we can see among other topics that the topic flygplats (airport) has a high increase in number of mentioning during 2014.

So let’s dive down and see what is being said about the topic flygplats during 2014.

Swedish political parties mentioning Bromma Airport.

Swedish political parties mentioning Bromma Airport during 2014.

The above image shows how the different political parties are mentioning the topic flygplats during the year 2014. The blue bar shows the number of times the topic flygplats was mentioned by each political party during the year. The green bar shows the WCA correlation value which indicates how strongly related a term is to the current filter. What we can conclude is that party Moderaterna mentioned flygplats during 2014 more frequently than other parties.

Reviewing the most correlated nouns when filtering on flygplats and the year 2014 shows among some other nouns: Bromma (place in Sweden), airport and nedläggning (closing). This gives some idea what was discussed during the period. By filtering on the speeches which was held by Moderaterna and reading some of them makes it clear that Moderaterna is against a closing of Bromma airport.

The text analytics and the index provided by WCA helps us both discover trending topics over time and gives us a tool for understanding who talked about a subject and what was said.

All the different topics about infrastructure can together create a single topic for infrastructure. Speeches that are mentioning tåg (train), bredband (broadband) or any other defined term for infrastructure are also tagged with the topic infrastructure. This wider concept of infrastructure can of course also be viewed over time.

Discussions in Swedish parliament mentioning the defined terms which builds up the subject infrastructure 2004 to 2015.

Discussions in Swedish parliament mentioning the defined terms which builds up the subject infrastructure 2004 to 2015.

Another way of finding which party that are most correlated to a subject is by comparing pair of facets. The following table shows parties highly related to terms regarding infrastructure and type of fuels.

Political parties highly correlated to subjects regarding infrastructure and types of fuel.

Swedish political parties highly correlated to subjects regarding infrastructure and types of fuel.

Let’s start by explain the first row in order to understand the table. Mobilnät (mobile net) has only been mentioned 44 times by Centerpartiet, but Centerpartiet is still highly related to the term with a WCA correlation value of 3.7. This means that Centerpartiet has a higher share of its speeches mentioning mobilnät compared to other parties. The table indicates that two parties Centerpartiet and Miljöpartiet are more involved about the subject infrastructure topics than other political parties.

Swedish parties mentioning the defined concept of infrastructure.

Swedish parties mentioning the defined concept of infrastructure.

Filtering on the concept infrastructure also shows that Miljöpartiet and Centerpartiet are the two parties which has the highest share of speeches mentioning the defined infrastructure topics.

Interested to dig deeper into the data? Parsing written text with text analytics is a successful approach for increasing an understanding of subjects such as politics. Using IBM Watson Explorer Content Analytics makes it easy. Most of the functionality used in this example is also out of the box functionalities in WCA.

Presentation: Enterprise Search and Findability in 2013

This was presented 8 November at J. Boye 2012 Conference in Aarhus, Denmark, by Kristian Norling.

Presentation Summary

There is a lot of talk about social, big data, cloud, digital workplace and semantic web. But what about search, is there anything interesting happening within enterprise search and findability? Or is enterprise search dead?

In the spring of 2012,  we conducted a global survey on Enterprise Search and Findability. The resulting report based on the answers from survey tells us what the leading practitioners are doing and gives guidance for what you can do to make your organisation’s enterprise search and findability better in 2013.

This presentation will give you a sneak peak into the near future and trends of enterprise search, based on data form the survey and what the leaders that are satisfied with their search solutions do.

Topics on Enterprise Search

  •  Help me! Content overload!
  • The importance of context
  • Digging for gold with search analytics
  • What has trust to do with enterprise search?
  • Social search? Are you serious?
  • Oh, and that mobile thing

Presentation: The Why and How of Findability

“The Why and How of Findability” presented by Kristian Norling at the ScanJour Kundeseminar in Copenhagen, 6 September 2012. We can make information findable with good metadata. The metadata makes it possible to create browsable, structured and highly findable information. We can make findability (and enterprise search) better by looking at findability in five different dimensions.

Five dimensions of Findability

1. BUSINESS – Build solutions to support your business processes and goals

2. INFORMATION – Prepare information to make it findable

3. USERS – Build usable solutions based on user needs

4. ORGANISATION – Govern and improve your solution over time

5. SEARCH TECHNOLOGY – Build solutions based on state-of-the-art search technology

Video: Search Analytics in Practice

Search Analytics in Practice from Findwise on Vimeo.

This presentation is about how to use search analytics to improve the search experience. A small investment in time and effort can really improve the search on your intranet or website. You will get practical advice on what metrics to look at and what actions can be taken as a result of the analysis.

Video in swedish “Sökanalys i praktiken”.

The presentation was recorded in Gothenburg on the 4th of May 2012.

The presentation featured in the video:

Search Analytics in Practice

View more presentations from Findwise

Architecture of Search Systems and Measuring the Search Effectiveness

Lecture made at the 19th of April 2012, at the Warsaw University of Technology. This is the 9th lecture in the regular course for master grade studies, “Introduction to text mining”.

View more presentations from Findwise

Book Review: Search Analytics for Your Site

Lou Rosenfeld is the founder and publisher of Rosenfeld Media and also the co-author (with Peter Morville) of the best-selling book Information architecture for the World Wide Web, which is considered one of the best books about information management.

In Lou Rosenfeld’s latest book he lets us know how to successfully work with Site Search Analytics (SSA). With SSA you analyse the saved search logs of what your users are searching for to try to find emerging patterns. This information can be a great help to figure out what users want and need from your site.  The search terms used on your site will offer more clues to why the user is on your site compared to search queries from Google (which reveal how they get to your site).

So what’s in the book?

Part I – Introducing Site Search Analytics

In part one the reader gets a great example of why to use SSA and an introduction to what SSA is. In the first chapters you follow John Ferrara who worked at a company called Vanguard and how he analysed search logs to prove that a newly bought search engine performed poorly whilst using the same statistics to improve it. This is a great real world example of how to use SSA for measuring quality of search AND to set up goals for improvement.

a word cloud is one way to play with the data

Part II – Analysing the data

In this part Lou gets hands on with user logs and lets you how to analyse the data. He makes it fun and emphasizes the need to play with user data. Without emphasis on playing, the task to analyse user data may seem daunting. Also, with real world examples from different companies and institutions it is easy to understand the different methods for analysis. Personally, I feel the use of real data in the book makes the subject easier (and more interesting) to understand.

From which pages do users search?

Part III – Improving your site

In the third part of the book, Rosenfeld shows how to apply your findings during your analysis. If you’ve worked with SSA before most of it will be familiar (improving best bets, zero hits, query completion and synonyms) but even for experienced professionals there is good information about how to improve everything from site navigation to site content and even to connect your ssa to your site KPI’s.

ConclusionSearch Analytics For Your Site shows how easy it is to get started with SSA but also the depth and usefulness of it. This book is easy to read and also quite funny. The book is quite short which in this day and age isn’t negative. For me this book reminded me of the importance of search analytics and I really hope more companies and sites takes the lessons in this book to heart and focuses on search analytics.

Findability in Customer Service Search

We have previously introduced Findability by Findwise, involving solutions that make optimal use of search technology to support and strengthen the business of our customers. In a series of blog posts we will present how findability solutions can be deployed within different parts of your organisation. Initially I will focus on how efficient implementation of search technology, by a good customer service search, can improve your customer service offering.

Ultimately, the goal of most customer service interactions is to increase customer satisfaction and thereby improve customer retention in a cost efficient way. In times when the amount of available information increases by the minute, one key success factor is to provide both customer service agents and customers with quick and easy access to relevant information. A findability solution based on state-of-the-art search technology and optimised along the findability dimensions will fuel your customer service search offering in two primary ways:

  1. Improved support to customer service agents
  2. Improved online customer service

Example of customer service search

Improved support to customer service agents

While more traditional customer service interaction solutions tend to be based on a knowledge database, that needs to be built and maintained, a Findability solution is more dynamic in its nature and is based on a dynamic search index created by the already existing data residing in corporate systems. In other words, the solution makes optimal use of existing information and systems to support customer service agents in accessing relevant information. The positive effects are illustrated by the case study below.

Case study: Telecom call centre

Findwise implemented a findability solution at a call centre for a large Swedish mobile operator. The solution introduced the powerful ability to search in the most important information source, which previously only had been accessible via tree-structure navigation.

The graph below presents the result of a test performed by the call centre agents to evaluate the new search function. The test encompassed a number of tasks in which the agents compared using the search functionality to the traditional navigation, in terms of both level of difficulty and time consumption in finding desired information. The graph shows that the agents found the search function very helpful, making the information both easier and less time consuming to find.

 The graph shows that the agents found the customer service search function very helpful, making the information both easier and less time consuming to find.

The most evident effects of improved support and information access via search technology are:

  • Reduced handling time
  • Higher first time resolution
  • Reduced Tier-2 escalations
  • Increased customer service agent satisfaction
  • Increased agent productivity
  • Less training needed to introduce new agents

In a white paper, Google has also pinpointed, and quantified, the above benefits of implementing a Findability solution in call centre operations, in this case fuelled by the Google Search Appliance (GSA) search platform. For example, Google states that handling time can be reduced by up to 20% on average and that is it possible to save up to 25% on training costs for each new call centre agent. The full article is available here.

Improved online customer service

Naturally a Findability solution can also improve your online customer service offering. Below I have outlined three solution elements that will help drive customer self-service and thereby deflect issues from being forwarded to the customer service organisation.

Improved search functionality

As in the case of agent support, a powerful search functionality that provides relevant information from all required sources in a user-friendly way will increase the ability of customer self-resolution.

Personalised user interface

Using the power of an enterprise search platform you can customise the self-service experience, in a dynamical way, to the individual and the incident to simplify and speed up the process of finding answers.

Dynamic FAQ

Self-service can also be fuelled by providing a relevant and updated FAQ section. The information can be made dynamic and include answers to the most recent questions by using both query log information, i.e. what users are searching for, and call centre comments as input to the FAQs.

For many enterprises, self-service is seen as the solution that can provide customers with the support they need while significantly reducing customer service costs. However, self-service must do more than just cut costs. When customers perceive self-service as simply a means to shift interaction costs onto their shoulders, it can reduce customer satisfaction. Customers need a self-service experience that provides them with higher levels of interaction convenience and information availability, faster issue resolution and more personalised interactions. A Findability solution including the above elements provides that.

The most evident effects of an improved online customer service offering gained from the use of search technology and search analytics are:

  • Less number of incoming calls/e-mails
  • Increased customer satisfaction
  • Increased browser- to-buyer conversion rate
  • Increased knowledge of user interests and behaviour (to fuel additional sales)

Visit our website to learn more about findability solutions that make our customers truly benefit from state-of-the-art search technology.

Quick Website Diagnostics with Search Analytics

I have recently been giving courses directed to web editors on how to successfully apply search technology on a public web site. One of the things we stress is how to use search analytics as a source of user feedback. Search analytics is like performing a medical checkup. Just as physicians inspect patients in search of maladious symptoms, we want to be able to inspect a website in search of problems hampering user experience. When such symptoms are discovered a reasonable resolution is prescribed.

Search analytics is a vast field but as usual a few tips and tricks will take you a long way. I will describe three basic analysis steps to get you started. Search usage on public websites can be collected and inspected using an array of analytics toolkits, for example Google Analytics.

How many users are using search?

For starters, have a look at how many of your users are actually using search. Obviously having a large portion of users doing so means that search is becoming very important to your business. A simple conclusion stemming from such evidence is that search simply has to work satisfactorily, otherwise a large portion of your users are getting disappointed.

Having many searchers also raises some questions. Are users using search because they want to or because they are forced to, because of tricky site navigation for example? If you feel that the latter seems reasonable you may find that as you improve site navigation your number of searchers will decrease while overall traffic hopefully increases.

Just as with high numbers, low numbers can be ambiguous. Low scores especially coupled with a good amount of overall site traffic may mean that users don’t need search in order to find what they are looking for. On the other hand it may mean that users haven’t found the search box yet, or that the search tool is simply too complicated for the average user.

Aside from the business, knowing how popular search is can be beneficial to you personally. It’s a great feeling to know that you are responsible for one of the most used subsystems of your site. Rub it in the face of your colleague!

From where are searches being initiated?

One of the first recommendations you will get when implementing a search engine for your web site is to include the search box on each and every page, preferably in a standardized easy-to-find place like the top right corner. The point of having the search box available wherever your users happen to be is to enable them to search, typically after they have failed to find what they are looking for through browsing.

Now that we know that search is being conducted everywhere, we should be keeping an eye out for pages that frequently emit searches. Knowing what those pages are will let us improve the user experience by altering or completing the information there.

Which are the most common queries?

The most frequently issued queries to a search system make up a significant amount of the total number of served queries. These are known as head queries. By improving the quality of search for head queries you can offer a better search experience to a large amount of users.

A simple but effective way of working with search tuning is this. For each of the 10, 20 or 50 most frequent queries to the system:

  1. Imagine what the user was looking for when typing that query
  2. Perform that query yourself
  3. Examine the 5-10 top results in the result list:
    • Do you think that the user was content with those results
    • If yes, pat your back 🙂
    • If not, tweak using synonyms or best bets.

Go through this at least once a month. If the information on your site is static you might not need to change a lot of things every time, but if your content is changing or the behavior of the users you may need to adjust a few things.

Systematic Relevance: Evaluation

Perfect relevance is the holy grail of Search. If possible we would like to give every user the document or piece of information they are looking for. Unfortunately, our chances of doing so are slim. Not even Google, the great librarian of our age, manages to do so. Google is good but not perfect.

Nevertheless, as IT professionals, search experts and information architects we try. We construct complicated document processing pipelines in order to tidy up our data and to extract new metadata. We experiment endlessly with stop words, synonym expansion, best bets and different ways to weigh sources and fields. Are we getting any closer? Well, probably. But how can we know?

There are a myriad of knobs and dials for tuning in an enterprise search engine. This fact alone should convince us that we need a systematic approach to dealing with relevance; with so many parameters to work with the risk of breaking relevance seems at least as great as the chance of improving on it. Another reason is that relevance doesn’t age gracefully, and even if we do manage to find a configuration that we feel is decent it will probably need to be reworked in a few months time. At Lucene Eurocon Grant Ingersoll also said that:

“I urge you to be empirical when working with relevance”

I favor the trial and error approach to most things in life, relevance tuning included. Borrowing concepts from information retrieval, one usually starts off by creating a gold standard. A gold standard is a depiction of the world as it should be: a list of queries, preferably popular or otherwise important, and the documents that should be present in the result list for each of those queries. If the search engine were capable of perfect relevance then the results would be 100% accuracy when compared to the gold standard.

The process of creating such a gold standard is an art in itself. I suggest choosing 50 or so queries. You may already have an idea of which ones are interesting to your system; otherwise search analytics can provide this information for you. Furthermore, you need to decide which documents should be shown for each of the queries. Since users are usually only content if their document is among the top 3 or 5 hits in the result list, you should have up to this amount of documents for each query in your gold standard. You can select these documents yourself if you like. However, arguably the best way is to sit down with a focus group selected from among your target audience and have them decide which documents to include. Ideally you want a gold standard that is representative for the queries that your users are issuing. Any improvements achieved through tuning should boost the overall relevance of the search engine and not just for the queries we picked out.

The next step is to determine a baseline. The baseline is our starting point, that is, how well the search engine compares out of the box to the gold standard. In most cases this will be significantly below 100%. As we proceed to tune the search engine its accuracy, as compared to the gold standard, should move from the baseline toward 100%. Should we end up with accuracy below that of the baseline then our work has probably had little effect. Either relevance was as good as it gets using the default settings of the search engine, or, more likely, we haven’t been turning the right knobs.

Using a systematic approach like the one above greatly simplifies the process of working with relevance. It allows us to determine which tweaks are helpful and keeps us on track toward our ultimate goal: perfect relevance. A goal that, although unattainable, is well worth striving toward.