Understanding politics with Watson using Text Analytics

To understand the topics that actually are important to different political parties is a difficult task. Can text analytics together with an search index be an approach to given a better understanding?

This blog post describes how IBM Watson Explorer Content Analytics (WCA) can be used to make sense of Swedish politics. All speeches (in Swedish: anföranden) in the Swedish Parliament from 2004 to 2015 are analyzed using WCA. In total 139 110 transcribed text documents were analyzed. The Swedish language support build by Findwise for WCA is used together with a few text analytic processing steps which parses out person names, political party, dates and topics of interest. The selected topics in this analyzed are all related to infrastructure and different types of fuels.

We start by looking at how some of the topics are mentioned over time.

Analyze of terms of interets in Swedsih parlament between 2004 and 2014.

Analyze of terms of interest in Swedish parliament between 2004 and 2014.

The view shows topic which has a higher number of mentions compared to what would be expected during one year. Here we can see among other topics that the topic flygplats (airport) has a high increase in number of mentioning during 2014.

So let’s dive down and see what is being said about the topic flygplats during 2014.

Swedish political parties mentioning Bromma Airport.

Swedish political parties mentioning Bromma Airport during 2014.

The above image shows how the different political parties are mentioning the topic flygplats during the year 2014. The blue bar shows the number of times the topic flygplats was mentioned by each political party during the year. The green bar shows the WCA correlation value which indicates how strongly related a term is to the current filter. What we can conclude is that party Moderaterna mentioned flygplats during 2014 more frequently than other parties.

Reviewing the most correlated nouns when filtering on flygplats and the year 2014 shows among some other nouns: Bromma (place in Sweden), airport and nedläggning (closing). This gives some idea what was discussed during the period. By filtering on the speeches which was held by Moderaterna and reading some of them makes it clear that Moderaterna is against a closing of Bromma airport.

The text analytics and the index provided by WCA helps us both discover trending topics over time and gives us a tool for understanding who talked about a subject and what was said.

All the different topics about infrastructure can together create a single topic for infrastructure. Speeches that are mentioning tåg (train), bredband (broadband) or any other defined term for infrastructure are also tagged with the topic infrastructure. This wider concept of infrastructure can of course also be viewed over time.

Discussions in Swedish parliament mentioning the defined terms which builds up the subject infrastructure 2004 to 2015.

Discussions in Swedish parliament mentioning the defined terms which builds up the subject infrastructure 2004 to 2015.

Another way of finding which party that are most correlated to a subject is by comparing pair of facets. The following table shows parties highly related to terms regarding infrastructure and type of fuels.

Political parties highly correlated to subjects regarding infrastructure and types of fuel.

Swedish political parties highly correlated to subjects regarding infrastructure and types of fuel.

Let’s start by explain the first row in order to understand the table. Mobilnät (mobile net) has only been mentioned 44 times by Centerpartiet, but Centerpartiet is still highly related to the term with a WCA correlation value of 3.7. This means that Centerpartiet has a higher share of its speeches mentioning mobilnät compared to other parties. The table indicates that two parties Centerpartiet and Miljöpartiet are more involved about the subject infrastructure topics than other political parties.

Swedish parties mentioning the defined concept of infrastructure.

Swedish parties mentioning the defined concept of infrastructure.

Filtering on the concept infrastructure also shows that Miljöpartiet and Centerpartiet are the two parties which has the highest share of speeches mentioning the defined infrastructure topics.

Interested to dig deeper into the data? Parsing written text with text analytics is a successful approach for increasing an understanding of subjects such as politics. Using IBM Watson Explorer Content Analytics makes it easy. Most of the functionality used in this example is also out of the box functionalities in WCA.

Analytics and Big Data at IBM Information On Demand 2011

The big trend these days are in Big Data and how you can analyze large amounts of information in order to gain important insights, and from those insights be able to take the right action. This trend was a hot topic at the IBM Information On Demand (IOD) conference in Las Vegas earlier this year. IBM has a very strong position in this field, it’s hard to have missed how their computer system Watson challenged the top players of all time in Jeopardy recently, and won! Read more about Watson

Now IBM has taken the technology behind Watson and started to apply it in their different analytics products, where one specific area that is being targeted is healthcare. For this area IBM released a new product during IOD called IBM Content and Predictive Analytics for Healthcare, which can for example be used as a tool for physicians to support them in their diagnosis of patients.

In April this year IBM merged two of their products, their search engine OmniFind and their product for analyzing large amounts of unstructured information, Content Analytics. The new product is called IBM Content analytics with Enterprise search and it too is based on much of the same technology that is used in Watson, more specifically it utilizes the same Natural Language Processing techniques. This means that it has the ability to understand text on a level just as sophisticated as that of Watson.

Content Analytics with enterprise search scales very well to many millions of documents. However, when there is a need for analyzing really enormous data sets, in the magnitude of petabytes or even exabytes, IBM has developed what they call their BigData platform. This platform mainly revolves around two products, InfoSphere Streams and InfoSphere BigInsights, and it builds on a foundation of open source software, such as Apache Hadoop and Apache Lucene. InfoSphere Streams is used for real time analysis of information in motion. This helps you understand what’s happening right at this moment in your organization and supports you in taking appropriate action as things are happening. InfoSphere BigInsights on the other hand lets you analyze and draw insight from massive amounts of already existing data.

Studies have shown how organizations that fall short in this area are overtaken by those who understand how to use the power of analytics.

IBM has surely chosen an interesting path when merging Analytics with Findability.

OmniFind Enterprise Edition 9.1 – New Capabilities Discussed Over Breakfast

During the last year a number of interesting things has happened to IBM’s search platform and the new version, OmniFind 9.1, was released this summer. Apart from a large number of improvements in the interface, the change to basing the new solution on open source (Lucene) has proven to be a genius by-pass of some of OmniFinds previous shortcomings.

The licensing model is still quite complicated, something Stephen E Arnold highlighted earlier this year. Since a number of our customers have chosen to take a closer look at OmniFind as a search solution we decided to host a breakfast seminar together with IBM last Thursday, in order to discuss the new features and show how some of our customer are working with it.

Without a doubt, the most interesting part is always to discuss how the solution can be utilized for intranets, extranets, external sites and e-business purposes.

Apart from this, we also took a look at some of the new features:
Type ahead (query suggestion), based on either search statistics or indexed content

Type ahead

Faceted search i.e. the ability to filter on dates, locations, format etc as well as numeric and date range. The later is of course widely used within e-business.

Facets for e-business

Thumbnail views of documents (yes, exactly what it sounds like: a thumbnail view for first page of documents in results page)

Thumbnail of a document

Search analytics in OmniFind 9.1 holds a number of interesting statistic capabilities. Some things worth mentioning is number of queries, query popularity, number of users, average response time (ms) and worst response time (ms).

Save searches (to be able to go back and see if new information has been included), search within result sets (to further narrow your result set within a given result set) and did-you-mean functionality (spell checking) are also included.

..and improvements on the administrator side, just to mention a few:

  • Ability to change the relevancy i.e. to adjust and give certain types of information higher ranking
  • Support for incremental indexing i.e. to only re-index the information that is new or changed since the last time you made it searchable

To conclude: IBM is making a whole lot of improvements in the new version, which are worth taking a closer look at. During the spring we are running upgrading projects for some of our customers, and we will keep you up-to-date with the different application areas OmniFind Enterprise Edition 9.1 is being used for. Please let us know if you have any particular questions or have areas that you are interested in.