Swedish language support (natural language processing) for IBM Content Analytics (ICA)

Findwise has now extended the NLP (natural language processing) in ICA to include both support for Swedish PoS tagging and Swedish sentiment analysis.

IBM Content Analytics with Enterprise Search (ICA) has its strength in natural language processing (NLP) which is achieved in the UIMA pipeline. From a Swedish perspective, one concern with ICA has always been its lack of NLP for Swedish. Previously the Swedish support in ICA consisted only of dictionary-based lemmatization (word: “sprang” -> lemma: “springa”). However, for a number of other languages ICA has also provided part of speech (PoS) tagging and sentiment analysis. One of the benefits of the PoS tagger is its ability to disambiguate words, which belong to multiple classes (e.g. “run” can be both a noun and a verb) as well as assign tags to words, which are not found in the dictionary. Furthermore, the POS tagger is crucial when it comes to improving entity extraction, which is important when a deeper understanding of the indexed text is needed.

Findwise has now extended the NLP in ICA to include both support for Swedish PoS tagging and Swedish sentiment analysis. The two images below shows simple examples of the PoS support.

Example when ICA uses NLP to analyse the string "ICA är en produkt som klarar entitetsextrahering"Example when ICA uses NLP to analyse the string "Watson deltog i jeopardy"

The question is how this extended functionality could be used?

IBM uses ICA and its NLP support together with several of their products. The jeopardy playing computer Watson may be the most famous example, even if it is not a real product. Watson used NLP in its UIMA pipeline when it analyzed its data from sources such as Wikipedia and Imdb.

One product which leverage from ICA and its NLP capabilities is Content and Predictive Analytics for Healthcare. This product helps doctors to determine which action to take for a patient given the patient’s journal and the symptoms. By also leveraging the predictive analytics from SPSS it is possible to suggest the next action for the patient.

ICA can also be connected directly to IBM Cognos or SPSS where ICA is the tool which creates structure to unstructured data. By using the NLP or sentiment analytics in ICA, structured data can be extracted from text documents. This data can then be fed to IBM Cognos, SPSS or non IBM products such as Splunk.

ICA can also be used on its own as a text miner or a search platform, but in many cases ICA delivers its maximum value together with other products. ICA is a product which helps enriching data by creating structure to unstructured data. The processed data can then be used by other products which normally work with structured data.

Enterprise Search and Business Intelligence?

Business Intelligence (BI) and Enterprise Search is a never ending story

A number of years ago Gartner coined “Biggle” – which was an expression for BI meeting Google. Back then a number of BI vendors, among them Cognos and SAS, claimed that they were working with enterprise search strategically (e.g. became Google One-box partners). Search vendors, like FAST, Autonomy and IBM also started to cooperate with companies such as Cognos. “The Adaptive Warehouse” and “BI for the masses” soon became buzzwords that spread in the industry.

The skeptics claimed that enterprise search never would be good at numbers and that BI would never be good with text.

Since then a lot a lot has happened and today the major vendors within Enterprise Search all claim to have BI solutions that can be fully integrated (and the other way around – BI solutions that can integrate with enterprise search).

The aim is the same now as back then:  to provide unified access to both structured (database) and unstructured (content) corporate information. As FAST wrote in a number of ‘Special Focus’:

“Users should have access to a wide variety of data from just one, simple search interface, covering reports, analysis, scorecards, dashboards and other information from the BI side, along with documents, e-mail and other forms of unstructured information.”

And of course, this seems appealing to customers. But does access to all information really make us more likely to take the right decisions in terms of Business Intelligence. Gartner is in doubt.

Nigel Rayner, research vice president at Gartner Inc, says that:

”The problem isn’t that they (users) don’t have access to information or tools; they already have too much information, and that’s just in the structured BI world. Now you want to couple it with unstructured data? That’s a whole load of garbage coming from the outside world”.

But he also states that search can be used as one part of BI:

“Part of the problem with traditional BI is that it’s very focused on structured information. Search can help with getting access to the vast amount of structured information you have”

Looking at the discussions going on in forums, in blogs and in the research domain most people seem to agree with Gartner’s view: enterprise search and business intelligence makes a powerful combination, but the integrations needs to be made with a number of things in mind:

Data quality

As mentioned before, if one wants to make unstructured and structured information available as a complement to BI it needs to be of a good quality. Knowing that the information found is the latest copy and written by someone with knowledge of the area is essential. Bad information quality is a threat to an Enterprise Search solution, to a combined BI- and search solution it can be devastating. Having Content Lifecycles in place (reviewing, deleting, archiving etc) is a fundamental prerequisite.

Data analysis

Business Intelligence in traditionally built on pre-thought ideas of what data the users need, whereas search gives access to all information in an ad-hoc manner. To combine these two requires a structured way of analyzing the data. If the unstructured information is taken out of its context there is a risk that decisions are built on assumptions and not fact.

BI for the masses?

The old buzzwords are still alive, but the question mark remains. If one wants to give everyone access to BI-data it has to be clear what the purpose is. Giving people a context, for example combining the latest sales statistics with searches for information about the ongoing marketing activities serves a purpose and improves findability. Just making numbers available does not.

enterprise search and business intelligence dashboard

Business intelligence and enterprise search in a combined dashboard – vision or reality within a near future?

So, to conclude: Gartner’s vision of “Biggle” is not yet fulfilled. There are a number of interesting opportunities for the business to create findability solutions that combines business intelligence and enterprise search, but the strategies for adopting it needs to be developed in order to create the really interesting cases.

Have you come across any successful enterprise search and business intelligence integrations? What is your vision? Do you think the integration between the two is a likely scenario?

Please let us know by posting your comments.

It’s soon time for us to go on summer vacation.

If you are Swedish, Nicklas Lundblad from Google had an interesting program about search (Sommar i P1) the other day, which is available as a podcast.

Have a nice summer all of you!

Basic Enterprise Search is a Commodity – Let’s Go Further! Conclusions of Trends 2007

Looking back at the search trends that were predicted for 2007 one can conclude that many of the larger research institutes, such as Forrester and Gartner, made a great forecast.

2007 was supposed to raise the question of 2.0 for search technology within the companies (and it seems like wikis, blogs and collaborative tools was all that we heard about for some months). Further on, there was a discussion of integration with business tools, such as BI and search, to create more powerful ways to extract critical data from several sources. The fact that IBM bought Cognos and FAST Radar says something about what we can expect in the future.

During the last months there has been a discussion of more sophisticated ways to develop enterprise search. The leading niche vendors such as Autonomy, FAST and Endeca have for the last few years been evangelists for search that goes beyond spellchecking and synonyms – talking in terms of information retrieval and knowledge management. It seems like a lot of companies have evaluated search capabilities, starting with basic functions, simply to realise that search actually solves a lot of problems they haven’t even considered.

It’s not a coincidence that giants such as IBM and Microsoft have strategies that will bring their enterprise search capabilities to new levels 2008.