Open or Opaque Artificial Intelligence

Data is the black gold in the information era and has similar value creation and ecology to that of petroleum. Data in its raw format needs to be refined (as does crude oil) to make sense and to add meaning and usefulness to any domain.

AI and its parts (machine learning, natural language processing, deep-learning etc.) are set to be a societal game changer in all collective human imagination domains.

opaque

The ambition should be to design for a sustainable AI future, aiming to incorporate the  UNs 17 development goals with ethics at the core. One omnipresent hurdle still is the black box or opaque setting i.e. being able to understand how, why and where different AI operates and influences

The open paradigm

Since all known to man utilities with AI, have a simple model, being:

inputmodeloutput and feedback (learning).

There is a need to shift the control from the computer back towards the human, and thereby enable the addition of meaning and semantics along with conceptual models.

By using open innovation, -standards, -models (knowledge graphs, ontologies, terminologies, code systems and the like), -software, -platforms (technology stacks, i.e. Singularity net) in the design for future AI utilities and cognitive computing, there exists opportunities for  leverage learning in a meaningful way – away from the opaque regime and towards cognitive-informed artificial intelligence. Efficient communication through interoperability that can accommodate data from different semantic domains that traditionally have been separate. Open domain knowledge and data-sets (as linked-data) will provide very good platforms for continuously improved datasets within the AI loop, both in terms of refining and addressing the contextual matter, but also enabling improved precision and outcome.

Informative communication – the word’s meaning should allow accurate mental reconstruction of the senders intended meaning, but we are well aware of the human messiness (complexity) within a language as described in Information bottleneck (Tishby), rate distortion theory (Shannon).

To take on the challenges and opportunities within AI, there are strong undercurrents to build interdisciplinary capacities as with Chalmers AI Research and AI innovation of Sweden and the like. Where computer science, cognitive science, data science, information science, social sciences and more disciplines meet and swap ideas to improve value creation within different domains, while at the same time beginning to blend industry, public sector, academia and society together.

The societal challenges that lay ahead, open up for innovation, where AI-assisted utilities will augment and automate for the benefit of mankind and the earth, but to do so require a balancing act where the open paradigm is favoured. AI is designed and is an artefact, hence we need to address ethics in its design with ART (Accountability, Responsibility and Transparency) The EU draft on AI ethics.

Tinkering with AI

The emerging development of AI shows a different pathway than that of traditional software engineering. All emerging machine learning, NLP and/or Deep-Learning machinery relies on a tinkering approach with trial and error -re-model, refine data-set, test-bed with different outcomes and behaviours -before it can reach a maturity level for the industrial stages in digital infrastructure, as with Google Cloud, or similar services. A great example is image recognition and computer vision with its data optimization algorithms. and processing steps. Here each development has emerged from previous learnings and tinkering. Sometimes the development and use of mathematical models simply do not provide up for real AI matter and utilities.

Here in the value creation, or the why in the first place, we should design and use ML, NLP and Deep-Learning in the process with an expected outcome.  AI is not, and never will be the silver bullet for all problem domains in computing! Start making sense, in essence, is needed, with contextual use-cases and utilities, long before we reach Artificial General Intelligence

The 25th of April an event will cover Sustainable Knowledge Graphs and AI together with linked-data Sweden network.

Semantic Search Engine – What is the Meaning?

The shortest dictionary definition of semantics is: the study of meaning. The more complex explanation of this term would lead to a relationship that maps words, terms and written expressions into common sense and understanding of objects and phenomena in the real world. It is worthy to mention that objects, phenomena and relationships between them are language independent. It means that the same semantic network of concepts can map to multiple languages which is useful in automatic translations or cross-lingual searches.

The approach

In the proposed approach semantics will be modeled as a defined ontology making it possible for the web to “understand” and satisfy the requests and intents of people and machines to use the web content. The ontology is a model that encapsulates knowledge from specific domain and consists of hierarchical structure of classes (taxonomy) that represents concepts of things, phenomena, activities etc. Each concept has a set of attributes that represent the mapping of that particular concept to words and phrases that represents that concepts in written language (as shown at the top of the figure below). Moreover, the proposed ontology model will have horizontal relationships between concepts, e.g. the linguistic relationships (synonymy, homonymy etc.) or domain specific relationships (medicine, law, military, biological, chemical etc.). Such a defined ontology model will be called a Semantic Map and will be used in the proposed search engine. An exemplar part of an enriched ontology of beverages is shown in the figure below. The ontology is enriched, so that the concepts can be easily identified in text using attributes such as the representation of the concept in the written text.

Semantic Map

The Semantic Map is an ontology that is used for bidirectional mapping of textual representation of concepts into a space of their meaning and associations. In this manner, it becomes possible to transform user queries into concepts, ideas and intent that can be matched with indexed set of similar concepts (and their relationships) derived from documents that are returned in a form of result set. Moreover, users will be able to precise and describe their intents using visualized facets of concept taxonomy, concept attributes and horizontal (domain) relationships. The search module will also be able to discover users’ intents based on the history of queries and other relevant factors, e.g. ontological axioms and restrictions. A potentially interesting approach will retrieve additional information regarding the specific user profile from publicly available information available in social portals like Facebook, blog sites etc., as well as in user’s own bookmarks and similar private resources, enabling deeper intent discovery.

Semantic Search Map

Semantic Search Engine

The search engine will be composed of the following components:

  • Connector – This module will be responsible for acquisition of data from external repositories and pass it to the search engine. The purpose of the connector is also to extract text and relevant metadata from files and external systems and pass it to further processing components.
  • Parser – This module will be responsible for text processing including activities like: tokenization (breaking text into lexems – words or phrases), lemmatization (normalization of grammar forms), exclusion of stop-words, paragraph and sentence boundary detector. The result of parsing stage is structured text with additional annotations that is passed to semantic Tagger.
  • Tagger – This module is responsible for adding semantic information for each lexem extracted from the processed text. Technically it refers to addition of identifiers to relevant concepts stored in the Semantic Map for each lexem. Moreover phrases consisting of several words are identified and disambiguation is performed basing on derived contexts. Consider the example illustrated in the figure.
  • Indexer – This module is responsible for taking all the processed information, transformation and storage into the search index. This module will be enriched with methods of semantic indexing using ontology (semantic map) and language tools.
  • Search index – The central storage of processed documents (document repository) structured properly to manage full text of the documents, their metadata and all relevant semantic information (document index). The structure is optimized for search performance and accuracy.
  • Search – This module is responsible for running queries against the search index and retrieval of relevant results. The search algorithms will be enriched to use user intents (complying data privacy) and the prepared Semantic Map to match semantic information stored in the search index.

What do you think? Please let us know by writing a comment.