Graph Search from Down Under

We’ve already written about the new concept called Graph Search, which is being popularized by Facebook. Wouldn’t it be cool if we applied this to the enterprise as well, as I wrote in an earlier blog post on Enterprise Graph Search? That’s what Australian startup company Lumanetix thinks, when they created the SPAR-K graph search engine for the enterprise.

Applied graph search

As seen in the screenshots of the product, the product do queries against relational databases with linked data objects such as Movies linked to People in Casts, or Managers of Departments in an organization. One difference to Facebook graph search is the more Google-like query syntax which is keyword-based where Facebook uses natural language processing to describe specific queries.Graph search applied to the enterprise

It’s exciting to see that the market is picking up speed with new innovations in the enterprise search field, as Lumanetix SPAR-K is an example of.

 

/Christian Ubbesen

Enterprise Graph Search

Facebook will soon launch their new Graph Search to the general public, and it has received a lot of interest lately.

With graph search, the users will be able to query the social graph that millions of people have constructed over the years when friending each other and putting in more and more personal information about themselves and their friends in the vast Facebook database. It will be possible to query for friends of friends who have similar interests as you, and invite them to a party, or to query for companies where people with similar beliefs as you work, and so on and so forth. The information that is already available, will all the sudden become much more accessible through the power of graph search.

How can we bring this to an enterprise search environment? Well, there are lots of graphs in the enterprise as well to query, both social and other types. For example, how about being able to query for people that have been members of a project in the last three years that involved putting a new product successfully to the market. This would be an interesting list of people to know about, if you’re a marketing director that want to assemble a team in the company, to create a new product and make sure it succeeds in the market.

If we dissect graph search, we will find three important concepts:

  1. The information we want to query against don’t only need to be indexed into one central search engine, but also the relations and attributes of all information objects need to be normalized to create the relational graph and have standard attributes to query against. We could use the Open Graph Protocol as the foundation.
  2. We need a parser that take human language and converts it to a formal query language that a search engine understands. We might want to query in different human languages as well.
  3. The presentation of results should be adapted to the kind of information sought for. In Facebook’s example, if you query for people you will get a list of people with their pictures and some relevant personal information in the result list, and if you query for pictures you will get a collage of pictures (similar to the Google image search).

So the recipe to success is to give the information management part of the project a big focus, making sure to create a unified information model of the content to be indexed. Then create a query parser for natural language based on actual user behavior, and the same user studies would also give us information on how to visualize the different result set types.

I believe we will see more of these kind of solutions in the coming years in the enterprise search market, and look forward exploring the possibilities together with our clients.

Semantic Search Engine – What is the Meaning?

The shortest dictionary definition of semantics is: the study of meaning. The more complex explanation of this term would lead to a relationship that maps words, terms and written expressions into common sense and understanding of objects and phenomena in the real world. It is worthy to mention that objects, phenomena and relationships between them are language independent. It means that the same semantic network of concepts can map to multiple languages which is useful in automatic translations or cross-lingual searches.

The approach

In the proposed approach semantics will be modeled as a defined ontology making it possible for the web to “understand” and satisfy the requests and intents of people and machines to use the web content. The ontology is a model that encapsulates knowledge from specific domain and consists of hierarchical structure of classes (taxonomy) that represents concepts of things, phenomena, activities etc. Each concept has a set of attributes that represent the mapping of that particular concept to words and phrases that represents that concepts in written language (as shown at the top of the figure below). Moreover, the proposed ontology model will have horizontal relationships between concepts, e.g. the linguistic relationships (synonymy, homonymy etc.) or domain specific relationships (medicine, law, military, biological, chemical etc.). Such a defined ontology model will be called a Semantic Map and will be used in the proposed search engine. An exemplar part of an enriched ontology of beverages is shown in the figure below. The ontology is enriched, so that the concepts can be easily identified in text using attributes such as the representation of the concept in the written text.

Semantic Map

The Semantic Map is an ontology that is used for bidirectional mapping of textual representation of concepts into a space of their meaning and associations. In this manner, it becomes possible to transform user queries into concepts, ideas and intent that can be matched with indexed set of similar concepts (and their relationships) derived from documents that are returned in a form of result set. Moreover, users will be able to precise and describe their intents using visualized facets of concept taxonomy, concept attributes and horizontal (domain) relationships. The search module will also be able to discover users’ intents based on the history of queries and other relevant factors, e.g. ontological axioms and restrictions. A potentially interesting approach will retrieve additional information regarding the specific user profile from publicly available information available in social portals like Facebook, blog sites etc., as well as in user’s own bookmarks and similar private resources, enabling deeper intent discovery.

Semantic Search Map

Semantic Search Engine

The search engine will be composed of the following components:

  • Connector – This module will be responsible for acquisition of data from external repositories and pass it to the search engine. The purpose of the connector is also to extract text and relevant metadata from files and external systems and pass it to further processing components.
  • Parser – This module will be responsible for text processing including activities like: tokenization (breaking text into lexems – words or phrases), lemmatization (normalization of grammar forms), exclusion of stop-words, paragraph and sentence boundary detector. The result of parsing stage is structured text with additional annotations that is passed to semantic Tagger.
  • Tagger – This module is responsible for adding semantic information for each lexem extracted from the processed text. Technically it refers to addition of identifiers to relevant concepts stored in the Semantic Map for each lexem. Moreover phrases consisting of several words are identified and disambiguation is performed basing on derived contexts. Consider the example illustrated in the figure.
  • Indexer – This module is responsible for taking all the processed information, transformation and storage into the search index. This module will be enriched with methods of semantic indexing using ontology (semantic map) and language tools.
  • Search index – The central storage of processed documents (document repository) structured properly to manage full text of the documents, their metadata and all relevant semantic information (document index). The structure is optimized for search performance and accuracy.
  • Search – This module is responsible for running queries against the search index and retrieval of relevant results. The search algorithms will be enriched to use user intents (complying data privacy) and the prepared Semantic Map to match semantic information stored in the search index.

What do you think? Please let us know by writing a comment.

How to Create Knowledge Sharing Intranets and the Role of Search

“If only HP knew what HP knows, we would be three times more productive”

The quote is a statement from the former chief executive of Hewlett-Packard, Lew Platt and summarizes this week’s discussion on knowledge sharing intranets at the conference “Sociala intranät” (Social Intranets) in Stockholm.

For two days intranet managers, editors, web strategists and communication managers gathered in Stockholm to talk about the benefits (and pitfalls) of knowledge sharing intranets where the end-users share and contribute with their own and their colleagues information. And what role search plays in a Social Intranet.

A number of larger companies and organization, such as TeliaSonera, Thomas Cook, Manpower and Perstorp, have started their second generation of intranets: where blogs, collaborative areas, wikis, personalization, micro blogging (see the twitter flow from the conference)  and Facebook-inspired solutions finally seem to work in a larger scale.

The pioneers, such as Fredrik Heidenholm from Skånemejerier, has been doing it without a large budget – proving that social intranets are more about users than expensive technical solutions.

Read interviews of Fredrik Heidenholm, Gunilla Rehnberg (Röda Korset) Hans Gustafsson (Boverket)  and Lisa Thorngren (Thomas Cook Northern Europe – Ving).

And in general, the speakers as well as the attendees seem to be agreeing with one another: having the whole organization contributing with their knowledge is a prerequisite for keeping knowledge sharing intranets alive.

But letting everyone create information requires a good enterprise search solution, something some of Findwise customers, such as Ericsson and Landstinget i Jönköping, talked about: “Search promotes the value of our social intranet” said Karin Hamberg, Enterprise Architect, at Ericsson. Search makes it possible to gather information from all kind of sources and make it accessible from one entrance. However, this also requires strategies for handling security restrictions (who should have access to what?), metadata models, user experience (expectations and behavior) and ranking (who determinates which results that should appear on the very top?).

Sven-Åke Svensson, from Landstinget i Jönköping, had the same experiences and emphasized the need for a good prestudy (workshop method) and tools for the editors such as a metadata service to help the contributors write good metadata tags. Sven-Åke also made a demo of the new intranet (if you are Swedish, the blog post “Landsting på väg mot det social intranätet” gives a great overview of the solution)

The two days covered most angles of Lew Platt’s vision – and apart from a number of good speakers the informal talk at coffee breaks and lunch gave a good insight in the fact that Swedish companies are working hard to provide knowledge sharing intranets that serves consumers as well as contributors.

Did you visit the conference? Was there anything in particular you found interesting? Please feel free to comment and share your thoughts.

P.S. If you want to read more about social intranets, take a look at Oscar Berg’s blogpost “The business case for social intranets”. An inspiring summary of the topic.

Real Time Search in the Enterprise

Real time search is a big fuzz in the global network called Internet. Major search engines like Google and Bing are now providing users with real time search results from Facebook, Twitter, Blogs and other social media sites. Real time search means that as soon as content are created or updated, it is immediately searchable. This might be obvious and seems like a basic requirement, but working with search you know that this is not the case most of the time. Looking inside the firewall, in the enterprise, I dare to say that real time search is far from common. Sometimes content is not changed very frequently so it is not necessary to make it instantly searchable. Though, in many cases it’s the technical architecture that limits a real time search implementation.

The most common way of indexing content is by using a web crawler or a connector. Either way, you schedule them to go out and fetch new/updated/deleted content at specific interval during the day. This is the basic architecture for search platforms these days. The advantage of this approach is that the content systems does not need to adapt to the search platform, they just deliver content through their ordinary API:s during indexing. The drawback is that new or updated content is not available until next scheduled indexing. Depending on the system this might take several hours. Due to several reasons, mostly performance, you do not want to schedule connectors or web crawlers to fetch content too often. Instead, to provide real time search you have to do the other way around; let the content system push content to the search platform.

Most systems have some sort of event system that triggers an event when content is created/updated/deleted. Listening for these events, the system can send the content to the search platform at the same time it’s stored in the content system. The search platform can immediately index the pushed content and make it searchable. This requires adaptation of the content system towards the search platform. In this case though, I think the advantages outweighs the disadvantages. Modern content systems of today are (or should be) providing a plug-in architecture so you should fairly easy be able to plug in this kind of code. These plug-ins could also be provided by the search platform vendors just as ordinary connectors are provided today.

Do you agree, or have I been living in a cave for the past years? I’d love to hear you comments on this subject!

Internet life in the Future

I always think it’s nice when I hear people talking about the same things that are on my mind these days. It makes me reflect upon things in new ways and also makes me realize that I’m on to something. I attended a presentation by Björn Jeffery from Good Old (hosted by Region Västra Götaland). His talk on internet strategy was interesting and had many things in common with the keynote by Elizabeth Churchill (Yahoo) that I recently heard at the HCI2007 conference. Two things interested me most; the future of mobility and the inevitable question of integrity. So here are my thoughts today, on internet strategy and the future of internet usage.

Integrity

Today young people have become used to using different web 2.0 technologies such as Flickr, Facebook, Delicious etc. So we have seen the emergence of things such as social search and folksonomies. People gladly contribute with information about themselves and what they think and like. I believe this is a good thing, but there are also some risks with this. These risks are that once something is on the internet and is indexed, it’s out there and it stays there. Many people are not aware of that fact. How do you keep your integrity when everything about you can be found online? Integrity is very important when implementing these solutions in an enterprise setting.

How can people contribute without having to share their stuff with everyone else if they don’t want to? Björn Jeffery mentioned that we’ve gone from sharing nothing with noone to sharing everything with everyone and that he thought this would change back to us sharing a lot of things with many people. I hope he’s right. Teenagers might note care who they share their stuff with, but security and integrity are vital issues when considering enterprise solutions.

Mobility

In these days mobility has become an important thing. We not only expect to be able to find the information we need but to find it whenever we want from where ever we want to. I am actually writing this blog post on a train, and off course I expect to have access to all Findwise and other resources from here as well. As technology changes our behavior and expectations change with it, and so does society. (I covered excitement generators in a previous post about Jared Spools keynote on HCI2007.)

“I don’t use computers, love. This is just the internet”.

quote from Elizabeth Churchills keynote

Today there is no longer an association between internet and the computer screen. Mobile phones have become an increasingly popular way of accessing the internet. So, you can use search to access all your company’s information from a single point of access when ever you need it. Then maybe next step is mobile search on your intranet? That would not only make information become available at all time but from where ever you might be, and exactly when you want it.

So in conclusion of these talks; I think that in the future we will want to be able to access everything from everywhere at any time. We used to talk about time we spent online. That distinction isn’t really there any more. Today our tasks are interweawed, we don’t separate time we spend online and offline. (Something that becomes painfully obvious when trying to work on the train when you’ve forgotten the usbconnection for the mobile internet.) And in that time we spend online we also need to define what things we want to share with whom. If we as designers can solve these things, I think we’re on to something promising.

Find People with Spock

Today, Google is the main source for finding information on the web, regardless of the kind of information you’re looking for. Let it be company information, diseases, or to find people – Google is used for finding everything. While Google is doing a great job in finding relevant information, it can be good to explore alternatives that are concentrated upon a more specific target.

In the previous post, Karl blogged about alternatives to Google that provides a different user interface. Earlier, Caroline has enlightened us about search engines that leads to new ways on how to use search. Today I am going to continue on these tracks and tell you a bit about a new challenger, Spock, and my first impressions of using it.

Spock, relased last week in beta version, is a search engine for finding people. Interest in finding people, both celebreties and ordinary people has risen the past years; just look at the popularity of social networking sites such as LinkedIn and Facebook. By using a search engine dedicated to finding people, you get more relevancy in the hits and more information in each hit. Spock crawls the above mentioned sites, as well as a bunch of others to gather the information about people you want to find.

When you begin to use Spock, you instantly see the difference in search results compared to Google. Searching for “Java developer Seattle” in Spock returns a huge list of Java developers positioned in Seattle. With Google, you get a bunch of hiring applications. Searching for a famous person like Steve Jobs with Google, you find yourself with thousands of pages about the CEO of Apple. Using Spock, you will learn that there are a lot of other people around the world also named Steve Jobs. With each hit, you find more information such as pictures, related people, links to pages that the person is mentioned on, etc.

In true Web 2.0 fashion, Spock uses tags to place people into categories. By exploring these tags, you will find even more people that might be of interest. Users can even register on Spock to add and edit tags and information about people.

Over all, Spock seems like a great search engine to me. The fact that users can contribute to the content, a fact that has made Wikipedia to what it is today, combined with good relevancy and a clean interface it has a promising future. It also shows how it is possible to compete with Google and the other giants at the search market by focusing on a specific target and deliver an excellent search experience in that particular area.