Presentation: Enterprise Search – Simple, Complex and Powerful

Every second, more and more information is created and stored in various applications. corporate websites, intranets, SharePoint sites, document management systems, social platforms and many more – inside the firewall the growth of information is similar to that of the internet. However, even though major players on the web have shown that navigation can’t compete with search, the Enterprise Search and Findability Report shows that most organisations have only a small or even a non-existing budget for search.

Web Search and Enterprise Search

Web search engines like Google has made search look easy. For enterprise search, some vendors give promises of a magic box. Buy a search engine, plug it in and wait for the magic to happen! Imagine the disappointment when both search results and performance are poor and users can’t find what they are looking for…

When you start planning your enterprise search project you soon realize the complexity and challenge – how do you meet the expectations created by Google?

The Presentation

This presentation was originally presented at the joint NSW KM Forum and IIM September event in Sydney, Australia by Mattias Brunnert. It contains topics as:

  • Why search is important and how to measure success
  • Why Enterprise Search and Information Management should be friends
  • How to kick off your search program

The Enterprise Search and Findability Report 2012 is ready

No strategy, no budget, no resources. This is the common scenario for enterprise search and findability in many organisations today. Still Enterprise Search is considered a critical success factor in 75% of organisations that responded to the global survey that ran from March to May this year.

The Enterprise Search and Findability Report 2012 is now ready for download.

The Enterprise Search and Findability report 2012 shows that 60% of the respondents expressed that it is very/moderately hard to find the right information. Only 11% stated that it is fairly easy to search for information and as few as 3% consider it very easy to find the desirable information. This shows that there still is a large untapped potential for any organisation to get great value from investing in enterprise search. For a relatively small investment, preferably in personnel it is possible to make search a lot better. The survey also reveals that  organisations who are very satisfied with their search, have a (larger) budget, more resources and systematically work with analysing search.

What is your primary goal for utilising search technology in your organisation?Figure. What is your primary goal for utilising search technology in your organisation?

The primary goal for using search is to accelerate retrieval of known information sources, 91%, and to improve the re-use of content (information/knowledge), 72%. This indicates that often search within organisations is used as a discovery tool for what already is known. If looking over the next three years, as many as 77% think that the amount of information in the organisation will increase. This means that every year it will be even more important be able to find the right information and that means Enterprise search is still very much needed, as stated in the following great presentations (on video):  Why Business Success Depends on Enterprise Search (by Martin White of Intranet Focus) and The Enterprise Search Market – What should be on your radar? (by Alan Pelz-Sharpe of 451 Research)

Download the full report.

Semantic Search Engine – What is the Meaning?

The shortest dictionary definition of semantics is: the study of meaning. The more complex explanation of this term would lead to a relationship that maps words, terms and written expressions into common sense and understanding of objects and phenomena in the real world. It is worthy to mention that objects, phenomena and relationships between them are language independent. It means that the same semantic network of concepts can map to multiple languages which is useful in automatic translations or cross-lingual searches.

The approach

In the proposed approach semantics will be modeled as a defined ontology making it possible for the web to “understand” and satisfy the requests and intents of people and machines to use the web content. The ontology is a model that encapsulates knowledge from specific domain and consists of hierarchical structure of classes (taxonomy) that represents concepts of things, phenomena, activities etc. Each concept has a set of attributes that represent the mapping of that particular concept to words and phrases that represents that concepts in written language (as shown at the top of the figure below). Moreover, the proposed ontology model will have horizontal relationships between concepts, e.g. the linguistic relationships (synonymy, homonymy etc.) or domain specific relationships (medicine, law, military, biological, chemical etc.). Such a defined ontology model will be called a Semantic Map and will be used in the proposed search engine. An exemplar part of an enriched ontology of beverages is shown in the figure below. The ontology is enriched, so that the concepts can be easily identified in text using attributes such as the representation of the concept in the written text.

Semantic Map

The Semantic Map is an ontology that is used for bidirectional mapping of textual representation of concepts into a space of their meaning and associations. In this manner, it becomes possible to transform user queries into concepts, ideas and intent that can be matched with indexed set of similar concepts (and their relationships) derived from documents that are returned in a form of result set. Moreover, users will be able to precise and describe their intents using visualized facets of concept taxonomy, concept attributes and horizontal (domain) relationships. The search module will also be able to discover users’ intents based on the history of queries and other relevant factors, e.g. ontological axioms and restrictions. A potentially interesting approach will retrieve additional information regarding the specific user profile from publicly available information available in social portals like Facebook, blog sites etc., as well as in user’s own bookmarks and similar private resources, enabling deeper intent discovery.

Semantic Search Map

Semantic Search Engine

The search engine will be composed of the following components:

  • Connector – This module will be responsible for acquisition of data from external repositories and pass it to the search engine. The purpose of the connector is also to extract text and relevant metadata from files and external systems and pass it to further processing components.
  • Parser – This module will be responsible for text processing including activities like: tokenization (breaking text into lexems – words or phrases), lemmatization (normalization of grammar forms), exclusion of stop-words, paragraph and sentence boundary detector. The result of parsing stage is structured text with additional annotations that is passed to semantic Tagger.
  • Tagger – This module is responsible for adding semantic information for each lexem extracted from the processed text. Technically it refers to addition of identifiers to relevant concepts stored in the Semantic Map for each lexem. Moreover phrases consisting of several words are identified and disambiguation is performed basing on derived contexts. Consider the example illustrated in the figure.
  • Indexer – This module is responsible for taking all the processed information, transformation and storage into the search index. This module will be enriched with methods of semantic indexing using ontology (semantic map) and language tools.
  • Search index – The central storage of processed documents (document repository) structured properly to manage full text of the documents, their metadata and all relevant semantic information (document index). The structure is optimized for search performance and accuracy.
  • Search – This module is responsible for running queries against the search index and retrieval of relevant results. The search algorithms will be enriched to use user intents (complying data privacy) and the prepared Semantic Map to match semantic information stored in the search index.

What do you think? Please let us know by writing a comment.

Enterprise Search and Findability Survey

A few days ago we launched the “Enterprise Search and Findability Survey“. The survey closes at the end of March.

If you complete the survey you will get the report when it  is finished.

[DKB url=”http://svy.mk/xJz2DM” text=”Take me to the Survey!” title=”The Survey” type=”large” style=”” color=”orange” height=”” width=”” opennewwindow=”no” nofollow=”” textcolor=”#ffffff”]

The survey is for people who are responsible for search in their organisations. If you are a search manager, intranet manager, product owner of search, search editor, in-house developer for search, this survey is for you!

The survey aims to help you by finding out your views about Enterprise Search and Findability. The research will help show what business value an Enterprise Search solution can provide.

The survey is structured into five sections, each of which provides a specific perspective on Findability:
• Business
• Organisation
• User
• Information
• Search Technology

More information about the perspectives is provided in each section.

The survey will take approximately 20-30 minutes of your time. If you need a break, you can continue answering the survey at the same question where you left. If you give us your contact information we will send you the finished report based on this survey when it is finished, we are aiming to have it finished by the month of June.

The survey results will be presented at Enterprise Search Europe 2012 (London, 30-31 May 2012) and Enterprise Search Summit (New York, 15-16 May 2012).

Findability, a holistic approach to implementing search technology

We are proud to present the first video on our new Vimeo channel. Enjoy!

Findability Dimensions

Successful search project does not only involve technology and having the most skilled developers, it is not enough. To utilise the full potential and receive return on search technology investments there are five main dimensions (or perspectives) that all need to be in focus when developing search solutions, and that require additional competencies to be involved.

This holistic approach to implementing search technology we call Findability by Findwise.

Search Driven Navigation and Content

In the beginning of October I attended Microsoft SharePoint Conference 2011 in Anaheim, USA. There were a lot of interesting and useful topics that were discussed. One really interesting session was Content Targeting with the FAST Search Web Part by Martin Harwar.

Martin Harwar talked about how search can be used to show content on a web page. The most common search-driven content is of course the traditional search. But there are a lot more content that can be retrieved by search. One of them is to have search-driven navigation and content. The search-driven navigation means that instead of having static links on a page we can render them depending on the query the user typed in. If a user is for example on a health care site and had recently done a search on “ear infection” the page can show links to ear specialist departments. When the user will do another search and returns to the same page the links will be different.

In the same way we can render content on the page. Imagine a webpage of a tools business that on its start page has two lists of products, most popular and newest tools. To make these lists more adapted for a user we only want show products that are of interest for the user. Instead of only showing the most popular and newest tools the lists can also be filtered on the last query a user has typed. Assume a user searches on “saw” and then returns to the page with the product lists. The lists will now show the most popular saws and the newest saws. This can also be used when a user finds the companies webpage by searching for “saw” on for instance Google.

This shows that search can be used in many ways to personalize a webpage and thereby increase Findability.

Inspiration from the Enterprise Search Europe 2011 conference

A couple of weeks ago, me and some of my colleagues attended the Enterprise Search Europe conference in London. We’re very grateful to the organizer Martin White at IntranetFocus for arranging the event, and having us as one of the gold sponsors.

For me it was the first time in years I attended a conference like this, and while it was “same old, same old” for many of the attendees, for me it was enlightening to meet up with the industry and have a discussion on where we are as an industry.

There were mainly software vendors and professional services/consultants there, as well a few customers or actual users of enterprise search… and I think the consensus of the two days were that we in the industry STILL haven’t really figured out what we should do with the enterprise search concept, and how to make it valuable for our customers. We at Findwise are not alone with this challenge, but rather it is an industry challenge. There are some vendors who seem to be doing some good work of delivering real value to customers, and also there are a few colleagues to us in the industry that do good professional services/consultant work. At first it was a bit of a downer to realize that we haven’t progressed more during the 10 years I’ve been in the business, but at the same time it was very inspirational to see that we at Findwise together with a few other players, seem to be on the right track with our hard work, and that we have the position to solve some of the real industry challenges we’re facing.

As I see it, if we gather our forces and make a focused “push forward” together now, we will be able to take the industry to a new maturity level where we better solve real business challenges with enterprise search (or search-driven Findability solutions, as we like to call them).

My simple analysis of all the discussions at the conference is that we need to do two things:

  1. Manage the whole “full picture” of enterprise search – from strategy to organizational governance, involving necessary competencies to cover all aspects of a successful Findability solution.
  2. Break down the customer challenge into manageable chunks, and solve actual business problems, not just solving the traditional “finding stuff when needed” challenge.

I think we are on the right track, and it’s going to be a very interesting journey from here on!

Microsoft SharePoint Conference 2011: Contributor vs. Consumer

A couple of weeks ago I had the opportunity to attend the Microsoft SharePoint Conference 2011, Anaheim USA. This turned out to be an intense four-day conference covering just about any SharePoint 2010 topic you can imagine – from the geekiest developer session to business tracks with lessons learned.

To me, one of the most memorable sessions where Social Search with Dan Benson and Paul Summers, in which they showed us how social behaviours can be used to influence the current rank of search. For instance, users interests entered in MySite can be used to boost (xrank) search results accordingly. This was an eye opener as it illustrated what’s possible with quite easy means. Thanks for that!

Another great session was Scott Jamison talking about Findability in SharePoint. The key ingredient in this session was to differentiate between contributor and consumer. Typically we focus on the contributor, building 100 level folder structures with names that make sense to contributor. However, we seem to forget about the consumers, who of course are the other key aspect of an intranet. It is equally important to create a good support system for contributors, as it is to focus on consumer needs. As Jamison said “why have folders for both contributors and consumers? ”. SharePoint includes endless possibilities when it comes to creating logical views built on search, tags and filtering aimed to fill the needs of the consumers.

So, keep the folders or what ever support the contributor needs, but let your imagination float free for delivering best class Findability to the consumer!

Google Search Appliance (GSA) 6.12 released

Google has released yet another version of the Google Search Appliance (GSA). It is good to see that Google stay active when it comes to improving their enterprise search product! Below is a list of the new features:

Dynamic navigation for secure search

The facet feature, new since 6.8, is still being improved. When filters are created, it is now possible to take in account that they only include secure documents, which the user is authorized to see.

Nested metadata queries

In previous Search Appliance releases there were restrictions for nesting meta tags in search queries. In this release many of those restrictions are lifted.

LDAP authentication with Universal Login

You can configure a Universal Login credential group for LDAP authentication.

Index removal and backoff intervals

When the Search Appliance encounters a temporary error while trying to fetch a document during crawl, it retains the document in the crawl queue and index. It schedules a series of retries after certain time intervals, known as “backoff” intervals. This before removing the URL from the index.

An example when this is useful is when using the processing pipeline that we have implemented for the GSA. GSA uses an external component to index the content, if that component goes down, the GSA will receive a “404 – page does not exist” when trying to crawl and this may cause mass removal from the index. With this functionality turned on, that can be avoided.

Specify URLs to crawl immediately in feeds

Release 6.12 provides the ability to specify URLs to crawl immediately in a feed by using the crawl-immediately attribute. This is a nice feature in order to prioritise what needs to get indexed quickly.

X-robots-tag support

The Appliance now supports the ability to exclude non-html documents by using the x-robots-tag. This feature opens the possibility to exclude non-html documents by using the x-robots-tag.

Google Search Appliance documentation page