Enterprise Search and Findability discussions at World Cafe in Oslo

Yesterday we (Kristian Hjelseth and Kristian Norling) participated in a great World Cafe event arranged by Steria in Norway. We did a Pecha Kucha inspired presentation (scroll down to the bottom of this blog post for the presentation) to introduce the subject of Enterprise Search and Findability and how to work more efficiently with the help of enterprise search. Afterwards there was a set of three round-table workshop with practitioners, where search related issues were discussed. We found the discussions very interesting, so we thought we should share some of the topics with a broader audience.

The attendees had answered a survey before coming to the World Cafe. In which 83,3% stated that finding the right information was critical for their business goals. But only 20,3% were satisfied with their current search solution, because 75% said it was hard or very hard to find the right information. More stats from a global survey on enterprise search that asked the same questions.

Unified Search

To have all the information that you would like to find in the same search was deemed very important for findability by the participants. The experience of search is that the users don’t know what to search for, but to make it even worse, they do not know where to look for the information! This is also confirmed by the Enterprise Search and Findability Survey that was done earlier this year. The report is available for download.


Google web search always comes up as an example of what “just works”. And it does work because they found a clever algorithm, PageRank, that basically measures the trustworthiness of information. Since PageRank is heavily dependent on inbound links this way of measuring trust is probably not going to work on an intranet where cross-referencing is not as common based on our experience. Most of the time it is not even possible to link stuff on the intranet, since the information is not accessible through http. Read more about it in this great in-depth article series on the difference between web search and enterprise search by Mark Bennet.

So how can we make search inside the firewall as good as web search? I think by connecting the information to the author. Trust builds between people based on their views of others. Simply put, someone has the authority over her peers either through rank (=organisation chart) or through trust. The trustworthiness can be based on the persons ability to connect to other people (we all probably know someone who knows “everyone”) or we trust someone based on the persons knowledge. More reading on the importance of trust in organisations. How to do this in practice? Some ideas in this post by BIll Ives. Also a good read: “How social is Enterprise Search?” by Jed Cawthorne. And finally another good post to read.


By adding relevant metadata to information, we can make it more findable. There was discussions on the importance of strict and controlled metadata and how to handle user tagging. For an idea on how to think about metadata, read a blog post on how VGR used metadata by Kristian Norling.

Search Analytics

Before you start to do any major work with your current enterprise search solution, look at the search log files and analyze the data. You might be surprised in what you find. Search analytics is great if you want insight into what the user expects to find when they search. Watch this video for an introduction to Search Analytics in Practice.

Other subjects

  • Access control and transparency
  • Who owns search?
  • Who owns the information?
  • Personalization of search results
All these subjects and many more were discussed at the workshops, but that will have to wait for another blog post!
As always, your thoughts and comments are most welcome!

Search Driven Navigation and Content

In the beginning of October I attended Microsoft SharePoint Conference 2011 in Anaheim, USA. There were a lot of interesting and useful topics that were discussed. One really interesting session was Content Targeting with the FAST Search Web Part by Martin Harwar.

Martin Harwar talked about how search can be used to show content on a web page. The most common search-driven content is of course the traditional search. But there are a lot more content that can be retrieved by search. One of them is to have search-driven navigation and content. The search-driven navigation means that instead of having static links on a page we can render them depending on the query the user typed in. If a user is for example on a health care site and had recently done a search on “ear infection” the page can show links to ear specialist departments. When the user will do another search and returns to the same page the links will be different.

In the same way we can render content on the page. Imagine a webpage of a tools business that on its start page has two lists of products, most popular and newest tools. To make these lists more adapted for a user we only want show products that are of interest for the user. Instead of only showing the most popular and newest tools the lists can also be filtered on the last query a user has typed. Assume a user searches on “saw” and then returns to the page with the product lists. The lists will now show the most popular saws and the newest saws. This can also be used when a user finds the companies webpage by searching for “saw” on for instance Google.

This shows that search can be used in many ways to personalize a webpage and thereby increase Findability.

Distributed processing + search == true?

In June 2011, I attended the Berlin Buzzwords conference. The main theme of the conference was undoubtedly the current paradigm shift in distributed processing, driven by the major success of Hadoop. Doug Cutting – founder of Apache projects such as Lucene, Nutch and Hadoop – held one of the keynotes. He focused on what he recognized as the new foundations for this paradigm shift:

– Commodity hardware
– Sequential file access
– Sharding
– Automated, high level reliability
– Open source

Distributed processing is done fairly well with Hadoop. Distributed search on the other hand is more or less limited to sharding and/or replicating the index. The downside of sharding is that you perform the same search on multiple servers and then need to combine the results. Due to the nature of algorithms in search such as tf/idf, tasks like ranking results suffers. Andrzej Białecki (another frequent Lucene committer) held a presentation on this topic, and his view can be summarized as: Use local search as long as you can, distribute only when the cost of local search limitations outweighs the cost of distributed search.

The setup of automated replication and sharding, with help from Zookeeper in the Solr Cloud project, is a major step in the right direction but the question on how to properly combine search results from different nodes still remains. One thing is sure though, there is a lot of interesting work being done in this area.

Google Search Appliance (GSA) 6.12 released

Google has released yet another version of the Google Search Appliance (GSA). It is good to see that Google stay active when it comes to improving their enterprise search product! Below is a list of the new features:

Dynamic navigation for secure search

The facet feature, new since 6.8, is still being improved. When filters are created, it is now possible to take in account that they only include secure documents, which the user is authorized to see.

Nested metadata queries

In previous Search Appliance releases there were restrictions for nesting meta tags in search queries. In this release many of those restrictions are lifted.

LDAP authentication with Universal Login

You can configure a Universal Login credential group for LDAP authentication.

Index removal and backoff intervals

When the Search Appliance encounters a temporary error while trying to fetch a document during crawl, it retains the document in the crawl queue and index. It schedules a series of retries after certain time intervals, known as “backoff” intervals. This before removing the URL from the index.

An example when this is useful is when using the processing pipeline that we have implemented for the GSA. GSA uses an external component to index the content, if that component goes down, the GSA will receive a “404 – page does not exist” when trying to crawl and this may cause mass removal from the index. With this functionality turned on, that can be avoided.

Specify URLs to crawl immediately in feeds

Release 6.12 provides the ability to specify URLs to crawl immediately in a feed by using the crawl-immediately attribute. This is a nice feature in order to prioritise what needs to get indexed quickly.

X-robots-tag support

The Appliance now supports the ability to exclude non-html documents by using the x-robots-tag. This feature opens the possibility to exclude non-html documents by using the x-robots-tag.

Google Search Appliance documentation page

Google Search Appliance (GSA) 6.10 released

Last week, Google released version 6.10 of the software to their Google Search Appliance (GSA).

This is a minor update and the focus at the Google teams has been bug fixes and increased stability. Looking at the release notes, there’s indeed plenty of bugs that has been solved.

However, there are also some new features in this release. Some of the more interesting, in my opinion, are:

Multiple front-end configuration for Dynamic Navigation

Since the 6.8 release, the GSA has been able to provde facets, or Dynamic Navigation as Google calls it. However the facets has been global so you couldn’t have two front ends with different facets. This is now possible.
More feeds statistics and Adjust PageRank in feeds
More statistics of what’s happening with feeds you push into the GSA is a very welcome feature. The possibility to adjus PageRank allows for some more control over relevancy in feeds.

Indexing Crawl time kerberos support and Indexing large files

Google is working hard on security and every release since 6.0 has included some security improvements. Nice to see that it continues. Since beginning, the GSA has simply dropped files bigger than 30 MB. Now it will index larger (you can configure how large), but still only the first 2.5 MB of the content will be indexed.

Stopword lists for differented languages

Scalability Centralized configuration

For a multi-node GSA setup, you can now specify the configuration on the master and it’s propagated to the slaves

For a complete list of new features, see the New and Changed Features page in the documentation

Findability on an E-commerce Site

Findability on any e-commerce site is a beast all on its own. What if visitors’ searches return no results? Will they continue to search or did you lose your chance at a sale?

While product findability is a key factor of success in e-commerce, it is predominantly enabled by simple search alone. And while simple search usually doesn’t fulfill complex needs among users, website developers and owners still regard advanced search as just another boring to-do item during development. Owners won’t go so far as to leave it out, because every e-commerce website has some kind of advanced search functionality, but they probably do not believe it brings in much revenue.

Research shows:

  • 50% of online buyers go straight to the search function
  • 34% of visitors leave the site if they can’t find an (available) product
  • Buyers are more likely than Browsers to use search (91%)

What can’t be found, can’t be bought:

  • Search is often mission critical in e-commerce
  • Users don’t know how to spell
  • Users often don’t even know how to describe it

First of all, Findability can accelerate the sales process. And faster sales can increase conversions, because you will not be losing customers who give up trying to find products. Furthermore, fast, precise and successful searches increase your customers’ trust.

On both e-commerce and shopping comparison sites, users can find products in two different ways: searching and browsing. Searching obviously means using the site search whilst browsing involves drilling down through the categories provided by the website. The most common location for a site search on e-commerce sites is at the top of the page, and generally on the right side. Many e-commerce sites have a site search, user login, and shopping cart info all located in the same general area. Keeping the site search in a location that is pretty common will help it to be easier to find for some of your visitors who are accustomed to this trend.

Faceted search should be the de facto standard for an e-commerce website. When a user performs a simple search first, but then on the results page, he or she can narrow the search through a drill-down link (for a single choice) or a check box selection (for multiple non-overlapping choices). The structure of the search results page must also be crystal clear. The results must be ranked in a logical order (i.e. for the user, not for you) by relevance. Users should be able to scan and comprehend the results easily. Queries should be easy to refine and resubmit, and the search results page should show the query itself.

Spell-check is also crucial. Many products have names that are hard to remember or type correctly. Users might think to correct their misspelling when they find poor results, but they will be annoyed at having to do that… or worse, they might think that the website either doesn’t work properly or does not have their product.

Query completion can decrease the problems caused by mistyping or not knowing the proper terminology. Queries usually start with words; so unambiguous character inputting is crucial.

Search analytics, contextual advertisement and behavioral targeting is more than just finding a page or a product. When people search they tell you something about their interests, time, location and what is in demand right now, they say something about search quality by the way they navigate and click in result pages and finally what they do after they found what they were looking for.

A good e-commerce solution uses search technology to:

  • Dynamically tailor a site to suit the visitors’ interests
  • Help the user to find and explore
  • Relate information and promote up- and cross sales
  • Improve visitor satisfaction
  • Increase stickiness
  • Increase sales of related products or accessories
  • Inspire visitors to explore new products/areas
  • Provide-increased understanding of visitor needs/preferences

–> Convert visitors into returning customers!

Why Web Search is Like a Store Clerk

When someone is using the search function on your web site, your web search, it tells you two things. First of all they have a specific need, expressed by their search query. Second, and more importantly he or she wants you to fulfill that need. If users didn’t care where the service was delivered from, they would have gone straight to Google. Hence, the use of your search function signals trust in your capabilities. This means that even if the majority of your website visitors doesn’t use the search function, you know that the ones who do have a commitment to you. Imagine you are working in a store as a clerk; the customer coming up to you and asking you something is probably more interested in doing business with you than the ones just browsing the goods.

This trust however, can easily be turned to frustration and bad will if the web search result is poor and users don’t find what they are looking for. Continuing our analogy with the store, this is much like the experience of looking for a product, wandering around for a few minutes, finally deciding to ask a clerk and getting the answer “If it’s not on the shelf we don’t have it”. I certainly would leave the store and the same applies for a web site. If users fail when browsing and searching, then they will probably leave your site. The consequence is that you might antagonize loyal customers or loose an easy sale. So how do you recognize a bad search function? A good way to start is to look at common search queries and try searching for them yourself. Then start asking a few basic questions such as:

  • Does the sorting of the search results make sense?
  • Is it possible to decide which result is interesting based on the information in the result presentation?
  • Is there any possibility to continue navigating the results if the top hits are not what you are looking for?

Answering these questions yourself will tell you a lot about how your web search is performing. The first step to a good user experience is to know where your challenges are, then you can start making changes to improve the issues you have found in order to make your customers happier. After all, who wants to be the snarky store clerk?

Bridging the Gap Between People and (Enterprise Search) Technology

Tony Russell-Rose recently wrote about the changing face of search, a post that summed up the discussion about the future of enterprise search that took part at the recent search solutions conference. This is indeed an interesting topic. My colleague Ludvig also touched on this topic in his recent post where he expressed his disappointment in the lack of visionary presentations at this year’s KMWorld conference.

At our last monthly staff meeting we had a visit from Dick Stenmark, associate professor of Informatics at the Department of Applied IT at Gothenburg University. He spoke about his view on the intranets of the future. One of the things he talked about was the big gap in between the user’s vague representation of her information need (e.g. the search query) and the representation of the documents indexed by the intranet enterprise search engine. If a user has a hard time defining what it is she is looking for it will of course be very hard for the search engine to interpret the query and deliver relevant results. What is needed, according to Dick Stenmark, is a way to bridge the gap between technology (the search engine) and people (the users of the search engine).

As I see it there are two ways you can bridge this gap:

  1. Help users become better searchers
  2. Customize search solutions to fit the needs of different user groups

Helping users become better searchers

I have mentioned this topic in one of my earlier posts. Users are not good at describing which information they are seeking, so it is important that we make sure the search solutions help them do so. Already existing functionalities, such as query completion and related searches, can help users create and use better queries.

Query completion often includes common search terms, but what if we did combine them with the search terms we would have wanted them to search for? This requires that you learn something about your users and their information needs. If you do take the time to learn about this it is possible to create suggestions that will help the user not only spell correctly, but also to create a more specific query. Some search solutions (such as homedepot.com) also uses a sort of query disambiguation, where the user’s search returns not only results, but a list of matching categories (where the user is asked to choose which category of products her search term belongs). This helps the search engine return not only the correct set of results, but also display the most relevant set of facets for that product category. Likewise, Google displays a list of related searches at the bottom of the search results list.

These are some examples of functionalities that can help users become better searchers. If you want to learn some more have a look at Dan Russells presentation linked from my previous post.

Customize search solutions to fit the needs of different user groups

One of the things Dick Stenmark talked about in his presentation for us at Findwise was how different users’ behavior is when it comes to searching for information. Users both have different information needs and also different ways of searching for information. However, when it comes to designing the experience of finding information most companies still try to achieve a one size fits all solution. A public website can maybe get by supporting 90% of its visitors but an intranet that only supports part of the employees is a failure. Still very few companies work with personalizing the search applications for their different user groups. (Some don’t even seem to care that they have different user groups and therefore treat all their users as one and the same.) The search engine needs to know and care more about its’ users in order to deliver better results and a better search experience as a whole. For search to be really useful personalization in some form is a must, and I think and hope we will see more of this in the future.

Google Instant – Can a Search Engine Predict What We Want?

On September 8th Google released a new feature for their search engine: Google instant.
If you haven’t seen it yet, there is an introduction on Youtube that is worth spending 1:41 minutes on.

Simply put, Google instant is a new way of displaying results and helping users find information faster. As you type, results will be presented in the background. In most cases it is enough to write two or three characters and the results you expect are already right in front of you.

Google instant

The Swedish site Prisjakt has been using this for years, helping the users to get a better precision in their searches.

At Google you have previously been guided by “query suggestion” i.e. you got suggestions of what others have searched for before – a function also used by other search engines such as Bing (called Type Ahead). Google instant is taking it one step further.

When looking at what the blog community has to say about the new feature it seems to split the users in two groups; you either hate it or love it.

So, what are the consequences? From an end-user perspective we will most likely stop typing if something interesting appears that draws our attention. The result?
The search results shown at the very top will generate more traffic , it will be more personalized over time and we will most probably be better at phrasing our queries better.

From an advertising perspective, this will most likely affect the way people work with search engine optimization. Some experts, like Steve Rubel, claims Google instant will make SEO irrelevant, wheas others, like Matt Cutts think it will change people behavior in a positive way over time  and explains why.

What Google is doing is something that they constantly do: change the way we consume information. So what is the next step?

CNN summarizes what the Eric Schmidt, the CEO of Google says:

“The next step of search is doing this automatically. When I walk down the street, I want my smartphone to be doing searches constantly: ‘Did you know … ?’ ‘Did you know … ?’ ‘Did you know … ?’ ‘Did you know … ?’ ”

Schmidt said at the IFA consumer electronics event in Berlin, Germany, this week.

“This notion of autonomous search — to tell me things I didn’t know but am probably interested in — is the next great stage, in my view, of search.”

Do you agree? Can we predict what the users want from search? Is this the sort of functionality that we want to use on the web and behind the firewall?