Google Search Appliance (GSA) 6.12 released

Google has released yet another version of the Google Search Appliance (GSA). It is good to see that Google stay active when it comes to improving their enterprise search product! Below is a list of the new features:

Dynamic navigation for secure search

The facet feature, new since 6.8, is still being improved. When filters are created, it is now possible to take in account that they only include secure documents, which the user is authorized to see.

Nested metadata queries

In previous Search Appliance releases there were restrictions for nesting meta tags in search queries. In this release many of those restrictions are lifted.

LDAP authentication with Universal Login

You can configure a Universal Login credential group for LDAP authentication.

Index removal and backoff intervals

When the Search Appliance encounters a temporary error while trying to fetch a document during crawl, it retains the document in the crawl queue and index. It schedules a series of retries after certain time intervals, known as “backoff” intervals. This before removing the URL from the index.

An example when this is useful is when using the processing pipeline that we have implemented for the GSA. GSA uses an external component to index the content, if that component goes down, the GSA will receive a “404 – page does not exist” when trying to crawl and this may cause mass removal from the index. With this functionality turned on, that can be avoided.

Specify URLs to crawl immediately in feeds

Release 6.12 provides the ability to specify URLs to crawl immediately in a feed by using the crawl-immediately attribute. This is a nice feature in order to prioritise what needs to get indexed quickly.

X-robots-tag support

The Appliance now supports the ability to exclude non-html documents by using the x-robots-tag. This feature opens the possibility to exclude non-html documents by using the x-robots-tag.

Google Search Appliance documentation page

Why Web Search is Like a Store Clerk

When someone is using the search function on your web site, your web search, it tells you two things. First of all they have a specific need, expressed by their search query. Second, and more importantly he or she wants you to fulfill that need. If users didn’t care where the service was delivered from, they would have gone straight to Google. Hence, the use of your search function signals trust in your capabilities. This means that even if the majority of your website visitors doesn’t use the search function, you know that the ones who do have a commitment to you. Imagine you are working in a store as a clerk; the customer coming up to you and asking you something is probably more interested in doing business with you than the ones just browsing the goods.

This trust however, can easily be turned to frustration and bad will if the web search result is poor and users don’t find what they are looking for. Continuing our analogy with the store, this is much like the experience of looking for a product, wandering around for a few minutes, finally deciding to ask a clerk and getting the answer “If it’s not on the shelf we don’t have it”. I certainly would leave the store and the same applies for a web site. If users fail when browsing and searching, then they will probably leave your site. The consequence is that you might antagonize loyal customers or loose an easy sale. So how do you recognize a bad search function? A good way to start is to look at common search queries and try searching for them yourself. Then start asking a few basic questions such as:

  • Does the sorting of the search results make sense?
  • Is it possible to decide which result is interesting based on the information in the result presentation?
  • Is there any possibility to continue navigating the results if the top hits are not what you are looking for?

Answering these questions yourself will tell you a lot about how your web search is performing. The first step to a good user experience is to know where your challenges are, then you can start making changes to improve the issues you have found in order to make your customers happier. After all, who wants to be the snarky store clerk?

Knowledge Management: Retrieve, Visualize and Communicate!

As noted by Swedish daily paper Metro, Findwise is working with JCDEC, the Joint Concept Development & Experimentation Centre at Swedish Military Headquarters. In Metro’s words the project aims at developing a knowledge management system for the headquarters of tomorrow. The system is expected to be up and running in time for the international military exercise VIKING 11, to be executed in April of 2011.

Good decisions stem from good information; this is true for both military and civilian enterprises. Vast amounts of time and resources are being invested in order to collect information. But to what end? Granted, somewhere among that information there is probably something you will find useful. But large amounts of information quickly become incomprehensible. In order to combat information overload you need a select-and-filter tool such as Search, and that’s where Findwise comes in.

However, for JCDEC it is not enough to simply locate the information they have available. Captain Alexandra Larsson, Concept Development Lead for Knowledge Support, makes this fact very clear. It is just as important to get an idea of what information is not there. In essence, JCDEC is in the process of creating information from information. This is also one of the great differences between the kind of web-based search and retrieval systems we have come to depend on and a state of the art knowledge management system. The latter is not just a retrieval tool; it is an information workbench where the user can select, retrieve, examine and manipulate information.

The key to finding information gaps is to study patterns. For example, consider the trivial problem of birthday distributions. Without any prior knowledge one would probably expect there to be roughly as many births in May as in August or November. This is not always the case. Depending on where you are in the world birth figures may actually be skewed so that one month has significantly more births than other months do. Why does this happen? Being able to pose that exact question may in turn teach us a lot about the mysterious workings of the Stork.

In military intelligence the filling of information gaps may mean the difference between victory and defeat. Why is there an increase in partisan activity in that district? Why were eight weapons silos raided over the course of two days? Why at this moment in time? These questions are expected to lead to insights into the plans and activities of suspects and to notify those in command of looming threats.

Retrieve, Visualize, Communicate

Retrieval

The envisioned work-flow for JCDEC information operators is threefold: retrieval, visualization and communication. Each research session will typically be initiated through a keyword search interface, much like you would issue a web query on Google. Just like its online counterpart the system would present the results ordered according to their expected relevancy to the operator’s query. Using facets and query refinement the result set can be narrowed down until the information in front of the operator is expected to contain that which is being sought for.

Captain Alexandra Larsson hints at another strategy for getting to information. Facets are so speedy these days that they can be applied on the full document set without any delays. Clearly, JCDEC is using search technology to provide directory listings much like websites such as the Open Directory Project, although completely dynamic. The option of simply browsing these directories is also available to operators.

Visualization

The next step, visualization, employs an array of tools for visually displaying the results. These include plotting objects on maps and timelines and looking for groupings where objects have a disproportionately dense distribution, so called cluster analysis, among others. This is where clues are uncovered and questions posed: why there, at that time, with those people? In some cases a field investigation is necessary in order to answer these questions. Other times the answers can be deduced from the tools themselves. The tools also allow the operator to formulate new search queries based on the visual information. The operator may choose to limit the scope of the search to one or more of the clusters in the timeline or map, for example.

Communication

If or when the operator finds something interesting this should be recorded. But to JCDEC it is not necessarily the results themselves that are important. The act of getting to the information is valuable in itself. The reason for this is that different operators have different backgrounds and possess different types of information. Where one operator filters or deduces information from a search result in one way, another operator might choose a completely different approach and unveil other clues.

According to Captain Alexandra Larsson it is absolutely necessary that operators share knowledge as well as refinement strategies as part of their work. One of the paradigms that JCDEC is looking to experiment with is social bookmarking along with the ability to search through sets of bookmarked objects. Objects can be both tagged and commented on, useful for conveying meta-information to fellow operators. It is likely that there will be custom-based filters, where an operator can inform the system of types of objects that do and do not interest him or her and have the system automatically filter the result sets based on this information. These filters can of course also be shared with other operators.

An evolving system

The process of retrieval, visualization and communication is only one, albeit the most prominent, feature of the JCDEC knowledge management system. The system itself will be put to use in the spring of 2011 and development will surely continue beyond that point. The ideas and concepts at work today will most likely be refined over time as Captain Alexandra Larsson and JCDEC learn from hands-on experience with working with information. And as evolution progresses I hope to be able to go into more detail on some of the other tidbits.