SLTC 2012 in retrospect – two cutting-edge components

The 4th Swedish Language Technology Conference (SLTC) was held in Lund on 24-26 October 2012.
It is a biennial event organized by prominent research centres in Sweden.
The conference is, therefore, an excellent venue to exchange ideas with Swedish researchers in the field of Natural Language Processing (NLP), as well as present own research and be updated of the state-of-the-art in most of the areas of Text Analytics (TA).

This year Findwise participated in two tracks – in a workshop and in the main conference.
As the area of Search Analytics (SA) is very important to us, we decided to be proactive and sent an application to organize a workshop on the topic of “Exploratory Query Log Analysis” in connection with the main conference. The application was granted and the workshop was very successful. It gathered researchers who work in the area of SA from very different perspective – from utilizing deep Machine Learning to discover users’ intent,  to looking at query logs as a totally new genre. I will do a follow-up on that in another post. All the contributions to the workshop will also be uploaded on our research page.

As for the main conference, we had two papers accepted for presentation. The first one dealt with the topic of document summarization – both single and multidocument summarization
(http://www.slideshare.net/findwise/extractive-document-summarization-an-unsupervised-approach).
The second paper was about detecting Named Enities in Swedish
(http://www.slideshare.net/findwise/identification-of-entities-in-swedish).

These two papers presented de facto state-of-the-art results for Swedish both when it comes to document summarization and Named Entity Recognition (NER). As for the former task, there is neither a standard corpus for evaluation of summarization systems, nor many previous results and just few other systems which made it unfeasible to compare our own system with. Thus, we have contributed two things to the research in document summarization – a Swedish corpus based on featured Wikipedia articles to be used for evaluation and a system based on unsupervised Machine Learning, which by relying on domain boosting achieves state-of-the-art results for English and Swedish. Our system can be further improved by relying on our enhanced NER and Coreference resolution modules.

As for the NER paper, our Entity recognition system for Swedish achieves 74.0% F-score, which is 4% higher than another study presented simultaneously at SLTC (http://www.ling.su.se/english/nlp/tools/stagger). Both systems were evaluated on the same corpus, which is considered a de facto standard for evaluation of different NLP resources for Swedish. The unlabelled score (i.e. no fine-grained division of classes but just entity vs non-entity) of our system achieved 91.3% F-score (93.1% Precision and 89.6% Recall). When identifying people, the Findwise NER system achieves 78.1% Precision and 90.5% Recall (83.9% F-score).

So, what did we take home from the conference? We were really happy to see that the tools we develop for our customers are not something mediocre but rather something that is of very high quality and is the state-of-the-art in Swedish NLP. We actively share our results and our corpora for research perposes. Findwise showed keen interest in cooperating with other researchers in developing better tools and systems in the area of NLP and Text Analytics. And this I think is a huge bonus to all our current and prospective customers – we actively follow the current trends in the research community and cooperate with researchers, and our products do incorporate the latest findings in the field, which make us leverage both high quality and cutting-edge technology.

As we continuously improve our products, we have also released a Polish NER and some work has been initiated on Danish and Norwegian ones. More NLP components will be soon available for demo and testing on our research page.

Enterprise Search and Findability discussions at World Cafe in Oslo

Yesterday we (Kristian Hjelseth and Kristian Norling) participated in a great World Cafe event arranged by Steria in Norway. We did a Pecha Kucha inspired presentation (scroll down to the bottom of this blog post for the presentation) to introduce the subject of Enterprise Search and Findability and how to work more efficiently with the help of enterprise search. Afterwards there was a set of three round-table workshop with practitioners, where search related issues were discussed. We found the discussions very interesting, so we thought we should share some of the topics with a broader audience.

The attendees had answered a survey before coming to the World Cafe. In which 83,3% stated that finding the right information was critical for their business goals. But only 20,3% were satisfied with their current search solution, because 75% said it was hard or very hard to find the right information. More stats from a global survey on enterprise search that asked the same questions.

Unified Search

To have all the information that you would like to find in the same search was deemed very important for findability by the participants. The experience of search is that the users don’t know what to search for, but to make it even worse, they do not know where to look for the information! This is also confirmed by the Enterprise Search and Findability Survey that was done earlier this year. The report is available for download.

Trust

Google web search always comes up as an example of what “just works”. And it does work because they found a clever algorithm, PageRank, that basically measures the trustworthiness of information. Since PageRank is heavily dependent on inbound links this way of measuring trust is probably not going to work on an intranet where cross-referencing is not as common based on our experience. Most of the time it is not even possible to link stuff on the intranet, since the information is not accessible through http. Read more about it in this great in-depth article series on the difference between web search and enterprise search by Mark Bennet.

So how can we make search inside the firewall as good as web search? I think by connecting the information to the author. Trust builds between people based on their views of others. Simply put, someone has the authority over her peers either through rank (=organisation chart) or through trust. The trustworthiness can be based on the persons ability to connect to other people (we all probably know someone who knows “everyone”) or we trust someone based on the persons knowledge. More reading on the importance of trust in organisations. How to do this in practice? Some ideas in this post by BIll Ives. Also a good read: “How social is Enterprise Search?” by Jed Cawthorne. And finally another good post to read.

Metadata

By adding relevant metadata to information, we can make it more findable. There was discussions on the importance of strict and controlled metadata and how to handle user tagging. For an idea on how to think about metadata, read a blog post on how VGR used metadata by Kristian Norling.

Search Analytics

Before you start to do any major work with your current enterprise search solution, look at the search log files and analyze the data. You might be surprised in what you find. Search analytics is great if you want insight into what the user expects to find when they search. Watch this video for an introduction to Search Analytics in Practice.

Other subjects

  • Access control and transparency
  • Who owns search?
  • Who owns the information?
  • Personalization of search results
All these subjects and many more were discussed at the workshops, but that will have to wait for another blog post!
As always, your thoughts and comments are most welcome!

Video: Search Analytics in Practice

Search Analytics in Practice from Findwise on Vimeo.

This presentation is about how to use search analytics to improve the search experience. A small investment in time and effort can really improve the search on your intranet or website. You will get practical advice on what metrics to look at and what actions can be taken as a result of the analysis.

Video in swedish “Sökanalys i praktiken”.

The presentation was recorded in Gothenburg on the 4th of May 2012.

The presentation featured in the video:

Search Analytics in Practice

View more presentations from Findwise

Book Review: Search Analytics for Your Site

Lou Rosenfeld is the founder and publisher of Rosenfeld Media and also the co-author (with Peter Morville) of the best-selling book Information architecture for the World Wide Web, which is considered one of the best books about information management.

In Lou Rosenfeld’s latest book he lets us know how to successfully work with Site Search Analytics (SSA). With SSA you analyse the saved search logs of what your users are searching for to try to find emerging patterns. This information can be a great help to figure out what users want and need from your site.  The search terms used on your site will offer more clues to why the user is on your site compared to search queries from Google (which reveal how they get to your site).

So what’s in the book?

Part I – Introducing Site Search Analytics

In part one the reader gets a great example of why to use SSA and an introduction to what SSA is. In the first chapters you follow John Ferrara who worked at a company called Vanguard and how he analysed search logs to prove that a newly bought search engine performed poorly whilst using the same statistics to improve it. This is a great real world example of how to use SSA for measuring quality of search AND to set up goals for improvement.

a word cloud is one way to play with the data

Part II – Analysing the data

In this part Lou gets hands on with user logs and lets you how to analyse the data. He makes it fun and emphasizes the need to play with user data. Without emphasis on playing, the task to analyse user data may seem daunting. Also, with real world examples from different companies and institutions it is easy to understand the different methods for analysis. Personally, I feel the use of real data in the book makes the subject easier (and more interesting) to understand.

From which pages do users search?

Part III – Improving your site

In the third part of the book, Rosenfeld shows how to apply your findings during your analysis. If you’ve worked with SSA before most of it will be familiar (improving best bets, zero hits, query completion and synonyms) but even for experienced professionals there is good information about how to improve everything from site navigation to site content and even to connect your ssa to your site KPI’s.

ConclusionSearch Analytics For Your Site shows how easy it is to get started with SSA but also the depth and usefulness of it. This book is easy to read and also quite funny. The book is quite short which in this day and age isn’t negative. For me this book reminded me of the importance of search analytics and I really hope more companies and sites takes the lessons in this book to heart and focuses on search analytics.

Findability on an E-commerce Site

Findability on any e-commerce site is a beast all on its own. What if visitors’ searches return no results? Will they continue to search or did you lose your chance at a sale?

While product findability is a key factor of success in e-commerce, it is predominantly enabled by simple search alone. And while simple search usually doesn’t fulfill complex needs among users, website developers and owners still regard advanced search as just another boring to-do item during development. Owners won’t go so far as to leave it out, because every e-commerce website has some kind of advanced search functionality, but they probably do not believe it brings in much revenue.

Research shows:

  • 50% of online buyers go straight to the search function
  • 34% of visitors leave the site if they can’t find an (available) product
  • Buyers are more likely than Browsers to use search (91%)

What can’t be found, can’t be bought:

  • Search is often mission critical in e-commerce
  • Users don’t know how to spell
  • Users often don’t even know how to describe it

First of all, Findability can accelerate the sales process. And faster sales can increase conversions, because you will not be losing customers who give up trying to find products. Furthermore, fast, precise and successful searches increase your customers’ trust.

On both e-commerce and shopping comparison sites, users can find products in two different ways: searching and browsing. Searching obviously means using the site search whilst browsing involves drilling down through the categories provided by the website. The most common location for a site search on e-commerce sites is at the top of the page, and generally on the right side. Many e-commerce sites have a site search, user login, and shopping cart info all located in the same general area. Keeping the site search in a location that is pretty common will help it to be easier to find for some of your visitors who are accustomed to this trend.

Faceted search should be the de facto standard for an e-commerce website. When a user performs a simple search first, but then on the results page, he or she can narrow the search through a drill-down link (for a single choice) or a check box selection (for multiple non-overlapping choices). The structure of the search results page must also be crystal clear. The results must be ranked in a logical order (i.e. for the user, not for you) by relevance. Users should be able to scan and comprehend the results easily. Queries should be easy to refine and resubmit, and the search results page should show the query itself.

Spell-check is also crucial. Many products have names that are hard to remember or type correctly. Users might think to correct their misspelling when they find poor results, but they will be annoyed at having to do that… or worse, they might think that the website either doesn’t work properly or does not have their product.

Query completion can decrease the problems caused by mistyping or not knowing the proper terminology. Queries usually start with words; so unambiguous character inputting is crucial.

Search analytics, contextual advertisement and behavioral targeting is more than just finding a page or a product. When people search they tell you something about their interests, time, location and what is in demand right now, they say something about search quality by the way they navigate and click in result pages and finally what they do after they found what they were looking for.

A good e-commerce solution uses search technology to:

  • Dynamically tailor a site to suit the visitors’ interests
  • Help the user to find and explore
  • Relate information and promote up- and cross sales
  • Improve visitor satisfaction
  • Increase stickiness
  • Increase sales of related products or accessories
  • Inspire visitors to explore new products/areas
  • Provide-increased understanding of visitor needs/preferences

–> Convert visitors into returning customers!

KMWorld 2010 Reflections: Search is a Journey Not a Destination

Two weeks ago me, Ludvig Johansson and Christopher Wallström attended KMWorlds quadruple conference in Washington D.C. The conference consisted of four different conferences; KMWorld, Enterprise Search Summit, Taxonomy Bootcamp and SharePoint Symposium. I focused on Enterprise Search Summit and SharePoint Symposium and Christopher mainly covered Taxonomy Bootcamp as well as the Enterprise Search Summit. (Christopher will soon write a blog post about this as well.)

During the conferences there where some good quality content, however most of it was old news with speakers mainly focusing on outputs of their own products. This was disappointing since I had hoped to see the newest and coolest solutions within my area. Speakers presented systems from their corporations, where the newest and coolest functionality they described was shallow filters on a Google Search Appliance. From my perspective this is not new or cool. I would rather consider this standard functionality in today’s search solutions.

However, some sessions where really good. Daniel W. Rasmus talked about the Evolution of Search in quite a fun and thoughtful way. One thing he wanted to see in the near future was more personalization of search. Search needs to know the user and adapt to him/her and not simply use a standardized algorithm. As Rasmus sad it: “my search engine is not that in to me”. This is, as I would put it, spot on how we see it at Findwise. Today’s customer wants standard search with components that have existed for years now. It’s time for search to take the next step in the evolution and for us to start deliver Findabillity solutions adapted to your needs as an individual. In the line of this, Rasmus ended with another good quote: “Don’t let your search vendors set your exceptions to low”. I think this speaks for it self more or less. If we want contextual search then we should push the vendors out there to start deliver!

Another good session was delivered by Ellen Feaheny on how to utilize both old and new systems smarter. It was from this session the title of this post origins, “It’s a journey not a destination”. I thought this sums up what we feel everyday in our projects. It’s common that customers want to see projects to have a clear start and end. However with search and Findability we see it as a journey. I can even go as far to say it’s a journey without an end. We have customers coming and complaining about their search; saying “It doesn’t work anymore” or “The content is old”, to give two examples. The problem is that search is not a one time problem that you solve and then never have to think about again. If you don’t work with your search solution and treat search as a journey, continually improve relevance, content and invest time in search analytics your solution will soon get dusty and not deliver what your employees or customers wants.

Search is a journey not a destination.

Quick Website Diagnostics with Search Analytics

I have recently been giving courses directed to web editors on how to successfully apply search technology on a public web site. One of the things we stress is how to use search analytics as a source of user feedback. Search analytics is like performing a medical checkup. Just as physicians inspect patients in search of maladious symptoms, we want to be able to inspect a website in search of problems hampering user experience. When such symptoms are discovered a reasonable resolution is prescribed.

Search analytics is a vast field but as usual a few tips and tricks will take you a long way. I will describe three basic analysis steps to get you started. Search usage on public websites can be collected and inspected using an array of analytics toolkits, for example Google Analytics.

How many users are using search?

For starters, have a look at how many of your users are actually using search. Obviously having a large portion of users doing so means that search is becoming very important to your business. A simple conclusion stemming from such evidence is that search simply has to work satisfactorily, otherwise a large portion of your users are getting disappointed.

Having many searchers also raises some questions. Are users using search because they want to or because they are forced to, because of tricky site navigation for example? If you feel that the latter seems reasonable you may find that as you improve site navigation your number of searchers will decrease while overall traffic hopefully increases.

Just as with high numbers, low numbers can be ambiguous. Low scores especially coupled with a good amount of overall site traffic may mean that users don’t need search in order to find what they are looking for. On the other hand it may mean that users haven’t found the search box yet, or that the search tool is simply too complicated for the average user.

Aside from the business, knowing how popular search is can be beneficial to you personally. It’s a great feeling to know that you are responsible for one of the most used subsystems of your site. Rub it in the face of your colleague!

From where are searches being initiated?

One of the first recommendations you will get when implementing a search engine for your web site is to include the search box on each and every page, preferably in a standardized easy-to-find place like the top right corner. The point of having the search box available wherever your users happen to be is to enable them to search, typically after they have failed to find what they are looking for through browsing.

Now that we know that search is being conducted everywhere, we should be keeping an eye out for pages that frequently emit searches. Knowing what those pages are will let us improve the user experience by altering or completing the information there.

Which are the most common queries?

The most frequently issued queries to a search system make up a significant amount of the total number of served queries. These are known as head queries. By improving the quality of search for head queries you can offer a better search experience to a large amount of users.

A simple but effective way of working with search tuning is this. For each of the 10, 20 or 50 most frequent queries to the system:

  1. Imagine what the user was looking for when typing that query
  2. Perform that query yourself
  3. Examine the 5-10 top results in the result list:
    • Do you think that the user was content with those results
    • If yes, pat your back 🙂
    • If not, tweak using synonyms or best bets.

Go through this at least once a month. If the information on your site is static you might not need to change a lot of things every time, but if your content is changing or the behavior of the users you may need to adjust a few things.

Get to Know Your Users with Search Analytics

At Findwise we are currently looking deeply into search analytics for enterprise search, a way not only to assure quality and relevance for your results, but to actually know and understand the users better.

Web analytics has been around for quite some time, but there are several things that makes search special.

There are simple ways to look at ‘top queries’ (most frequently asked), ‘zero-results-hits’ (which of course can be a result of bad spelling, but many times by lack of information) and popular searches over time (for trends etc), ‘Top queries’ can be fixed by static tools, bad spelling by good spell-checking and lack of information by synonyms and adding the missing pieces. But I believe we are missing something important here:

When a user conducts a search, he is using it to either:

  • find a specific piece of information or
  • find more and/or related information about a topic
  • but, by doing so, he might find information that brings new perspectives such as:
  • information he didn’t know existed

The process of search should always be a dialogue between the user and the search application. Simple: The ‘what‘-questions always have to lead to the ‘why‘-questions.
The users doesn’t type a query for fun, they have an intention when doing so. Why do the user ask for a particular piece or area of information? Depending on the intention of the user (specific piece, related or general information), different tools can be used to enhance information retreival.

Done right, search analytics can be used for tuning your search engine (weighting of documents, improvements of spellchecking, synonyms etc) and clearly improve information retrieval, but just as important, work as a tool for information quality assurance and management.

Within the next couple of weeks this blog will cover further aspects and thoughts on this subject. If you haven’t read Maria’s ‘What differentiates a good search engine from a bad one?’ already, I recommend you to do so.