4 quick ways to improve your search!

Findwise has recently published its annual report about Enterprise Search and Findability. We can see that a lot of people are complaining that the search engine is running poorly. There were 36% dissatisfied (users) in 2015. Is there any simple recipe for that? I bet there are some things that can be applied almost immediately!

 boy-cry

Data

It is quite common that the reason for bad results is content quality – there is simply no good content in a search engine!

Solution = editors and cleaning: Educate editors to produce better content! Decide on the main message, set informative titles and be consistent with internal wording and phrasing. Also, look over your index, archive outdated content and remove duplicates. Don’t be afraid to remove entire data sources from your index.


Metadata

If you already have data indexed, it is much easier to search using additional filters. Let’s say, when we are looking for a new washing machine on any good online store we can easily filter out the features such as energy class, manufacturer, price range etc. The same can happen to corporate data, provided that our documents are tagged in a consistent manner.

Solution = tagging: Check tags and metadata consistency for documents, which we search through. If the quality of the tagging leaves much to be desired, it should be corrected (note: this can be done automatically to the large extent!). Then you should consider what filters are the most useful for your company search and implement them in your browser.

 

Accuracy

Users’ expectations are very important. If they ask and search, they usually want and need, eg. current lunch menu, financial settlement form, a specific procedure for calculating credit risk, sales report for the previous quarter, etc. This unique need of each user is expressed through a simple query. And here we encounter significant problem: these queries are not always well interpreted by the search engine. If you don’t see the desired document/answer in the first five slots of the search results list, even after 2-3 trials by using various queries, you quickly come to the conclusion that the search engine doesn’t not work (well).

Solution = user feed-back: It is fundamental to regularly collect users feed-back on the search engine. If you receive signals that something does not work, then you absolutely need to examine what specific search scenarios aren’t functioning well. These things can be usually fixed pretty easily by using synonyms, promotions or by changing the order of results display.


Monitoring

It is not easy to gather the opinion of everyone in large organiations, as there might be thousands of them. A search engine, like everything else, sometimes breaks down, answers too long for queries and gives silly results, or even no result at all. Additionally, it’s not certain if such a thing contributes to our organization or not, and who makes the use of our search at all.

Solution = logging: Log analysis gives a lot of information about the real use of search engines by the users. Logs tell us how many people are looking for something, what they are asking for, how fast search engine responds, when it gives zero results. It’s priceless information to understand what works, who really benefits, what are the most popular contents and questions, what needs to be improved. It’s crucially important to do it on a regular basis.

 boy-happy

Summary

And now, when you fixed all these four points related to the search engine please tell me that it continues to be malfunctioning. I’ve yet to hear such a case 🙂

Read more about Enterprise Search at Findwise

Big Data is a Big Challenge

Big Data is also a Big Challenge for a number of companies that would like to be ahead of the competition. I think Findwise can help a lot with both technical expertise in text analytics and search technology but also with how to put Big Data to use in a business.

During the last days of February I had the pleasure to attend IDG Big Data conference in Warsaw, Poland. It brought plenty of people from both vendors and industry that shared interesting insights on the topic. In general, big vendors that try to be associated with Big Data dominated the conference. IBM, SAS, SAP, Teradata has provided massive marketing information on software products and capabilities around Big Data. Interestingly every single presentation had its own definition on what Big Data is. This is probably caused by the fact that everybody tries to find the best definitions for fitting own products into it.

From my perspective it was very nice to hear that everyone agrees text analytics and search components are of big importance in any Big Data solution. In multiple applications analysis (both predictive and deductive) and for mass social media one must use advanced linguistic techniques for retrieving and structuring the data streams. This sounded especially strong in IBM and SAS presentations.

A couple of companies revealed what they have already achieved in so called Big Data. Orange and T-Mobile presented their approach of extending traditional business intelligence to harness Big Data. They want to go beyond standard data collected in transaction databases and open up for all the information they have from calls (picked and non-answered), SMS, data transmission logs, etc. Telecom companies consider this kind of information to be a good source for data about their clients.

But the most interesting sessions were held by companies that openly shared their experience about evolution of their Big Data solutions based mainly on open source software. In this way Adam Kawa from Spotify showed how they based their platform on Hadoop cluster starting from a single server to a few hundreds nowadays. To me that seems like a good way to grow and adapt easily to changing business needs and altering external conditions.

Nasza Klasa – a Polish Facebook competitor had a very good presentation on several dimensions connected to challenges in Big Data solutions that might be used for summarisation of this post:

  1. Lack of legal regulations – Currently there are no clear regulations on how the data might be used and how to make money out of it. It is especially important for social portals where all our personal information might be used for different kinds of analysis and sold in aggregated or non-aggregated form. But the laws might be changed soon, thus changing the business too.
  2. Big Data is a bit like research – it is hard to predict return on investment on Big Data as it is a novelty but also a very powerful tool. For many who are looking into this the challenge is internal, to convince executives to invest in something that is still rather vague.
  3. Lack of data scientists – even if there are tools for operating on Big Data, there is a huge lack of skilled people – Big Data operators. These are not IT people nor developers but rather open-minded people with a good mathematical background able to understand and find patterns in a constantly growing stream of various structured and unstructured information.

As I stated at the beginning of this post, Big Data is also a Big Challenge for a number of companies that would like to be ahead of the competition. I truly believe we at Findwise can help a lot within this area, we have both the technical expertise and experience on how to put Big Data to use in a business.

Architecture of Search Systems and Measuring the Search Effectiveness

Lecture made at the 19th of April 2012, at the Warsaw University of Technology. This is the 9th lecture in the regular course for master grade studies, “Introduction to text mining”.

View more presentations from Findwise

Semantic Search Engine – What is the Meaning?

The shortest dictionary definition of semantics is: the study of meaning. The more complex explanation of this term would lead to a relationship that maps words, terms and written expressions into common sense and understanding of objects and phenomena in the real world. It is worthy to mention that objects, phenomena and relationships between them are language independent. It means that the same semantic network of concepts can map to multiple languages which is useful in automatic translations or cross-lingual searches.

The approach

In the proposed approach semantics will be modeled as a defined ontology making it possible for the web to “understand” and satisfy the requests and intents of people and machines to use the web content. The ontology is a model that encapsulates knowledge from specific domain and consists of hierarchical structure of classes (taxonomy) that represents concepts of things, phenomena, activities etc. Each concept has a set of attributes that represent the mapping of that particular concept to words and phrases that represents that concepts in written language (as shown at the top of the figure below). Moreover, the proposed ontology model will have horizontal relationships between concepts, e.g. the linguistic relationships (synonymy, homonymy etc.) or domain specific relationships (medicine, law, military, biological, chemical etc.). Such a defined ontology model will be called a Semantic Map and will be used in the proposed search engine. An exemplar part of an enriched ontology of beverages is shown in the figure below. The ontology is enriched, so that the concepts can be easily identified in text using attributes such as the representation of the concept in the written text.

Semantic Map

The Semantic Map is an ontology that is used for bidirectional mapping of textual representation of concepts into a space of their meaning and associations. In this manner, it becomes possible to transform user queries into concepts, ideas and intent that can be matched with indexed set of similar concepts (and their relationships) derived from documents that are returned in a form of result set. Moreover, users will be able to precise and describe their intents using visualized facets of concept taxonomy, concept attributes and horizontal (domain) relationships. The search module will also be able to discover users’ intents based on the history of queries and other relevant factors, e.g. ontological axioms and restrictions. A potentially interesting approach will retrieve additional information regarding the specific user profile from publicly available information available in social portals like Facebook, blog sites etc., as well as in user’s own bookmarks and similar private resources, enabling deeper intent discovery.

Semantic Search Map

Semantic Search Engine

The search engine will be composed of the following components:

  • Connector – This module will be responsible for acquisition of data from external repositories and pass it to the search engine. The purpose of the connector is also to extract text and relevant metadata from files and external systems and pass it to further processing components.
  • Parser – This module will be responsible for text processing including activities like: tokenization (breaking text into lexems – words or phrases), lemmatization (normalization of grammar forms), exclusion of stop-words, paragraph and sentence boundary detector. The result of parsing stage is structured text with additional annotations that is passed to semantic Tagger.
  • Tagger – This module is responsible for adding semantic information for each lexem extracted from the processed text. Technically it refers to addition of identifiers to relevant concepts stored in the Semantic Map for each lexem. Moreover phrases consisting of several words are identified and disambiguation is performed basing on derived contexts. Consider the example illustrated in the figure.
  • Indexer – This module is responsible for taking all the processed information, transformation and storage into the search index. This module will be enriched with methods of semantic indexing using ontology (semantic map) and language tools.
  • Search index – The central storage of processed documents (document repository) structured properly to manage full text of the documents, their metadata and all relevant semantic information (document index). The structure is optimized for search performance and accuracy.
  • Search – This module is responsible for running queries against the search index and retrieval of relevant results. The search algorithms will be enriched to use user intents (complying data privacy) and the prepared Semantic Map to match semantic information stored in the search index.

What do you think? Please let us know by writing a comment.

Enterprise Search Stuffed up with GIS

When I browsed through marketing brochures of GIS (Geographic Information System) vendors I noticed that the message is quite similar to search analytics. It refers in general to integration of various separate sources into analysis based on geo-visualizations. I have recently seen quite nice and powerful combination of enterprise search and GIS technologies and so I would like to describe it a little bit. Let us start from the basic things.

Search result visualization

It is quite obvious to use a map instead of simple list of results to visualize what was returned for an entered query. This technique is frequently used for plenty of online search applications especially in directory services like yellow pages or real estate web sites. The list of things that are required to do this is pretty short:

– geoloalization of items  – it means to assign accurate geo coordinates to location names, addresses, zip codes or whatever expected to be shown in the map; geo localization services are given more less for free by Google or Bing maps.

– backgroud map – this is necessity and also given by Google or Bing; there are also plenty of vendors for more specialized mapping applications

– returned results with geo-coordinates  as metadata – to put them in the map

Normally this kind of basic GIS visualisation delivers basic map operations like zooming, panning, different views and additionally some more data like traffic, parks, shops etc. Results are usually pins [Bing] or drops [Google].

Querying / filtering with the map

The step further of integration between search and GIS would be utilizing the map as a tool for definition of search query. One way is to create area of interest that could be drawn in the map as circle, rectangle or polygon. In simple way it could be just the current window view on the map as the area of query. In such an approach full text query is refined to include only results belonging to area defined.

Apart from map all other query refinement tools should be available as well, like date-time sliders or any kind of navigation and fielded queries.

Simple geo-spatial analysis

Sometimes it is important to sort query results by distance from a reference point in order to see all the nearest Chinese restaurant in the neighborhood.  I would also categorize as simple geo-spatial analysis grouping of search result into a GIS layers like e.g. density heatmap, hot spots using geographical and other information stored in results metadata etc.

Advanced geo-spatial analysis

More advance query definition and refinement would involve geo-spatial computations. Basing on real needs it could be possible for example to refine search results by an area of sight line from a picked reference point or select filtering areas like those inside specific borders of cities, districts, countries etc.

So the idea is to use relevant output from advanced GIS analysis as an input for query refinement. In this way all the power of GIS can be used to get to the unstructured data through a search process.

What kind of applications do you think could get advantage of search stuffed with really advanced GIS? Looking forward to your comments on this post.