Under the hood of the search engine

While using a search application we rarely think about what happens inside it. We just type a query, sometime refine details with facets or additional filters and pick one of the returned results. Ideally, the most desired result is on the top of the list. The secret of returning appropriate results and figuring out which fits a query better than others is hidden in the scoring, ranking and similarity functions enclosed in relevancy models. These concepts are crucial for the search application user’s satisfaction.

In this post we will review basic components of the popular TF/IDF model with simple examples. Additionally, we will learn how to ask Elasticsearch for explanation of scoring for a specific document and query.

Document ranking is one of the fundamental problems in information retrieval, a discipline acting as a mathematical foundation of search. The ranking, which is literally assigning a rank to a document matching search query corresponds with a term of relevance. Document relevance is a function which determines how well given document meets the search query. A concept of similarity corresponds, in turn, to the relevance idea, since relevance is a metric of similarity between a candidate result document and a search query. Continue reading