Elastic Stack 5.0 is released

At a first glance, the major Elasticsearch version bump might seem frightening. Going from version 2.4.x to 5.0 is a big jump, but there’s no need to worry. The main reason is to align versions between the different products in the stack. Having all products on the same version will make it a lot easier to handle future upgrades and simplify the overall experience for both new and existing users.

All products in the stack have been updated, some more than others. Here are a few highlights regarding Elasticsearch 5.0 that we recommend you to read before upgrading. Or schedule an appointment with us and we’ll help you out!

New relevance model

Elasticsearch prior version 5 used the default scoring algorithm TF/IDF. From now on the default algorithm is BM25.

Depending on the nature of your indexed information, a re-index operation might give you slightly different results and most likely more relevant.

Re-index from remote

This new feature of the Elasticsearch API is really useful when for example upgrading from old clusters. By specifying a remote cluster in the API call, you can easily transfer old documents to your newly created 5.0 cluster without going through a rolling node upgrade procedure.

Ingest Node

There’s a new node type in town. Starting from version 5.0, Elasticsearch gives you the possibility to do simple data manipulation within a running cluster prior indexing. This is useful if you prefer a more simplistic architecture without Logstash instances, but still require to do some alterations to your data.

Most core processors found in Logstash are available. Often used ones include:

  • Date Processor
  • Convert processor
  • Grok Processor
  • Rename Processor
  • JSON Processor

Search and Aggregations

The search API has been refactored to be more clever regarding which indices are hit, but also if aggregations need to be recalculated or not when issuing range queries. By looking at when indices were last modified, range aggregations can be cached and only recalculated if really needed. This improvement is really useful for the typical log analytic case with time series data. You will notice speed improvements in your Kibana dashboards.

New data structures

Lucence 6.0 introduces a new feature called dimensional points, which uses the k-d tree geo-spatial data structure to enable fast single- and multi-dimensional numeric range and geo-spatial point-in-shape filtering. Elasticsearch 5.0 implements a variant called block k-d tree specifically designed for efficient IO, which gives significant performance boosts when indexing as well as filtering.

Should I upgrade?

If your typical use case involves geo-spatial queries and filtering, we definitely recommend that you upgrade your cluster and re-index your documents to gain the performance boost. Due to the simplicity in upgrading or even migrating data to a completely new cluster, it will be worth the time getting your Elastic Stack up to date and ready for features to come.

In case you need help, don’t hesitate to contact us and we will guide you through the process.

Written by: Joar Svensson, Consultant Findwise

Under the hood of the search engine

While using a search application we rarely think about what happens inside it. We just type a query, sometime refine details with facets or additional filters and pick one of the returned results. Ideally, the most desired result is on the top of the list. The secret of returning appropriate results and figuring out which fits a query better than others is hidden in the scoring, ranking and similarity functions enclosed in relevancy models. These concepts are crucial for the search application user’s satisfaction.

In this post we will review basic components of the popular TF/IDF model with simple examples. Additionally, we will learn how to ask Elasticsearch for explanation of scoring for a specific document and query.

Document ranking is one of the fundamental problems in information retrieval, a discipline acting as a mathematical foundation of search. The ranking, which is literally assigning a rank to a document matching search query corresponds with a term of relevance. Document relevance is a function which determines how well given document meets the search query. A concept of similarity corresponds, in turn, to the relevance idea, since relevance is a metric of similarity between a candidate result document and a search query. Continue reading