Migration from Google Search Appliance (GSA) in 4 easy steps

 

 

Google Search Appliance is being phased out and in 2018, renewals will end. As an existing client, you can buy one-year license renewals throughout 2017. However, if fancying a change, here’s 4 simple steps for switching to Apache Solr or Elasticsearch.

1. Choose your hosting solution or servers

Wikimedia_Foundation_Servers-8055_14 

Whereas Google Search Appliance comes ready to plug in, Apache Solr and Elasticsearch need to be deployed and hosted on servers. You can choose to host Solr or Elasticsearch on your own infrastructure or in the cloud. Both platforms are highly scalable and can be massively distributed.

  • Own infrastructure

Servers and hardware requirements are highly dependent on the number of documents, documents types, search use cases and number of users. Memory, CPUs, disk and network are the main parameters to consider.

Elasticsearch hardware recommendations: https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html

Apache Solr performance: https://wiki.apache.org/solr/SolrPerformanceProblems

Both Elasticsearch and Solr requires running java. For SolrCloud, you will also need to install Zookeeper.

  • In the cloud

You can also choose to run Solr or Elasticsearch on a cloud platform.

Elastic official cloud platform: https://www.elastic.co/cloud

2. Define your schema and mapping

In Apache Solr and Elasticsearch, fields can be indexed and processed differently according on type, language, use case … A field and its type can be defined in Elasticsearch using the mapping API or in Apache Solr with the schema.xml

Elasticsearch mapping API: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

Apache Solr schema: https://wiki.apache.org/solr/SchemaXml

3. Tune your connectors

the-cable-guy

Do you need to change all connectors?

The answer is no. Connectors sending GSA feeds can be kept, just refactor the output to match the Elasticsearch or Solr indexing syntax.

However, if you use GSA to crawl websites, you will need either to reconsider crawling as the method to get your data or to use an external webcrawler (like Norconex) Contrary to GSA, Apache Solr and Elasticsearch do not come with a webcrawler.

Elasticsearch Indexing API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html

4. Rewrite your queries and fetch new output

All common query functions such as filtering, sorting and dynamic navigation are standard in both Apache Solr and Elasticsearch. However, query parameters and output (XML or JSON) are different, which means queries and front-end need adaption to your new search engine.

If you are using Jellyfish by Findwise, queries and output will roughly be the same.

Elasticsearch response body: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html

Apache Solr response: https://cwiki.apache.org/confluence/display/solr/Response+Writers

Google Search Appliance features equivalence

GSA feature Elasticsearch Apache Solr
Web crawling X X
Language Bundles Languages Language Analysis
Synonyms Synonyms Synonyms
Stopwords Stopwords Stopwords
Result Biasing Controlling relevance Query elevation
Suggestions Search-suggesters Suggester
Dynamic navigation Aggregations Faceting
Document preview X X
User result X X
Expert search X X
Keymatch X X
Related Queries X X
Secure search Shield Solr Security
Search reports Logstash+Kibana X
Mirroring/Distributed Scale Elastic Solr Cloud
System alert Watcher X
Email update/Alert Watcher X

X = not available outside of the box

One thought on “Migration from Google Search Appliance (GSA) in 4 easy steps

  1. Very interesting article. How many work hours would you estimate a migration from Google Search Appliance to ElaticSearch of 1.000.000 documents?

Leave a Reply

Your email address will not be published. Required fields are marked *