ExternalFileField in Solr

Sometimes we want to update document values in an indexed field more often than other fields. A good solution to this is to use the field type ExternFileField. The ExternalFileField gets values from an external file instead of the index. Such file can easily be changed and update the field after a commit. Hence no documents need to be re-indexed. A field that has ExternalFileField as type is not searchable. The field may currently only be used as a ValueSource in a FunctionQuery.

The external file contains keys and values:

key1=value1
key2=value2

The keys don’t need to be unique.

The name of the external file must be external_<fieldname> or external_<fieldname>.* and must be placed in the index directory.

A new file type of the type ExternalFileField and field must be added to schema.xml.

<fieldType name="file"

           keyField="keyField" defVal="1" indexed="false"

           stored="false" valType="float" />

<field name="<fieldname>" type="file" />

keyField is the field that contains the keys and <fieldname> contains the values from the external file.

valType defines the value type of the field.

At Findwise we have used this method for a customer where we wanted to show the most visited pages higher up in the search result. These statistics are changing daily for a lot of pages and we don’t want to re-index all these pages every day.

4 thoughts on “ExternalFileField in Solr

  1. can you show place you code sample which you are using for this theory. As I am facing problem how to define ExternalFileField.

    thanks

  2. You need to add class=”solr.ExternalFileField” to the fieldType. The correct definition of the field type in schema.xml should be:

    <fieldType name="file" class="solr.ExternalFileField" keyField="keyField" defVal="1" indexed="false" stored="false" valType="float" />

    • Let’s say we have a field, page_visited, of the type ExternalFileField. We have following values in the external_page_visited.txt file:
      page1=5
      page2=2
      page3=15
      Let’s assume these three pages have the same ranking score when a usual search is done. But if you add the parameter bf=page_visited^100 or something similar to the query, page3 should get higher score and appear first in the result.

Leave a Reply

Your email address will not be published. Required fields are marked *