Published

~4 minutes reading đź•‘

Wed, 21 June 2017

← Back to articles

Kinto and ElasticSearch

ElasticSearch integration

Kinto is a generic API on top of a storage backend. ElasticSearch is a generic API on top of an indexing backend. When combining both, your data is stored in a persistent database, and you can search, aggregate and filter it efficiently.

This integration has been sitting around for a long time, since it was the subject of our Kinto plugin tutorial. But recently, we had a concrete use-case at Mozilla so we published it as an official plugin!

Use case

Mozilla publishes its products for several platforms, in a lot of different locales (70-100 regions/languages), and at different stages of maturity (nightly, beta, release, ESR). And currently there is no comprehensive or official database for every released version and its associated metadata (revision, build date, ...).

Our task was to build an API to allow third-party scripts and tools to publish or query metadata about released products, as well as a nice UI to explore and browse the catalog.

Kinto gives us a ready-to-use JSON storage API, whose accessibility via curl is rather intuitive, and for which we can manage authentication and permissions using simple tokens or LDAP. ElasticSearch gives us a rocket fast index that powers the faceted search of our catalog.

Prototype UI using SearchKit

How does it work?

Basically, when enabling the plugin a new endpoint is available: /buckets/{bid}/collections/{cid}/search

When the collection is created, a dedicated index is created on the underlying ElasticSearch cluster/instance. When a record is created, it is duplicated in the collection index. The /search on the collection is just a pass-through endpoint of the collection, where you post a query and obtain the results. You can only access it if you have the permission to read the whole collection.

By default, ElasticSearch will infer the field types from the posted data. But it is possible to define and force the mapping schema in the collection metadata.

The installation is straightforward:

pip install kinto-elasticsearch

And activation is done as usual in the settings:

kinto.includes = kinto_elasticsearch
kinto.elasticsearch.hosts = localhost:9200

Example with geographical data

ElasticSearch supports geospatial indices, which allow — for example — to filter points contained in a rectangle (aka. bounding box).

In this example, we'll import some OpenStreetMap data into a Kinto collection. A simple Leaflet map will then display all records as dots on the map, and filter the list of results based on the current map extent using the search endpoint.

Loading the data

We can obtain some OSM export on <http://overpass-turbo.eu>. For example, the following query will return every pizzeria of Italy:

area[name="Italia"];(node["cuisine"="pizza"](area););out;

We export and save the result as export.geojson.

Screenshot of query on overpass-turbo.eu

We load it into Kinto using kinto-http.py:

import kinto_http

server = "http://localhost:8888/v1"
bucket = "restaurants"
collection = "pizzerias"
auth = ("user", "pass")

# Create the destination bucket.
client = kinto_http.Client(server_url=server, auth=auth)
client.create_bucket(id=bucket, if_not_exists=True)

We define the index and permissions. Each pizzeria will have a name and a location (long, lat).

# Define the ElasticSearch mapping in the collection metadata.
collection_metadata = {
    "index:schema": {
        "properties": {
            "name": {
                "type": "text"
            },
            "location": {
                "type": "geo_point"
            }
        }
    }
}
# Let anonymous users read the records (online map demo).
collection_permissions = {
    "read": ["system.Everyone"]
}
client.create_collection(id=collection, bucket=bucket, data=collection_metadata,
                         permissions=collection_permissions, if_not_exists=True)

Create a record for each «feature» in the GeoJSON file, using batched requests (one HTTP request includes many creations):

export = json.load(open("export.geojson"))

with client.batch() as batch:
    for pizzeria in export["features"]:
        record = {
            "name": pizzeria["properties"].get("name"),
            "location": pizzeria["geometry"]["coordinates"],
        }
        batch.create_record(data=record, bucket=bucket, collection=collection)

Web mapping

Now that the data is loaded in Kinto, indexed in ElasticSearch, and publicly available, we'll render a map.

Leaflet.js is a nice piece of software that makes webmapping really pleasant.

// A map with OpenStreetMap background.
const map = L.map('map');
L.tileLayer('http://{s}.tile.osm.org/{z}/{x}/{y}.png', {
    attribution: '&copy; <a href="http://osm.org/copyright">OpenStreetMap</a> contributors'
}).addTo(map);

Using kinto-http.js, it is also quite intuitive to retrieve the whole list of records:

// Fetch all records from Kinto to populate the map layer.
// (Note: this also could have been done using ES.)
const kinto = new KintoClient(server);

kinto.bucket(bucket).collection(collection).listRecords()
  .then(({data: pizzerias}) => {
    // ...
  });

Showing records on the map is done like this. (Note: it would have been even simpler if Leaflet hadn't followed GMaps and used {long, lat} instead of {lat, long}.)

// Add a circle on the map for each record.
pizzerias.map((pizzeria) => {
  // Leaflet maps use [lat, lng] and GeoJSON uses [lng, lat].
  const latlng = [pizzeria.location[1], pizzeria.location[0]];

  L.circleMarker(latlng, {
    color: 'purple',
    fillOpacity: 0.7
  }).setRadius(4).addTo(map);
});

In order to query the search endpoint, we use the fetch API and post a query that defines a filter using a rectangle:

async function searchExtent(bbox) {
  const query = {
    query: {
      bool: {
        must: {
          match_all: {},
        },
        filter: {
          geo_bounding_box : {
            location: {
              top: bbox.getNorthWest().lat,
              left: bbox.getNorthWest().lng,
              bottom: bbox.getSouthEast().lat,
              right: bbox.getSouthEast().lng
            }
          }
        }
      }
    }
  };
  const url = `${server}/buckets/${bucket}/collections/${collection}/search`;
  const {hits: {hits}} = await postJSON(url, query);
  return hits;
}

async function postJSON(url, body) {
  const response = await fetch(url, {
    body: JSON.stringify(body),
    method: "POST",
    headers: {"Content-Type": "application/json"}
  });
  return await response.json();
}

Now that we know how to render the map and obtain the list of records on a rectangle, we just need to listen to the map move event and refresh the list of results:

map.on('load moveend', async (e) => {
  // Query ElasticSearch with the current map bounds as search extent.
  const bbox = e.target.getBounds();
  const hits = await searchExtent(bbox);

  // Render the results as a list of bullet points.
  const listing = document.getElementById("listing");
  listing.innerHTML = "";
  hits.map(({_source: pizzeria}) => {
    const node = document.createElement("li");
    const textnode = document.createTextNode(pizzeria.name || "(No name)");
    node.appendChild(textnode);
    listing.appendChild(node);
  });
});

Here you go!

Online demo

kinto-elasticsearch map search extent
Revenir au début