Merino

Merino is a service that provides address bar suggestions and curated recommendations to Firefox. Some of this content comes from third party providers. In this case, Merino serves as a privacy preserving buffer. User input in the address bar is handled by Merino and any clicked impression will be delegated to a Mozilla-controlled service which will then send an interaction ping if defined in the request and not to a provider directly. See API documentation for more details.

Table of Contents

api.md - API Documentation describes endpoints, query parameters, request and response headers, response objects and details on the suggestion objects.

firefox.md - Firefox and Merino Environments describes how to enable Merino in Firefox and lists the endpoints for the service in Production, State and Dev.

data.md - Data, Metrics, Logging describes all metrics and logs.

dev/index.md - Basic Developer Docs describes basics of working on Merino.

dev/dependencies.md - Development Dependencies describes the development dependencies required for Merino.

dev/logging-and-metrics.md - Logging and Metrics describes metrics, logging, and telemetry.

dev/release-process.md - Release Process describes the release process of Merino in detail.

dev/testing.md - Testing describes unit, integration and load tests for Merino.

dev/profiling.md - Profiling describes how to profile Merino to address performance issues.

operations/configs.md - Configuring Merino describes configuration management of the project, Dynaconf setup, and the configuration of the HTTP server, logging, metrics, Remote Settings, and Sentry.

operations/elasticsearch.md - Elasticsearch Operations describes some functionality and operations that we do on the Elasticsearch cluster.

operations/jobs.md - Merino Jobs describes the jobs that are configured in Merino. Indicate where the jobs exist and link to the details for how the jobs are run.

About the Name

This project drives an important part of Firefox's "felt experience". That is, the feeling of using Firefox, hopefully in a delightful way. The word "felt" in this phrase refers to feeling, but it can be punned to refer to the textile. Felt is often made of wool, and Merino wool (from Merino sheep) produces exceptionally smooth felt.

Architecture

flowchart TD
    User[\fa:fa-user User/]

    subgraph Firefox [fa:fa-firefox Firefox]
        online(Online Search and Suggest)
        offline(Offline Search and Suggest<br/>fetches adMarketplace, static Wikipedia, <br/>and other suggestions.<br/> Offline mode is fallback if Merino times out.)
    end

    User --> |Accessing the Firefox URL bar| Firefox

    subgraph Merino [fa:fa-leaf Merino]
        srh(fa:fa-gears Suggest Request Handler)

        subgraph middleware [fa:fa-paperclip Middleware]
            Geolocation
            Logging
            UserAgent
            Metrics
        end

        maxmind[(MaxmindDB)]
        Geolocation --> maxmind

        srh -..- middleware

        subgraph providers [fa:fa-truck Providers]
            adm(adm)
            amo(amo)
            geolocation(geolocation)
            toppicks(top-picks)
            weather(weather)
            wikipedia(wikipedia)
        end

        srh --> adm
        srh --> amo
        srh --> geolocation
        srh --> toppicks
        srh --> weather
        srh --> wikipedia

        subgraph backends [fa:fa-server Backends]
            rsb(remote settings)
            accuweather(accuweather)
            elastic(elastic)
            toppicks_back(top picks)
            dynamic_amo(dynamic addons)

        end

        adm --> rsb
        amo --> dynamic_amo
        toppicks --> toppicks_back
        weather --> accuweather
        wikipedia --> elastic
    end


    subgraph "Airflow (Merino Jobs)"
        wikipedia_sync(Wikipedia Sync)
        toppicks_sync(Top Picks Sync)
        addons_sync(Addons Remote Settings Upload)
    end

    addons_api(Addons API)
    dynamic_amo --> addons_api

    elastico[(Elasticsearch)]
    elastic --> elastico
    wikipedia_sync ..- |Syncs Wikipedia entries weekly| elastico

    accuweather_api(Accuweather API)
    accuweather ..-> accuweather_api

    redis[(Redis Cache)]
    accuweather ..-> |tries to query cache first| redis

    kinto[(Remote Settings)]
    rsb --- kinto
    addons_sync ..- |Add Addons Suggestions to Remote Settings| kinto

    toppicks_data[(GCS Top Picks Data,<br/>a list of Mozilla curated popular sites and metadata to be <br/>displayed on browser)]
    toppicks_sync ..-> toppicks_data

    online --> |/api/v1/suggest| srh
    offline ..- kinto

Merino API documentation

This page describes the API endpoints available on Merino.

The autogenerated API documentation exists here.

Configuring Firefox and Merino Environments

Merino has been enabled by default in Firefox. Though, you will need to enable the data sharing for Firefox Suggest to fully enable the feature. To enable it, type about:config in the URL bar set the Firefox preference browser.urlbar.quicksuggest.dataCollection.enabled to true. By default, Merino will connect to the production environments. This is controlled with the browser.urlbar.merino.endpointURL preference. See below for other options.

You can also query any of the endpoint URLs below with something like:

curl 'https://stage.merino.nonprod.cloudops.mozgcp.net/api/v1/suggest?q=your+query'

Environments

Production

Endpoint URL: https://merino.services.mozilla.com/api/v1/suggest

The primary environment for end users. Firefox is configured to use this by default.

Stage

Endpoint URL: https://stage.merino.nonprod.cloudops.mozgcp.net/api/v1/suggest

This environment is used for manual and load testing of the server. It is not guaranteed to be stable or available. It is used as a part of the deploy process to verify new releases before they got to production.

Data collection

This page should list all metrics and logs that Merino is expected to emit in production, including what should be done about them, if anything.

Logs

This list does not include any DEBUG level events, since those are not logged by default in production. The level and type of the log is listed.

Any log containing sensitive data must include a boolean field sensitive that is set to true to exempt it from flowing to the generally accessible log inspection interfaces.

Merino APIs

  • INFO web.suggest.request - A suggestion request is being processed. This event will include fields for all relevant details of the request. Fields:

    • sensitive - Always set to true to ensure proper routing.
    • query - If query logging is enabled, the text the user typed. Otherwise an empty string.
    • country - The country the request came from.
    • region - The first country subdivision the request came from.
    • city - The city the request came from.
    • dma - A US-only location description that is larger than city and smaller than states, but does not align to political borders.
    • agent - The original user agent.
    • os_family - Parsed from the user agent. One of "windows", "macos", "linux", "ios", "android", "chrome os", "blackberry", or "other".
    • form_factor - Parsed from the user agent. One of "desktop", "phone", "tablet", or "other"
    • browser - The browser and possibly version detected. Either "Firefox(XX)" where XX is the version, or "Other".
    • rid - The request ID.
    • WIP accepts_english - True if the user's Accept-Language header includes an English locale, false otherwise.
    • requested_providers - A comma separated list of providers requested via the query string, or an empty string if none were requested (in which case the default values would be used).
    • client_variants - Any client variants sent to Merino in the query string.
    • session_id - A UUID generated by the client for each search session.
    • sequence_no - A client-side event counter (0-based) that records the query sequence within each search session.
  • INFO request.summary - The application request summary that follows the MozLog convention. This log is recorded for all incoming HTTP requests except for the suggest API endpoint.

  • ERROR dockerflow.error_endpoint - The __error__ endpoint of the server was called. This is used to test our error reporting system. It is not a cause for concern, unless we receive a large amount of these records, in which case some outside service is likely malicious or misconfigured.

Merino Middleware Logs

Geolocation

  • WARNING merino.middleware.geolocation - There was an error with a geolocation lookup.

Merino Cron Tasks

  • WARNING merino.cron - There was an error while executing a cron task.

Merino Feature Flags

  • ERROR merino.featureflags - There was an error while attempting to assign a feature flag for a suggest API request.

Curated Recommendations

  • ERROR merino.curated_recommendations.corpus_backends.corpus_api_backend - Failed to get timezone for scheduled surface.
  • WARNING merino.curated_recommendations.corpus_backends.corpus_api_backend - Retrying CorpusApiBackend after an http client exception was raised.
  • ERROR GcsEngagement failed to update cache: {e} - unexpected exception when updating engagement.
  • ERROR Curated recommendations engagement size {blob.size} > {self.max_size} - Max engagement blob size is exceeded. The backend will gracefully fall back to cached data or 0's.
  • INFO Curated recommendations engagement unchanged since {self.last_updated}. - The engagement blob was not updated since the last check. last_updated is expected to be between 0 and 30 minutes.

Metrics

A note on timers: Statsd timers are measured in milliseconds, and are reported as integers (at least in Cadence). Milliseconds are often not precise enough for the tasks we want to measure in Merino. Instead, we use generic histograms to record microsecond times. Metrics recorded in this way should have -us appended to their name, to mark the units used (since we shouldn't put the proper unit μs in metric names).

  • merino.providers.initialize - A timer to measure the overall initialization duration (in ms) for all providers.

  • merino.providers.initialize.<provider> - A timer to measure the initialization duration (in ms) for the given <provider>.

    Example: merino.providers.initialize.adm

  • merino.<http_method>.<url_path>.status_codes.<status_code> - A counter to measure the status codes of an HTTP method for the <url_path>.

    Example: merino.get.api.v1.suggest.status_codes.200

  • merino.<http_method>.<url_path>.timing - A timer to measure the duration (in ms) of an HTTP method for a URL path.

    Example: merino.get.api.v1.suggest.timing

  • merino.<provider_module>.query - A timer to measure the query duration (in ms) of a certain suggestion provider.

    Example: merino.providers.adm.query

  • merino.<provider_module>.query.timeout - A counter to measure the query timeouts of a certain suggestion provider.

    Example: merino.providers.wikipedia.query.timeout

  • merino.suggestions-per.request - A histogram metric to get the distribution of suggestions per request.

  • merino.suggestions-per.provider.<provider_module> - A histogram metric to get the distribution of suggestions returned per provider (per request).

    Example: merino.suggestions-per.provider.wikipedia

AccuWeather

The weather provider records additional metrics.

  • accuweather.upstream.request.<request_type>.get - A counter to measure the number of times an upstream request to Accuweather was made.
  • accuweather.request.location.not_provided - A counter to measure the number of times a query was send without a location being provided, and therefore unable to process a weather request. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch - A timer to measure the duration (in ms) of looking up a weather report in the cache. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch.miss.locations - A counter to measure the number of times weather location was not in the cache. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch.miss.currentconditions - A counter to measure the number of times a current conditions was not in the cache. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch.miss.forecasts - A counter to measure the number of times a forecast for a location was not in the cache. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch.hit.{locations | currentconditions | forecasts} - A counter to measure the number of times a requested value like a location or forecast is in the cache. We don't count TTL hits explicitly, just misses. Sampled at 75%.
  • merino.providers.accuweather.query.backend.get - A timer to measure the duration (in ms) of a request for a weather report from the backend. This metric isn't recorded for cache hits. Sampled at 75%.
  • merino.providers.accuweather.query.cache.store - A timer to measure the duration (in ms) of saving a weather report from the backend to the cache. This metric isn't recorded for cache hits. Sampled at 75%.
  • merino.providers.accuweather.query.cache.error - A counter to measure the number of times the cache store returned an error when fetching or storing a weather report. This should be 0 in normal operation. In case of an error, the logs will include a WARNING with the full error message.

Curated Recommendations

The following additional metrics are recorded when curated recommendations are requested.

  • corpus_api.request.timing - A timer to measure the duration (in ms) of looking up a list of scheduled corpus items.
  • corpus_api.request.status_codes.{res.status_code} - A counter to measure the status codes of an HTTP request to the curated-corpus-api.
  • corpus_api.request.graphql_error - A counter to measure the number of GraphQL errors from the curated-corpus-api.
  • recommendation.engagement.update.timing - A timer to measure the duration (in ms) of updating the engagement data from GCS.
  • recommendation.engagement.size - A gauge to track the size of the engagement blob on GCS.
  • recommendation.engagement.count - A gauge to measure the total number of engagement records.
  • recommendation.engagement.{country}.count - A gauge to measure the number of scheduled corpus items with engagement data per country.
  • recommendation.engagement.{country}.clicks - A gauge to measure the number of clicks per country in our GCS engagement blob.
  • recommendation.engagement.{country}.impressions - A gauge to measure the number of impressions per country in our GCS engagement blob.
  • recommendation.engagement.last_updated - A gauge for the staleness (in seconds) of the engagement data, measured between when the data was updated in GCS and the current time.
  • recommendation.prior.update.timing - A timer to measure the duration (in ms) of updating the prior data from GCS.
  • recommendation.prior.size - A gauge to track the size of the Thompson sampling priors blob on GCS.
  • recommendation.prior.last_updated - A gauge for the staleness (in seconds) of the prior data, measured between when the data was updated in GCS and the current time.

Merino Developer Guidelines and Social Contract

This is an additional contractual document on top of CONTRIBUTING.

Foster a Shared Ownership

Not only do Merino developers build the service together, they also share the ownership of the service. That ownership is embodied in the following responsibilities:

  • Be responsible for the entire lifecycle of each change landed in the code base: from writing the PR and getting it merged; ensuring it goes through CI/CD and eventually deployed to production; setting up monitoring on metrics and ensuring its healthy status and the overall health of Merino.
  • Be familiar with Merino’s operation. Conduct operational reviews on a regular basis. Identify and track operational issues. Coordinate with the team(s) to close action items and resolve the identified issues.
  • Documentation. Make sure the code meets the documentation requirements (no linting errors). If a change adds/updates the API, logs or metrics, ensure the associated documentation is up to date.

We commit to sharing knowledge about Merino across the team, with the long-term goal that each team member is capable of resolving incidents of Merino. Merino developers should familiarize themselves with the Mozilla Incident Response Process and the Merino Runbooks. Each individual should be able to initiate an incident response, serve as the incident handling manager, and drive it to its resolution along with other incident responders. Any issues associated with an incident should be tracked in Jira in a way the team agrees upon. For example, assigned with an ‘incident-action-items’ label.

  • Be aware of the infrastructure costs associated with new functionality. The team should have a good understanding of the cost to run the service including logging, computing, networking, and storage costs.
  • Be mindful of work hours and the time zones of your fellow developers when scheduling meetings, deploying code, pairing on code, or collaborating in other ways. Set your work hours in Google Calendar and configure Slack to receive notifications only during those times. We encourage code deployments when there are fellow developers online to support. If you must deploy off-hours, ensure you have a peer available to approve any potential rollbacks.

We are not going to grow individual Merino developers in deployment, operation, documentation, and incident responding for Merino. Rather, we’d like to foster a shared ownership with shared knowledge in every aspect of the day-to-day job for Merino.

Use ADRs to Record Architectural Decisions

ADRs (Architectural Decision Record) are widely adopted by teams at Mozilla to capture important architecture decisions, including their context and consequences. Developers are encouraged to exercise the ADR process to facilitate the decision making on important subjects of the project. ADRs should be made easy to access and reference and therefore are normally checked into the source control and rendered as part of the project documentation.

Use SLO and Error Budget to Manage Service Risks

We strive to build highly available and reliable services while also emphasizing rapid iteration and continuous deployment as key aspects of product development. We opt to use SLOs (Service Level Objective) and error budget for risk management. SLOs can be co-determined by the product owner(s) and the service builders & operators. The error budget should be monitored and enforced by the monitoring infrastructure. Once the budget is reached, the service owners should be more reluctant or even reject to accept risky code artifacts until the budget gets reset.

Request RRA for New Content Integrations

RRA (Rapid Risk Assessment) is the recommended process for service builders to perform a standardized lightweight risk assessment for the service or the feature of interest. Since Merino is a user-facing consumer service, we shall take extra caution for user security and the related risks. We have agreed with the Security Assurance team that we’d request an RRA (by following the RRA instructions) for every new content integration (e.g. AccuWeather) or content storage (e.g. Elasticsearch) for Merino.

Testing for Productivity & Reliability

We value testing as a mechanism of establishing feedback loops for service development, design, and release. As developers add new changes to the project, thorough and effective testing reduces uncertainty and generates short feedback loops, accelerating development, release, and regression resolution. Testing also helps reduce the potential decrease in reliability from each change. To materialize those merits for Merino, we have designed the Merino Test Strategy and fulfilled it with adequate tests. We anticipate the cross-functional team to adhere to the strategy and evolve it to better support the project over time.

Aim for Simplicity

We prioritize simple and conventional solutions in all aspects of development, from system design, to API specs, to code. We prefer mature, battle-tested technologies over complex, cutting-edge alternatives. At the same time, we know that Merino can always get better, and we welcome ideas from everyone. If you’ve got a new approach in mind, share it with the team or propose an Architectural Decision Record (ADR).

Blame-free Culture

While we strive to make Merino a highly reliable service, things would still go wrong regardless of how much care we take. Code errors, misconfigurations, operational glitches, to name a few. We opt for a blame-free culture to ease the mental stress when individuals are encouraged to take on more activities & responsibilities, especially before they gain familiarity around the tasks. We believe that learning from mistakes and incorporating the learned experience into processes to avoid repeating the same mistakes is more constructive and useful than putting someone on center stage. With a blame-free culture and proper risk management processes in place, the average cost of failures should be more tolerable within the error budget boundary. Who would be afraid of making mistakes?

Have Fun

Last but not least. Let’s make Merino a fun project to work with!

Developer documentation for working on Merino

tl;dr

Here are some useful commands when working on Merino.

Run the main app

This project uses Poetry for dependency management. See dependencies for how to install Poetry on your machine.

Install all the dependencies:

$ poetry install

Run Merino:

$ poetry run uvicorn merino.main:app --reload

# Or you can use a shortcut
$ make run

General commands

# List all available make commands with descriptions
$ make help

# Just like `poetry install`
$ make install

# Run linter
$ make ruff-lint

# Run format checker
$ make ruff-fmt

# Run formatter
$ make ruff-format

# Run black
$ make black

# Run bandit
$ make bandit

# Run mypy
$ make mypy

# Run all linting checks
$ make -k lint

# Run all formatters
$ make format

# Run merino-py with the auto code reloading
$ make dev

# Run merino-py without the auto code reloading
$ make run

# Run unit and integration tests and evaluate combined coverage
$ make test

# Evaluate combined unit and integration test coverage
$ make test-coverage-check

# Run unit tests
$ make unit-tests

# List fixtures in use per unit test
$ make unit-test-fixtures

# Run integration tests
$ make integration-tests

# List fixtures in use per integration test
$ make integration-test-fixtures

# Build the docker image for Merino named "app:build"
$ make docker-build

# Run local execution of (Locust) load tests
$ make load-tests

# Stop and remove containers and networks for load tests
$ make load-tests-clean

# Generate documents
$ make doc

# Preview the generated documents
$ make doc-preview

# Profile Merino with Scalene
$ make profile

# Run the Wikipedia CLI job
$ make wikipedia-indexer job=$JOB

Documentation

You can generate documentation, both code level and book level, for Merino and all related crates by running ./dev/make-all-docs.sh. You'll need mdbook and mdbook-mermaid, which you can install via:

make doc-install-deps

If you haven't installed Rust and Cargo, you can reference the official Rust document.

Local configuration

The default configuration of Merino is development, which has human-oriented pretty-print logging and debugging enabled. For settings that you wish to change in the development configuration, you have two options, listed below.

For full details, make sure to check out the documentation for Merino's setting system (operations/configs.md).

Update the defaults

Dynaconf is used for all configuration management in Merino, where values are specified in the merino/configs/ directory in .toml files. Environment variables are set for each environment as well and can be set when using the cli to launch the Merino service. Environment variables take precedence over the values set in the .toml files, so any environment variable set will automatically override defaults. By the same token, any config file that is pointed to will override the merino/configs/default.toml file.

If the change you want to make makes the system better for most development tasks, consider adding it to merino/configs/development.toml, so that other developers can take advantage of it. If you do so, you likely want to add validation to those settings which needs to be added in merino/config.py, where the Dynaconf instance exists along with its validators. For examples of the various config settings, look at configs/default.toml and merino/config.py to see an example of the structure.

It is not advisable to put secrets in configs/secrets.toml.

Create a local override

Dynaconf will use the specified values and environment variables in the merino/configs/default.toml file. You can change the environment you want to use as mentioned above, but for local changes to adapt to your machine or tastes, you can put the configuration in merino/configs/development.local.toml. This file doesn't exist by default, so you will have to create it. Then simply copy from the other config files and make the adjustments that you require. These files should however not be checked into source control and are configured to be ignored, so long as they follow the *.local.toml format. Please follow this convention and take extra care to not check them in and only use them locally.

See the Dynaconf Documentation for more details.

Content Moderation and Blocklists

This summarizes the mechanisms that block sensitive or questionable content in Merino. Because Merino supports several providers that have a broad range of potential suggestions, often from different sources, we require the ability to remove certain suggestions from being displayed.

Blocklists in Merino filter content at two distinct phases:

  1. Content that is filtered at the data creation and indexing phase. Provider backends serve suggestions to the client based on matching against searched terms. This ensures that data that could be sensitive is not available to search against since it is not indexed. For instance, the Wikipedia provider filters categories of articles that are tagged with a matching category term in the blocklist.

  2. Content that is filtered at application runtime. There are instances where we want to quickly and dynamically add to block lists without re-indexing or running a job. In this case, suggestions are compared to a static list in the code that blocks out these suggestions.

In the Navigational Suggestions provider, a blocklist is used during data creation to block specific domains of websites that we do not want to suggest.

The blocklist, domain_blocklist.json, is referenced during data generation of the top_picks.json file, which is ingested by the provider backend. This ensures specific domains are not indexed for suggestions. The blocklist is loaded and an exact string comparison is made between all second-level domains and the second-level domains defined in the blocklist.

See nav-suggestions blocklist runbook for more information.

Wikipedia

The Wikipedia Provider does both title filtering and category filtering at the data indexing level.

Since the indexing jobs run periodically, we also implemented title filtering in the provider to get the blocking out sooner.

Indexer

The Wikipedia Indexer Job references a remote blocklist which contains sensitive categories. At job runtime, the indexer reads the remote blocklist and creates a set of article categories that are be excluded from indexing.

The article categories in the blocklist are chosen based off of analysis and best guesses of what could be considered objectionable content, based off of Mozilla's values and brand image. Any modifications to the file should be done with careful consideration.

The indexer also blocks titles that are defined in the WIKIPEDIA_TITLE_BLOCKLIST in the application, which is referenced below. Any title that matches this blocklist is excluded from indexing.

Provider

When queried, the Wikipedia provider reads the WIKIPEDIA_TITLE_BLOCKLIST when creating a WikipediaSuggestion and if the query matches a blocked title, the suggestion is not shown to the client.

We have this feature because the indexing job is not run daily. Therefore, we desire having an option to rapidly add to this list should we need to block a specific article.

See wikipedia blocklist runbook for more information.

Development Dependencies

Package Dependencies

This project uses Poetry for dependency management. While you can use the vanilla virtualenv to set up the dev environment, we highly recommend to check out pyenv and pyenv-virtualenv, as they work nicely with Poetry. Follow the instructions to install pyenv, pyenv-virtualenv, and poetry.

Feel free to browse the pyproject.toml file for a listing of dependencies and their versions.

Once Poetry is installed, install all the dependencies:

$ poetry install

Add packages to project via poetry

$ poetry add <package_name>

After that you should be to run Merino as follows:

$ poetry run uvicorn merino.main:app --reload

# Or you can fire up a poetry shell to make it shorter
$ poetry shell
$ uvicorn merino.main:app --reload

Service Dependencies

Merino uses a Redis-based caching system, and so requires a Redis instance to connect to.

To make things simple, Redis (and any future service dependencies) can be started with Docker Compose, using the docker-compose.yaml file in the dev/ directory. Notably, this does not run any Merino components that have source code in this repository.

$ cd dev
$ docker-compose up

# Or run services in deamon mode
$ docker-compose up -d

# Stop it
$ docker-compose down


# Shortcuts are also provided
$ make docker-compose-up
$ make docker-compose-up-daemon
$ make docker-compose-down

Redis is listening on port 6397 and can be connected via redis://localhost:6397.

This Dockerized set up is optional. Feel free to run the dependent services by any other means as well.

Dev Helpers

The docker-compose setup also includes some services that can help during development.

  • Redis Commander, http://localhost:8081 - Explore the Redis database started above.

Logging and Metrics

To get data out of Merino and into observable systems, we use metrics and logging. Each has a unique use case. Note that in general, because of the scale we work at, adding a metric or log event in production is not free, and if we are careless can end up costing quite a bit. Record what is needed, but don't go over board.

All data collection that happens in production (logging at INFO, WARN, or ERROR levels; and metrics) should be documented in docs/data.md.

Logging

Merino uses MozLog for structured logging. Logs can be recorded through the standard Python logging module. Merino can output logs in various formats, including a JSON format (MozLog) for production. A pretty, human readable format is also provided for development and other use cases.

Types

MozLog requires that all messages have a type value. By convention, we use the name of the Python module, where the log record get issued, to populate this field. For example:

import logging

logger = logging.getLogger(__name__)

# The `type` field of the log record will be the same as `__name__`.
logger.info("A new log message", data=extra_fields)

In general, the log message ("An empty MultiProvider was created") and the log type should both tell the reader what has happened. The difference is that the message is for humans and the type is for machines.

Levels

Tracing provides five log levels that should be familiar. This is what we mean by them in Merino:

  • CRITICAL - There was a serious error indicating that the program itself may be unable to continue running.

  • ERROR - There was a problem, and the task was not completable. This usually results in a 500 being sent to the user. All error logs encountered in production are reported to Sentry and should be considered a bug. If it isn't a bug, it shouldn't be logged as an error.

  • WARNING - There was a problem, but the task was able to recover. This doesn't usually affect what the user sees. Warnings are suitable for unexpected but "in-spec" issues, like a sync job not returning an empty set or using a deprecated function. These are not reported to Sentry.

  • INFO - This is the default level of the production service. Use for logging that something happened that isn't a problem and we care about in production. This is the level that Merino uses for it's one-per-request logs and sync status messages. Be careful adding new per-request logs at this level, as they can be expensive.

  • DEBUG - This is the default level for developers running code locally. Use this to give insight into how the system is working, but keep in mind that this will be on by default, so don't be too noisy. Generally this should summarize what's happening, but not give the small details like a log line for every iteration of a loop. Since this is off in production, there are no cost concerns.

Metrics

Merino metrics are reported as Statsd metrics.

Unlike logging, the primary way that metrics reporting can cost a lot is in cardinality. The number of metric IDs we have and the combination of tag values that we supply. Often the number of individual events doesn't matter as much, since multiple events are aggregated together.

Middlwares

Merino leverages middleware for various functionalities such as logging, metrics, parsing for geolocation & user agent, feature flags etc. Middleware is defined in the merino/middleware directory.

Caveat

We currently don't implement middleware using the middleware facilities provided by FastAPI/Starlette as they've shown significant performance overhead, preventing Merino from achieving the SLOs required by Firefox Suggest.

Before those performance issues get resolved in the upstream, we will be implementing middleware for Merino through the ASGI protocol. You can also reference this tutorial to learn more about ASGI. See Starlette's middleware document for more details about how to write pure ASGI middlewares. Specifically, we can reuse Starlette's data structures (Request, Headers, QueryParams etc.) to facilitate the implementation.

Feature Flags

Usage

Do you plan to release code behind a feature flag? Great! 😃

Your feature flag needs to be defined first. If it's already defined, go ahead. Otherwise check the configuration section below before you continue.

Use the following line in API endpoint code to gain access to the feature flags object:

feature_flags: FeatureFlags = request.scope[ScopeKey.FEATURE_FLAGS]

Then check whether a certain feature flag, such as example, is enabled by calling:

if feature_flags.is_enabled("example"):
    print("feature flag 'example' is enabled! 🚀")

When you do that, the decision (whether the feature flag is enabled or not) is recorded and stored in a dict on the decisions attribute of the feature flags object.

Implementation

The feature flags system in Merino consists of three components:

DescriptionLocation
A FastAPI middleware that reads the query parameter sid sent by the client application and sets a session ID for the current request based on that.merino/middleware/featureflags.py
A FeatureFlags class which you can use to check if a certain feature flag is enabled.merino/featureflags.py
A local directory containing static files that define and configure feature flags for Merino.merino/configs/flags/

Configuration

Currently two bucketing schemes are supported: random and session.

Random

Random does what it says on the tin. It generates a random bucketing ID for every flag check.

Session

Session bucketing uses the session ID of the request as the bucketing key so that feature checks within a given search session would be consistent.

Fields

Each flag defines the following fields:

[default.flags.<flag_name>]
scheme = 'session'
enabled = 0.5
FieldDescription
schemeThis is the bucketing scheme for the flag. Allowed values are 'random' and 'session'
enabledThis represents the % enabled for the flag and must be a float between 0 and 1

Metrics

When submitting application metrics, feature flag decisions that were made while processing the current request up to this point are automatically added as tags to the emitted metrics.

The format of these tags is:

feature_flag.<feature_flag_name>

For more information about this see the ClientMeta meta class and the add_feature_flags decorator in merino/metrics.py.

Monitoring in Grafana

Because feature flag decisions are automatically added as tags to emitted metrics, you can use them in your queries in Grafana. 📈

For example, if you want to group by decisions for a feature flag with name hello_world, you can use tag(feature_flag.hello_world) in GROUP BY in Grafana. You can also use [[tag_feature_flag.hello_world]] in the ALIAS for panel legends.

The Release Process

This project currently follows a Continuous Deployment process.

Whenever a commit is pushed to this repository's main branch, a CircleCI workflow is triggered which performs code checks and runs automated tests. The workflow additionally builds a new Docker image of the service and pushes that Docker image to the Docker Hub registry (this requires all previous jobs to pass).

Pushing a new Docker image to the Docker Hub registry triggers a webhook that starts the Jenkins deployment pipeline (the Docker image tag determines the target environment). The deployment pipeline first deploys to the stage environment and subsequently to the production environment.

Activity diagram of CircleCI main-workflow

After the deployment is complete, accessing the __version__ endpoint will show the commit hash of the deployed version, which will eventually match to the one of the latest commit on the main branch (a node with an older version might still serve the request before it is shut down).

Release Best Practices

The expectation is that the author of the change will:

  • merge pull requests during hours when the majority of contributors are online
  • monitor the [Merino Application & Infrastructure][merino_app_info] dashboard for any anomaly

Versioning

The commit hash of the deployed code is considered its version identifier. The commit hash can be retrieved locally via git rev-parse HEAD.

Load Testing

Load testing can be performed either locally or during the deployment process. During deployment, load tests are run against the staging environment before Merino-py is promoted to production.

Load tests in continuous deployment are controlled by adding a specific label to the commit message being deployed. The format for the label is [load test: (abort|skip|warn)]. Typically, this label is added to the merge commit created when a GitHub pull request is integrated.

  • abort: Stops the deployment if the load test fails.
  • skip: Skips load testing entirely during deployment.
  • warn: Proceeds with the deployment even if the load test fails, but sends a warning notification through Slack.

If no label is included in the commit message, the default behavior is to run the load test and issue a warning if it fails.

For more detailed information about load testing procedures and conventions, please refer to the Load Test README.

Logs from load tests executed in continuous deployment are available in the /data volume of the Locust master kubernetes pod.

What to do if production breaks?

If your latest release causes problems and needs to be rolled back: don't panic and follow the instructions in the Rollback Runbook.

What to do if tests fail during deployment?

Please refer to What to do with Test Failures in CI?

Profiling

As Merino runs as a single-threaded application using the asyncio-based framework, it would be useful for engineers to get a good understanding about how Merino performs and where it spends time and memory doing what tasks to serve the requests. Local profiling offers us a way to look into those low-level details.

We use Scalene as the profiler to conduct the profiling for Merino. It's very easy to use, offers extremely detailed (at the line level) insights with much lower overhead compared to other profilers.

Usage

To start the profiling, you can run the following to start Merino with Scalene:

$ make profile

# or you can run it directly

$ python -m scalene merino/main.py

Then you can send requests to Merino manually or through using other load testing tools. Once that's done, you can terminate the Merino application. It will automatically collect profiling outputs (CPU & Memory) and open it in your browser.

Understand the outputs

Out of the box, Scalene provides a very intuitive web interface to display the profiling outputs. It's organized at the file (module) level. For each file, it shows the CPU time and average memory usage for both the line profile and the function profile of that module. You can also click on specific columns to sort the lines or functions accordingly.

For more details of how to read the outputs, you can reference Scalene's documents.

Equipped with those insights, you can have a good understanding about the application, identify hotspots, bottlenecks, or other findings that are not easy to uncover by only reading the source code. And then, you can tweak or fix those issues, test or profile it again to verify if the fix is working.

Merino Testing

Test Strategy

Merino is tested using a combination of functional and performance tests.

Test code resides in the tests directory.

Merino's test strategy requires that we do not go below a minimum test coverage percentage for unit and integration tests. Load tests cannot go below a minimum performance threshold.

Test documentation resides in the /docs/testing/ directory.

The functional test strategy is four-tiered, composed of:

See documentation and repositories in each given test area for specific details on running and maintaining tests.

Unit Tests

The unit layer is suitable for testing complex behavior at a small scale, with fine-grained control over the inputs. Due to their narrow scope, unit tests are fundamental to thorough test coverage.

To execute unit tests, use: make unit-tests

Unit tests are written and executed with pytest and are located in the tests/unit directory, using the same organizational structure as the source code of the merino service. Type aliases dedicated for test should be stored in the types.py module. The conftest.py modules contain common utilities in fixtures.

For a breakdown of fixtures in use per test, use: make unit-test-fixtures

Fixtures

Available fixtures include:

FilterCaplogFixture

Useful when verifying log messages, this fixture filters log records captured with pytest's caplog by a given logger_name.

Usage:

def test_with_filter_caplog(
    caplog: LogCaptureFixture, filter_caplog: FilterCaplogFixture
) -> None:
    records: list[LogRecord] = filter_caplog(caplog.records, "merino.providers.adm")

Note: This fixture is shared with integration tests.

SuggestionRequestFixture

For use when querying providers, this fixture creates a SuggestionRequest object with a given query

Usage:

def test_with_suggestion_request(srequest: SuggestionRequestFixture) -> None:
    request: SuggestionRequest = srequest("example")
    result: list[BaseSuggestion] = await provider.query(request)

ScopeFixture, ReceiveMockFixture & SendMockFixture

For use when testing middleware, these fixtures initialize or mock the common Scope, Receive and Send object dependencies.

Usage:

def test_middleware(scope: Scope, receive_mock: Receive, send_mock: Send) -> None:
    pass

Integration Tests

The integration layer of testing allows for verification of interactions between service components, with lower development, maintenance and execution costs compared with higher level tests.

To execute integration tests, make sure you have Docker installed and a docker daemon running. Then use: make integration-tests

Integration tests are located in the tests/integration directory. They use pytest and the FastAPI TestClient to send requests to specific merino endpoints and verify responses as well as other outputs, such as logs. Tests are organized according to the API path under test. Type aliases dedicated for test should be stored in the types.py module. Fake providers created for test should be stored in the fake_providers.py module. The conftest.py modules contain common utilities in fixtures.

We have also added integration tests that use Docker via the testcontainers library. See fixture example below.

For a breakdown of fixtures in use per test, use: make integration-test-fixtures

Fixtures

Available fixtures include:

FilterCaplogFixture

Details available in Unit Tests section

TestClientFixture

This fixture creates an instance of the TestClient to be used in testing API calls.

Usage:

def test_with_test_client(client: TestClient):
    response: Response = client.get("/api/v1/endpoint")

TestClientWithEventsFixture

This fixture creates an instance of the TestClient, that will trigger event handlers (i.e. startup and shutdown) to be used in testing API calls.

Usage:

def test_with_test_client_with_event(client_with_events: TestClient):
    response: Response = client_with_events.get("/api/v1/endpoint")

RequestSummaryLogDataFixture

This fixture will extract the extra log data from a captured 'request.summary' LogRecord for verification

Usage:

def test_with_log_data(
    caplog: LogCaptureFixture,
    filter_caplog: FilterCaplogFixture,
    extract_request_summary_log_data: LogDataFixture
):
    records: list[LogRecord] = filter_caplog(caplog.records, "request.summary")
    assert len(records) == 1

    record: LogRecord = records[0]
    log_data: dict[str, Any] = extract_request_summary_log_data(record)
    assert log_data == expected_log_data

InjectProvidersFixture & ProvidersFixture

These fixture will setup and teardown given providers.

Usage:

If specifying providers for a module:

@pytest.fixture(name="providers")
def fixture_providers() -> Providers:
    return {"test-provider": TestProvider()}

If specifying providers for a test:

@pytest.mark.parametrize("providers", [{"test-provider": TestProvider()}])
def test_with_provider() -> None:
    pass

SetupProvidersFixture

This fixture sets application provider dependency overrides.

Usage:

def test_with_setup_providers(setup_providers: SetupProvidersFixture):
    providers: dict[str, BaseProvider] = {"test-provider": TestProvider()}
    setup_providers(providers)

TeardownProvidersFixture

This fixture resets application provider dependency overrides and is often used in teardown fixtures.

Usage:

@pytest.fixture(autouse=True)
def teardown(teardown_providers: TeardownProvidersFixture):
    yield
    teardown_providers()

TestcontainersFixture

See tests/integration/jobs/navigational_suggestions/test_domain_metadata_uploader.py for a detailed example.

This is a lightweight example on how to set up a docker container for your integration tests.

Usage:

@pytest.fixture(scope="module")
def your_docker_container() -> DockerContainer:
    os.environ.setdefault("STORAGE_EMULATOR_HOST", "http://localhost:4443")

    container = (
        DockerContainer("your-docker-image")
        .with_command("-scheme http")
        .with_bind_ports(4443, 4443)
    ).start()

    # wait for the container to start and emit logs
    delay = wait_for_logs(container, "server started at")
    port = container.get_exposed_port(4443)

    yield container

    container.stop()

Merino Load (Locust) Tests

This documentation describes the load tests for Merino. This test framework uses IP2Location LITE data available from https://lite.ip2location.com

Overview

The tests in the tests/load directory spawn multiple HTTP clients that consume Merino's API, in order to simulate real-world load on the Merino infrastructure. These tests use the Locust framework and are triggered at the discretion of the Merino Engineering Team.

Local Execution

Note that if you make changes to the load test code, you must stop and remove the Docker containers and networks for changes to reflect. Do this by running make load-tests-clean.

Follow the steps bellow to execute the load tests locally:

Setup Environment

1. Configure Environment Variables

The following environment variables as well as Locust environment variables can be set in tests\load\docker-compose.yml. Make sure any required API key is added but then not checked into source control.

WARNING: if the WIKIPEDIA__ES_API_KEY is missing, the load tests will not execute.

Environment VariableNode(s)Description
LOAD_TESTS__LOGGING_LEVELmaster & workerLevel for the logger in the load tests as an int (10 for DEBUG, 20 for INFO etc.)
MERINO_REMOTE_SETTINGS__SERVERmaster & workerServer URL of the Kinto instance containing suggestions
MERINO_REMOTE_SETTINGS__BUCKETmaster & workerKinto bucket with the suggestions
MERINO_REMOTE_SETTINGS__COLLECTIONmaster & workerKinto collection with the suggestions
MERINO_PROVIDERS__TOP_PICKS__TOP_PICKS_FILE_PATHmaster & workerThe minimum character limit set for long domain suggestion indexing
MERINO_PROVIDERS__TOP_PICKS__QUERY_CHAR_LIMITmaster & workerThe minimum character limit set for short domain suggestion indexing
MERINO_PROVIDERS__TOP_PICKS__FIREFOX_CHAR_LIMITmaster & workerFile path to the json file of domains
MERINO_PROVIDERS__WIKIPEDIA__ES_API_KEYmaster & workerThe base64 key used to authenticate on the Elasticsearch cluster specified by es_cloud_id
MERINO_PROVIDERS__WIKIPEDIA__ES_URLmaster & workerThe Cloud ID of the Elasticsearch cluster
MERINO_PROVIDERS__WIKIPEDIA__ES_INDEXmaster & workerThe index identifier of Wikipedia in Elasticsearch

2. Host Locust via Docker

Execute the following from the repository root:

make load-tests

3. (Optional) Host Merino Locally

Use one of the following commands to host Merino locally. Execute the following from the repository root:

  • Option 1: Use the local development instance
    make dev
    
  • Option 2: Use the profiler instance
    make profile
    
  • Option 3: Use the Docker instance
    make docker-build && docker run -p 8000:8000 app:build
    

Run Test Session

1. Start Load Test

  • In a browser navigate to http://localhost:8089/
  • Set up the load test parameters:
    • Option 1: Select the MerinoSmokeLoadTestShape or MerinoAverageLoadTestShape
      • These options have pre-defined settings
    • Option 2: Select the Default load test shape with the following recommended settings:
      • Number of users: 25
      • Spawn rate: 1
      • Host: 'https://stagepy.merino.nonprod.cloudops.mozgcp.net'
        • Set host to 'http://host.docker.internal:8000' to test against a local instance of Merino
      • Duration (Optional): 10m
  • Select "Start Swarming"

2. Stop Load Test

Select the 'Stop' button in the top right hand corner of the Locust UI, after the desired test duration has elapsed. If the 'Run time' is set in step 1, the load test will stop automatically.

3. Analyse Results

Clean-up Environment

1. Remove Load Test Docker Containers

Execute the following from the repository root:

make load-tests-clean

Distributed GCP Execution - Manual Trigger

Follow the steps bellow to execute the distributed load tests on GCP with a manual trigger:

Setup Environment

1. Start a GCP Cloud Shell

The load tests can be executed from the contextual-services-test-eng cloud shell.

2. Configure the Bash Script

  • The setup_k8s.sh file, located in the tests\load directory, contains shell commands to create a GKE cluster, setup an existing GKE cluster or delete a GKE cluster
    • Modify the script to include the MERINO_PROVIDERS__WIKIPEDIA__ES_API_KEY environment variables
    • Execute the following from the root directory, to make the file executable:
      chmod +x tests/load/setup_k8s.sh
      

3. Create the GCP Cluster

  • Execute the setup_k8s.sh file and select the create option, in order to initiate the process of creating a cluster, setting up the env variables and building the docker image. Choose smoke or average depending on the type of load test required.
    ./tests/load/setup_k8s.sh create [smoke|average]
    
    • Smoke - The smoke load test verifies the system's performance under minimal load. The test is run for a short period, possibly in CD, to ensure the system is working correctly.
    • Average - The average load test measures the system's performance under standard operational conditions. The test is meant to reflect an ordinary day in production.
  • The cluster creation process will take some time. It is considered complete, once an external IP is assigned to the locust_master node. Monitor the assignment via a watch loop:
    kubectl get svc locust-master --watch
    
  • The number of workers is defaulted to 5, but can be modified with the kubectl scale command. Example (10 workers):
    kubectl scale deployment/locust-worker --replicas=10
    
  • To apply new changes to an existing GCP Cluster, execute the setup_k8s.sh file and select the setup option.
    • This option will consider the local commit history, creating new containers and deploying them (see Artifact Registry)

Run Test Session

1. Start Load Test

  • In a browser navigate to http://$EXTERNAL_IP:8089

    This url can be generated via command

    EXTERNAL_IP=$(kubectl get svc locust-master -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
    echo http://$EXTERNAL_IP:8089
    
  • Select the MerinoSmokeLoadTestShape, this option has pre-defined settings and will last 5 minutes

  • Select "Start Swarming"

2. Stop Load Test

Select the 'Stop' button in the top right hand corner of the Locust UI, after the desired test duration has elapsed. If the 'Run time' is set in step 1, the load test will stop automatically.

3. Analyse Results

RPS

  • The request-per-second load target for Merino is 1500
  • Locust reports client-side RPS via the "merino_stats.csv" file and the UI (under the "Statistics" tab or the "Charts" tab)
  • Grafana reports the server-side RPS via the "HTTP requests per second per country" chart

HTTP Request Failures

  • The number of responses with errors (5xx response codes) should be 0
  • Locust reports Failures via the "merino_failures.csv" file and the UI (under the "Failures" tab or the "Charts" tab)
  • Grafana reports Failures via the "HTTP Response codes" chart and the "HTTP 5xx error rate" chart

Exceptions

  • The number of exceptions raised by the test framework should be 0
  • Locust reports Exceptions via the "merino_exceptions.csv" file and the UI (under the "Exceptions" tab)

Latency

  • The HTTP client-side response time (aka request duration) for 95 percent of users is required to be 200ms or less (p95 <= 200ms), excluding weather requests
  • Locust reports client-side latency via the "merino_stats.csv" file and the UI (under the "Statistics" tab or the "Charts" tab)
    • Warning! A Locust worker with too many users will bottleneck RPS and inflate client-side latency measures. Locust reports worker CPU and memory usage metrics via the UI (under the "Workers" tab)
  • Grafana reports server-side latency via the "p95 latency" chart

Resource Consumption

  • To conserve costs, resource allocation must be kept to a minimum. It is expected that container, CPU and memory usage should trend consistently between load test runs.
  • Grafana reports metrics on resources via the "Container Count", "CPU usage time sum" and "Memory usage sum" charts

4. Report Results

  • Results should be recorded in the Merino Load Test Spreadsheet
  • Optionally, the Locust reports can be saved and linked in the spreadsheet:
    • Download the results via the Locust UI or via command:
      kubectl cp <master-pod-name>:/home/locust/merino_stats.csv merino_stats.csv
      kubectl cp <master-pod-name>:/home/locust/merino_exceptions.csv merino_exceptions.csv
      kubectl cp <master-pod-name>:/home/locust/merino_failures.csv merino_failures.csv
      
      The master-pod-name can be found at the top of the pod list:
      kubectl get pods -o wide
      
    • Upload the files to the ConServ drive and record the links in the spreadsheet

Clean-up Environment

1. Delete the GCP Cluster

Execute the setup_k8s.sh file and select the delete option

./tests/load/setup_k8s.sh

Distributed GCP Execution - CI Trigger

The load tests are triggered in CI via Jenkins, which has a command overriding the load test Dockerfile entrypoint.

Follow the steps below to execute the distributed load tests on GCP with a CI trigger:

Run Test Session

1. Execute Load Test

To modify the load testing behavior, you must include a label in your Git commit. This must be the merge commit on the main branch, since only the most recent commit is checked for the label. The label format is: [load test: (abort|skip|warn)]. Take careful note of correct syntax and spacing within the label. There are three options for load tests: abort, skip, and warn:

  • The abort label will prevent a prod deployment if the load test fails
    Ex. feat: Add feature ABC [load test: abort].
  • The skip label will bypass load testing entirely during deployment
    Ex. feat: Add feature LMN [load test: skip].
  • The warn label will output a Slack warning if the load test fails but still allow for the production deployment
    Ex. feat: Add feature XYZ [load test: warn].

If no label is included in the commit message, the load test will be executed with the warn action.

The commit tag signals load test instructions to Jenkins by modifying the Docker image tag. The Jenkins deployment workflow first deploys to stage and then runs load tests if requested. The Docker image tag passed to Jenkins appears as follows: ^(?P<environment>stage|prod)(?:-(?P<task>\w+)-(?P<action>abort|skip|warn))?-(?P<commit>[a-z0-9]+)$

2. Analyse Results

See Distributed GCP Execution (Manual Trigger) - Analyse Results

3. Report Results

  • Optionally, results can be recorded in the Merino Load Test Spreadsheet. It is recommended to do so if unusual behavior is observed during load test execution or if the load tests fail.
  • The Locust reports can be saved and linked in the spreadsheet. The results are persisted in the /data directory of the locust-master-0 pod in the locust-master k8s cluster in the GCP project of merino-nonprod. To access the Locust logs:
    • Open a cloud shell in the Merino stage environment
    • Authenticate by executing the following command:
        gcloud container clusters get-credentials merino-nonprod-v1 \
          --region us-west1 --project moz-fx-merino-nonprod-ee93
      
    • Identify the log files needed in the Kubernetes pod by executing the following command, which lists the log files along with file creation timestamp when the test was performed. The {run-id} uniquely identifies each load test run:
        kubectl exec -n locust-merino locust-master-0 -- ls -al /data/
      
    • Download the results via the Locust UI or via command:
      kubectl -n locust-merino cp locust-master-0:/data/{run-id}-merino_stats.csv merino_stats.csv
      kubectl -n locust-merino cp locust-master-0:/data/{run-id}-merino_exceptions.csv merino_exceptions.csv
      kubectl -n locust-merino cp locust-master-0:/data/{run-id}-merino_failures.csv merino_failures.csv
      
    • Upload the files to the ConServ drive and record the links in the spreadsheet

Calibration

Following the addition of new features, such as a Locust Task or Locust User, or environmental changes, such as node size or the upgrade of a major dependency like the python version image, it may be necessary to re-establish the recommended parameters of the performance test.

ParameterDescription
WAIT TIME- Changing this cadence will increase or decrease the number of channel subscriptions and notifications sent by a MerinoUser.
- The default is currently in use for the MerinoUser class.
TASK WEIGHT- Changing this weight impacts the probability of a task being chosen for execution.
- This value is hardcoded in the task decorators of the MerinoUser class.
USERS_PER_WORKER- This value should be set to the maximum number of users a Locust worker can support given CPU and memory constraints.
- This value is hardcoded in the LoadTestShape classes.
WORKER_COUNT- This value is derived by dividing the total number of users needed for the performance test by the USERS_PER_WORKER.
- This value is hardcoded in the LoadTestShape classes and the setup_k8s.sh script.
  • Locust documentation is available for [WAIT TIME][13] and [TASK WEIGHT][14]

Calibrating for USERS_PER_WORKER

This process is used to determine the number of users that a Locust worker can support.

Setup Environment

1. Start a GCP Cloud Shell

The load tests can be executed from the contextual-services-test-eng cloud shell. If executing a load test for the first time, the git merino-py repository will need to be cloned locally.

2. Configure the Bash Script

  • The setup_k8s.sh file, located in the tests\load directory, contains shell commands to create a GKE cluster, setup an existing GKE cluster or delete a GKE cluster
    • Execute the following from the root directory, to make the file executable:
      chmod +x tests/load/setup_k8s.sh
      

3. Create the GCP Cluster

  • In the setup_k8s.sh script, modify the WORKER_COUNT variable to equal 1
  • Execute the setup_k8s.sh file from the root directory and select the create option, in order to initiate the process of creating a cluster, setting up the env variables and building the docker image. Choose smoke or average depending on the type of load test required.
    ./tests/load/setup_k8s.sh create [smoke|average]
    
  • The cluster creation process will take some time. It is considered complete, once an external IP is assigned to the locust_master node. Monitor the assignment via a watch loop:
    kubectl get svc locust-master --watch
    

Calibrate

Repeat steps 1 to 3, using a process of elimination, such as the bisection method, to determine the maximum USERS_PER_WORKER. The load tests are considered optimized when CPU and memory resources are maximally utilized. This step is meant to determine the maximum user count that a node can accommodate by observing CPU and memory usage while steadily increasing or decreasing the user count. You can monitor the CPU percentage in the Locust UI but also in the Kubernetes engine Workloads tab where both memory and CPU are visualized on charts.

1. Start Load Test

  • In a browser navigate to http://$EXTERNAL_IP:8089 This url can be generated via command
    EXTERNAL_IP=$(kubectl get svc locust-master -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
    echo http://$EXTERNAL_IP:8089
    
  • Set up the load test parameters:
    • ShapeClass: Default
    • UserClasses: MerinoUser
    • Number of users: USERS_PER_WORKER (Consult the Merino_spreadsheet to determine a starting point)
    • Ramp up: RAMP_UP (RAMP_UP = 5/USERS_PER_WORKER)
    • Host: 'https://stagepy.merino.nonprod.cloudops.mozgcp.net'
    • Duration (Optional): 600s
  • Select "Start Swarm"

2. Stop Load Test

Select the 'Stop' button in the top right hand corner of the Locust UI, after the desired test duration has elapsed. If the 'Run time' or 'Duration' is set in step 1, the load test will stop automatically.

3. Analyse Results

CPU and Memory Resource Graphs

Log Errors or Warnings

  • Locust will emit errors or warnings if high CPU or memory usage occurs during the execution of a load test. The presence of these logs is a strong indication that the USERS_PER_WORKER count is too high

4. Report Results

See Distributed GCP Execution (Manual Trigger) - Analyse Results

5. Update Shape and Script Values

  • WORKER_COUNT = MAX_USERS/USERS_PER_WORKER
    • If MAX_USERS is unknown, calibrate to determine WORKER_COUNT
  • Update the USERS_PER_WORKER and WORKER_COUNT values in the following files:
    • \tests\load\locustfiles\smoke_load.py or \tests\load\locustfiles\average_load.py
    • \tests\load\setup_k8s.sh

Clean-up Environment

See Distributed GCP Execution (Manual Trigger) - Clean-up Environment

Calibrating for WORKER_COUNT

This process is used to determine the number of Locust workers required in order to generate sufficient load for a test given a SHAPE_CLASS.

Setup Environment

Calibrate

Repeat steps 1 to 4, using a process of elimination, such as the bisection method, to determine the maximum WORKER_COUNT. The tests are considered optimized when they generate the minimum load required to cause node scaling in the the Merino-py Stage environment. You can monitor the Merino-py pod counts on Grafana.

1. Update Shape and Script Values

  • Update the WORKER_COUNT values in the following files:
    • \tests\load\locustfiles\smoke_load.py or \tests\load\locustfiles\average_load.py
    • \tests\load\setup_k8s.sh
  • Using Git, commit the changes locally

2. Start Load Test

  • In a browser navigate to http://$EXTERNAL_IP:8089 This url can be generated via command
    EXTERNAL_IP=$(kubectl get svc locust-master -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
    echo http://$EXTERNAL_IP:8089
    
  • Set up the load test parameters:
    • ShapeClass: SHAPE_CLASS
    • Host: 'https://stagepy.merino.nonprod.cloudops.mozgcp.net'
  • Select "Start Swarm"

3. Stop Load Test

Select the 'Stop' button in the top right hand corner of the Locust UI, after the desired test duration has elapsed. If the 'Run time', 'Duration' or 'ShapeClass' are set in step 1, the load test will stop automatically.

4. Analyse Results

Stage Environment Pod Counts

  • The 'Merino-py Pod Count' should demonstrate scaling during the execution of the load test
    • The pod counts can be observed in Grafana

CPU and Memory Resources

5. Report Results

Clean-up Environment

Maintenance

The load test maintenance schedule cadence is once a quarter and should include updating the following:

  1. poetry version and python dependencies
  2. Docker artifacts
  3. Distributed GCP execution scripts and Kubernetes configurations
  4. Documentation

Operations

This is where we put operational documentation for Merino.

How to Rollback Changes

Note: We use "roll-forward" strategy for rolling back changes in production.

  1. Depending on the severity of the problem, decide if this warrants kicking off an incident;
  2. Identify the problematic commit (it may not be the latest commit) and create a revert PR. If it is the latest commit, you can revert the change with:
    git revert HEAD~1
    
  3. Create a revert PR and go through normal review process to merge PR.

Navigational Suggestions Job Blocklist

The Navigational Suggestions Job blocklist is contained in merino/utils/blocklists.py. The TOP_PICKS_BLOCKLIST variable is used when running the indexing job and prevents the included domains from being added.

Add to Blocklist

  1. Go to merino/utils/blocklists.py.
  2. Add the second-level-domain to the TOP_PICKS_BLOCKLIST set.
  3. Open a PR and merge in the changes to block this domain from being indexed.

Remove from Blocklist

Repeat as above, just remove the domain from the TOP_PICKS_BLOCKLIST set.

  • Note: removing from the blocklist means that the domain was likely not created during the Airflow job, so if you wish to see it re-added, supposing it is still in the top 1000 domains, you have to re-run the airflow job. See the instructions for this in the jobs/navigational_suggestions docs.

How to Add to the Wikipedia Indexer and Provider Blocklist

Provider - Rapid Blocklist Addition

These steps define how to rapidly add and therefore block a Wikipedia article by its title.

  1. In /merino/utils/blocklists.py, add the matching title to TITLE_BLOCK_LIST.

NOTE: Ensure the title field is added as it appears with correct spacing between the words. In adding to the list, enter the title as it appears in Wikipedia. Membership checks of the block list are not case sensitive and any underscores in the titles should instead be spaces.

  1. Check in the changes to source control, merge a pull request with the new block list and deploy Merino.

Indexer Job

Since the indexer runs at a regular cadence, you do not need to re-run the Airflow job. Adding to the blocklist using the steps above is sufficient to rapidly block a title. The next time the Wikipedia indexer job runs, this title will be excluded during the indexer job.

NOTE: There are two blocklists referenced by the Wikipedia Indexer Job:

  1. blocklist_file_url: a key contained in the merino/configs/default.toml file that points to a remote block list which encapsulates blocked categories.
  2. WIKIPEDIA_TITLE_BLOCKLIST: an application-level list of titles found at /merino/utils/blocklists.py as explained above.

What to do with test failures in CI?

  1. Investigate the cause of the test failure

  2. Fix or mitigate the failure

    • If a fix can be identified in a relatively short time, then submit a fix
    • If the failure is caused by a flaky or intermittent functional test and the risk to the end-user experience is low, then the test can be "skipped", using the pytestxfail decorator during continued investigation. Example:
      @pytest.mark.xfail(reason="Test Flake Detected (ref: DISCO-####)")
      
  3. Re-Deploy

    • A fix or mitigation will most likely require a PR merge to the main branch that will automatically trigger the deployment process. If this is not possible, a re-deployment can be initiated manually by triggering the CI pipeline in CircleCI.

Configuring Merino (Operations)

To manage configurations and view all documentation for individual config values, please view the default.toml file.

Settings

Merino's settings are managed via Dynaconf and can be specified in two ways:

  1. a TOML file in the merino/configs/ directory.
  2. via environment variables. Environment variables take precedence over the values set in the TOML files. Production environment variables are managed by SRE and defined in the relevant merino-py repo. TOML files set with the same environment name that is currently activated also automatically override defaults. Any config file that is pointed to will override the merino/configs/default.toml file.

File organization

These are the settings sources, with later sources overriding earlier ones.

  • A config.py file establishes a Dynaconf instance and environment-specific values are pulled in from the corresponding TOML files and environment variables. Other configurations are established by files that are prefixed with config_*.py, such as config_sentry.py or config_logging.py.

  • Per-environment configuration files are in the configs directory. The environment is selected using the environment variable MERINO_ENV. The settings for that environment are then loaded from configs/${env}.toml, if the file/env exists. The default environment is "development". A "production" environment is also provided.

  • Local configuration files are not checked into the repository, but if created should be named configs/development.local.toml, following the format of <environment>.local.toml. This file is listed in the .gitignore file and is safe to use for local configuration. One may add secrets here if desired, though it is advised to exercise great caution.

General

  • All environments are prefixed with MERINO_. This is established in the config.py file by setting the envvar_prefix="MERINO" for the Dynaconf instance. The first level following MERINO_ is accessed with a single underscore _ and any subsequent levels require two underscores __. For example, the logging format can be controlled from the environment variable MERINO_LOGGING__FORMAT.

  • Production environment variables are set by SRE and stored in the cloudops project in the configmap.yml file. Contact SRE if you require information or access on this file, or request access to the cloudops infra repo.

  • You can set these environment variables in your setup by modifying the .toml files. Conversely, when using make, you can prefix make run with overrides to the desired environment variables using CLI flags.

    Example: MERINO_ENV=production MERINO_LOGGING__FORMAT=pretty make dev

  • env (MERINO_ENV) - Only settable from environment variables. Controls which environment configuration is loaded, as described above.

  • debug (MERINO_DEBUG) - Boolean that enables additional features to debug the application. This should not be set to true in public environments, as it reveals all configuration, including any configured secrets.

  • format (MERINO_LOGGING__FORMAT) - Controls the format of outputted logs in either pretty or mozlog format. See config_logging.py.

Caveat

Be extra careful whenever you need to reference those deeply nested settings (e.g. settings.foo.bar.baz) in the hot paths of the code base, such as middlewares or route handlers. Under the hood, Dynaconf will perform a dictionary lookup for each level of the configuration hierarchy. While it's harmless to do those lookups once or twice, it comes a surprisingly high overhead if accessing them repeatedly in the hot paths. You can cache those settings somewhere to mitigate this issue.

Elasticsearch Operations

We use Elasticsearch as a source of data for one of our providers. This page documents some of the commands that we want to run on the cluster.

Elasticsearch Index Policy

We want to ensure that the index expire after 30 days, so we need to add a lifecycle policy for this deletion to happen.

The command to run in Kibana to add this policy:

PUT _ilm/policy/enwiki_policy
{
  "policy": {
    "phases": {
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Closed Index Recovery

The indexing job currently closes the index after it migrates the alias to point to the new index. Closing the index removes the ability to query from the index but also reduces the heap memory usage when the index is not actively being queried.

If there is a situation where we need to recover a closed index to be the main index, we will need to do the following:

  1. Re-open the index
  2. Point the index alias to the recovered index

Jobs

Merino Jobs Operations

This document provides instructions and documentation on the navigational suggestions job. This job creates a file that is ingested by the Top Picks/Navigational Suggestions provider. The provider indexes a collection of the top 1000 searched domains and generates the top_picks.json file. Then the provider backend can serve suggestions that match query terms that are searched by the client to second-level domains.

If you need to run the navigational suggestions job ad-hoc, the quickest recommended solution is to run it in Airflow, download the top_picks.json file sent by email, and then merge the new file into the Merino repo with the newly generated one.

If needing to update the blocklist to avoid certain domains and suggestions from being displayed, please see the navigational suggestions blocklist runbook.

Running the job in Airflow

Normally, the job is set as a cron to run at set intervals as a DAG in Airflow. There may be instances you need to manually re-run the job from the Airflow dashboard.

Grid View Tab (Airflow UI)

  1. Visit the Airflow dashboard for merino_jobs.
  2. In the Grid View Tab, select the task you want to re-run.
  3. Click on 'Clear Task' and the executor will re-run the job. merino_jobs UI Diagram

Graph View Tab (Airflow UI) - Alternative

  1. Visit the Airflow dashboard for merino_jobs.
  2. From the Graph View Tab, Click on the nav_suggestions_prepare_domain_metadata_prod task. merino_jobs Nav Suggest Graph View
  3. Click on 'Clear' and the job will re-run. merino_jobs UI Task Instance Clear

At the conclusion of the job, you should recieve an email with a link to the newly generated file. Ensure you are a member of the disco-team email distro group to recieve the email.

Note: You can also re-run the stage job, but the changes won't reflect in production. Stage should be re-run in the event of an error before running in prod to verify the correction of an error.

See Airflow's documentation on re-running DAGs for more information and implementation details.

To see the code for the merino_jobs DAG, visit the telemetry-airflow repo. The source for the job is also in the 'code' tab in the airflow console.

To see the navigational suggestions code that is run when the job is invoked, visit Merino jobs/navigational_suggestions.

Merino Jobs Operations

Dynamic Wikipedia Indexer Job

Merino currently builds the Elasticsearch indexing job that runs in Airflow. Airflow takes the latest image built as the base image. The reasons to keep the job code close to the application code are:

  1. Data models can be shared between the indexing job and application more easily. This means that data migrations will be simpler.
  2. All the logic regarding Merino functionality can be found in one place.
  3. Eliminates unintended differences in functionality due to dependency mismatch.

If your reason for re-running the job is needing to update the blocklist to avoid certain suggestions from being displayed, please see the wikipedia blocklist runbook.

Running the job in Airflow

Normally, the job is set as a cron to run at set intervals as a DAG in Airflow. There may be instances you need to manually re-run the job from the Airflow dashboard.

Grid View Tab (Airflow UI)

  1. Visit the Airflow dashboard for merino_jobs.
  2. In the Grid View Tab, select the task you want to re-run.
  3. Click on 'Clear Task' and the executor will re-run the job. merino_jobs UI Diagram

Graph View Tab (Airflow UI) - Alternative

  1. Visit the Airflow dashboard for merino_jobs.
  2. From the Graph View Tab, Click on the wikipedia_indexer_build_index_production task. merino_jobs Wikipedia Indexer Graph View
  3. Click on 'Clear' and the job will re-run. merino_jobs UI Task Instance Clear

Note: You can also re-run the stage job, but the changes won't reflect in production. Stage should be re-run in the event of an error before running in prod to verify the correction of an error.

See Airflow's documentation on re-running DAGs for more information and implementation details.

To see the code for the merino_jobs DAG, visit the telemetry-airflow repo. The source for the job is also in the 'code' tab in the airflow console.

To see the Wikipedia Indexer code that is run when the job is invoked, visit Merino jobs/wikipedia_indexer.

Merino Jobs Operations

CSV Remote Settings Uploader Job

The CSV remote settings uploader is a job that uploads suggestions data in a CSV file to remote settings. It takes two inputs:

  1. A CSV file. The first row in the file is assumed to be a header that names the fields (columns) in the data.
  2. A Python module that validates the CSV contents and describes how to convert it into suggestions JSON.

If you're uploading suggestions from a Google sheet, you can export a CSV file from File > Download > Comma Separated Values (.csv). Make sure the first row in the sheet is a header that names the columns.

Uploading suggestions (Step by step)

If you're uploading a type of suggestion that the uploader already supports, skip to Running the uploader below. If you're not sure whether it's supported, check in the merino/jobs/csv_rs_uploader/ directory for a file named similarly to the type.

To upload a new type of suggestion, follow the steps below. In summary, first you'll create a Python module that implements a model for the suggestion type, and then you'll run the uploader.

1. Create a Python model module for the new suggestion type

Add a Python module to merino/jobs/csv_rs_uploader/. It's probably easiest to copy an existing model module like mdn.py, follow along with the steps here, and modify it for the new suggestion type. Name the file according to the suggestion type.

This file will define the model of the new suggestion type as it will be serialized in the output JSON, perform validation and conversion of the input data in the CSV, and define how the input data should map to the output JSON.

2. Add the Suggestion class

In the module, implement a class called Suggestion that derives from BaseSuggestion in merino.jobs.csv_rs_uploader.base or RowMajorBaseSuggestion in merino.jobs.csv_rs_uploader.row_major_base. BaseSuggestion class will be the model of the new suggestion type. BaseSuggestion itself derives from Pydantic's BaseModel, so the validation the class will perform will be based on Pydantic, which is used throughout Merino. BaseSuggestion is implemented in base.py. If the CSV data is row-major based, please use RowMajorBaseSuggestion,

3. Add suggestion fields to the class

Add a field to the class for each property that should appear in the output JSON (except score, which the uploader will add automatically). Name each field as you would like it to be named in the JSON. Give each field a type so that Pydantic can validate it. For URL fields, use HttpUrl from the pydantic module.

4. Add validator methods to the class

Add a method annotated with Pydanyic's @field_validator decorator for each field. Each validator method should transform its field's input value into an appropriate output value and raise a ValueError if the input value is invalid. Pydantic will call these methods automatically as it performs validation. Their return values will be used as the values in the output JSON.

BaseSuggestion implements two helpers you should use:

  • _validate_str() - Validates a string value and returns the validated value. Leading and trailing whitespace is stripped, and all whitespace is replaced with spaces and collapsed. Returns the validated value.
  • _validate_keywords() - The uploader assumes that lists of keywords are serialized in the input data as comma-delimited strings. This helper method takes a comma-delimited string and splits it into individual keyword strings. Each keyword is converted to lowercase, some non-ASCII characters are replaced with ASCII equivalents that users are more likely to type, leading and trailing whitespace is stripped, all whitespace is replaced with spaces and collapsed, and duplicate keywords are removed. Returns the list of keyword strings.

5. Implement the class methods

For suggestion created from row-major based CSV, should add a @classmethod to Suggestion called row_major_field_map(). It should return a dict that maps from field (column) names in the input CSV to property names in the output JSON. Otherwise, should add a @classmethod to Suggestion called csv_to_suggestions(). It should return suggestion array created from passed CSV reader.

6. Add a test

Add a test file to tests/unit/jobs/csv_rs_uploader/. See test_mdn.py as an example. The test should perform a successful upload as well as uploads that fail due to validation errors and missing fields (columns) in the input CSV.

utils.py in the same directory implements helpers that your test should use:

  • do_csv_test() - Makes sure the uploader works correctly during a successful upload. It takes either a path to a CSV file or a list[dict] that will be used to create a file object (StringIO) for an in-memory CSV file. Prefer passing in a list[dict] instead of creating a file and passing a path, since it's simpler.
  • do_error_test() - Makes sure a given error is raised when expected. Use ValidationError from the pydantic module to check validation errors and MissingFieldError from merino.jobs.csv_rs_uploader to check input CSV that is missing an expected field (column).

7. Run the test

$ MERINO_ENV=testing poetry run pytest tests/unit/jobs/csv_rs_uploader/test_foo.py

See also the main Merino development documentation for running unit tests.

8. Submit a PR

Once your test is passing, submit a PR with your changes so that the new suggestion type is committed to the repo. This step isn't necessary to run the uploader and upload your suggestions, so you can come back to it later.

9. Upload!

See Running the uploader.

Running the uploader

Run the following from the repo's root directory to see documentation for all options and their defaults. Note that the upload command is the only command in the csv-rs-uploader job.

poetry run merino-jobs csv-rs-uploader upload --help`

The uploader takes a CSV file as input, so you'll need to download or create one first.

Here's an example that uploads suggestions in foo.csv to the remote settings dev server:

poetry run merino-jobs csv-rs-uploader upload \
  --server "https://remote-settings-dev.allizom.org/v1" \
  --bucket main-workspace \
  --csv-path foo.csv \
  --model-name foo \
  --record-type foo-suggestions \
  --auth "Bearer ..."

Let's break down each command-line option in this example:

  • --server - Suggestions will be uploaded to the remote settings dev server
  • --bucket - The main-workspace bucket will be used
  • --csv-path - The CSV input file is foo.csv
  • --model-name - The model module is named foo. Its path within the repo would be merino/jobs/csv_rs_uploader/foo.py
  • --record-type - The type in the remote settings records created for these suggestions will be set to foo-suggestions. This argument is optional and defaults to "{model_name}-suggestions"
  • --auth - Your authentication header string from the server. To get a header, log in to the server dashboard (don't forget to log in to the Mozilla VPN first) and click the small clipboard icon near the top-right of the page, after the text that shows your username and server URL. The page will show a "Header copied to clipboard" toast notification if successful.

Setting suggestion scores

By default all uploaded suggestions will have a score property whose value is defined in the remote_settings section of the Merino config. This default can be overridden using --score <number>. The number should be a float between 0 and 1 inclusive.

Other useful options

  • --dry-run - Log the output suggestions but don't upload them. The uploader will still authenticate with the server, so --auth must still be given.

Structure of the remote settings data

The uploader uses merino/jobs/utils/chunked_rs_uploader.py to upload the output suggestions. In short, suggestions will be chunked, and each chunk will have a corresponding remote settings record with an attachment. The record's ID will be generated from the --record-type option, and its type will be set to --record-type exactly. The attachment will contain a JSON array of suggestion objects in the chunk.

Merino ADRs

This directory archives all the Architectural Decision Records (ADRs) for Merino.

Locust vs k6; Merino-py Performance Test Framework

  • Status: Accepted
  • Deciders: Nan Jiang, Raphael Pierzina & Katrina Anderson
  • Date: 2023-02-21

Context and Problem Statement

Performance testing for the Rust version of Merino was conducted with the Locust test framework and focused on the detection of HTTP request failures. During the migration of Merino from Rust to Python, performance testing was conducted with k6 and focused on the evaluation of request latency. Going forward a unified performance testing solution is preferred, should the test framework be Locust or k6?

Decision Drivers

  1. The test framework supports the current load test design, a 10-minute test run with an average load of 1500RPS (see Merino Load Test Plan)
  2. The test framework measures HTTP request failure and client-side latency metrics
  3. The test framework is compatible with the Rapid Release Model for Firefox Services initiative, meaning:
    • It can execute through command line
    • It can signal failures given check or threshold criteria
    • It can be integrated into a CD pipeline
    • It can report metrics to Grafana
  4. The members of the DISCO and ETE teams are able to contribute to and maintain load tests written with the test framework

Considered Options

  • A. Locust
  • B. k6

Decision Outcome

Chosen option:

  • A. Locust

Both k6 and Locust are able to execute the current load test design, report required metrics and fulfill the Rapid Release Model for Firefox Services initiative; However, Locust's Python tech stack ultimately makes it the better fit for the Merino-py project. In-line with the team's single repository direction (see PR), using Locust will:

  • Leverage existing testing, linting and formatting infrastructure
  • Promote dependency sharing and code re-use (models & backends)

Pros and Cons of the Options

A. Locust

Locust can be viewed as the status quo option, since it is the framework that is currently integrated into the Merino-py repository and is the basis for the CD load test integration currently underway (see DISCO-2113).

Pros

  • Locust has a mature distributed load generation feature and can easily support a 1500 RPS load
  • Locust has built-in RPS, HTTP request failure and time metrics with customizable URL break-down
  • Locust scripting is in Python
  • Locust supports direct command line usage
  • Locust is used for load testing in other Mozilla projects and is recommended by the ETE team

Cons

  • Locust is 100% community driven (no
  • commercial business), which means its contribution level can wane
  • Preliminary research indicates that reporting metrics from Locust to Grafana requires the creation of custom code, a plugin or a third party integration

B. k6

For the launch of Merino-py, performance bench-marking was conducted using a k6 load test script (see Merino Explorations). This script was reused from the Merino rewrite exploration effort and has proven successful in assessing if Merino-py performance achieves the target p95 latency threshold, effecting preventative change (See PR). k6's effectiveness and popularity amongst team members is an incentive to pause and evaluate if it is a more suitable framework going forward.

Pros

  • k6 is an open-source commercially backed framework with a high contribution rate
  • k6 is built by Grafana Labs, inferring easy integration with dashboards
  • k6 has built-in RPS, HTTP request failure and time metrics with customizable URL break-down
  • k6 supports direct command line usage
  • k6 is feature rich, including built-in functions to generate pass/fail results and create custom metrics

Cons

  • The k6 development stack is in JavaScript/TypeScript. This means:
    • Modeling and backend layer code would need to be duplicated and maintained
    • Linting, formatting and dependency infrastructure would need to be added and maintained
  • k6 has an immature distributed load generation feature, with documented limitations
    • k6 runs more efficiently than other frameworks, so it may be possible to achieve 1500 RPS without distribution

Merino Suggest API Response Structure

  • Status: accepted
  • Deciders: Michelle Tran, Lina Butler, Nan Jiang, Wil Stuckey, Drew Willcoxon, Taddes Korris, Tiffany Tran
  • Date: 2023-04-20

Context and Problem Statement

As Merino continues to add more suggestions, suggestion providers are going to have to return all sorts of data to the clients that are bespoke to the particular suggestion. For instance, weather suggestion returns a temperature. Currently, we do not have a strategy to manage these bespoke pieces of data which results in them returned at the top level of the suggestion object. However, this will pose a problem when

  1. names of fields are shared between providers, but have different semantics (i.e. rating may be a decimal value between 0-1 in one type, and a "star" integer rating between 1-5 in another)
  2. the API is unclear about what will necessarily exist, and what is optional, which leads to client confusion about the contract

So, this ADR is to make a decision on how we want to handle provider specific fields going forward.

Decision Drivers

In rough order of importance:

  1. Explicitness of Ownership - i.e. the rating field belongs to the addons provider
  2. Compatibility with [JSON] Schema Validation
  3. Adherence to the Fx Suggest Design Framework
  4. Backwards Compatibility with Current Schema

Considered Options

  • A. Continue to add to Top Level with Optional Fields
  • B. Custom Details Field for Bespoke Provider Fields
  • B.5 Custom Details Field without the Provider Nesting
  • C. Custom Details Field for a "Type"
  • D. Component Driven custom_details

Decision Outcome

Chosen option: B

We will also not increase the version number of the API for this ADR. So, going forward, we will encode option B into the response design without changing the existing providers. This means that the following providers will not have their bespoke fields removed from top level:

  • AdM Provider
  • Top Picks Provider
  • Weather Provider
  • Wikipedia Provider
  • WikiFruit Provider

However, this does not preclude these providers from duplicating the fields to custom_details in the v1 API.

Positive Consequences of Option B

  • Clear isolation of fields that belong together (i.e. grouped by provider).
  • Clear ownership of fields through the structure.
  • Simpler validation logic than other options due to less need for conditionals.

Negative Consequences of Option B

  • Potentially some redundancy caused by extra nesting.
  • Might not be as flexible with a provider that returns different fields based on what type of suggestion it is.

Positive Consequences of not Increasing API Version

  • We do not have to worry about migrating Firefox (and other clients) into the new format. The migration is going to be quite a lot of extra work that adds little benefits (other than consistency of design, it doesn't add more features nor improve any known time sinks with development).
  • Do not have to support 2 versions of the API.

Negative Consequences of not Increasing API Version

  • Some inconsistencies with how providers add fields to the response. We will likely want to resolve this as we migrate to v2, but it's a known issue at the moment.
  • Might be missing an opportune time to migrate, as features are currently not out yet which means the flexibility for change is higher.

Pros and Cons of the Options

A. Continue to add to Top Level with Optional Fields

This is the status quo option. We will continue to append bespoke values to the top level suggestion, and ensure that they're optional. We can continue to use the provider to signal what fields exists and how they should be parsed. For example, we can specify 2 different types of rating, and hence 2 validation strategy for it, based off of which provider is specified.

Example:

{
  "suggestions": [
    {
      ...
      "provider": "addons",
      "rating": "4.123",
      ...
    },
    {
      ...
      "provider": "movies",
      "rating": 0.123,
      ...
    },
    ...
  ],
  ...
}

The partial JSON Schema validation will look something like:

{
  "type": "object",
  "properties": {
    "provider": {
      "type": "string"
    }
  },
  "required": ["provider"],
  "allOf": [
    {
      "if": {
        "properties": {
          "provider": {
            "const": "addons"
          }
        }
      },
      "then": {
        "properties": {
          "rating": {
            "type": "string"
          }
        },
        "required": [
          "rating"
        ]
      }
    },
    {
      "if": {
        "properties": {
          "provider": {
            "const": "movies"
          }
        }
      },
      "then": {
        "properties": {
          "rating": {
            "type": "number"
          }
        },
        "required": [
          "rating"
        ]
      }
    }
  ]
}

Pros

  • Can specify specific validation per provider.
  • Merino is still kind of immature, so it still might be too early to think about design.
  • Less nesting in the models (resulting in less complexity).
  • Currently, backwards compatible as we don't have to do anything to existing providers, as this follows the existing patterns.

Cons

  • Lack of isolation for bespoke fields; ratings is coupled with 2 specific providers, and by just looking at the response, it's not clear that they are related.
  • Not clear what is shared between all suggestions, vs. what is bespoke to specific provider.
  • It is not obvious that the provider field should signal how you should perform validation. In other words, there is a contextual dependency on the JSON structure of suggestion based on provider.

B. Custom Details Field for Bespoke Provider Fields

We introduce a custom_details field that uses a provider name as key to an object with the bespoke values to that provider.

Example:

{
  "suggestions": [
    {
      ...
      "provider": "addons",
      "custom_details": {
        "addons": {
          "rating": "4.7459"
        }
      }
    },
    ...
  ],
  ...
}

The specific fields in custom_details will all be optional (i.e. addons will be an optional key) but the shape of what goes in addons can be more strict (i.e. addons require a rating field).

A partial schema specification for the above might look like1:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Suggest API Response v1",
  "description": "Response for /api/v1/suggest",
  "type": "object",
  "properties": {
    "provider": {
      "description": "id for the provider type",
      "type": "string"
    },
    "custom_details": {
      "type": "object",
      "properties": {
        "addons": {
          "type": "object",
          "description": "Custom Addon Fields",
          "properties": {
            "rating": {
              "type": "number"
            }
          },
          "required": ["rating"]
        }
      }
    }
  },
  "required": ["provider"]
}
1

Can play with JSON schema in https://www.jsonschemavalidator.net/

Pros

  • Can specify specific validation per provider.
  • Clear ownership of rating to addons via structure.
  • Fields outside of custom_details can be fields that are more universal across suggestions. These fields can potentially be correlated directly to the Fx Suggest Design Framework (i.e. context_label, url, title, description, etc.).
  • Having a clear distinction for Fx Suggest Design Framework fields vs. bespoke fields makes this more backwards compatible, as the fields in the Design Framework can render the default suggestion case for clients who haven't upgraded their clients.

Cons

  • We'll likely need to migrate existing providers at some point. But in the meantime, some fields will not follow convention to maintain backwards compatibility.
  • Extra nesting inside of custom_details.

B.5 Custom Details Field without the Provider Nesting

This is exactly like B, except that we remove the extra nesting.

So, in the example above, we can remove the extra addons object to get:

{
  "suggestions": [
    {
      ...
      "provider": "addons",
      "custom_details": {
        "rating": "4.7459"
      }
    },
    ...
  ],
  ...
}

The validation of the contents of custom_details will look more like A.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Suggest API Response v1",
  "description": "Response for /api/v1/suggest",
  "type": "object",
  "properties": {
    "provider": {
      "description": "id for the provider type",
      "type": "string"
    }
  },
  "required": [
    "provider"
  ],
  "if": {
    "properties": {
      "provider": {
        "const": "addons"
      }
    }
  },
  "then": {
    "properties": {
      "custom_details": {
        "description": "Custom Details Specific for Addons",
        "type": "object",
        "properties": {
          "rating": {
            "type": "string"
          }
        },
        "required": [
          "rating"
        ]
      }
    },
    "required": ["custom_details"]
  }
}

Pros

  • Can specify specific validation per provider.
  • Fields outside of custom_details can be fields that are more universal across suggestions. These fields can potentially be correlated directly to the Fx Suggest Design Framework (i.e. context_label, url, title, description, etc.).
  • Having a clear distinction for Fx Suggest Design Framework fields vs. bespoke fields makes this more backwards compatible, as the fields in the Design Framework can render the default suggestion case for clients who haven't upgraded their clients.
  • Less nesting in the response than B

Cons

  • We'll likely need to migrate existing providers at some point. But in the meantime, some fields will not follow convention to maintain backwards compatibility.
  • The relationship between provider and custom_details is more implicit, than explicit.
  • This has a lot of the same cons as Option A because validation is done similarly.

C. Custom Details Field for a "Type"

This is similar to option B, except that we want to introduce a new type field to differentiate it from the provider. The custom_details will be keyed by this type, rather than the provider name. These types are kind of analogous to a rendering component, as they will likely be used to specify a specific rendering path in the client.

Example:

{
  "suggestions": [
    {
      ...
      "provider": "addons",
      "type": "addons_type",
      "custom_details": {
        "addons_type": {
          "rating": "4.7459"
        }
      }
    },
    ...
  ],
  ...
}

Pros

  • All the pros for B applies here
  • Can decouple the custom_details from provider. This will be helpful for potentially sharing the type with other suggestions produced by different providers. For instance, we may want this to specify different rendering paths in the client (i.e. a "top picks" type to be shared between addons and top_picks providers, as there's many shared fields because they're rendered similarly).

Cons

  • All the cons for B applies here
  • Potentially over-engineering for type, as it's use is currently hypothetical.

D. Component Driven custom_details

This solution will model distinct UI components in the custom_details section. For example, if the addons provider have specific UI components to render a ratings component and a highlight_context_label, then we can specify these directly in the custom_details section. This will assume that the client side have these specific rendering types.

Example:

{
  "suggestions": [
    {
      ...
      "provider": "addons",
      "custom_details": {
        "ratings": {
          "value": "4.7459",
          "unit": "stars"
        },
        "highlight_context_label": {
          "text": "Special Limited Time Offer!"
        }
      }
    },
    ...
  ],
  ...
}

Pros

  • Can share custom components with schema validation.
  • Backwards compatible with clients who don't have the necessary components to render. It will just use the default renderer via the Fx Suggest Design Framework

Cons

  • We currently don't have a sophisticated Component Design Framework, so this is probably overengineering.
  • This tightly couples the API to the design framework of Desktop Firefox, which makes the fields potentially less relevant to other clients.

Streamline Test Coverage of Third-Party Integrations

  • Status: Accepted
  • Deciders: Nan Jiang & Katrina Anderson
  • Date: 2024-01-24

Context and Problem Statement

In 2024, it is anticipated that Merino will expand to be consumed by a greater set of Firefox surfaces and to include more content providers. This will challenge the current feature test strategy, which has shown weakness in detecting incompatibilities with third-party integrations. Examples:

  1. Weather Incident with Redis Integration
  2. Navigation Suggestions Error with GCP Integration

The current test approach uses a combination of unit, integration, and contract feature tests, where third-party integrations such as cloud services, data storage services, and external API integrations are test doubled in the unit and integration tests and emulated with Docker containers in contract tests. While test doubles might be easier to work with, they lack the accuracy of working with real dependencies in terms of matching the production environment and covering all the integration surfaces and concerns in tests.

Despite the potential to test with third-party integrations in contract tests, developers have refrained due to their lack of familiarity with Docker and CI tooling, as well as their belief in a poor return on investment for the time and effort required to create and maintain contract tests for experimental features.

Given the Merino service context, which has a rapid development pace and a high risk tolerance, is there a way to streamline the test strategy while ensuring robustness against third-party integrations?

Decision Drivers

1. Usability & Skill Transferability

Ease-of-use is the key criterion when we assess a tool for testing. The test strategy should prefer tools that require less effort and time to acquire proficiency. It should be easy to learn and work with. Ideally, any skills or knowledge acquired should be applicable across current contexts or for future scenarios.

2. Maturity & Expandability

The test strategy and tooling should be able to handle known third-party Merino dependencies in tests with a reasonable belief that it will cover future growth. Known third-party dependencies include: REST APIs, Remote Settings, GCP Cloud Storage, Redis, and Elasticsearch. Future dependencies include: relational DBMS such as PostgreSQL and other GCP cloud services such as Pub/Sub.

3. Cost Efficiency

The worker hours and tooling expenditures associated with the implementation and execution of the test strategy should ensure the profitability of Merino.

Considered Options

  • A. Yes. Expand the Scope of Integration Tests Using Dependency Docker Containers (Testcontainers)
  • B. Yes. Reduce the Dependency Overhead in Tests Using Development and Stage Environments
  • C. No. Fulfill the Current Test Strategy with Contract Test Coverage (Status quo)

Decision Outcome

Chosen option: A

Overall, we believe that increasing the scope of integration tests to verify third-party integrations with Testcontainers will be the most effective and sustainable way forward. Testcontainers' "Test dependencies as code" approach best fulfills the Usability & Skill Transferability and Maturity & Expandability decision drivers and long-term would prove to be the most Cost Efficient option.

We expect there to be initial labour costs to integrating Testcontainers, but anticipate that moving more verification responsibility to the integration test layer will be more accessible for developers and will reduce bugs found between Merino and third-party integrations.

Testcontainers is a widely adopted container-based test platform that supports a wide range of programming languages including Python and Rust, which are popular at Mozilla and there is indication that Testcontainers would have applicability across services in PXI. Given the success of Rapid Release and other experiments in Merino, it's a good candidate to use in Merino first as a pilot test. Should we find any issues or unsoundness about it, we can always revert the decision in the future.

Pros and Cons of the Options

A. Yes. Expand the Scope of Integration Tests Using Dependency Docker Containers (Testcontainers)

A preference for the unit and integration feature test layers in Merino has emerged over time. These test layers are white-box, which means developers can more easily set up program environments to test either happy paths or edge cases. In addition, tooling for debugging and measuring code coverage is readily available in these layers.Testcontainers can be used to increase the scope of integration tests, covering the backend layer logic and network communication with third-party integrations, the current test strategy's point of weakness.

Pros

  • Testcontainers works with any Docker image and has numerous pre-built modules. Almost all the existing dependencies (or their close emulators) of Merino can be run as Docker containers. As a result, we can use real dependencies in Merino's tests as opposed to test doubles
  • Testcontainers allows developers to programmatically manage the lifecycle of containers in the test code. This simplifies its usage for developers, who will not need to run any Docker commands separately for testing
  • Testcontainers, which has recently been acquired by Docker, is fairly mature and supports many popular programming languages. There are also a large number of community maintained clients available for popular services such as PostgreSQL, Redis, Elasticsearch, etc.
  • Testcontainers is lightweight and sandboxed, meaning service resources aren't shared and are cleaned up automatically, promoting test isolation and parallelization
  • Docker-compose is also supported by Testcontainers, facilitating use of multiple dependency containers for more complex test cases
  • Testcontainers supports Python, Rust and Javascript languages and works well with their respective test frameworks PyTest, Cargo-Test and Jest

Cons

  • A Docker runtime is required to run all the tests that depend on Testcontainers. Docker is already setup in CI, but developers may need to install a Docker runtime locally
  • Integration tests cannot be run completely offline as Docker images need to be downloaded first
  • Developers will need to understand more about how to configure and work with dependency containers. The development community has many popular services out of the box, but developers would still need to know and do more than what's required when using test doubles
  • It could be challenging to provision test fixtures for the underlying containers. Feeding the fixture data into the containers could be complex
  • Developers need to ensure version consistency across the development, testing, and production environments in the integration test layer
  • For third-party API integrations, if the provider doesn't provide a Docker image for their API, Testcontainers alone will not help us much. It's possible to use fake API container generators, such as Wiremock, but it comes with its own complexities
  • Implementation of Testcontainers would require refactoring of integration tests, including the removal of mocks and fixtures

B. Yes. Reduce the Dependency Overhead in Tests Using Development and Stage Environments

Using Merino's staging environment and third-party development resources in tests has been considered. This would effectively cover the current test strategy's weakness with third-party integrations without the cost and complexity involved with setting up test doubles or dependency containers. However, this approach has a key challenge in how to share the stage environment across all the test consumers (devs & CI) as most of the services do not support multi-tenant usage and would require a significant amount of effort to support resource isolation.

Pros

  • Best matches the production environment
  • Doesn't need extra effort to create test doubles or dependencies for testing

Cons

  • Tests cannot be run offline since they would require a network connection to interact with development and stage environments
  • This option breaks the Testing Guidelines & Best Practices for Merino, which require tests to be isolated and repeatable. A dependency on shared network resources will almost certainly lead to test flake, reducing the confidence in the test suite
  • Test execution speeds would be negatively impacted, due to the lack of sandboxing, which enables parallel test runs

C. No. Fulfill the Current Test Strategy with Contract Test Coverage (Status quo)

The current test strategy, which relies on the contract tests to verify the interface between Merino and third-party dependencies, has not been fully implemented as designed. The missing coverage explains the current test strategy's weakness. Examples:

  1. DISCO-2032: Weather Contract Tests
  2. DISCO-2324: Add a merino-py contract test that interacts with a real Redis instance
  3. DISCO-2055: Dynamic Wikipedia Contract Tests

Pros

  • The most cost-effective solution, at least in the short term, since the test framework and Docker dependencies are set up and integrated into CI
  • The unit and integration feature test layers remain simple by using test doubles

Cons

  • The black-box nature of contract tests makes it harder to set up the environmental conditions required to enable testing edge cases
  • Adding dependency containers is complex, often requiring developers to have advanced knowledge of Docker and CI vendors (e.g. CircleCI)
  • There is a high level of redundancy between unit, integration, and contract tests that negatively impacts development pace

Open Questions

  • How to test 3rd party API integrations? We have two options for consideration: Either use generic API mocking frameworks or keep status quo and rely on other means (e.g. observerbility) to capture API breakages. They both have pros and cons and warrant a separate ADR to discuss in detail