Merino

Merino is a service that provides address bar suggestions and curated recommendations to Firefox. Some of this content comes from third party providers. In this case, Merino serves as a privacy preserving buffer. User input in the address bar is handled by Merino and any clicked impression will be delegated to a Mozilla-controlled service which will then send an interaction ping if defined in the request and not to a provider directly. See API documentation for more details.

Table of Contents

api.md - API Documentation describes endpoints, query parameters, request and response headers, response objects and details on the suggestion objects.

firefox.md - Firefox and Merino Environments describes how to enable Merino in Firefox and lists the endpoints for the service in Production, State and Dev.

data.md - Data, Metrics, Logging describes all metrics and logs.

dev/index.md - Basic Developer Docs describes basics of working on Merino.

dev/dependencies.md - Development Dependencies describes the development dependencies required for Merino.

dev/logging-and-metrics.md - Logging and Metrics describes metrics, logging, and telemetry.

dev/release-process.md - Release Process describes the release process of Merino in detail.

dev/testing.md - Testing describes unit, integration and load tests for Merino.

dev/profiling.md - Profiling describes how to profile Merino to address performance issues.

operations/configs.md - Configuring Merino describes configuration management of the project, Dynaconf setup, and the configuration of the HTTP server, logging, metrics, Remote Settings, and Sentry.

operations/elasticsearch.md - Elasticsearch Operations describes some functionality and operations that we do on the Elasticsearch cluster.

operations/jobs.md - Merino Jobs describes the jobs that are configured in Merino. Indicate where the jobs exist and link to the details for how the jobs are run.

About the Name

This project drives an important part of Firefox's "felt experience". That is, the feeling of using Firefox, hopefully in a delightful way. The word "felt" in this phrase refers to feeling, but it can be punned to refer to the textile. Felt is often made of wool, and Merino wool (from Merino sheep) produces exceptionally smooth felt.

Architecture

flowchart TD
subgraph Firefox["fa:fa-firefox-browser Firefox"]
        NewTab
        UrlBar
end
subgraph NewTab["fa:fa-plus New Tab"]
        CuratedRecommendations("Curated Recommendations")
        WeatherWidget("Weather Widget")
end
subgraph UrlBar["fa:fa-magnifying-glass Url Bar"]
        online("Online Search and Suggest")
        offline("Offline Search and Suggest<br>fetches adMarketplace, static Wikipedia, <br>and other suggestions.<br> Offline mode is fallback if Merino times out.")
end
subgraph middleware["fa:fa-layer-group Middleware"]
        Geolocation["Geolocation"]
        Logging["Logging"]
        UserAgent["UserAgent"]
        Metrics["Metrics"]
end
subgraph suggestProviders["fa:fa-truck Suggest Providers"]
        admProvider("adm")
        amoProvider("amo")
        geolocationProvider("geolocation")
        toppicksProvider("top-picks")
        weatherProvider("weather")
        wikipediaProvider("wikipedia")
end
subgraph suggestBackends["fa:fa-microchip Suggest Backends"]
        remoteSettingsBackend("remote settings")
        accuweatherBackend("accuweather")
        elasticBackend("elastic")
        toppicksBackend("top picks")
        dynamicAmoBackend("dynamic addons")
end
subgraph curatedRecommendationsBackends["fa:fa-microchip Curated Recommendations Backends"]
        corpusBackend("corpus")
        extendedExpirationCorpusBackend("corpus extended expiration")
        gcsEngagementBackend("gcs engagement")
        fakespotBackend("fakespot")
        gcsPriorBackend("gcs prior")
end
subgraph Merino["fa:fa-server Merino"]
        srh("fa:fa-gears Suggest Request Handler")
        crh("fa:fa-gears Curated Recommendations Handler")
        mrh("fa:fa-gears Manifest Handler")
        middleware
        maxmind[("fa:fa-database MaxmindDB")]
        suggestProviders
        curatedRecommendationsProvider["fa:fa-truck Curated Recommendations Provider"]
        manifestProvider["fa:fa-truck Manifest Provider"]
        suggestBackends
        curatedRecommendationsBackends
        manifestBackend["Manifest Backend"]
end
subgraph Jobs["fa:fa-rotate Airflow (Merino Jobs)"]
        wikipediaSyncJob("Wikipedia Sync")
        toppicksSyncJob("Top Picks Sync")
end
    User[\"fa:fa-user User"/] -- Accessing the Firefox URL bar --> Firefox
    online -- /api/v1/suggest --> srh
    CuratedRecommendations -- "/api/v1/curated-recommendations" --> crh
    manifest["manifest"] -- /api/v1/manifest --> mrh
    WeatherWidget --> srh
    srh -..- middleware
    crh -..- middleware
    mrh -..- middleware
    srh --> suggestProviders
    crh --> curatedRecommendationsProvider
    mrh --> manifestProvider
    curatedRecommendationsProvider --> curatedRecommendationsBackends
    manifestProvider --> manifestBackend
    admProvider --> remoteSettingsBackend
    amoProvider --> dynamicAmoBackend
    toppicksProvider --> toppicksBackend
    weatherProvider --> accuweatherBackend
    wikipediaProvider --> elasticBackend
    Geolocation --> maxmind
    dynamicAmoBackend --> addonsAPI("fa:fa-globe Addons API")
    elasticBackend --> elasticSearch[("Elasticsearch")]
    manifestBackend -..-> toppicksData[("fa:fa-database GCS Top Picks Data,<br>a list of Mozilla curated popular sites and metadata to be <br>displayed on browser")]
    toppicksSyncJob -..-> toppicksData
    accuweatherBackend -..-> accuweatherAPI("fa:fa-globe Accuweather API")
    accuweatherAPI -. tries to query cache first ..-> redis[("fa:fa-memory Redis Cache")]
    gcsEngagementBackend --> gcsMerinoAirflowData[("fa:fa-database GCS Merino Airflow Data")]
    gcsPriorBackend --> gcsMerinoAirflowData
    fakespotBackend --> gcsFakespotNewTabProducts[("fa:fa-database GCS Fakespot NewTab Products")]
    corpusBackend -..-> curatedCorpusAPI("fa:fa-globe Curated Corpus API")
    offline -..- kinto[("Remote Settings")]
    remoteSettingsBackend --- kinto
    wikipediaSyncJob -. Syncs Wikipedia entries weekly ..- elasticSearch

Merino API documentation

This page describes the API endpoints available on Merino.

The autogenerated API documentation exists here.

Working with the Manifest endpoint

Overview

The /manifest endpoint returns a curated list of websites with associated metadata. This endpoint is designed to be used as part of your development process to maintain an up-to-date copy of website favicons.

Endpoint details

  • URL: https://merino.services.mozilla.com/api/v1/manifest
  • Method: GET
  • Response: JSON
{
  "domains": [
    {
      "rank": 1,
      "domain": "google",
      "categories": [
        "Search Engines"
      ],
      "serp_categories": [
        0
      ],
      "url": "https://www.google.com/",
      "title": "Google",
      "icon": ""
    },
    {
      "rank": 2,
      "domain": "microsoft",
      "categories": [
        "Business",
        "Information Technology"
      ],
      "serp_categories": [
        0
      ],
      "url": "https://www.microsoft.com/",
      "title": "Microsoft – AI, Cloud, Productivity, Computing, Gaming & Apps",
      "icon": "https://merino-images.services.mozilla.com/favicons/90cdaf487716184e4034000935c605d1633926d348116d198f355a98b8c6cd21_17174.oct"
    }
  ]
}

The icon field has the url of the Mozilla-hosted favicon of the website.

Usage

  • You can save the JSON response as a manifest.json file:
curl https://merino.services.mozilla.com/api/v1/manifest -o manifest.json

Or, if you have jq installed on your system, you can pretty-print it:

curl -s https://merino.services.mozilla.com/api/v1/manifest | jq '.' > manifest.json
  • Check it into your repository and ship it with the application you are building.
  • Whenever you need to display a favicon for a website or URL, you can check the Manifest file and use the icon field as a link to the favicon.

Add custom domains

You are also able to add custom domains to this endpoint. We currently run a weekly cron job to collect favicons from the Top 2000 websites. Adding custom domains is handled via this Python file in the Merino codebase: https://github.com/mozilla-services/merino-py/blob/main/merino/jobs/navigational_suggestions/custom_domains.py

To add yours:

  1. git clone git@github.com:mozilla-services/merino-py.git
  2. Add a new entry to the CUSTOM_DOMAINS list with url and at least one category: https://github.com/mozilla-services/merino-py/blob/main/merino/jobs/navigational_suggestions/custom_domains.py
  3. Create a PR against the merino-py repo with your changes

The custom domains will be picked up during the next run (every Wednesday). This job can also be run manually by following instructions here.

Configuring Firefox and Merino Environments

Merino has been enabled by default in Firefox. Though, you will need to enable the data sharing for Firefox Suggest to fully enable the feature. To enable it, type about:config in the URL bar set the Firefox preference browser.urlbar.quicksuggest.dataCollection.enabled to true. By default, Merino will connect to the production environments. This is controlled with the browser.urlbar.merino.endpointURL preference. See below for other options.

You can also query any of the endpoint URLs below with something like:

curl 'https://merino.services.allizom.org/api/v1/suggest?q=your+query'

Environments

Production

Endpoint URL: https://merino.services.mozilla.com/api/v1/suggest

The primary environment for end users. Firefox is configured to use this by default.

Stage

Endpoint URL: https://merino.services.allizom.org/api/v1/suggest

This environment is used for manual and load testing of the server. It is not guaranteed to be stable or available. It is used as a part of the deploy process to verify new releases before they got to production.

Data collection

This page should list all metrics and logs that Merino is expected to emit in production, including what should be done about them, if anything.

Logs

This list does not include any DEBUG level events, since those are not logged by default in production. The level and type of the log is listed.

Any log containing sensitive data must include a boolean field sensitive that is set to true to exempt it from flowing to the generally accessible log inspection interfaces.

Merino APIs

  • INFO web.suggest.request - A suggestion request is being processed. This event will include fields for all relevant details of the request. Fields:

    • sensitive - Always set to true to ensure proper routing.
    • query - If query logging is enabled, the text the user typed. Otherwise an empty string.
    • country - The country the request came from.
    • region - The first country subdivision the request came from.
    • city - The city the request came from.
    • dma - A US-only location description that is larger than city and smaller than states, but does not align to political borders.
    • agent - The original user agent.
    • os_family - Parsed from the user agent. One of "windows", "macos", "linux", "ios", "android", "chrome os", "blackberry", or "other".
    • form_factor - Parsed from the user agent. One of "desktop", "phone", "tablet", or "other"
    • browser - The browser and possibly version detected. Either "Firefox(XX)" where XX is the version, or "Other".
    • rid - The request ID.
    • WIP accepts_english - True if the user's Accept-Language header includes an English locale, false otherwise.
    • requested_providers - A comma separated list of providers requested via the query string, or an empty string if none were requested (in which case the default values would be used).
    • client_variants - Any client variants sent to Merino in the query string.
    • session_id - A UUID generated by the client for each search session.
    • sequence_no - A client-side event counter (0-based) that records the query sequence within each search session.
  • ERROR dockerflow.error_endpoint - The __error__ endpoint of the server was called. This is used to test our error reporting system. It is not a cause for concern, unless we receive a large amount of these records, in which case some outside service is likely malicious or misconfigured.

Merino Middleware Logs

Geolocation

  • WARNING merino.middleware.geolocation - There was an error with a geolocation lookup.

Merino Cron Tasks

  • WARNING merino.cron - There was an error while executing a cron task.

Merino Feature Flags

  • ERROR merino.featureflags - There was an error while attempting to assign a feature flag for a suggest API request.

Curated Recommendations

  • ERROR merino.curated_recommendations.corpus_backends.corpus_api_backend - Failed to get timezone for scheduled surface.
  • WARNING merino.curated_recommendations.corpus_backends.corpus_api_backend - Retrying CorpusApiBackend after an http client exception was raised.
  • ERROR GcsEngagement failed to update cache: {e} - unexpected exception when updating engagement.
  • ERROR Curated recommendations engagement size {blob.size} > {self.max_size} - Max engagement blob size is exceeded. The backend will gracefully fall back to cached data or 0's.
  • INFO Curated recommendations engagement unchanged since {self.last_updated}. - The engagement blob was not updated since the last check. last_updated is expected to be between 0 and 30 minutes.

Metrics

A note on timers: Statsd timers are measured in milliseconds, and are reported as integers (at least in Cadence). Milliseconds are often not precise enough for the tasks we want to measure in Merino. Instead, we use generic histograms to record microsecond times. Metrics recorded in this way should have -us appended to their name, to mark the units used (since we shouldn't put the proper unit μs in metric names).

  • merino.providers.initialize - A timer to measure the overall initialization duration (in ms) for all providers.

  • merino.providers.initialize.<provider> - A timer to measure the initialization duration (in ms) for the given <provider>.

    Example: merino.providers.initialize.adm

  • merino.<http_method>.<url_path>.status_codes.<status_code> - A counter to measure the status codes of an HTTP method for the <url_path>.

    Example: merino.get.api.v1.suggest.status_codes.200

  • merino.<http_method>.<url_path>.timing - A timer to measure the duration (in ms) of an HTTP method for a URL path.

    Example: merino.get.api.v1.suggest.timing

  • merino.<provider_module>.query - A timer to measure the query duration (in ms) of a certain suggestion provider.

    Example: merino.providers.suggest.adm.query

  • merino.<provider_module>.query.timeout - A counter to measure the query timeouts of a certain suggestion provider.

    Example: merino.providers.suggest.wikipedia.query.timeout

  • merino.suggestions-per.request - A histogram metric to get the distribution of suggestions per request.

  • merino.suggestions-per.provider.<provider_module> - A histogram metric to get the distribution of suggestions returned per provider (per request).

    Example: merino.suggestions-per.provider.wikipedia

AccuWeather

The weather provider records additional metrics.

  • accuweather.upstream.request.<request_type>.get - A counter to measure the number of times an upstream request to Accuweather was made.
  • accuweather.request.location.not_provided - A counter to measure the number of times a query was send without a location being provided, and therefore unable to process a weather request. Sampled at 75%.
  • accuweather.request.location.dist_calculated.success - A counter to measure the number of successful lat long distance calculations used to find location.
  • accuweather.request.location.dist_calculated.fail - A counter to measure the number of failed lat long distance calculations used to find location.
  • merino.providers.accuweather.query.cache.fetch - A timer to measure the duration (in ms) of looking up a weather report in the cache. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch.miss.locations - A counter to measure the number of times weather location was not in the cache. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch.miss.currentconditions - A counter to measure the number of times a current conditions was not in the cache. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch.miss.forecasts - A counter to measure the number of times a forecast for a location was not in the cache. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch.hit.{locations | currentconditions | forecasts} - A counter to measure the number of times a requested value like a location or forecast is in the cache. We don't count TTL hits explicitly, just misses. Sampled at 75%.
  • merino.providers.accuweather.query.backend.get - A timer to measure the duration (in ms) of a request for a weather report from the backend. This metric isn't recorded for cache hits. Sampled at 75%.
  • merino.providers.accuweather.query.cache.store - A timer to measure the duration (in ms) of saving a weather report from the backend to the cache. This metric isn't recorded for cache hits. Sampled at 75%.
  • merino.providers.accuweather.query.cache.error - A counter to measure the number of times the cache store returned an error when fetching or storing a weather report. This should be 0 in normal operation. In case of an error, the logs will include a WARNING with the full error message.
  • merino.providers.accuweather.query.weather_report - A counter to measure the number of queries that are for weather, origin of the request is recorded in tags.
  • merino.providers.accuweather.skip_cities_mapping.total.size - A counter to measure the total number of occurrences cities were skipped due to no location
  • merino.providers.accuweather.skip_cities_mapping.unique.size - A counter to measure the number of unique cities that are skipped due to no location

Curated Recommendations

The following additional metrics are recorded when curated recommendations are requested.

  • corpus_api.{get_sections | scheduled_surface}.timing - A timer to measure the duration (in ms) of making a request to the Corpus API.
  • corpus_api.{get_sections | scheduled_surface}.status_codes.{res.status_code} - A counter to measure the status codes of an HTTP request to the curated-corpus-api.
  • corpus_api.{get_sections | scheduled_surface}.graphql_error - A counter to measure the number of GraphQL errors from the curated-corpus-api.
  • recommendation.engagement.update.timing - A timer to measure the duration (in ms) of updating the engagement data from GCS.
  • recommendation.engagement.size - A gauge to track the size of the engagement blob on GCS.
  • recommendation.engagement.count - A gauge to measure the total number of engagement records.
  • recommendation.engagement.{country}.count - A gauge to measure the number of scheduled corpus items with engagement data per country.
  • recommendation.engagement.{country}.clicks - A gauge to measure the number of clicks per country in our GCS engagement blob.
  • recommendation.engagement.{country}.impressions - A gauge to measure the number of impressions per country in our GCS engagement blob.
  • recommendation.engagement.last_updated - A gauge for the staleness (in seconds) of the engagement data, measured between when the data was updated in GCS and the current time.
  • recommendation.prior.update.timing - A timer to measure the duration (in ms) of updating the prior data from GCS.
  • recommendation.prior.size - A gauge to track the size of the Thompson sampling priors blob on GCS.
  • recommendation.prior.last_updated - A gauge for the staleness (in seconds) of the prior data, measured between when the data was updated in GCS and the current time.

Manifest

When requesting a manifest file, we record the following metrics.

  • manifest.request.get - A counter for how many requests against the /manifest endpoint where made.
  • manifest.request.timing - A timer for how long it took the endpoint to fulfill the request.
  • manifest.gcs.fetch_time - A timer for how long it took to download the latest manifest file from the Google Cloud bucket.
  • manifest.request.no_manifest - A counter to measure how many times we didn't find the latest manifest file.
  • manifest.request.error - A counter to measure how many times we could not provide a valid JSON manifest file.
  • manifest.invalid_icon_url - A counter to measure how many requests the Manifest provider gets for domains where we don't have a valid icon Url (with tags).

Service Governance

The following metrics are recorded for service governance monitoring.

  • governance.circuits.<circuit-breaker-name> - A gauge to instrument the failure count for each "open" circuit breaker.

Merino Developer Guidelines and Social Contract

This is an additional contractual document on top of CONTRIBUTING.

Foster a Shared Ownership

Not only do Merino developers build the service together, they also share the ownership of the service. That ownership is embodied in the following responsibilities:

  • Be responsible for the entire lifecycle of each change landed in the code base: from writing the PR and getting it merged; ensuring it goes through CI/CD and eventually deployed to production; setting up monitoring on metrics and ensuring its healthy status and the overall health of Merino.
  • Be familiar with Merino’s operation. Conduct operational reviews on a regular basis. Identify and track operational issues. Coordinate with the team(s) to close action items and resolve the identified issues.
  • Documentation. Make sure the code meets the documentation requirements (no linting errors). If a change adds/updates the API, logs or metrics, ensure the associated documentation is up to date.

We commit to sharing knowledge about Merino across the team, with the long-term goal that each team member is capable of resolving incidents of Merino. Merino developers should familiarize themselves with the Mozilla Incident Response Process and the Merino Runbooks. Each individual should be able to initiate an incident response, serve as the incident handling manager, and drive it to its resolution along with other incident responders. Any issues associated with an incident should be tracked in Jira in a way the team agrees upon. For example, assigned with an ‘incident-action-items’ label.

  • Be aware of the infrastructure costs associated with new functionality. The team should have a good understanding of the cost to run the service including logging, computing, networking, and storage costs.
  • Be mindful of work hours and the time zones of your fellow developers when scheduling meetings, deploying code, pairing on code, or collaborating in other ways. Set your work hours in Google Calendar and configure Slack to receive notifications only during those times. We encourage code deployments when there are fellow developers online to support. If you must deploy off-hours, ensure you have a peer available to approve any potential rollbacks.

We are not going to grow individual Merino developers in deployment, operation, documentation, and incident responding for Merino. Rather, we’d like to foster a shared ownership with shared knowledge in every aspect of the day-to-day job for Merino.

Use ADRs to Record Architectural Decisions

ADRs (Architectural Decision Record) are widely adopted by teams at Mozilla to capture important architecture decisions, including their context and consequences. Developers are encouraged to exercise the ADR process to facilitate the decision making on important subjects of the project. ADRs should be made easy to access and reference and therefore are normally checked into the source control and rendered as part of the project documentation.

Use SLO and Error Budget to Manage Service Risks

We strive to build highly available and reliable services while also emphasizing rapid iteration and continuous deployment as key aspects of product development. We opt to use SLOs (Service Level Objective) and error budget for risk management. SLOs can be co-determined by the product owner(s) and the service builders & operators. The error budget should be monitored and enforced by the monitoring infrastructure. Once the budget is reached, the service owners should be more reluctant or even reject to accept risky code artifacts until the budget gets reset.

Request RRA for New Content Integrations

RRA (Rapid Risk Assessment) is the recommended process for service builders to perform a standardized lightweight risk assessment for the service or the feature of interest. Since Merino is a user-facing consumer service, we shall take extra caution for user security and the related risks. We have agreed with the Security Assurance team that we’d request an RRA (by following the RRA instructions) for every new content integration (e.g. AccuWeather) or content storage (e.g. Elasticsearch) for Merino.

Testing for Productivity & Reliability

We value testing as a mechanism of establishing feedback loops for service development, design, and release. As developers add new changes to the project, thorough and effective testing reduces uncertainty and generates short feedback loops, accelerating development, release, and regression resolution. Testing also helps reduce the potential decrease in reliability from each change. To materialize those merits for Merino, we have designed the Merino Test Strategy and fulfilled it with adequate tests. We anticipate the cross-functional team to adhere to the strategy and evolve it to better support the project over time.

Aim for Simplicity

We prioritize simple and conventional solutions in all aspects of development, from system design, to API specs, to code. We prefer mature, battle-tested technologies over complex, cutting-edge alternatives. At the same time, we know that Merino can always get better, and we welcome ideas from everyone. If you’ve got a new approach in mind, share it with the team or propose an Architectural Decision Record (ADR).

Blame-free Culture

While we strive to make Merino a highly reliable service, things would still go wrong regardless of how much care we take. Code errors, misconfigurations, operational glitches, to name a few. We opt for a blame-free culture to ease the mental stress when individuals are encouraged to take on more activities & responsibilities, especially before they gain familiarity around the tasks. We believe that learning from mistakes and incorporating the learned experience into processes to avoid repeating the same mistakes is more constructive and useful than putting someone on center stage. With a blame-free culture and proper risk management processes in place, the average cost of failures should be more tolerable within the error budget boundary. Who would be afraid of making mistakes?

Have Fun

Last but not least. Let’s make Merino a fun project to work with!

Developer documentation for working on Merino

tl;dr

Here are some useful commands when working on Merino.

Run the main app

This project uses uv for dependency management. See dependencies for how to install uv on your machine.

Install all the dependencies:

$ uv sync --all-groups

Run Merino:

$ uv run fastapi run merino/main.py --reload

# Or you can use a shortcut
$ make run
# To run in hot reload mode
$ make dev

General commands

# List all available make commands with descriptions
$ make help

$ make install

# Run linter
$ make ruff-lint

# Run format checker
$ make ruff-fmt

# Run formatter
$ make ruff-format

# Run black
$ make black

# Run bandit
$ make bandit

# Run mypy
$ make mypy

# Run all linting checks
$ make -k lint

# Run all formatters
$ make format

# Run merino-py with the auto code reloading
$ make dev

# Run merino-py without the auto code reloading
$ make run

# Run unit and integration tests and evaluate combined coverage
$ make test

# Evaluate combined unit and integration test coverage
$ make test-coverage-check

# Run unit tests
$ make unit-tests

# List fixtures in use per unit test
$ make unit-test-fixtures

# Run integration tests
$ make integration-tests

# List fixtures in use per integration test
$ make integration-test-fixtures

# Build the docker image for Merino named "app:build"
$ make docker-build

# Run local execution of (Locust) load tests
$ make load-tests

# Stop and remove containers and networks for load tests
$ make load-tests-clean

# Generate documents
$ make doc

# Preview the generated documents
$ make doc-preview

# Profile Merino with Scalene
$ make profile

# Run the Wikipedia CLI job
$ make wikipedia-indexer job=$JOB

Documentation

You can generate documentation, both code level and book level, for Merino and all related crates by running ./dev/make-all-docs.sh. You'll need mdbook and mdbook-mermaid, which you can install via:

make doc-install-deps

If you haven't installed Rust and Cargo, you can reference the official Rust document.

Local configuration

The default configuration of Merino is development, which has human-oriented pretty-print logging and debugging enabled. For settings that you wish to change in the development configuration, you have two options, listed below.

For full details, make sure to check out the documentation for Merino's setting system (operations/configs.md).

Update the defaults

Dynaconf is used for all configuration management in Merino, where values are specified in the merino/configs/ directory in .toml files. Environment variables are set for each environment as well and can be set when using the cli to launch the Merino service. Environment variables take precedence over the values set in the .toml files, so any environment variable set will automatically override defaults. By the same token, any config file that is pointed to will override the merino/configs/default.toml file.

If the change you want to make makes the system better for most development tasks, consider adding it to merino/configs/development.toml, so that other developers can take advantage of it. If you do so, you likely want to add validation to those settings which needs to be added in merino/config.py, where the Dynaconf instance exists along with its validators. For examples of the various config settings, look at configs/default.toml and merino/config.py to see an example of the structure.

It is not advisable to put secrets in configs/secrets.toml.

Create a local override

Dynaconf will use the specified values and environment variables in the merino/configs/default.toml file. You can change the environment you want to use as mentioned above, but for local changes to adapt to your machine or tastes, you can put the configuration in merino/configs/development.local.toml. This file doesn't exist by default, so you will have to create it. Then simply copy from the other config files and make the adjustments that you require. These files should however not be checked into source control and are configured to be ignored, so long as they follow the *.local.toml format. Please follow this convention and take extra care to not check them in and only use them locally.

See the Dynaconf Documentation for more details.

Content Moderation and Blocklists

This summarizes the mechanisms that block sensitive or questionable content in Merino. Because Merino supports several providers that have a broad range of potential suggestions, often from different sources, we require the ability to remove certain suggestions from being displayed.

Blocklists in Merino filter content at two distinct phases:

  1. Content that is filtered at the data creation and indexing phase. Provider backends serve suggestions to the client based on matching against searched terms. This ensures that data that could be sensitive is not available to search against since it is not indexed. For instance, the Wikipedia provider filters categories of articles that are tagged with a matching category term in the blocklist.

  2. Content that is filtered at application runtime. There are instances where we want to quickly and dynamically add to block lists without re-indexing or running a job. In this case, suggestions are compared to a static list in the code that blocks out these suggestions.

In the Navigational Suggestions provider, a blocklist is used during data creation to block specific domains of websites that we do not want to suggest.

The blocklist, domain_blocklist.json, is referenced during data generation of the top_picks.json file, which is ingested by the provider backend. This ensures specific domains are not indexed for suggestions. The blocklist is loaded and an exact string comparison is made between all second-level domains and the second-level domains defined in the blocklist.

See nav-suggestions blocklist runbook for more information.

Wikipedia

The Wikipedia Provider does both title filtering and category filtering at the data indexing level.

Since the indexing jobs run periodically, we also implemented title filtering in the provider to get the blocking out sooner.

Indexer

The Wikipedia Indexer Job references a remote blocklist which contains sensitive categories. At job runtime, the indexer reads the remote blocklist and creates a set of article categories that are be excluded from indexing.

The article categories in the blocklist are chosen based off of analysis and best guesses of what could be considered objectionable content, based off of Mozilla's values and brand image. Any modifications to the file should be done with careful consideration.

The indexer also blocks titles that are defined in the WIKIPEDIA_TITLE_BLOCKLIST in the application, which is referenced below. Any title that matches this blocklist is excluded from indexing.

Provider

When queried, the Wikipedia provider reads the WIKIPEDIA_TITLE_BLOCKLIST when creating a WikipediaSuggestion and if the query matches a blocked title, the suggestion is not shown to the client.

We have this feature because the indexing job is not run daily. Therefore, we desire having an option to rapidly add to this list should we need to block a specific article.

See wikipedia blocklist runbook for more information.

Development Dependencies

Package Dependencies

This project uses uv for dependency management, virtual environment management and running scripts and commands. While you can use the vanilla virtualenv to set up the dev environment, we highly recommend to check out uv.

To install uv, run:

$ pipx install uv

Or install via your preferred method.

Feel free to browse the pyproject.toml file for a listing of dependencies and their versions.

First, lets make sure you have a virtual environment set up with the right Python version. This service uses Python 3.13.

$ uv venv

See more about setting up virtual envs and Python version with uv.

Once uv is installed, and a virtual environment is created with the correct Python version, install all the dependencies:

$ uv sync --all-groups

Add packages to project via uv

$ uv add <package_name>

After that you should be to run Merino as follows:

$ uv run fastapi run merino/main.py --reload

Moving from the Poetry & Pyenv Set up

If you had your environment previously set up via poetry and pyenv, here are the steps to move to uv. This would be a nice clean up and reset.

# Remove your previous virtual environment. If you set up using pyenv, then:
rm .python-version
pyenv uninstall merino-py

# Uninstall pyenv
rm -rf $(pyenv root)
# or if you installed it via your OS package manager
brew uninstall pyenv

Service Dependencies

Merino uses a Redis-based caching system, and so requires a Redis instance to connect to. In addition, a GCS (GCP Cloud Storage) emulator, fake-gcs-server, is also provided to facilitate local development and testing.

To make things simple, all these service dependencies can be started with Docker Compose, using the docker-compose.yaml file in the dev/ directory. Notably, this does not run any Merino components that have source code in this repository.

# Run this at the Merino's project root
$ docker compose -f dev/docker-compose.yaml up

# Or run services in deamon mode
$ docker compose -f dev/docker-compose.yaml up -d

# Stop it
$ docker compose -f dev/docker-compose.yaml down

# Shortcuts are also provided
$ make docker-compose-up
$ make docker-compose-up-daemon
$ make docker-compose-down

Redis

Two Redis servers (primary & replica) are listening on ports 6379 and 6380, and can be connected via redis://localhost:6379 and redis://localhost:6380, respectively.

This Dockerized set up is optional. Feel free to run the dependent services by any other means as well.

GCS Emulator

The GCS emulator is listening on port 4443 and ready for both read and write operations. Make sure you set a environment variable STORAGE_EMULATOR_HOST=http://localhost:4443 so that Merino's GCS clients can connect to it. For example,

$ STORAGE_EMULATOR_HOST=http://localhost:4443 make run

Optionally, you can create a GCS bucket and preload data into it. The preloaded data is located in dev/local_data/gcs_emulator/. Say if you want to preload a JSON file top_picks_latest.json into a bucket merino-images-prodpy, you can create a new sub-directory merino-images-prody in gcs_emulator and then create or copy top_picks_latest.json into it. Then you can set Merino's configurations to use those artifacts in the GCS emulator.

# File layout of the preloaded GCS data

dev/local_data
└── gcs_emulator
    └── merino-images-prodpy  <- GCS Bucket ID
        └── top_picks_latest.json  <- A preloaded GCS blob

Note that dev/local_data will not be checked into the source control nor the docker image of Merino.

Dev Helpers

The docker-compose setup also includes some services that can help during development.

  • Redis Commander, http://localhost:8081 - Explore the Redis database started above.

Logging and Metrics

To get data out of Merino and into observable systems, we use metrics and logging. Each has a unique use case. Note that in general, because of the scale we work at, adding a metric or log event in production is not free, and if we are careless can end up costing quite a bit. Record what is needed, but don't go over board.

All data collection that happens in production (logging at INFO, WARN, or ERROR levels; and metrics) should be documented in docs/data.md.

Logging

Merino uses MozLog for structured logging. Logs can be recorded through the standard Python logging module. Merino can output logs in various formats, including a JSON format (MozLog) for production. A pretty, human readable format is also provided for development and other use cases.

Types

MozLog requires that all messages have a type value. By convention, we use the name of the Python module, where the log record get issued, to populate this field. For example:

import logging

logger = logging.getLogger(__name__)

# The `type` field of the log record will be the same as `__name__`.
logger.info("A new log message", data=extra_fields)

In general, the log message ("An empty MultiProvider was created") and the log type should both tell the reader what has happened. The difference is that the message is for humans and the type is for machines.

Levels

Tracing provides five log levels that should be familiar. This is what we mean by them in Merino:

  • CRITICAL - There was a serious error indicating that the program itself may be unable to continue running.

  • ERROR - There was a problem, and the task was not completable. This usually results in a 500 being sent to the user. All error logs encountered in production are reported to Sentry and should be considered a bug. If it isn't a bug, it shouldn't be logged as an error.

  • WARNING - There was a problem, but the task was able to recover. This doesn't usually affect what the user sees. Warnings are suitable for unexpected but "in-spec" issues, like a sync job not returning an empty set or using a deprecated function. These are not reported to Sentry.

  • INFO - This is the default level of the production service. Use for logging that something happened that isn't a problem and we care about in production. This is the level that Merino uses for it's one-per-request logs and sync status messages. Be careful adding new per-request logs at this level, as they can be expensive.

  • DEBUG - This is the default level for developers running code locally. Use this to give insight into how the system is working, but keep in mind that this will be on by default, so don't be too noisy. Generally this should summarize what's happening, but not give the small details like a log line for every iteration of a loop. Since this is off in production, there are no cost concerns.

Metrics

Merino metrics are reported as Statsd metrics.

Unlike logging, the primary way that metrics reporting can cost a lot is in cardinality. The number of metric IDs we have and the combination of tag values that we supply. Often the number of individual events doesn't matter as much, since multiple events are aggregated together.

Middlwares

Merino leverages middleware for various functionalities such as logging, metrics, parsing for geolocation & user agent, feature flags etc. Middleware is defined in the merino/middleware directory.

Caveat

We currently don't implement middleware using the middleware facilities provided by FastAPI/Starlette as they've shown significant performance overhead, preventing Merino from achieving the SLOs required by Firefox Suggest.

Before those performance issues get resolved in the upstream, we will be implementing middleware for Merino through the ASGI protocol. You can also reference this tutorial to learn more about ASGI. See Starlette's middleware document for more details about how to write pure ASGI middlewares. Specifically, we can reuse Starlette's data structures (Request, Headers, QueryParams etc.) to facilitate the implementation.

Feature Flags

Usage

Do you plan to release code behind a feature flag? Great! 😃

Your feature flag needs to be defined first. If it's already defined, go ahead. Otherwise check the configuration section below before you continue.

Use the following line in API endpoint code to gain access to the feature flags object:

feature_flags: FeatureFlags = request.scope[ScopeKey.FEATURE_FLAGS]

Then check whether a certain feature flag, such as example, is enabled by calling:

if feature_flags.is_enabled("example"):
    print("feature flag 'example' is enabled! 🚀")

When you do that, the decision (whether the feature flag is enabled or not) is recorded and stored in a dict on the decisions attribute of the feature flags object.

Implementation

The feature flags system in Merino consists of three components:

DescriptionLocation
A FastAPI middleware that reads the query parameter sid sent by the client application and sets a session ID for the current request based on that.merino/middleware/featureflags.py
A FeatureFlags class which you can use to check if a certain feature flag is enabled.merino/featureflags.py
A local directory containing static files that define and configure feature flags for Merino.merino/configs/flags/

Configuration

Currently two bucketing schemes are supported: random and session.

Random

Random does what it says on the tin. It generates a random bucketing ID for every flag check.

Session

Session bucketing uses the session ID of the request as the bucketing key so that feature checks within a given search session would be consistent.

Fields

Each flag defines the following fields:

[default.flags.<flag_name>]
scheme = 'session'
enabled = 0.5
FieldDescription
schemeThis is the bucketing scheme for the flag. Allowed values are 'random' and 'session'
enabledThis represents the % enabled for the flag and must be a float between 0 and 1

Metrics

When submitting application metrics, feature flag decisions that were made while processing the current request up to this point are automatically added as tags to the emitted metrics.

The format of these tags is:

feature_flag.<feature_flag_name>

For more information about this see the ClientMeta meta class and the add_feature_flags decorator in merino/metrics.py.

Monitoring in Grafana

Because feature flag decisions are automatically added as tags to emitted metrics, you can use them in your queries in Grafana. 📈

For example, if you want to group by decisions for a feature flag with name hello_world, you can use tag(feature_flag.hello_world) in GROUP BY in Grafana. You can also use [[tag_feature_flag.hello_world]] in the ALIAS for panel legends.

The Release Process

This project currently follows a Continuous Deployment process.

Whenever a commit is pushed to this repository's main branch, a CircleCI workflow is triggered which performs code checks and runs automated tests. The workflow additionally builds a new Docker image of the service and pushes that Docker image to the Docker Hub registry (this requires all previous jobs to pass).

Pushing a new Docker image to the Docker Hub registry triggers a webhook that starts the Jenkins deployment pipeline (the Docker image tag determines the target environment). The deployment pipeline first deploys to the stage environment and subsequently to the production environment.

Activity diagram of CircleCI main-workflow

After the deployment is complete, accessing the __version__ endpoint will show the commit hash of the deployed version, which will eventually match to the one of the latest commit on the main branch (a node with an older version might still serve the request before it is shut down).

Release Best Practices

The expectation is that the author of the change will:

  • merge pull requests during hours when the majority of contributors are online
  • monitor the [Merino Application & Infrastructure][merino_app_info] dashboard for any anomaly

Versioning

The commit hash of the deployed code is considered its version identifier. The commit hash can be retrieved locally via git rev-parse HEAD.

Load Testing

Load testing can be performed either locally or during the deployment process. During deployment, load tests are run against the staging environment before Merino-py is promoted to production.

Load tests in continuous deployment are controlled by adding a specific label to the commit message being deployed. The format for the label is [load test: (abort|skip|warn)]. Typically, this label is added to the merge commit created when a GitHub pull request is integrated.

  • abort: Stops the deployment if the load test fails.
  • skip: Skips load testing entirely during deployment.
  • warn: Proceeds with the deployment even if the load test fails, but sends a warning notification through Slack.

If no label is included in the commit message, the default behavior is to run the load test and issue a warning if it fails.

For more detailed information about load testing procedures and conventions, please refer to the Load Test README.

Logs from load tests executed in continuous deployment are available in the /data volume of the Locust master kubernetes pod.

What to do if production breaks?

If your latest release causes problems and needs to be rolled back: don't panic and follow the instructions in the Rollback Runbook.

What to do if tests fail during deployment?

Please refer to What to do with Test Failures in CI?

Profiling

As Merino runs as a single-threaded application using the asyncio-based framework, it would be useful for engineers to get a good understanding about how Merino performs and where it spends time and memory doing what tasks to serve the requests. Local profiling offers us a way to look into those low-level details.

We use Scalene as the profiler to conduct the profiling for Merino. It's very easy to use, offers extremely detailed (at the line level) insights with much lower overhead compared to other profilers.

Usage

To start the profiling, you can run the following to start Merino with Scalene:

$ make profile

# or you can run it directly

$ python -m scalene merino/main.py

Then you can send requests to Merino manually or through using other load testing tools. Once that's done, you can terminate the Merino application. It will automatically collect profiling outputs (CPU & Memory) and open it in your browser.

Understand the outputs

Out of the box, Scalene provides a very intuitive web interface to display the profiling outputs. It's organized at the file (module) level. For each file, it shows the CPU time and average memory usage for both the line profile and the function profile of that module. You can also click on specific columns to sort the lines or functions accordingly.

For more details of how to read the outputs, you can reference Scalene's documents.

Equipped with those insights, you can have a good understanding about the application, identify hotspots, bottlenecks, or other findings that are not easy to uncover by only reading the source code. And then, you can tweak or fix those issues, test or profile it again to verify if the fix is working.

Merino Testing

Test Strategy

Merino is tested using a combination of functional and performance tests.

Test code resides in the tests directory.

Merino's test strategy requires that we do not go below a minimum test coverage percentage for unit and integration tests. Load tests cannot go below a minimum performance threshold.

Functional tests push test artifacts, in the form of JUnit XMLs and Coverage JSONs to the ETE Test Metric Pipeline for trending and monitoring. Results can be found on the Merino-py Test Metrics Looker Dashboard.

Test documentation resides in the /docs/testing/ directory.

The test strategy is three-tiered, composed of:

See documentation and repositories in each given test area for specific details on running and maintaining tests.

Unit Tests

The unit layer is suitable for testing complex behavior at a small scale, with fine-grained control over the inputs. Due to their narrow scope, unit tests are fundamental to thorough test coverage.

To execute unit tests, use: make unit-tests

Unit tests are written and executed with pytest and are located in the tests/unit directory, using the same organizational structure as the source code of the merino service. Type aliases dedicated for test should be stored in the types.py module. The conftest.py modules contain common utilities in fixtures.

For a breakdown of fixtures in use per test, use: make unit-test-fixtures

Fixtures

Available fixtures include:

FilterCaplogFixture

Useful when verifying log messages, this fixture filters log records captured with pytest's caplog by a given logger_name.

Usage:

def test_with_filter_caplog(
    caplog: LogCaptureFixture, filter_caplog: FilterCaplogFixture
) -> None:
    records: list[LogRecord] = filter_caplog(caplog.records, "merino.providers.suggest.adm")

Note: This fixture is shared with integration tests.

SuggestionRequestFixture

For use when querying providers, this fixture creates a SuggestionRequest object with a given query

Usage:

def test_with_suggestion_request(srequest: SuggestionRequestFixture) -> None:
    request: SuggestionRequest = srequest("example")
    result: list[BaseSuggestion] = await provider.query(request)

ScopeFixture, ReceiveMockFixture & SendMockFixture

For use when testing middleware, these fixtures initialize or mock the common Scope, Receive and Send object dependencies.

Usage:

def test_middleware(scope: Scope, receive_mock: Receive, send_mock: Send) -> None:
    pass

Integration Tests

The integration layer of testing allows for verification of interactions between service components, with lower development, maintenance and execution costs compared with higher level tests.

To execute integration tests, make sure you have Docker installed and a docker daemon running. Then use: make integration-tests

Integration tests are located in the tests/integration directory. They use pytest and the FastAPI TestClient to send requests to specific merino endpoints and verify responses as well as other outputs, such as logs. Tests are organized according to the API path under test. Type aliases dedicated for test should be stored in the types.py module. Fake providers created for test should be stored in the fake_providers.py module. The conftest.py modules contain common utilities in fixtures.

We have also added integration tests that use Docker via the testcontainers library. See fixture example below.

For a breakdown of fixtures in use per test, use: make integration-test-fixtures

Fixtures

Available fixtures include:

FilterCaplogFixture

Details available in Unit Tests section

TestClientFixture

This fixture creates an instance of the TestClient to be used in testing API calls.

Usage:

def test_with_test_client(client: TestClient):
    response: Response = client.get("/api/v1/endpoint")

TestClientWithEventsFixture

This fixture creates an instance of the TestClient, that will trigger event handlers (i.e. startup and shutdown) to be used in testing API calls.

Usage:

def test_with_test_client_with_event(client_with_events: TestClient):
    response: Response = client_with_events.get("/api/v1/endpoint")

InjectProvidersFixture & ProvidersFixture

These fixture will setup and teardown given providers.

Usage:

If specifying providers for a module:

@pytest.fixture(name="providers")
def fixture_providers() -> Providers:
    return {"test-provider": TestProvider()}

If specifying providers for a test:

@pytest.mark.parametrize("providers", [{"test-provider": TestProvider()}])
def test_with_provider() -> None:
    pass

SetupProvidersFixture

This fixture sets application provider dependency overrides.

Usage:

def test_with_setup_providers(setup_providers: SetupProvidersFixture):
    providers: dict[str, BaseProvider] = {"test-provider": TestProvider()}
    setup_providers(providers)

TeardownProvidersFixture

This fixture resets application provider dependency overrides and is often used in teardown fixtures.

Usage:

@pytest.fixture(autouse=True)
def teardown(teardown_providers: TeardownProvidersFixture):
    yield
    teardown_providers()

TestcontainersFixture

See tests/integration/jobs/navigational_suggestions/test_domain_metadata_uploader.py for a detailed example.

This is a lightweight example on how to set up a docker container for your integration tests.

Usage:

@pytest.fixture(scope="module")
def your_docker_container() -> DockerContainer:
    os.environ.setdefault("STORAGE_EMULATOR_HOST", "http://localhost:4443")

    container = (
        DockerContainer("your-docker-image")
        .with_command("-scheme http")
        .with_bind_ports(4443, 4443)
    ).start()

    # wait for the container to start and emit logs
    delay = wait_for_logs(container, "server started at")
    port = container.get_exposed_port(4443)

    yield container

    container.stop()

Merino Load (Locust) Tests

This documentation describes the load tests for Merino. This test framework uses IP2Location LITE data available from https://lite.ip2location.com

Overview

The tests in the tests/load directory spawn multiple HTTP clients that consume Merino's API, in order to simulate real-world load on the Merino infrastructure. These tests use the Locust framework and are triggered at the discretion of the Merino Engineering Team.

Local Execution

Note that if you make changes to the load test code, you must stop and remove the Docker containers and networks for changes to reflect. Do this by running make load-tests-clean.

Follow the steps bellow to execute the load tests locally:

Setup Environment

1. Configure Environment Variables

The following environment variables as well as Locust environment variables can be set in tests\load\docker-compose.yml. Make sure any required API key is added but then not checked into source control.

WARNING: if the WIKIPEDIA__ES_API_KEY is missing, the load tests will not execute.

Environment VariableNode(s)Description
LOAD_TESTS__LOGGING_LEVELmaster & workerLevel for the logger in the load tests as an int (10 for DEBUG, 20 for INFO etc.)
MERINO_REMOTE_SETTINGS__SERVERmaster & workerServer URL of the Kinto instance containing suggestions
MERINO_REMOTE_SETTINGS__BUCKETmaster & workerKinto bucket with the suggestions
MERINO_REMOTE_SETTINGS__COLLECTIONmaster & workerKinto collection with the suggestions
MERINO_PROVIDERS__TOP_PICKS__TOP_PICKS_FILE_PATHmaster & workerThe minimum character limit set for long domain suggestion indexing
MERINO_PROVIDERS__TOP_PICKS__QUERY_CHAR_LIMITmaster & workerThe minimum character limit set for short domain suggestion indexing
MERINO_PROVIDERS__TOP_PICKS__FIREFOX_CHAR_LIMITmaster & workerFile path to the json file of domains
MERINO_PROVIDERS__WIKIPEDIA__ES_API_KEYmaster & workerThe base64 key used to authenticate on the Elasticsearch cluster specified by es_cloud_id
MERINO_PROVIDERS__WIKIPEDIA__ES_URLmaster & workerThe Cloud ID of the Elasticsearch cluster
MERINO_PROVIDERS__WIKIPEDIA__ES_INDEXmaster & workerThe index identifier of Wikipedia in Elasticsearch

2. Host Locust via Docker

Execute the following from the repository root:

make load-tests

3. (Optional) Host Merino Locally

Use one of the following commands to host Merino locally. Execute the following from the repository root:

  • Option 1: Use the local development instance
    make dev
    
  • Option 2: Use the profiler instance
    make profile
    
  • Option 3: Use the Docker instance
    make docker-build && docker run -p 8000:8000 app:build
    

Run Test Session

1. Start Load Test

  • In a browser navigate to http://localhost:8089/
  • Set up the load test parameters:
    • Option 1: Select the MerinoSmokeLoadTestShape or MerinoAverageLoadTestShape
      • These options have pre-defined settings
    • Option 2: Select the Default load test shape with the following recommended settings:
      • Number of users: 25
      • Spawn rate: 1
      • Host: 'https://merino.services.allizom.org'
        • Set host to 'http://host.docker.internal:8000' to test against a local instance of Merino
      • Duration (Optional): 10m
  • Select "Start Swarming"

2. Stop Load Test

Select the 'Stop' button in the top right hand corner of the Locust UI, after the desired test duration has elapsed. If the 'Run time' is set in step 1, the load test will stop automatically.

3. Analyse Results

Clean-up Environment

1. Remove Load Test Docker Containers

Execute the following from the repository root:

make load-tests-clean

Distributed GCP Execution - Manual Trigger

Follow the steps bellow to execute the distributed load tests on GCP with a manual trigger:

Setup Environment

1. Start a GCP Cloud Shell

The load tests can be executed from the contextual-services-test-eng cloud shell.

2. Configure the Bash Script

  • The setup_k8s.sh file, located in the tests\load directory, contains shell commands to create a GKE cluster, setup an existing GKE cluster or delete a GKE cluster
    • Modify the script to include the MERINO_PROVIDERS__WIKIPEDIA__ES_API_KEY environment variables
    • Execute the following from the root directory, to make the file executable:
      chmod +x tests/load/setup_k8s.sh
      

3. Create the GCP Cluster

  • Execute the setup_k8s.sh file and select the create option, in order to initiate the process of creating a cluster, setting up the env variables and building the docker image. Choose smoke or average depending on the type of load test required.
    ./tests/load/setup_k8s.sh create [smoke|average]
    
    • Smoke - The smoke load test verifies the system's performance under minimal load. The test is run for a short period, possibly in CD, to ensure the system is working correctly.
    • Average - The average load test measures the system's performance under standard operational conditions. The test is meant to reflect an ordinary day in production.
  • The cluster creation process will take some time. It is considered complete, once an external IP is assigned to the locust_master node. Monitor the assignment via a watch loop:
    kubectl get svc locust-master --watch
    
  • The number of workers is defaulted to 5, but can be modified with the kubectl scale command. Example (10 workers):
    kubectl scale deployment/locust-worker --replicas=10
    
  • To apply new changes to an existing GCP Cluster, execute the setup_k8s.sh file and select the setup option.
    • This option will consider the local commit history, creating new containers and deploying them (see Artifact Registry)

Run Test Session

1. Start Load Test

  • In a browser navigate to http://$EXTERNAL_IP:8089

    This url can be generated via command

    EXTERNAL_IP=$(kubectl get svc locust-master -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
    echo http://$EXTERNAL_IP:8089
    
  • Select the MerinoSmokeLoadTestShape, this option has pre-defined settings and will last 5 minutes

  • Select "Start Swarming"

2. Stop Load Test

Select the 'Stop' button in the top right hand corner of the Locust UI, after the desired test duration has elapsed. If the 'Run time' is set in step 1, the load test will stop automatically.

3. Analyse Results

RPS

  • The request-per-second load target for Merino is 1500
  • Locust reports client-side RPS via the "merino_stats.csv" file and the UI (under the "Statistics" tab or the "Charts" tab)
  • Grafana reports the server-side RPS via the "HTTP requests per second per country" chart

HTTP Request Failures

  • The number of responses with errors (5xx response codes) should be 0
  • Locust reports Failures via the "merino_failures.csv" file and the UI (under the "Failures" tab or the "Charts" tab)
  • Grafana reports Failures via the "HTTP Response codes" chart and the "HTTP 5xx error rate" chart

Exceptions

  • The number of exceptions raised by the test framework should be 0
  • Locust reports Exceptions via the "merino_exceptions.csv" file and the UI (under the "Exceptions" tab)

Latency

  • The HTTP client-side response time (aka request duration) for 95 percent of users is required to be 200ms or less (p95 <= 200ms), excluding weather requests
  • Locust reports client-side latency via the "merino_stats.csv" file and the UI (under the "Statistics" tab or the "Charts" tab)
    • Warning! A Locust worker with too many users will bottleneck RPS and inflate client-side latency measures. Locust reports worker CPU and memory usage metrics via the UI (under the "Workers" tab)
  • Grafana reports server-side latency via the "p95 latency" chart

Resource Consumption

  • To conserve costs, resource allocation must be kept to a minimum. It is expected that container, CPU and memory usage should trend consistently between load test runs.
  • Grafana reports metrics on resources via the "Container Count", "CPU usage time sum" and "Memory usage sum" charts

4. Report Results

  • Results should be recorded in the Merino Load Test Spreadsheet
  • Optionally, the Locust reports can be saved and linked in the spreadsheet:
    • Download the results via the Locust UI or via command:
      kubectl cp <master-pod-name>:/home/locust/merino_stats.csv merino_stats.csv
      kubectl cp <master-pod-name>:/home/locust/merino_exceptions.csv merino_exceptions.csv
      kubectl cp <master-pod-name>:/home/locust/merino_failures.csv merino_failures.csv
      
      The master-pod-name can be found at the top of the pod list:
      kubectl get pods -o wide
      
    • Upload the files to the ConServ drive and record the links in the spreadsheet

Clean-up Environment

1. Delete the GCP Cluster

Execute the setup_k8s.sh file and select the delete option

./tests/load/setup_k8s.sh

Distributed GCP Execution - CI Trigger

The load tests are triggered in CI via Jenkins, which has a command overriding the load test Dockerfile entrypoint.

Follow the steps below to execute the distributed load tests on GCP with a CI trigger:

Run Test Session

1. Execute Load Test

To modify the load testing behavior, you must include a label in your Git commit. This must be the merge commit on the main branch, since only the most recent commit is checked for the label. The label format is: [load test: (abort|skip|warn)]. Take careful note of correct syntax and spacing within the label. There are three options for load tests: abort, skip, and warn:

  • The abort label will prevent a prod deployment if the load test fails
    Ex. feat: Add feature ABC [load test: abort].
  • The skip label will bypass load testing entirely during deployment
    Ex. feat: Add feature LMN [load test: skip].
  • The warn label will output a Slack warning if the load test fails but still allow for the production deployment
    Ex. feat: Add feature XYZ [load test: warn].

If no label is included in the commit message, the load test will be executed with the warn action.

The commit tag signals load test instructions to Jenkins by modifying the Docker image tag. The Jenkins deployment workflow first deploys to stage and then runs load tests if requested. The Docker image tag passed to Jenkins appears as follows: ^(?P<environment>stage|prod)(?:-(?P<task>\w+)-(?P<action>abort|skip|warn))?-(?P<commit>[a-z0-9]+)$

2. Analyse Results

See Distributed GCP Execution (Manual Trigger) - Analyse Results

3. Report Results

  • Optionally, results can be recorded in the Merino Load Test Spreadsheet. It is recommended to do so if unusual behavior is observed during load test execution or if the load tests fail.
  • The Locust reports can be saved and linked in the spreadsheet. The results are persisted in the /data directory of the locust-master-0 pod in the locust-master k8s cluster in the GCP project of merino-nonprod. To access the Locust logs:
    • Open a cloud shell in the Merino stage environment
    • Authenticate by executing the following command:
        gcloud container clusters get-credentials merino-nonprod-v1 \
          --region us-west1 --project moz-fx-merino-nonprod-ee93
      
    • Identify the log files needed in the Kubernetes pod by executing the following command, which lists the log files along with file creation timestamp when the test was performed. The {run-id} uniquely identifies each load test run:
        kubectl exec -n locust-merino locust-master-0 -- ls -al /data/
      
    • Download the results via the Locust UI or via command:
      kubectl -n locust-merino cp locust-master-0:/data/{run-id}-merino_stats.csv merino_stats.csv
      kubectl -n locust-merino cp locust-master-0:/data/{run-id}-merino_exceptions.csv merino_exceptions.csv
      kubectl -n locust-merino cp locust-master-0:/data/{run-id}-merino_failures.csv merino_failures.csv
      
    • Upload the files to the ConServ drive and record the links in the spreadsheet

Calibration

Following the addition of new features, such as a Locust Task or Locust User, or environmental changes, such as node size or the upgrade of a major dependency like the python version image, it may be necessary to re-establish the recommended parameters of the performance test.

ParameterDescription
WAIT TIME- Changing this cadence will increase or decrease the number of channel subscriptions and notifications sent by a MerinoUser.
- The default is currently in use for the MerinoUser class.
TASK WEIGHT- Changing this weight impacts the probability of a task being chosen for execution.
- This value is hardcoded in the task decorators of the MerinoUser class.
USERS_PER_WORKER- This value should be set to the maximum number of users a Locust worker can support given CPU and memory constraints.
- This value is hardcoded in the LoadTestShape classes.
WORKER_COUNT- This value is derived by dividing the total number of users needed for the performance test by the USERS_PER_WORKER.
- This value is hardcoded in the LoadTestShape classes and the setup_k8s.sh script.
  • Locust documentation is available for [WAIT TIME][13] and [TASK WEIGHT][14]

Calibrating for USERS_PER_WORKER

This process is used to determine the number of users that a Locust worker can support.

Setup Environment

1. Start a GCP Cloud Shell

The load tests can be executed from the contextual-services-test-eng cloud shell. If executing a load test for the first time, the git merino-py repository will need to be cloned locally.

2. Configure the Bash Script

  • The setup_k8s.sh file, located in the tests\load directory, contains shell commands to create a GKE cluster, setup an existing GKE cluster or delete a GKE cluster
    • Execute the following from the root directory, to make the file executable:
      chmod +x tests/load/setup_k8s.sh
      

3. Create the GCP Cluster

  • In the setup_k8s.sh script, modify the WORKER_COUNT variable to equal 1
  • Execute the setup_k8s.sh file from the root directory and select the create option, in order to initiate the process of creating a cluster, setting up the env variables and building the docker image. Choose smoke or average depending on the type of load test required.
    ./tests/load/setup_k8s.sh create [smoke|average]
    
  • The cluster creation process will take some time. It is considered complete, once an external IP is assigned to the locust_master node. Monitor the assignment via a watch loop:
    kubectl get svc locust-master --watch
    

Calibrate

Repeat steps 1 to 3, using a process of elimination, such as the bisection method, to determine the maximum USERS_PER_WORKER. The load tests are considered optimized when CPU and memory resources are maximally utilized. This step is meant to determine the maximum user count that a node can accommodate by observing CPU and memory usage while steadily increasing or decreasing the user count. You can monitor the CPU percentage in the Locust UI but also in the Kubernetes engine Workloads tab where both memory and CPU are visualized on charts.

1. Start Load Test

  • In a browser navigate to http://$EXTERNAL_IP:8089 This url can be generated via command
    EXTERNAL_IP=$(kubectl get svc locust-master -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
    echo http://$EXTERNAL_IP:8089
    
  • Set up the load test parameters:
    • ShapeClass: Default
    • UserClasses: MerinoUser
    • Number of users: USERS_PER_WORKER (Consult the Merino_spreadsheet to determine a starting point)
    • Ramp up: RAMP_UP (RAMP_UP = 5/USERS_PER_WORKER)
    • Host: 'https://merino.services.allizom.org'
    • Duration (Optional): 600s
  • Select "Start Swarm"

2. Stop Load Test

Select the 'Stop' button in the top right hand corner of the Locust UI, after the desired test duration has elapsed. If the 'Run time' or 'Duration' is set in step 1, the load test will stop automatically.

3. Analyse Results

CPU and Memory Resource Graphs

Log Errors or Warnings

  • Locust will emit errors or warnings if high CPU or memory usage occurs during the execution of a load test. The presence of these logs is a strong indication that the USERS_PER_WORKER count is too high

4. Report Results

See Distributed GCP Execution (Manual Trigger) - Analyse Results

5. Update Shape and Script Values

  • WORKER_COUNT = MAX_USERS/USERS_PER_WORKER
    • If MAX_USERS is unknown, calibrate to determine WORKER_COUNT
  • Update the USERS_PER_WORKER and WORKER_COUNT values in the following files:
    • \tests\load\locustfiles\smoke_load.py or \tests\load\locustfiles\average_load.py
    • \tests\load\setup_k8s.sh

Clean-up Environment

See Distributed GCP Execution (Manual Trigger) - Clean-up Environment

Calibrating for WORKER_COUNT

This process is used to determine the number of Locust workers required in order to generate sufficient load for a test given a SHAPE_CLASS.

Setup Environment

Calibrate

Repeat steps 1 to 4, using a process of elimination, such as the bisection method, to determine the maximum WORKER_COUNT. The tests are considered optimized when they generate the minimum load required to cause node scaling in the the Merino-py Stage environment. You can monitor the Merino-py pod counts on Grafana.

1. Update Shape and Script Values

  • Update the WORKER_COUNT values in the following files:
    • \tests\load\locustfiles\smoke_load.py or \tests\load\locustfiles\average_load.py
    • \tests\load\setup_k8s.sh
  • Using Git, commit the changes locally

2. Start Load Test

  • In a browser navigate to http://$EXTERNAL_IP:8089 This url can be generated via command
    EXTERNAL_IP=$(kubectl get svc locust-master -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
    echo http://$EXTERNAL_IP:8089
    
  • Set up the load test parameters:
    • ShapeClass: SHAPE_CLASS
    • Host: 'https://merino.services.allizom.org'
  • Select "Start Swarm"

3. Stop Load Test

Select the 'Stop' button in the top right hand corner of the Locust UI, after the desired test duration has elapsed. If the 'Run time', 'Duration' or 'ShapeClass' are set in step 1, the load test will stop automatically.

4. Analyse Results

Stage Environment Pod Counts

  • The 'Merino-py Pod Count' should demonstrate scaling during the execution of the load test
    • The pod counts can be observed in Grafana

CPU and Memory Resources

5. Report Results

Clean-up Environment

Maintenance

The load test maintenance schedule cadence is once a quarter and should include updating the following:

  1. uv version and python dependencies
  2. Docker artifacts
  3. Distributed GCP execution scripts and Kubernetes configurations
  4. Documentation

Operations

This is where we put operational documentation for Merino.

How to Rollback Changes

Note: We use "roll-forward" strategy for rolling back changes in production.

  1. Depending on the severity of the problem, decide if this warrants kicking off an incident;
  2. Identify the problematic commit (it may not be the latest commit) and create a revert PR. If it is the latest commit, you can revert the change with:
    git revert HEAD~1
    
  3. Create a revert PR and go through normal review process to merge PR.

Navigational Suggestions Job Blocklist

The Navigational Suggestions Job blocklist is contained in merino/utils/blocklists.py. The TOP_PICKS_BLOCKLIST variable is used when running the indexing job and prevents the included domains from being added.

Add to Blocklist

  1. Go to merino/utils/blocklists.py.
  2. Add the second-level-domain to the TOP_PICKS_BLOCKLIST set.
  3. Open a PR and merge in the changes to block this domain from being indexed.

Remove from Blocklist

Repeat as above, just remove the domain from the TOP_PICKS_BLOCKLIST set.

  • Note: removing from the blocklist means that the domain was likely not created during the Airflow job, so if you wish to see it re-added, supposing it is still in the top 1000 domains, you have to re-run the airflow job. See the instructions for this in the jobs/navigational_suggestions docs.

How to Add to the Wikipedia Indexer and Provider Blocklist

Provider - Rapid Blocklist Addition

These steps define how to rapidly add and therefore block a Wikipedia article by its title.

  1. In /merino/utils/blocklists.py, add the matching title to TITLE_BLOCK_LIST.

NOTE: Ensure the title field is added as it appears with correct spacing between the words. In adding to the list, enter the title as it appears in Wikipedia. Membership checks of the block list are not case sensitive and any underscores in the titles should instead be spaces.

  1. Check in the changes to source control, merge a pull request with the new block list and deploy Merino.

Indexer Job

Since the indexer runs at a regular cadence, you do not need to re-run the Airflow job. Adding to the blocklist using the steps above is sufficient to rapidly block a title. The next time the Wikipedia indexer job runs, this title will be excluded during the indexer job.

NOTE: There are two blocklists referenced by the Wikipedia Indexer Job:

  1. blocklist_file_url: a key contained in the merino/configs/default.toml file that points to a remote block list which encapsulates blocked categories.
  2. WIKIPEDIA_TITLE_BLOCKLIST: an application-level list of titles found at /merino/utils/blocklists.py as explained above.

What to do with test failures in CI?

  1. Investigate the cause of the test failure

  2. Fix or mitigate the failure

    • If a fix can be identified in a relatively short time, then submit a fix
    • If the failure is caused by a flaky or intermittent functional test and the risk to the end-user experience is low, then the test can be "skipped", using the pytestxfail decorator during continued investigation. Example:
      @pytest.mark.xfail(reason="Test Flake Detected (ref: DISCO-####)")
      
  3. Re-Deploy

    • A fix or mitigation will most likely require a PR merge to the main branch that will automatically trigger the deployment process. If this is not possible, a re-deployment can be initiated manually by triggering the CI pipeline in CircleCI.

Configuring Merino (Operations)

To manage configurations and view all documentation for individual config values, please view the default.toml file.

Settings

Merino's settings are managed via Dynaconf and can be specified in two ways:

  1. a TOML file in the merino/configs/ directory.
  2. via environment variables. Environment variables take precedence over the values set in the TOML files. Production environment variables are managed by SRE and defined in the relevant merino-py repo. TOML files set with the same environment name that is currently activated also automatically override defaults. Any config file that is pointed to will override the merino/configs/default.toml file.

File organization

These are the settings sources, with later sources overriding earlier ones.

  • A config.py file establishes a Dynaconf instance and environment-specific values are pulled in from the corresponding TOML files and environment variables. Other configurations are established by files that are prefixed with config_*.py, such as config_sentry.py or config_logging.py.

  • Per-environment configuration files are in the configs directory. The environment is selected using the environment variable MERINO_ENV. The settings for that environment are then loaded from configs/${env}.toml, if the file/env exists. The default environment is "development". A "production" environment is also provided.

  • Local configuration files are not checked into the repository, but if created should be named configs/development.local.toml, following the format of <environment>.local.toml. This file is listed in the .gitignore file and is safe to use for local configuration. One may add secrets here if desired, though it is advised to exercise great caution.

General

  • All environments are prefixed with MERINO_. This is established in the config.py file by setting the envvar_prefix="MERINO" for the Dynaconf instance. The first level following MERINO_ is accessed with a single underscore _ and any subsequent levels require two underscores __. For example, the logging format can be controlled from the environment variable MERINO_LOGGING__FORMAT.

  • Production environment variables are set by SRE and stored in the cloudops project in the configmap.yml file. Contact SRE if you require information or access on this file, or request access to the cloudops infra repo.

  • You can set these environment variables in your setup by modifying the .toml files. Conversely, when using make, you can prefix make run with overrides to the desired environment variables using CLI flags.

    Example: MERINO_ENV=production MERINO_LOGGING__FORMAT=pretty make dev

  • env (MERINO_ENV) - Only settable from environment variables. Controls which environment configuration is loaded, as described above.

  • debug (MERINO_DEBUG) - Boolean that enables additional features to debug the application. This should not be set to true in public environments, as it reveals all configuration, including any configured secrets.

  • format (MERINO_LOGGING__FORMAT) - Controls the format of outputted logs in either pretty or mozlog format. See config_logging.py.

Caveat

Be extra careful whenever you need to reference those deeply nested settings (e.g. settings.foo.bar.baz) in the hot paths of the code base, such as middlewares or route handlers. Under the hood, Dynaconf will perform a dictionary lookup for each level of the configuration hierarchy. While it's harmless to do those lookups once or twice, it comes a surprisingly high overhead if accessing them repeatedly in the hot paths. You can cache those settings somewhere to mitigate this issue.

Elasticsearch Operations

We use Elasticsearch as a source of data for one of our providers. This page documents some of the commands that we want to run on the cluster.

Elasticsearch Index Policy

We want to ensure that the index expire after 30 days, so we need to add a lifecycle policy for this deletion to happen.

The command to run in Kibana to add this policy:

PUT _ilm/policy/enwiki_policy
{
  "policy": {
    "phases": {
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Closed Index Recovery

The indexing job currently closes the index after it migrates the alias to point to the new index. Closing the index removes the ability to query from the index but also reduces the heap memory usage when the index is not actively being queried.

If there is a situation where we need to recover a closed index to be the main index, we will need to do the following:

  1. Re-open the index
  2. Point the index alias to the recovered index

Jobs

Merino Jobs Operations

This document provides instructions and documentation on the navigational suggestions job. This job creates a file that is ingested by the Top Picks/Navigational Suggestions provider. The provider indexes a collection of the top 1000 searched domains and generates the top_picks.json file. Then the provider backend can serve suggestions that match query terms that are searched by the client to second-level domains.

If you need to run the navigational suggestions job ad-hoc, the quickest recommended solution is to run it in Airflow, download the top_picks.json file sent by email, and then merge the new file into the Merino repo with the newly generated one.

If needing to update the blocklist to avoid certain domains and suggestions from being displayed, please see the navigational suggestions blocklist runbook.

Running the job in Airflow

Normally, the job is set as a cron to run at set intervals as a DAG in Airflow. There may be instances you need to manually re-run the job from the Airflow dashboard.

Grid View Tab (Airflow UI)

  1. Visit the Airflow dashboard for merino_jobs.
  2. In the Grid View Tab, select the task you want to re-run.
  3. Click on 'Clear Task' and the executor will re-run the job. merino_jobs UI Diagram

Graph View Tab (Airflow UI) - Alternative

  1. Visit the Airflow dashboard for merino_jobs.
  2. From the Graph View Tab, Click on the nav_suggestions_prepare_domain_metadata_prod task. merino_jobs Nav Suggest Graph View
  3. Click on 'Clear' and the job will re-run. merino_jobs UI Task Instance Clear

At the conclusion of the job, you should recieve an email with a link to the newly generated file. Ensure you are a member of the disco-team email distro group to recieve the email.

Note: You can also re-run the stage job, but the changes won't reflect in production. Stage should be re-run in the event of an error before running in prod to verify the correction of an error.

See Airflow's documentation on re-running DAGs for more information and implementation details.

To see the code for the merino_jobs DAG, visit the telemetry-airflow repo. The source for the job is also in the 'code' tab in the airflow console.

To see the navigational suggestions code that is run when the job is invoked, visit Merino jobs/navigational_suggestions.

Running the favicon extractor locally

$ uv run probe-images mozilla.org wikipedia.org

There is a Python script (domain_tester.py) which imports the DomainMetadataExtractor, Scraper and FaviconDownloader and runs them locally, without saving the results to the cloud.

This is meant to troubleshoot domains locally and iterate over the functionality in a contained environment.

Example output:

$ uv run probe-images mozilla.org wikipedia.org

Testing domain: mozilla.org
✅ Success!
 Title           Mozilla - Internet for people, not profit (UK)
 Best Icon       https://dummy-cdn.example.com/favicons/bd67680f7da3564bace91b2be87feab16d5e9e6266355b8f082e21f8159…
 Total Favicons  4

All favicons found:
- https://www.mozilla.org/media/img/favicons/mozilla/apple-touch-icon.05aa000f6748.png (rel=apple-touch-icon
size=180x180 type=image/png)
- https://www.mozilla.org/media/img/favicons/mozilla/favicon-196x196.e143075360ea.png (rel=icon size=196x196
type=image/png)
- https://www.mozilla.org/media/img/favicons/mozilla/favicon.d0be64e474b1.ico (rel=shortcut,icon)

Testing domain: wikipedia.org
✅ Success!
 Title           Wikipedia
 Best Icon       https://dummy-cdn.example.com/favicons/4c8bf96d667fa2e9f072bdd8e9f25c8ba6ba2ad55df1af7d9ea0dd575c1…
 Total Favicons  3

All favicons found:
- https://www.wikipedia.org/static/apple-touch/wikipedia.png (rel=apple-touch-icon)
- https://www.wikipedia.org/static/favicon/wikipedia.ico (rel=shortcut,icon)

Summary: 2/2 domains processed successfully

Adding new domains

  • You can add new domains to the job by adding them to the /merino/jobs/navigational_suggestions/custom_domains.py file.
  • Either manually, or you can use a script inside the /scripts folder.
  • The script is called import_domains.sh and it works with a CSV file with a REGISTERED_DOMAIN header, and starting from the second row, the firsts column is the domain name (e.g getpocket.com)

This step was introduced to give the HNT team an easy way of importing their updated domains.

Execute the script like this:

$ ./scripts/import_domains.sh DOWNLOADED_FILE.csv

This will add the domains to the custom_domains.py file, check if the domain exists, and if not, it adds it. Afterwards, all domains are getting alphabetically sorted.

Note

  • Subdomains are supported and treated as distinct domains. For example, sub.example.com is different from example.com and can be added separately.
  • Duplicate checks are done by exact domain string, not by apex/normalized form. If a custom domain exactly matches an existing domain, it will be skipped and logged as: Skipped duplicate domains: <domain>.

Running the Navigational Suggestions job locally

The Navigational Suggestions job can be run locally for development and testing purposes without requiring access to Google Cloud. This is useful for testing changes to the favicon extraction and domain processing logic.

Prerequisites

  • Docker installed and running
  • Merino repository cloned locally

Running the job locally

There is one Make command which starts the Docker container for GCS, queries 20 domains, and stops the container afterwards.

$ make nav-suggestions

This will:

  1. Use a sample of domains from custom_domains.py instead of querying BigQuery
  2. Process these domains through the same extraction pipeline used in production
  3. Upload favicons and domain metadata to the local GCS emulator
  4. Generate a local metrics report in the local_data directory

Examples

# With monitoring enabled
make nav-suggestions ENABLE_MONITORING=true

# With custom sample size
make nav-suggestions SAMPLE_SIZE=50

# With custom metrics directory
make nav-suggestions METRICS_DIR=./test_data

# With all options combined
make nav-suggestions SAMPLE_SIZE=30 METRICS_DIR=./test_data ENABLE_MONITORING=true

# Add any other options as needed
make nav-suggestions NAV_OPTS="--min-favicon-width=32"

Merino Jobs Operations

Dynamic Wikipedia Indexer Job

Merino currently builds the Elasticsearch indexing job that runs in Airflow. Airflow takes the latest image built as the base image. The reasons to keep the job code close to the application code are:

  1. Data models can be shared between the indexing job and application more easily. This means that data migrations will be simpler.
  2. All the logic regarding Merino functionality can be found in one place.
  3. Eliminates unintended differences in functionality due to dependency mismatch.

If your reason for re-running the job is needing to update the blocklist to avoid certain suggestions from being displayed, please see the wikipedia blocklist runbook.

Running the job in Airflow

Normally, the job is set as a cron to run at set intervals as a DAG in Airflow. There may be instances you need to manually re-run the job from the Airflow dashboard.

Grid View Tab (Airflow UI)

  1. Visit the Airflow dashboard for merino_jobs.
  2. In the Grid View Tab, select the task you want to re-run.
  3. Click on 'Clear Task' and the executor will re-run the job. merino_jobs UI Diagram

Graph View Tab (Airflow UI) - Alternative

  1. Visit the Airflow dashboard for merino_jobs.
  2. From the Graph View Tab, Click on the wikipedia_indexer_build_index_production task. merino_jobs Wikipedia Indexer Graph View
  3. Click on 'Clear' and the job will re-run. merino_jobs UI Task Instance Clear

Note: You can also re-run the stage job, but the changes won't reflect in production. Stage should be re-run in the event of an error before running in prod to verify the correction of an error.

See Airflow's documentation on re-running DAGs for more information and implementation details.

To see the code for the merino_jobs DAG, visit the telemetry-airflow repo. The source for the job is also in the 'code' tab in the airflow console.

To see the Wikipedia Indexer code that is run when the job is invoked, visit Merino jobs/wikipedia_indexer.

Merino Jobs Operations

CSV Remote Settings Uploader Job

The CSV remote settings uploader is a job that uploads suggestions data in a CSV file to remote settings. It takes two inputs:

  1. A CSV file. The first row in the file is assumed to be a header that names the fields (columns) in the data.
  2. A Python module that validates the CSV contents and describes how to convert it into suggestions JSON.

If you're uploading suggestions from a Google sheet, you can export a CSV file from File > Download > Comma Separated Values (.csv). Make sure the first row in the sheet is a header that names the columns.

Uploading suggestions (Step by step)

If you're uploading a type of suggestion that the uploader already supports, skip to Running the uploader below. If you're not sure whether it's supported, check in the merino/jobs/csv_rs_uploader/ directory for a file named similarly to the type.

To upload a new type of suggestion, follow the steps below. In summary, first you'll create a Python module that implements a model for the suggestion type, and then you'll run the uploader.

1. Create a Python model module for the new suggestion type

Add a Python module to merino/jobs/csv_rs_uploader/. It's probably easiest to copy an existing model module like mdn.py, follow along with the steps here, and modify it for the new suggestion type. Name the file according to the suggestion type.

This file will define the model of the new suggestion type as it will be serialized in the output JSON, perform validation and conversion of the input data in the CSV, and define how the input data should map to the output JSON.

2. Add the Suggestion class

In the module, implement a class called Suggestion that derives from BaseSuggestion in merino.jobs.csv_rs_uploader.base or RowMajorBaseSuggestion in merino.jobs.csv_rs_uploader.row_major_base. BaseSuggestion class will be the model of the new suggestion type. BaseSuggestion itself derives from Pydantic's BaseModel, so the validation the class will perform will be based on Pydantic, which is used throughout Merino. BaseSuggestion is implemented in base.py. If the CSV data is row-major based, please use RowMajorBaseSuggestion,

3. Add suggestion fields to the class

Add a field to the class for each property that should appear in the output JSON (except score, which the uploader will add automatically). Name each field as you would like it to be named in the JSON. Give each field a type so that Pydantic can validate it. For URL fields, use HttpUrl from the pydantic module.

4. Add validator methods to the class

Add a method annotated with Pydanyic's @field_validator decorator for each field. Each validator method should transform its field's input value into an appropriate output value and raise a ValueError if the input value is invalid. Pydantic will call these methods automatically as it performs validation. Their return values will be used as the values in the output JSON.

BaseSuggestion implements two helpers you should use:

  • _validate_str() - Validates a string value and returns the validated value. Leading and trailing whitespace is stripped, and all whitespace is replaced with spaces and collapsed. Returns the validated value.
  • _validate_keywords() - The uploader assumes that lists of keywords are serialized in the input data as comma-delimited strings. This helper method takes a comma-delimited string and splits it into individual keyword strings. Each keyword is converted to lowercase, some non-ASCII characters are replaced with ASCII equivalents that users are more likely to type, leading and trailing whitespace is stripped, all whitespace is replaced with spaces and collapsed, and duplicate keywords are removed. Returns the list of keyword strings.

5. Implement the class methods

For suggestion created from row-major based CSV, should add a @classmethod to Suggestion called row_major_field_map(). It should return a dict that maps from field (column) names in the input CSV to property names in the output JSON. Otherwise, should add a @classmethod to Suggestion called csv_to_suggestions(). It should return suggestion array created from passed CSV reader.

6. Add a test

Add a test file to tests/unit/jobs/csv_rs_uploader/. See test_mdn.py as an example. The test should perform a successful upload as well as uploads that fail due to validation errors and missing fields (columns) in the input CSV.

utils.py in the same directory implements helpers that your test should use:

  • do_csv_test() - Makes sure the uploader works correctly during a successful upload. It takes either a path to a CSV file or a list[dict] that will be used to create a file object (StringIO) for an in-memory CSV file. Prefer passing in a list[dict] instead of creating a file and passing a path, since it's simpler.
  • do_error_test() - Makes sure a given error is raised when expected. Use ValidationError from the pydantic module to check validation errors and MissingFieldError from merino.jobs.csv_rs_uploader to check input CSV that is missing an expected field (column).

7. Run the test

$ MERINO_ENV=testing uv run pytest tests/unit/jobs/csv_rs_uploader/test_foo.py

See also the main Merino development documentation for running unit tests.

8. Submit a PR

Once your test is passing, submit a PR with your changes so that the new suggestion type is committed to the repo. This step isn't necessary to run the uploader and upload your suggestions, so you can come back to it later.

9. Upload!

See Running the uploader.

Running the uploader

Run the following from the repo's root directory to see documentation for all options and their defaults. Note that the upload command is the only command in the csv-rs-uploader job.

uv run merino-jobs csv-rs-uploader upload --help`

The uploader takes a CSV file as input, so you'll need to download or create one first.

Here's an example that uploads suggestions in foo.csv to the remote settings dev server:

uv run merino-jobs csv-rs-uploader upload \
  --server "https://remote-settings-dev.allizom.org/v1" \
  --bucket main-workspace \
  --csv-path foo.csv \
  --model-name foo \
  --record-type foo-suggestions \
  --auth "Bearer ..."

Let's break down each command-line option in this example:

  • --server - Suggestions will be uploaded to the remote settings dev server
  • --bucket - The main-workspace bucket will be used
  • --csv-path - The CSV input file is foo.csv
  • --model-name - The model module is named foo. Its path within the repo would be merino/jobs/csv_rs_uploader/foo.py
  • --record-type - The type in the remote settings records created for these suggestions will be set to foo-suggestions. This argument is optional and defaults to "{model_name}-suggestions"
  • --auth - Your authentication header string from the server. To get a header, log in to the server dashboard (don't forget to log in to the Mozilla VPN first) and click the small clipboard icon near the top-right of the page, after the text that shows your username and server URL. The page will show a "Header copied to clipboard" toast notification if successful.

Setting suggestion scores

By default all uploaded suggestions will have a score property whose value is defined in the remote_settings section of the Merino config. This default can be overridden using --score <number>. The number should be a float between 0 and 1 inclusive.

Other useful options

  • --dry-run - Log the output suggestions but don't upload them. The uploader will still authenticate with the server, so --auth must still be given.

Structure of the remote settings data

The uploader uses merino/jobs/utils/chunked_rs_uploader.py to upload the output suggestions. In short, suggestions will be chunked, and each chunk will have a corresponding remote settings record with an attachment. The record's ID will be generated from the --record-type option, and its type will be set to --record-type exactly. The attachment will contain a JSON array of suggestion objects in the chunk.

Merino Jobs Operations

Geonames Uploader Job

The geonames uploader is a job that uploads geographical place data from geonames.org to remote settings. This data is used by the Suggest client to recognize place names and relationships for certain suggestion types like weather suggestions.

The job consists of a single command called upload. It uploads two types of records:

Core geonames data includes places' primary names, numeric IDs, their countries and administrative divisions, geographic coordinates, population sizes, etc. This data is derived from the main geoname table described in the geonames documentation.

A single place and its data is referred to as a geoname.

Alternate names are the different names associated with a geoname. A single geoname can have many alternate names since a place can have many different variations of its name. For example, New York City can be referred to as "New York City," "New York," "NYC," "NY", etc. Alternate names also include translations of the geoname's name into different languages. In Spanish, New York City is "Nueva York."

Alternate names are referred to simply as alternates.

Usage

uv run merino-jobs geonames-uploader upload \
    --rs-server 'https://remote-settings-dev.allizom.org/v1' \
    --rs-bucket main-workspace \
    --rs-collection quicksuggest-other \
    --rs-auth 'Bearer ...'

This will upload data for the countries and client locales that are hardcoded by the job.

Geonames records

Each geonames record corresponds to a partition of geonames within a given country. A partition has a lower population threshold and an optional upper population threshold, and the geonames in the partition are the geonames in the partition's country with population sizes that fall within that range. The lower threshold is inclusive and the upper threshold is exclusive.

If a partition has an upper threshold, its record's attachment contains its country's geonames with populations in the range [lower, upper), and the record's ID is geonames-{country}-{lower}-{upper}.

If a partition does not have an upper threshold, its attachment contains geonames with populations in the range [lower, infinity), and the record's ID is geonames-{country}-{lower}.

country is an ISO 3166-1 alpha-2 code like US, GB, and CA. lower and upper are in thousands and zero-padded to four places.

A partition can have a list of client countries, which are are added to its record's filter expression so that only clients in those countries will ingest the partition's record.

Partitions serve a couple of purposes. First, they help keep geonames attachment sizes small. Second, they give us control over the clients that ingest a set of geonames. For example, we might want clients outside a country to ingest only its large, well known geonames, while clients within the country should ingest its smaller geonames.

If there are no geonames with population sizes in a partition's range, no record will be created for the partition.

Types of geonames

Three types of geonames can be included in each attachment: cities, administrative divisions, and countries. Administrative divisions correspond to things like states, provinces, territories, and boroughs. A geoname can have up to four administrative divisions, and the meaning and number of divisions depends on the country and can even vary within a country.

Example geonames record IDs

  • geonames-US-0050-0100
    • US geonames with populations in the range [50k, 100k)
  • geonames-US-1000
    • US geonames with populations in the range [1m, infinity)

Alternates records

Each alternates record corresponds to a single geonames record and language. Since a geonames record corresponds to a country and partition, that means each alternates record corresponds to a country, partition, and language. The alternates record contains alternates in the language for the geonames in the geonames record.

The ID of an alternates record is the ID of its corresponding geonames record with the language code appended:

  • geonames-{country}-{lower}-{upper}-{language}
  • geonames-{country}-{lower}-{language} (for geonames records without an upper threshold)

language is a language code as defined in the geonames alternates data. There are generally three types of language codes in the data:

  • A two-letter ISO 639 language code, like en, es, pt, de, and fr
  • A locale code combining an ISO 639 language code with an ISO 3166-1 alpha-2 country code, like en-GB, es-MX, and pt-BR
  • A geonames-specific pseudo-code:
    • abbr - Abbreviations, like "NYC" for New York City
    • iata - Airport codes, like "PDX" for Portland Oregon USA
    • Others that we generally don't use

The input to the geonames uploader job takes Firefox locale codes, and the job automatically converts each locale code to a set of appropriate geonames language codes. Alternates record IDs always include the geonames language code, not the Firefox locale code (although sometimes they're the same).

If a geonames record includes client countries (or in other words has a filter expression limiting ingest to clients in certain countries), the corresponding alternates record for a given language will have a filter expression limiting ingest to clients using a locale that is both valid for the language and supported within the country.

If a geonames record does not include any client countries, then the corresponding alternates record will have a filter expression limiting ingest to clients using a locale that is valid for the language.

The supported locales of each country are defined in CONFIGS_BY_COUNTRY.

Alternates records for the abbr (abbreviations) and iata (airport codes) pseudo-language codes are automatically created for each geonames partition, when abbr and iata alternates exist for geonames in the parition.

Excluded alternates

The job may exclude selected alternates in certain cases, or in other words it may not include some alternates you expect it to. To save space in remote settings, alternates that are the same as a geoname's primary name or ASCII name are usually excluded.

Also, it's often the case that a partition does not have any alternates at all, or any alternates in a given language.

Example alternates record IDs

  • geonames-US-0050-0100-en
    • English-language alternates for US geonames with populations in the range [50k, 100k)
  • geonames-US-0050-0100-en-GB
    • British-English-language alternates for US geonames with populations in the range [1m, infinity)
  • geonames-US-1000-de
    • German-language alternates for US geonames with populations in the range [1m, infinity)
  • geonames-US-1000-abbr
    • Abbreviations for US geonames with populations in the range [1m, infinity)
  • geonames-US-1000-iata
    • Airport codes for US geonames with populations in the range [1m, infinity)

Country and locale selection

Because the geonames uploader is a complex job and typically uploads a lot of data at once, it hardcodes the selection of countries and Firefox locales. This means that, if you want to make any changes to the records that are uploaded, you'll need to modify the code, but the tradeoff is that all supported countries and locales are listed in one place, you don't need to run the job more than once per upload, and there's no chance of making mistakes on the command line.

The job does not re-upload unchanged records by default.

The selection of countries and locales is defined in the CONFIGS_BY_COUNTRY dict in the job's __init__.py. Here are example entries for Canada and the US:

CONFIGS_BY_COUNTRY = {
    "CA": CountryConfig(
        geonames_partitions=[
            Partition(threshold=50_000, client_countries=["CA"]),
            Partition(threshold=250_000, client_countries=["CA", "US"]),
            Partition(threshold=500_000),
        ],
        supported_client_locales=EN_CLIENT_LOCALES + ["fr"],
    ),
    "US": CountryConfig(
        geonames_partitions=[
            Partition(threshold=50_000, client_countries=["US"]),
            Partition(threshold=250_000, client_countries=["CA", "US"]),
            Partition(threshold=500_000),
        ],
        supported_client_locales=EN_CLIENT_LOCALES,
    ),
}

Each entry maps an ISO 3166-1 alpha-2 country code to data for the country. The data includes two properties:

  • geonames_partitions determines the geonames records that will be created for the country
  • supported_client_locales contributes to the set of languages for which alternates records will be created, not only for the country but for all countries in CONFIGS_BY_COUNTRY

geonames_partitions

geonames_partitions is a list of one or more partitions. Each partition defines its lower population threshold and client countries. The upper threshold is automatically calculated from the partition with the next-largest threshold.

Client countries should be defined for all partitions except possibly the last. If the last partition doesn't include client_countries, its record won't have a filter expression, so it will be ingested by all clients regardless of country.

In the example CONFIGS_BY_COUNTRY above, US geonames will be partitioned into three records:

  • geonames-US-0050-0100
    • US geonames with populations in the range [50k, 100k) that will be ingested only by US clients. Its filter expression will be env.country in ['US']
  • geonames-US-0100-0500
    • US geonames with populations in the range [100k, 500k) that will be ingested by US and Canadian clients. Its filter expression will be env.country in ['CA', 'US']
  • geonames-US-0500
    • US geonames with populations in the range [500k, infinity) that will be ingested by all clients. It won't have a filter expression.

supported_client_locales

supported_client_locales is a list of Firefox locales. The job will convert the locales to geonames alternates languages and create one alternates record per geoname record per country per language (generally -- see the caveat about excluded alternates).

Note that supported_client_locales is not necessarily a list of all conceivable locales for a country. It's only a list of locales that need to be supported in the country. In the example CONFIGS_BY_COUNTRY above, the entry for Canada includes both English and French locales. If you didn't need to support Canadian clients using the fr locale, you could leave out fr. If you did leave out fr but then added a CONFIGS_BY_COUNTRY entry for France, which presumably would include support for the fr locale, then French-language alternates for all countries in CONFIGS_BY_COUNTRY would be uploaded anyway, and Canadian clients using the fr locale would ingest them even though fr wasn't listed as a supported Canadian locale.

The example CONFIGS_BY_COUNTRY uses EN_CLIENT_LOCALES, which is all English locales supported by Firefox: en-CA, en-GB, en-US, and en-ZA. Up to 15 alternates records will be created for the three US geonames records due to the following math:

3 US geonames records * (
    `en` language
    + `en-CA` language
    + `en-GB` language
    + `en-US` language
    + `en-ZA` language
)

In reality, most of the US geonames records won't have geonames with alternates in the en-* languages, only the en language, so it's more likely that only the following alternates records will be created:

  • geonames-US-0050-0100-en
    • en language alternates for the geonames in the geonames-US-0050-0100 record. Its filter expression will be env.locale in ['en-CA', 'en-GB', 'en-US', 'en-ZA']
  • geonames-US-0100-0500-en
    • en language alternates for the geonames in the geonames-US-0100-0500 record. Its filter expression will be env.locale in ['en-CA', 'en-GB', 'en-US', 'en-ZA']
  • geonames-US-0500-en
    • en language alternates for the geonames in the geonames-US-0500 record. Its filter expression will be env.locale in ['en-CA', 'en-GB', 'en-US', 'en-ZA']
  • Plus maybe one or two en-GB and/or en-CA records

Operation

For each country in CONFIGS_BY_COUNTRY, the job performs two steps corresponding to the two types of records:

Step 1:

  1. Download the country's geonames from geonames.org
  2. Upload the country's geonames records
  3. Delete unused geonames records for the country

Step 2:

  1. Download the country's alternates from geonames.org
  2. For each alternates language, upload the country's alternates records
  3. Delete unused alternates records for the country

The job does not re-create or re-upload records and attachments that haven't changed.

Command-line options

As with all Merino jobs, options can be defined in Merino's config files in addition to being passed on the command line.

--alternates-url-format

Format string for alternates zip files on the geonames server. Should contain a reference to a country variable. Default value: https://download.geonames.org/export/dump/alternatenames/{country}.zip

--force-reupload

Recreate records and attachments even when they haven't changed.

--geonames-url-format

Format string for geonames zip files on the geonames server. Should contain a reference to a country variable. Default value: https://download.geonames.org/export/dump/{country}.zip

--rs-dry-run

Don't perform any mutable remote settings operations.

--rs-auth auth

Your authentication header string from the server. To get a header, log in to the server dashboard (don't forget to log in to the Mozilla VPN first) and click the small clipboard icon near the top-right of the page, after the text that shows your username and server URL. The page will show a "Header copied to clipboard" toast notification if successful.

--rs-bucket bucket

The remote settings bucket to upload to.

--rs-collection collection

The remote settings collection to upload to.

--rs-server url

The remote settings server to upload to.

Tips

Use attachment sizes to help decide population thresholds

Attachment sizes for geonames and alternates records can be quite large since this job makes it easy to select a large number of geonames. As you decide on population thresholds, you can check potential attachment sizes without making any modifications by using --rs-dry-run with a log level of INFO like this:

MERINO_LOGGING__LEVEL=INFO \
    uv run merino-jobs geonames-uploader upload \
    --rs-server 'https://remote-settings-dev.allizom.org/v1' \
    --rs-bucket main-workspace \
    --rs-collection quicksuggest-other \
    --rs-auth 'Bearer ...' \
    --rs-dry-run

Look for "Uploading attachment" in the output.

You can make the log easier to read if you have jq installed. Use the mozlog format and pipe the output to jq ".Fields.msg" like this:

MERINO_LOGGING__LEVEL=INFO MERINO_LOGGING__FORMAT=mozlog \
    uv run merino-jobs geonames-uploader upload \
    --rs-server 'https://remote-settings-dev.allizom.org/v1' \
    --rs-bucket main-workspace \
    --rs-collection quicksuggest-other \
    --rs-auth 'Bearer ...' \
    --rs-dry-run \
    | jq ".Fields.msg"

Merino ADRs

This directory archives all the Architectural Decision Records (ADRs) for Merino.

Locust vs k6; Merino-py Performance Test Framework

  • Status: Accepted
  • Deciders: Nan Jiang, Raphael Pierzina & Katrina Anderson
  • Date: 2023-02-21

Context and Problem Statement

Performance testing for the Rust version of Merino was conducted with the Locust test framework and focused on the detection of HTTP request failures. During the migration of Merino from Rust to Python, performance testing was conducted with k6 and focused on the evaluation of request latency. Going forward a unified performance testing solution is preferred, should the test framework be Locust or k6?

Decision Drivers

  1. The test framework supports the current load test design, a 10-minute test run with an average load of 1500RPS (see Merino Load Test Plan)
  2. The test framework measures HTTP request failure and client-side latency metrics
  3. The test framework is compatible with the Rapid Release Model for Firefox Services initiative, meaning:
    • It can execute through command line
    • It can signal failures given check or threshold criteria
    • It can be integrated into a CD pipeline
    • It can report metrics to Grafana
  4. The members of the DISCO and ETE teams are able to contribute to and maintain load tests written with the test framework

Considered Options

  • A. Locust
  • B. k6

Decision Outcome

Chosen option:

  • A. Locust

Both k6 and Locust are able to execute the current load test design, report required metrics and fulfill the Rapid Release Model for Firefox Services initiative; However, Locust's Python tech stack ultimately makes it the better fit for the Merino-py project. In-line with the team's single repository direction (see PR), using Locust will:

  • Leverage existing testing, linting and formatting infrastructure
  • Promote dependency sharing and code re-use (models & backends)

Pros and Cons of the Options

A. Locust

Locust can be viewed as the status quo option, since it is the framework that is currently integrated into the Merino-py repository and is the basis for the CD load test integration currently underway (see DISCO-2113).

Pros

  • Locust has a mature distributed load generation feature and can easily support a 1500 RPS load
  • Locust has built-in RPS, HTTP request failure and time metrics with customizable URL break-down
  • Locust scripting is in Python
  • Locust supports direct command line usage
  • Locust is used for load testing in other Mozilla projects and is recommended by the ETE team

Cons

  • Locust is 100% community driven (no
  • commercial business), which means its contribution level can wane
  • Preliminary research indicates that reporting metrics from Locust to Grafana requires the creation of custom code, a plugin or a third party integration

B. k6

For the launch of Merino-py, performance bench-marking was conducted using a k6 load test script (see Merino Explorations). This script was reused from the Merino rewrite exploration effort and has proven successful in assessing if Merino-py performance achieves the target p95 latency threshold, effecting preventative change (See PR). k6's effectiveness and popularity amongst team members is an incentive to pause and evaluate if it is a more suitable framework going forward.

Pros

  • k6 is an open-source commercially backed framework with a high contribution rate
  • k6 is built by Grafana Labs, inferring easy integration with dashboards
  • k6 has built-in RPS, HTTP request failure and time metrics with customizable URL break-down
  • k6 supports direct command line usage
  • k6 is feature rich, including built-in functions to generate pass/fail results and create custom metrics

Cons

  • The k6 development stack is in JavaScript/TypeScript. This means:
    • Modeling and backend layer code would need to be duplicated and maintained
    • Linting, formatting and dependency infrastructure would need to be added and maintained
  • k6 has an immature distributed load generation feature, with documented limitations
    • k6 runs more efficiently than other frameworks, so it may be possible to achieve 1500 RPS without distribution

Merino Suggest API Response Structure

  • Status: accepted
  • Deciders: Michelle Tran, Lina Butler, Nan Jiang, Wil Stuckey, Drew Willcoxon, Taddes Korris, Tiffany Tran
  • Date: 2023-04-20

Context and Problem Statement

As Merino continues to add more suggestions, suggestion providers are going to have to return all sorts of data to the clients that are bespoke to the particular suggestion. For instance, weather suggestion returns a temperature. Currently, we do not have a strategy to manage these bespoke pieces of data which results in them returned at the top level of the suggestion object. However, this will pose a problem when

  1. names of fields are shared between providers, but have different semantics (i.e. rating may be a decimal value between 0-1 in one type, and a "star" integer rating between 1-5 in another)
  2. the API is unclear about what will necessarily exist, and what is optional, which leads to client confusion about the contract

So, this ADR is to make a decision on how we want to handle provider specific fields going forward.

Decision Drivers

In rough order of importance:

  1. Explicitness of Ownership - i.e. the rating field belongs to the addons provider
  2. Compatibility with [JSON] Schema Validation
  3. Adherence to the Fx Suggest Design Framework
  4. Backwards Compatibility with Current Schema

Considered Options

  • A. Continue to add to Top Level with Optional Fields
  • B. Custom Details Field for Bespoke Provider Fields
  • B.5 Custom Details Field without the Provider Nesting
  • C. Custom Details Field for a "Type"
  • D. Component Driven custom_details

Decision Outcome

Chosen option: B

We will also not increase the version number of the API for this ADR. So, going forward, we will encode option B into the response design without changing the existing providers. This means that the following providers will not have their bespoke fields removed from top level:

  • AdM Provider
  • Top Picks Provider
  • Weather Provider
  • Wikipedia Provider
  • WikiFruit Provider

However, this does not preclude these providers from duplicating the fields to custom_details in the v1 API.

Positive Consequences of Option B

  • Clear isolation of fields that belong together (i.e. grouped by provider).
  • Clear ownership of fields through the structure.
  • Simpler validation logic than other options due to less need for conditionals.

Negative Consequences of Option B

  • Potentially some redundancy caused by extra nesting.
  • Might not be as flexible with a provider that returns different fields based on what type of suggestion it is.

Positive Consequences of not Increasing API Version

  • We do not have to worry about migrating Firefox (and other clients) into the new format. The migration is going to be quite a lot of extra work that adds little benefits (other than consistency of design, it doesn't add more features nor improve any known time sinks with development).
  • Do not have to support 2 versions of the API.

Negative Consequences of not Increasing API Version

  • Some inconsistencies with how providers add fields to the response. We will likely want to resolve this as we migrate to v2, but it's a known issue at the moment.
  • Might be missing an opportune time to migrate, as features are currently not out yet which means the flexibility for change is higher.

Pros and Cons of the Options

A. Continue to add to Top Level with Optional Fields

This is the status quo option. We will continue to append bespoke values to the top level suggestion, and ensure that they're optional. We can continue to use the provider to signal what fields exists and how they should be parsed. For example, we can specify 2 different types of rating, and hence 2 validation strategy for it, based off of which provider is specified.

Example:

{
  "suggestions": [
    {
      ...
      "provider": "addons",
      "rating": "4.123",
      ...
    },
    {
      ...
      "provider": "movies",
      "rating": 0.123,
      ...
    },
    ...
  ],
  ...
}

The partial JSON Schema validation will look something like:

{
  "type": "object",
  "properties": {
    "provider": {
      "type": "string"
    }
  },
  "required": ["provider"],
  "allOf": [
    {
      "if": {
        "properties": {
          "provider": {
            "const": "addons"
          }
        }
      },
      "then": {
        "properties": {
          "rating": {
            "type": "string"
          }
        },
        "required": [
          "rating"
        ]
      }
    },
    {
      "if": {
        "properties": {
          "provider": {
            "const": "movies"
          }
        }
      },
      "then": {
        "properties": {
          "rating": {
            "type": "number"
          }
        },
        "required": [
          "rating"
        ]
      }
    }
  ]
}

Pros

  • Can specify specific validation per provider.
  • Merino is still kind of immature, so it still might be too early to think about design.
  • Less nesting in the models (resulting in less complexity).
  • Currently, backwards compatible as we don't have to do anything to existing providers, as this follows the existing patterns.

Cons

  • Lack of isolation for bespoke fields; ratings is coupled with 2 specific providers, and by just looking at the response, it's not clear that they are related.
  • Not clear what is shared between all suggestions, vs. what is bespoke to specific provider.
  • It is not obvious that the provider field should signal how you should perform validation. In other words, there is a contextual dependency on the JSON structure of suggestion based on provider.

B. Custom Details Field for Bespoke Provider Fields

We introduce a custom_details field that uses a provider name as key to an object with the bespoke values to that provider.

Example:

{
  "suggestions": [
    {
      ...
      "provider": "addons",
      "custom_details": {
        "addons": {
          "rating": "4.7459"
        }
      }
    },
    ...
  ],
  ...
}

The specific fields in custom_details will all be optional (i.e. addons will be an optional key) but the shape of what goes in addons can be more strict (i.e. addons require a rating field).

A partial schema specification for the above might look like1:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Suggest API Response v1",
  "description": "Response for /api/v1/suggest",
  "type": "object",
  "properties": {
    "provider": {
      "description": "id for the provider type",
      "type": "string"
    },
    "custom_details": {
      "type": "object",
      "properties": {
        "addons": {
          "type": "object",
          "description": "Custom Addon Fields",
          "properties": {
            "rating": {
              "type": "number"
            }
          },
          "required": ["rating"]
        }
      }
    }
  },
  "required": ["provider"]
}
1

Can play with JSON schema in https://www.jsonschemavalidator.net/

Pros

  • Can specify specific validation per provider.
  • Clear ownership of rating to addons via structure.
  • Fields outside of custom_details can be fields that are more universal across suggestions. These fields can potentially be correlated directly to the Fx Suggest Design Framework (i.e. context_label, url, title, description, etc.).
  • Having a clear distinction for Fx Suggest Design Framework fields vs. bespoke fields makes this more backwards compatible, as the fields in the Design Framework can render the default suggestion case for clients who haven't upgraded their clients.

Cons

  • We'll likely need to migrate existing providers at some point. But in the meantime, some fields will not follow convention to maintain backwards compatibility.
  • Extra nesting inside of custom_details.

B.5 Custom Details Field without the Provider Nesting

This is exactly like B, except that we remove the extra nesting.

So, in the example above, we can remove the extra addons object to get:

{
  "suggestions": [
    {
      ...
      "provider": "addons",
      "custom_details": {
        "rating": "4.7459"
      }
    },
    ...
  ],
  ...
}

The validation of the contents of custom_details will look more like A.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Suggest API Response v1",
  "description": "Response for /api/v1/suggest",
  "type": "object",
  "properties": {
    "provider": {
      "description": "id for the provider type",
      "type": "string"
    }
  },
  "required": [
    "provider"
  ],
  "if": {
    "properties": {
      "provider": {
        "const": "addons"
      }
    }
  },
  "then": {
    "properties": {
      "custom_details": {
        "description": "Custom Details Specific for Addons",
        "type": "object",
        "properties": {
          "rating": {
            "type": "string"
          }
        },
        "required": [
          "rating"
        ]
      }
    },
    "required": ["custom_details"]
  }
}

Pros

  • Can specify specific validation per provider.
  • Fields outside of custom_details can be fields that are more universal across suggestions. These fields can potentially be correlated directly to the Fx Suggest Design Framework (i.e. context_label, url, title, description, etc.).
  • Having a clear distinction for Fx Suggest Design Framework fields vs. bespoke fields makes this more backwards compatible, as the fields in the Design Framework can render the default suggestion case for clients who haven't upgraded their clients.
  • Less nesting in the response than B

Cons

  • We'll likely need to migrate existing providers at some point. But in the meantime, some fields will not follow convention to maintain backwards compatibility.
  • The relationship between provider and custom_details is more implicit, than explicit.
  • This has a lot of the same cons as Option A because validation is done similarly.

C. Custom Details Field for a "Type"

This is similar to option B, except that we want to introduce a new type field to differentiate it from the provider. The custom_details will be keyed by this type, rather than the provider name. These types are kind of analogous to a rendering component, as they will likely be used to specify a specific rendering path in the client.

Example:

{
  "suggestions": [
    {
      ...
      "provider": "addons",
      "type": "addons_type",
      "custom_details": {
        "addons_type": {
          "rating": "4.7459"
        }
      }
    },
    ...
  ],
  ...
}

Pros

  • All the pros for B applies here
  • Can decouple the custom_details from provider. This will be helpful for potentially sharing the type with other suggestions produced by different providers. For instance, we may want this to specify different rendering paths in the client (i.e. a "top picks" type to be shared between addons and top_picks providers, as there's many shared fields because they're rendered similarly).

Cons

  • All the cons for B applies here
  • Potentially over-engineering for type, as it's use is currently hypothetical.

D. Component Driven custom_details

This solution will model distinct UI components in the custom_details section. For example, if the addons provider have specific UI components to render a ratings component and a highlight_context_label, then we can specify these directly in the custom_details section. This will assume that the client side have these specific rendering types.

Example:

{
  "suggestions": [
    {
      ...
      "provider": "addons",
      "custom_details": {
        "ratings": {
          "value": "4.7459",
          "unit": "stars"
        },
        "highlight_context_label": {
          "text": "Special Limited Time Offer!"
        }
      }
    },
    ...
  ],
  ...
}

Pros

  • Can share custom components with schema validation.
  • Backwards compatible with clients who don't have the necessary components to render. It will just use the default renderer via the Fx Suggest Design Framework

Cons

  • We currently don't have a sophisticated Component Design Framework, so this is probably overengineering.
  • This tightly couples the API to the design framework of Desktop Firefox, which makes the fields potentially less relevant to other clients.

Streamline Test Coverage of Third-Party Integrations

  • Status: Accepted
  • Deciders: Nan Jiang & Katrina Anderson
  • Date: 2024-01-24

Context and Problem Statement

In 2024, it is anticipated that Merino will expand to be consumed by a greater set of Firefox surfaces and to include more content providers. This will challenge the current feature test strategy, which has shown weakness in detecting incompatibilities with third-party integrations. Examples:

  1. Weather Incident with Redis Integration
  2. Navigation Suggestions Error with GCP Integration

The current test approach uses a combination of unit, integration, and contract feature tests, where third-party integrations such as cloud services, data storage services, and external API integrations are test doubled in the unit and integration tests and emulated with Docker containers in contract tests. While test doubles might be easier to work with, they lack the accuracy of working with real dependencies in terms of matching the production environment and covering all the integration surfaces and concerns in tests.

Despite the potential to test with third-party integrations in contract tests, developers have refrained due to their lack of familiarity with Docker and CI tooling, as well as their belief in a poor return on investment for the time and effort required to create and maintain contract tests for experimental features.

Given the Merino service context, which has a rapid development pace and a high risk tolerance, is there a way to streamline the test strategy while ensuring robustness against third-party integrations?

Decision Drivers

1. Usability & Skill Transferability

Ease-of-use is the key criterion when we assess a tool for testing. The test strategy should prefer tools that require less effort and time to acquire proficiency. It should be easy to learn and work with. Ideally, any skills or knowledge acquired should be applicable across current contexts or for future scenarios.

2. Maturity & Expandability

The test strategy and tooling should be able to handle known third-party Merino dependencies in tests with a reasonable belief that it will cover future growth. Known third-party dependencies include: REST APIs, Remote Settings, GCP Cloud Storage, Redis, and Elasticsearch. Future dependencies include: relational DBMS such as PostgreSQL and other GCP cloud services such as Pub/Sub.

3. Cost Efficiency

The worker hours and tooling expenditures associated with the implementation and execution of the test strategy should ensure the profitability of Merino.

Considered Options

  • A. Yes. Expand the Scope of Integration Tests Using Dependency Docker Containers (Testcontainers)
  • B. Yes. Reduce the Dependency Overhead in Tests Using Development and Stage Environments
  • C. No. Fulfill the Current Test Strategy with Contract Test Coverage (Status quo)

Decision Outcome

Chosen option: A

Overall, we believe that increasing the scope of integration tests to verify third-party integrations with Testcontainers will be the most effective and sustainable way forward. Testcontainers' "Test dependencies as code" approach best fulfills the Usability & Skill Transferability and Maturity & Expandability decision drivers and long-term would prove to be the most Cost Efficient option.

We expect there to be initial labour costs to integrating Testcontainers, but anticipate that moving more verification responsibility to the integration test layer will be more accessible for developers and will reduce bugs found between Merino and third-party integrations.

Testcontainers is a widely adopted container-based test platform that supports a wide range of programming languages including Python and Rust, which are popular at Mozilla and there is indication that Testcontainers would have applicability across services in PXI. Given the success of Rapid Release and other experiments in Merino, it's a good candidate to use in Merino first as a pilot test. Should we find any issues or unsoundness about it, we can always revert the decision in the future.

Pros and Cons of the Options

A. Yes. Expand the Scope of Integration Tests Using Dependency Docker Containers (Testcontainers)

A preference for the unit and integration feature test layers in Merino has emerged over time. These test layers are white-box, which means developers can more easily set up program environments to test either happy paths or edge cases. In addition, tooling for debugging and measuring code coverage is readily available in these layers.Testcontainers can be used to increase the scope of integration tests, covering the backend layer logic and network communication with third-party integrations, the current test strategy's point of weakness.

Pros

  • Testcontainers works with any Docker image and has numerous pre-built modules. Almost all the existing dependencies (or their close emulators) of Merino can be run as Docker containers. As a result, we can use real dependencies in Merino's tests as opposed to test doubles
  • Testcontainers allows developers to programmatically manage the lifecycle of containers in the test code. This simplifies its usage for developers, who will not need to run any Docker commands separately for testing
  • Testcontainers, which has recently been acquired by Docker, is fairly mature and supports many popular programming languages. There are also a large number of community maintained clients available for popular services such as PostgreSQL, Redis, Elasticsearch, etc.
  • Testcontainers is lightweight and sandboxed, meaning service resources aren't shared and are cleaned up automatically, promoting test isolation and parallelization
  • Docker-compose is also supported by Testcontainers, facilitating use of multiple dependency containers for more complex test cases
  • Testcontainers supports Python, Rust and Javascript languages and works well with their respective test frameworks PyTest, Cargo-Test and Jest

Cons

  • A Docker runtime is required to run all the tests that depend on Testcontainers. Docker is already setup in CI, but developers may need to install a Docker runtime locally
  • Integration tests cannot be run completely offline as Docker images need to be downloaded first
  • Developers will need to understand more about how to configure and work with dependency containers. The development community has many popular services out of the box, but developers would still need to know and do more than what's required when using test doubles
  • It could be challenging to provision test fixtures for the underlying containers. Feeding the fixture data into the containers could be complex
  • Developers need to ensure version consistency across the development, testing, and production environments in the integration test layer
  • For third-party API integrations, if the provider doesn't provide a Docker image for their API, Testcontainers alone will not help us much. It's possible to use fake API container generators, such as Wiremock, but it comes with its own complexities
  • Implementation of Testcontainers would require refactoring of integration tests, including the removal of mocks and fixtures

B. Yes. Reduce the Dependency Overhead in Tests Using Development and Stage Environments

Using Merino's staging environment and third-party development resources in tests has been considered. This would effectively cover the current test strategy's weakness with third-party integrations without the cost and complexity involved with setting up test doubles or dependency containers. However, this approach has a key challenge in how to share the stage environment across all the test consumers (devs & CI) as most of the services do not support multi-tenant usage and would require a significant amount of effort to support resource isolation.

Pros

  • Best matches the production environment
  • Doesn't need extra effort to create test doubles or dependencies for testing

Cons

  • Tests cannot be run offline since they would require a network connection to interact with development and stage environments
  • This option breaks the Testing Guidelines & Best Practices for Merino, which require tests to be isolated and repeatable. A dependency on shared network resources will almost certainly lead to test flake, reducing the confidence in the test suite
  • Test execution speeds would be negatively impacted, due to the lack of sandboxing, which enables parallel test runs

C. No. Fulfill the Current Test Strategy with Contract Test Coverage (Status quo)

The current test strategy, which relies on the contract tests to verify the interface between Merino and third-party dependencies, has not been fully implemented as designed. The missing coverage explains the current test strategy's weakness. Examples:

  1. DISCO-2032: Weather Contract Tests
  2. DISCO-2324: Add a merino-py contract test that interacts with a real Redis instance
  3. DISCO-2055: Dynamic Wikipedia Contract Tests

Pros

  • The most cost-effective solution, at least in the short term, since the test framework and Docker dependencies are set up and integrated into CI
  • The unit and integration feature test layers remain simple by using test doubles

Cons

  • The black-box nature of contract tests makes it harder to set up the environmental conditions required to enable testing edge cases
  • Adding dependency containers is complex, often requiring developers to have advanced knowledge of Docker and CI vendors (e.g. CircleCI)
  • There is a high level of redundancy between unit, integration, and contract tests that negatively impacts development pace

Open Questions

  • How to test 3rd party API integrations? We have two options for consideration: Either use generic API mocking frameworks or keep status quo and rely on other means (e.g. observerbility) to capture API breakages. They both have pros and cons and warrant a separate ADR to discuss in detail

Assure Endpoint Functionality and Load Test Suite Integrity with Default Load Tests

  • Status: Accepted
  • Deciders: Katrina Anderson & Nan Jiang
  • Date: 2024-11-04

Context and Problem Statement

Currently, load tests for the Merino service are executed on an opt-in basis, requiring contributors to use their judgement to execute load tests as part of a deployment to production. Contributors opt-in to load testing by including the [load test: (abort|warn)] substring in their commit message. The abort option prevents a production deployment if the load testing fails, while the warn option provides a warning via Slack and allows the deployment to continue in the event of failure.

This strategy has several drawbacks:

  • Load tests are run infrequently, making it difficult to establish performance trends or trace regressions to specific changes
  • Relying on contributors to decide when to run load tests has proven unreliable. Developers occasionally introduce changes that silently break the load testing suite, particularly when new dependencies are added (Example: DISCO-3026)
  • The SRE team currently lacks the capacity to implement a weekly load test build (Example: SVCSE-2236)
    • On a related note, due to the same capacity issues, the SRE team has indicated that a smoke test suite can't be integrated into the CD pipeline until Merino moves from GCP v1 to GCP v2, leaving a gap in coverage (Example DISCO-2861)

Given these drawbacks, is there a way to provide greater consistency and more reliable feedback on the performance of Merino's API endpoints and the health of its load test suite?

Decision Drivers

Resource Consumption
The solution should ensure API quotas with third-party providers, such as AccuWeather, are respected.

Load Test Break Detection
The solution should notify contributors when they introduce changes that break the load tests.

Performance Trending
The solution should enable the establishment of consistent and reliable performance trends for Merino-py endpoints, allowing contributors to quickly identify regressions.

Deployment Efficiency
The solution should minimize delays in the deployment process while ensuring that critical issues are flagged promptly.

Considered Options

  • A. Turn on [load test: warn] by default with opt-out option
  • B. Turn on [load test: abort] by default with opt-out option
  • C. Weekly manual execution of load tests
  • D. Status quo: Keep current strategy

Decision Outcome

Chosen option: A. Turn on [load test: warn] by default with opt-out option

Until a weekly load test run and smoke tests can be incorporated into the CD pipeline, the decision is to turn on [load test: warn] by default and add an opt-out option, [load test: skip]. This will provide much-needed insight into the performance and health of Merino’s API endpoints, while giving contributors early feedback on the integrity of the load test suite. Additionally, this approach will pave the way for the deprecation of Contract Tests, reducing overall test maintenance.

Note: The policy for documenting load test results in the Merino Load Test spreadsheet will remain unchanged. Contributors may decide when it's necessary to do so, for example when a load test fails.

Pros and Cons of the Options

A. Turn on [load test: warn] by default with opt-out option

This option would ensure that load tests run automatically during deployments, with failures generating warnings but not blocking the deployment. Contributors would have the ability to opt-out of load tests using a new option, [load test: skip].

Pros

  • Load tests would run more frequently, providing consistent feedback on Merino API endpoints and acting as a lightweight smoke test
  • Contributors would receive early warnings if their changes break the load test suite, allowing issues to be traced back to specific pull requests
  • The work required to implement this change is minimal and includes:
    • Modifying the smoke load test curve to minimize runtime and API resource consumption
    • Updating the CircleCI configuration
    • Updating documentation

Cons

  • This approach would increase deployment time by approximately 10 minutes and could worsen an existing issue where concurrent merges to the main branch do not queue as expected, resulting in simultaneous deployments that may invalidate load tests
  • If production deployments were to increase dramatically, there is potential to exceed 3rd party API quotas

B. Turn on [load test: abort] by default with opt-out option

This option would also ensure that load tests run automatically during deployments, but production deployments would be blocked if the load tests fail. Contributors would have the option to opt-out of load tests with a new option, [load test: skip].

Pros

Includes the Pros from Option A, plus:

  • Ensures that broken API endpoints are not deployed to users, maintaining the integrity of the service

Cons

Includes the Cons from Option A, plus:

  • Critical features and fixes may be delayed if the load tests themselves are broken, leading to unnecessary deployment blockages

C. Weekly manual execution of load tests

This option involves a member of the DISCO team manually triggering a load test on a weekly basis. The load test could be triggered via PR or manually via a bash script.

Pros

  • Regular load testing would allow the team to establish meaningful performance trends
  • Breaks in the load test suite would be detected within a reasonable timeframe, making them easier to trace

Cons

  • This approach does not address the coverage gap for API endpoint verification during deployment
  • It is time-consuming for the DISCO team, and depending on the trigger technique, it may be error-prone
    • For example, if a DISCO team member triggers the load test via bash script and forgets to tear down the GCP cluster after use, unnecessary costs will be incurred

D. Status quo: Keep current strategy

This option involves continuing with the current opt-in approach, where load tests are only run if contributors explicitly include them in their deployment process, until the SRE team can prioritize test strategy changes.

Pros

  • Requires no additional work or changes to the current setup.

Cons

  • Breakages in the load testing suite due to environmental, configuration, or dependency changes will continue to go undetected
  • The lack of regular load tests prevents contributors from gathering sufficient data to establish meaningful performance trends

Asycnchronous Python Google Cloud Storage Client

  • Status: Accepted
  • Deciders: Nan Jiang, Herraj Luhano
  • Date: 2025-02-04

Context and Problem Statement

The Merino application has expanded to include the /manifest endpoint that interacts with a Google Cloud Storage (GCS) bucket. Currently, Merino relies on the official Google Cloud Python client (google-cloud-storage) for interacting with GCS for weekly job runs, but this client is synchronous.

Since the /manifest endpoint handles requests in an asynchronous web environment, using a synchronous client would block the main thread, leading to performance issues. To work around this, we currently offload GCS operations to thread pool dedicated for running synchronous workloads, but this adds unnecessary complexity.

To simplify the implementation and fully leverage asynchronous capabilities, we are considering adopting talkiq/gcloud-aio-storage, a community-supported asynchronous Python client for Google Cloud Storage. This would allow us to perform GCS operations without blocking the main thread, leading to cleaner and more efficient code.

Decision Drivers

  1. Deteriorated performance due to /manifest requests blocking the main thread.
  2. Additional complexity due to implementing custom logic for background tasks.

Considered Options

  • A. gcloud-aio-storage.
  • B. google-cloud-storage (Existing official synchronous Python client).

Decision Outcome

Chosen option:

A. gcloud-aio-storage

gcloud-aio-storage appears to be the most widely used community-supported async client for Google Cloud Storage. It has fairly decent documentation, is easy to set up and use, and aligns well with Merino’s asynchronous architecture. Adopting it will simplify integration while ensuring non-blocking GCS interactions in the /manifest endpoint.

Positive Consequences

  • Seamless integration with existing implementation and logic. As an async client, it comes with native async APIs to GCS, which substantially simplifies the usage of GCS in Merino. Particularly, no more offloading synchronous calls over to the thread pool.
  • Easy authentication -- No extra steps needed for authentication. Uses the same logic as the exisiting sync client.
  • Provides other asynch clients as well -- gcloud-aio library has modules for other Google Cloud entities such as BigQuery, PubSub, e.t.c, which will be useful in the future.

Negative Consequences

  • The SDK api is slightly different to the official one -- When it comes to wrapper classes and return types. Although, it supports the basic wrapper classes for entities such as Blob and Bucket, some of the types are more raw / basic. This could be seen as allowing for implementation flexibility, however, it does introduce some verbosity.
  • Not officially supported by Google -- Relying on community contributors for support and updates. Will have to migrate to the official async one if/when Google releases one.
  • Two GCS clients -- Merino will use both the async client for the web app mode, and the official sync client for Merino jobs, which might cause confusion.

Pros and Cons of the Synchronous Client

Pros

  • Officially Supported by Google – Maintained and supported by Google, ensuring long-term reliability, security updates, and compatibility with GCS features.

  • Official Documentation & Large User Base – Extensive official documentation and a large user base, making it easier to find solutions to issues.

  • Consistent with Existing Usage – Already used in Merino’s jobs component, reducing the need to maintain multiple clients for the same service.

  • No Additional Dependencies – Avoids adding a third-party dependency, reducing potential maintenance overhead.

Cons

  • Blocks the Main Thread – The client is synchronous, which can lead to performance issues in Merino’s /manifest endpoint by blocking request handling.

  • Workarounds Add Complexity – Using background tasks to offload GCS operations introduces unnecessary complexity and potential race conditions.

  • Inconsistent with Merino’s Async Architecture – Merino is built to be asynchronous, and using a sync client requires special handling, breaking architectural consistency.

  • Potential Scalability Issues – Blocking I/O operations can slow down request processing under high load, reducing overall efficiency.

  • Misses Out on Async Benefits – Async clients improve responsiveness and throughput by allowing other tasks to execute while waiting for network responses.

Suggest New Provider Integrations Endpoint Split

  • Status: Rejected
  • Deciders: Herraj Luhano, Nan Jiang, Drew Willcoxon, Chris Bellini, Temisan Iwere, Bastian Gruber
  • Date: 2025-06-08

Context and Problem Statement

We want to review the current implementation of the suggest endpoint and determine either to continue expanding it or to introduce new endpoints to support the upcoming new suggestion providers that are "non-standard" providers.

Standard suggest providers return results based on full or partial search queries—essentially suggesting content as the user types. Examples include adm, amo, top_picks, and wikipedia.

Non-standard suggest providers return a specific result triggered by an exact keyword match. For instance, accuweather provides weather details when the user enters a city name followed by the keyword weather.

Current Implementation

Currently, all suggest providers—including third-party ones (Accuweather)—use the /suggest/ endpoint. Here's a high-level overview of the request flow for a weather suggestion:

  1. A request hits the /suggest endpoint.
  2. The following query parameters are accepted:
    • request
    • q
    • country
    • region
    • city
    • providers
    • client_variants
    • sources
    • request_type

The suggest() method processes the request by:

  1. Extracting metrics_client and user_agent from middleware.
  2. Removing duplicate provider names (if passed via the providers query param).
  3. Creating local variables such as languages and geolocation to be passed to the providers' query() methods.
  4. Looping through each provider to:
    • Construct a SuggestionRequest object from the query params.
    • Call the provider’s validate() method.
    • Call the provider’s query() method (which does all the actual processing).
    • Add each successful async task to a list.
  5. Performing additional logic and emitting per-request and per-suggestion metrics.
  6. Building a SuggestResponse object with a list of suggestions and other metadata.
  7. Adding TTL and other headers to the final response.
  8. Returning an ORJSONResponse.

Limitations of the Current Implementation

This implementation highlights how the /suggest endpoint is built to support a flexible, provider-agnostic flow. However, it comes with significant overhead—shared parsing logic, dynamic provider resolution, and assumptions like multi-suggestion responses —that don’t align well with the needs of upcoming providers. The problem statement asks whether we should continue extending this shared machinery or introduce new, purpose-built endpoints. Understanding the complexity and coupling in the current flow helps clarify why a new endpoint may offer a cleaner, more maintainable path forward for future provider integrations. See the Accuweather provider example below.

Accuweather Provider

The Accuweather provider currently uses this same endpoint to serve both weather suggestions and widget data. However, it's tightly coupled to all the suggest-related types, protocols, and abstractions. This coupling became especially apparent when implementing custom TTL logic for weather responses, which had to be awkwardly threaded through unrelated suggest components.

Moreover, the SuggestResponse type requires a suggestions list. But for weather—and likely for many new providers—we only return a single suggestion per request.

Future Considerations

Now that we’re planning to add 5+ new providers for the Firefox search and suggest feature, we should reconsider whether this shared approach is still appropriate. These new providers will each have their own query parameters, request/response shapes, and logic for upstream API calls and formatting.

The only requirement is that the final API response must conform to the SuggestResponse format expected by Firefox.

Decision Drivers

  1. Separation of entities and mental model
  2. Addressing the growing complex custom logic
  3. Ergonomics for the client-side integration

Considered Options

  • A. Continue using the existing /suggest endpoint and extend it to support new providers.
  • B. Create a separate endpoint for each provider, each with its own request/response handling logic.
  • C. Create a single new endpoint for all non-standard providers (i.e., those that don’t follow the typical suggest flow or response shape).

Pros & Cons of Each Option

Option A

Pros

  1. Consistent client interface -- No need to change frontend code or contracts; clients already know how to use /suggest.
  2. Shared logic and infrastructure -- Leverages existing abstractions like middleware, metrics, and response formatting.

Cons

  1. Overgeneralized interface -- Forces all providers to conform to a common structure, even when their needs (params, shape, TTL) are different.
  2. Hard to scale and maintain -- Adding each new provider increases complexity and coupling, making the suggest logic harder to reason about.

Option B

Pros

  1. Full flexibility per provider -- Each provider can define its own request/response model and internal flow, with no need to conform to shared logic.
  2. Clear separation of concerns -- Isolates logic and failures per provider, making debugging and ownership more straightforward.

Cons

  1. Client complexity -- The frontend would need to know which endpoint to call per provider, increasing client-side branching or routing logic.
  2. Maintenance overhead -- More endpoints to monitor, document, test, and version over time.

Option C

Pros

  1. Clean separation from the legacy /suggest logic -- Avoids polluting the current flow with special cases while still avoiding endpoint proliferation.
  2. Balance of structure and flexibility -- A shared endpoint can still dispatch to internal handlers, allowing each provider to have tailored logic behind a unified interface.

Cons

  1. Yet another endpoint to manage -- Slight increase in complexity at the infra/API gateway level.
  2. Internal dispatching still requires careful design -- You still need to decide how to route requests internally (e.g., by provider param) and validate inputs correctly without repeating /suggest-style logic.

Case for Option C

1. Encapsulation of Divergent Logic

The new providers will likely have custom logic around query parameters, upstream requests, and response formatting. Trying to shoehorn this into the existing /suggest flow would introduce complexity and conditionals that hurt maintainability.

A new endpoint provides a clean separation between "standard" suggest logic and custom workflows.

2. Avoids Tight Coupling

The existing implementation is tightly coupled to SuggestResponse, middleware-derived state, and other shared abstractions.

Decoupling non-standard providers from that machinery avoids repeating the friction you experienced with Accuweather (e.g., threading TTL logic and handling one-item responses in a list-based structure).

3. Simplifies Onboarding of Future Providers

With a flexible endpoint, you can tailor the request/response contract to match each provider's needs while maintaining a consistent response format for Firefox.

This reduces the amount of edge-case handling required and lowers the cognitive load for developers onboarding new providers.

4. Maintains Backward Compatibility

Keeping /suggest intact for legacy or conforming providers avoids breaking existing consumers.

You can gradually migrate providers to the new endpoint as needed.

Decision Outcome

Chosen option:

  • A. Option A -- Continue using the existing /suggest endpoint and extend it to support new providers.

Based on the discussion and feedback from the DISCO and Search & Suggest team engineers, we'll proceed with the current implementation using the existing /suggest endpoint for the new provider integrations. Since there's no pressing need to introduce a new endpoint and this approach aligns better with the client’s expectations, it makes sense to avoid unnecessary complexity for now. Down the line, we can revisit the endpoint design if needed and have a broader conversation around evolving the request/response structure to better support both legacy and new providers.

Load GCS Data via SyncedGcsBlob in Merino

  • Status: Accepted
  • Deciders: Bastian Gruber, Herraj Luhano, Mathijs Miermans, Nan Jiang
  • Date: 2025-07-24

Context and Problem Statement

The following Merino providers load data from Google Cloud Storage (GCS):

  • TopPicks - reloads suggestion data every 12 hours
  • Manifest - reloads domain metadata (icons, titles, categories) every hour
  • Curated Recommendations - uses SyncedGcsBlob to check every minute and load data only when the blob is updated

TopPicks and Manifest are loaded periodically without checking whether the data was updated. This requires a longer reload time, leading to stale data, unnecessary resource use, and duplicated maintenance effort.

The existing SyncedGcsBlob class, used by the curated recommendations endpoint, periodically checks the updated timestamp of GCS blobs and reloads data only when blobs are updated, calling a callback function to parse and cache the data. However, SyncedGcsBlob currently uses Google's synchronous GCS client (google-cloud-storage), offloading synchronous calls to a thread pool to avoid blocking the main event loop (as previously addressed in ADR 0005).

Should we standardize on SyncedGcsBlob for all providers, and further, should we enhance it to use the asynchronous gcloud-aio-storage client to simplify our implementation?

Decision Drivers

  1. Minimize latency from GCS updates to Merino data availability.
  2. Minimize performance overhead by frequent metadata checks (not full blob).
  3. Simplify maintenance with a consistent solution.
  4. Continue to avoid blocking Merino's async event loop (ADR 0005).

Considered Options

OptionSummaryProsCons
ASyncedGcsBlob + gcloud-aio-storage (preferred)Adopt async GCS client within SyncedGcsBlobLow latency, minimal memory usage, non-blocking I/O, consistent implementationRequires refactor
BSyncedGcsBlob with current sync GCS clientUse existing SyncedGcsBlob (status quo)Low latency, existing solutionMemory overhead from threads
CProvider-specific loadersKeep per-provider loading logicNo immediate refactor, flexible per providerDuplicate logic, higher maintenance
DGCS → Pub/Sub pushEvent-driven notificationsInstant updatesIncreased complexity, more infrastructure
  • Option A: Adopt SyncedGcsBlob enhanced with gcloud-aio-storage.

This approach efficiently reduces latency, simplifies asynchronous handling by removing the thread-pool complexity in the current implementation of SyncedGcsBlob, and maintains the decision outcome of ADR-0005. By integrating gcloud-aio-storage into SyncedGcsBlob, it will natively use asynchronous I/O, and stop offloading synchronous calls to threads.

Positive Consequences

  • Minimal latency: Rapid propagation of GCS updates.
  • Low memory overhead: Leveraging async avoids creating a separate thread for each job.
  • Low network overhead: Frequent metadata checks instead of full downloads.
  • Maintenance ease: Standard implementation across providers.

Negative Consequences

  • Initial refactor: Existing providers require updating.

Implementation Impact

Adopting this approach will require the following changes:

  • SyncedGcsBlob - Refactor to use the async gcloud-aio-storage client instead of the synchronous client
  • TopPicks - Replace periodic reload with SyncedGcsBlob to check for updates
  • Manifest - Replace periodic reload with SyncedGcsBlob to check for updates

Curated Recommendations already uses SyncedGcsBlob and will benefit from the async refactor without requiring significant changes.

Usage Example

Here's how providers will initialize and use the enhanced asynchronous SyncedGcsBlob:

async def initialize_provider_backend() -> ProviderBackend:
    try:
        storage_client = AsyncStorageClient(...)  # from gcloud-aio-storage
        synced_blob = SyncedGcsBlob(
            storage_client=storage_client,
            metrics_client=get_metrics_client(),
            metrics_namespace="provider.data",
            bucket_name=settings.provider.gcs.bucket_name,
            blob_name=settings.provider.gcs.data.blob_name,
            max_size=settings.provider.gcs.data.max_size,
            cron_interval_seconds=settings.provider.gcs.data.cron_interval_seconds,  # How often 'updated' timestamp is checked
            cron_job_name="fetch_provider_data",
        )
        synced_blob.set_fetch_callback(parse_and_cache_provider_data)
        await synced_blob.initialize()  # starts async background task
        return GcsProviderBackend(synced_gcs_blob=synced_blob)
    except Exception as e:
        logger.error(f"Failed to initialize provider backend: {e}")
        return DefaultProviderBackend()

def parse_and_cache_provider_data(data: str) -> None:
    provider_data = json.loads(data)
    cache_provider_data(provider_data)

The callback (parse_and_cache_provider_data) is called automatically by SyncedGcsBlob whenever the blob is updated. The callback implementation will vary; typical it decodes JSON, sometimes converting it to Pydantic models, and caching in memory.

Extend Merino via Rust Extensions

  • Status: Proposed
  • Deciders: All Merino engineers
  • Date: 2025-07-28

Context and Problem Statement

As Merino continues to expand with an ever-growing user base, performance hot-spots and resource intensive code paths have emerged in the code base, which impose new challenges in service scalability and operational cost hikes.

As a common solution, Python extensions can be developed for performance critical modules, but they also bring their own challenges such as the familiarity with a low-level language (e.g. C/C++ or Rust), tooling, and the potential issues (e.g. memory safety).

In this ADR, we explore various options to develop Python extensions for Merino and aim to identify a reasonable approach for us extend Merino to meet the performance needs while maintaining the overall developer experience that we equally value for the project.

Note that:

Instead of re-writing performance critical parts as language-level extensions, we could also carve certain functionalities out and tackle them separately outside of Merino. For instance, a new service can be added to handle a computationally intensive task for Merino. Or a dedicated external storage system can be used to replace an in-memory dataset in Merino.

That approach is out of scope for this ADR as it normally requires a wider discussion on service architecture or system design changes. This ADR only focuses on extensions on the language level.

Decision Drivers

  1. The ability to meet the desirable performance requirements and to get fine-grained control over compute resources.
  2. Developer experience. Developing Merino extensions should not have negative impact on the overall developer experience of Merino.
  3. System safety. Performance boost should not be achieved at the cost of system safety regressions.

Considered Options

  • A. Extend Merino via Rust extensions through PyO3/Maturin ecosystem.
  • B. Extend Merino via C/C++ extensions.
  • C. Maintain status quo – build Merino in pure Python.

Decision Outcome

Chosen option:

  • A. "Extend Merino via Rust extensions through PyO3/Maturin ecosystem".

Positive Consequences

  • Rust is a system programming language suitable for developing performance critical software. PyO3/Maturin is a mature ecosystem for building Python extensions.
  • Rust has been widely adopted at Mozilla for both server and client side development. Its learning curve is relatively lower than other counterparts such as C/C++ for Python extension development.
  • Rust's strong memory safety guarantees are superior than other competitors.

Negative Consequence

  • Rust would be a requirement for Merino's extension development, which comes with its own learning curve.
  • Require the familiarity with PyO3/Maturin.

Mitigations

  • To minimize interruptions, Merino extensions will be developed as a separate Python package using a dedicated git repo. The extensions will be added to Merino as a PyI dependence. For Merino developers who do not work on extensions, their development experience will remain unchanged.
  • While the basic familiarity with PyO3/Maturin is still required for Merino extension developers, common development actions can be provided via uv and Makefile tasks. Package building and publishing will be automated in CI.
  • The DISCO team will host "Office Hours" regularly to help Merino developers with questions about Rust & extension development.

Pros and Cons of the Options

Option A: Extend Merino via Rust extensions Through PyO3/Maturin

This approach allows Merino developers to identify the performance critical code in Merino and re-implement it as Python extensions in Rust via PyO3 to boost system performance or resolve bottlenecks.

Pros

  • Using Python extensions is a common way to achieve higher performance and lower resource footprint in Python.
  • Rust has gained its popularity in building Python extensions lately. Many popular Python extensions, including the ones used by Merino, e.g. pydantic and orjson, are built in Rust via Pyo3.
  • Building Python extensions normally requires manual management of compute resources using a low-level language, hence extensions are prone to memory safety bugs. Rust is superior over its competitors w.r.t avoiding memory safety issues as it is a memory safe language.

Cons

  • Rust has a steep learning curve.
  • Many disruptive changes from build, test, and release to Merino's development processes if we were to introduce Rust and PyO3 to the Merino project, which could negatively affect Merino's developer experience especially for folks that do not work on extensions.

Option B: Extend Merino via C/C++ Extensions

C/C++ and Cython are the most popular languages for developing Python extension development. While being the most mature solution, it requires the use of C/C++ that is even more alien than Rust to most Merino developers.

Pros

  • The most mature ecosystem as it's the standard way to build Python extensions.
  • Best performance.

Cons

  • C/C++ has an equally steep learning curve as Rust.
  • C/C++ is memory unsafe and more likely to introduce safety issues to Merino.

Option C: Maintain Status Quo – Build Merino in pure Python

We could continue to build everything in Python for Merino. For performance critical code, we could either optimize it in pure Python or resort to third party packages if any.

Pros

  • Python is all we need for Merino development.
  • No changes required for build, package, and release processes.

Cons

  • Could be difficult to optimize things if bare-metal or fine-grained resource control is needed.
  • Of-the-shelf solutions are not always available specifically for business logic code paths.