Data collection

This page should list all metrics and logs that Merino is expected to emit in production, including what should be done about them, if anything.

Logs

This list does not include any DEBUG level events, since those are not logged by default in production. The level and type of the log is listed.

Any log containing sensitive data must include a boolean field sensitive that is set to true to exempt it from flowing to the generally accessible log inspection interfaces.

Merino APIs

  • INFO web.suggest.request - A suggestion request is being processed. This event will include fields for all relevant details of the request. Fields:

    • sensitive - Always set to true to ensure proper routing.
    • query - If query logging is enabled, the text the user typed. Otherwise an empty string.
    • country - The country the request came from.
    • region - The first country subdivision the request came from.
    • city - The city the request came from.
    • dma - A US-only location description that is larger than city and smaller than states, but does not align to political borders.
    • agent - The original user agent.
    • os_family - Parsed from the user agent. One of "windows", "macos", "linux", "ios", "android", "chrome os", "blackberry", or "other".
    • form_factor - Parsed from the user agent. One of "desktop", "phone", "tablet", or "other"
    • browser - The browser and possibly version detected. Either "Firefox(XX)" where XX is the version, or "Other".
    • rid - The request ID.
    • WIP accepts_english - True if the user's Accept-Language header includes an English locale, false otherwise.
    • requested_providers - A comma separated list of providers requested via the query string, or an empty string if none were requested (in which case the default values would be used).
    • client_variants - Any client variants sent to Merino in the query string.
    • session_id - A UUID generated by the client for each search session.
    • sequence_no - A client-side event counter (0-based) that records the query sequence within each search session.
  • INFO request.summary - The application request summary that follows the MozLog convention. This log is recorded for all incoming HTTP requests except for the suggest API endpoint.

  • ERROR dockerflow.error_endpoint - The __error__ endpoint of the server was called. This is used to test our error reporting system. It is not a cause for concern, unless we receive a large amount of these records, in which case some outside service is likely malicious or misconfigured.

Merino Middleware Logs

Geolocation

  • WARNING merino.middleware.geolocation - There was an error with a geolocation lookup.

Merino Cron Tasks

  • WARNING merino.cron - There was an error while executing a cron task.

Merino Feature Flags

  • ERROR merino.featureflags - There was an error while attempting to assign a feature flag for a suggest API request.

Curated Recommendations

  • ERROR merino.curated_recommendations.corpus_backends.corpus_api_backend - Failed to get timezone for scheduled surface.
  • WARNING merino.curated_recommendations.corpus_backends.corpus_api_backend - Retrying CorpusApiBackend after an http client exception was raised.
  • ERROR GcsEngagement failed to update cache: {e} - unexpected exception when updating engagement.
  • ERROR Curated recommendations engagement size {blob.size} > {self.max_size} - Max engagement blob size is exceeded. The backend will gracefully fall back to cached data or 0's.
  • INFO Curated recommendations engagement unchanged since {self.last_updated}. - The engagement blob was not updated since the last check. last_updated is expected to be between 0 and 30 minutes.

Metrics

A note on timers: Statsd timers are measured in milliseconds, and are reported as integers (at least in Cadence). Milliseconds are often not precise enough for the tasks we want to measure in Merino. Instead, we use generic histograms to record microsecond times. Metrics recorded in this way should have -us appended to their name, to mark the units used (since we shouldn't put the proper unit μs in metric names).

  • merino.providers.initialize - A timer to measure the overall initialization duration (in ms) for all providers.

  • merino.providers.initialize.<provider> - A timer to measure the initialization duration (in ms) for the given <provider>.

    Example: merino.providers.initialize.adm

  • merino.<http_method>.<url_path>.status_codes.<status_code> - A counter to measure the status codes of an HTTP method for the <url_path>.

    Example: merino.get.api.v1.suggest.status_codes.200

  • merino.<http_method>.<url_path>.timing - A timer to measure the duration (in ms) of an HTTP method for a URL path.

    Example: merino.get.api.v1.suggest.timing

  • merino.<provider_module>.query - A timer to measure the query duration (in ms) of a certain suggestion provider.

    Example: merino.providers.adm.query

  • merino.<provider_module>.query.timeout - A counter to measure the query timeouts of a certain suggestion provider.

    Example: merino.providers.wikipedia.query.timeout

  • merino.suggestions-per.request - A histogram metric to get the distribution of suggestions per request.

  • merino.suggestions-per.provider.<provider_module> - A histogram metric to get the distribution of suggestions returned per provider (per request).

    Example: merino.suggestions-per.provider.wikipedia

AccuWeather

The weather provider records additional metrics.

  • accuweather.upstream.request.<request_type>.get - A counter to measure the number of times an upstream request to Accuweather was made.
  • accuweather.request.location.not_provided - A counter to measure the number of times a query was send without a location being provided, and therefore unable to process a weather request. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch - A timer to measure the duration (in ms) of looking up a weather report in the cache. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch.miss.locations - A counter to measure the number of times weather location was not in the cache. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch.miss.currentconditions - A counter to measure the number of times a current conditions was not in the cache. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch.miss.forecasts - A counter to measure the number of times a forecast for a location was not in the cache. Sampled at 75%.
  • merino.providers.accuweather.query.cache.fetch.hit.{locations | currentconditions | forecasts} - A counter to measure the number of times a requested value like a location or forecast is in the cache. We don't count TTL hits explicitly, just misses. Sampled at 75%.
  • merino.providers.accuweather.query.backend.get - A timer to measure the duration (in ms) of a request for a weather report from the backend. This metric isn't recorded for cache hits. Sampled at 75%.
  • merino.providers.accuweather.query.cache.store - A timer to measure the duration (in ms) of saving a weather report from the backend to the cache. This metric isn't recorded for cache hits. Sampled at 75%.
  • merino.providers.accuweather.query.cache.error - A counter to measure the number of times the cache store returned an error when fetching or storing a weather report. This should be 0 in normal operation. In case of an error, the logs will include a WARNING with the full error message.

Curated Recommendations

The following additional metrics are recorded when curated recommendations are requested.

  • corpus_api.request.timing - A timer to measure the duration (in ms) of looking up a list of scheduled corpus items.
  • corpus_api.request.status_codes.{res.status_code} - A counter to measure the status codes of an HTTP request to the curated-corpus-api.
  • corpus_api.request.graphql_error - A counter to measure the number of GraphQL errors from the curated-corpus-api.
  • recommendation.engagement.update.timing - A timer to measure the duration (in ms) of updating the engagement data from GCS.
  • recommendation.engagement.size - A gauge to track the size of the engagement blob on GCS.
  • recommendation.engagement.count - A gauge to measure the total number of engagement records.
  • recommendation.engagement.{country}.count - A gauge to measure the number of scheduled corpus items with engagement data per country.
  • recommendation.engagement.{country}.clicks - A gauge to measure the number of clicks per country in our GCS engagement blob.
  • recommendation.engagement.{country}.impressions - A gauge to measure the number of impressions per country in our GCS engagement blob.
  • recommendation.engagement.last_updated - A gauge for the staleness (in seconds) of the engagement data, measured between when the data was updated in GCS and the current time.
  • recommendation.prior.update.timing - A timer to measure the duration (in ms) of updating the prior data from GCS.
  • recommendation.prior.size - A gauge to track the size of the Thompson sampling priors blob on GCS.
  • recommendation.prior.last_updated - A gauge for the staleness (in seconds) of the prior data, measured between when the data was updated in GCS and the current time.