Data collection
This page should list all metrics and logs that Merino is expected to emit in production, including what should be done about them, if anything.
Logs
This list does not include any DEBUG
level events, since those are not logged
by default in production. The level and type of the log is listed.
Any log containing sensitive data must include a boolean field sensitive
that is set to true
to exempt it from flowing to the generally accessible
log inspection interfaces.
Merino APIs
-
INFO web.suggest.request
- A suggestion request is being processed. This event will include fields for all relevant details of the request. Fields:sensitive
- Always set to true to ensure proper routing.query
- If query logging is enabled, the text the user typed. Otherwise an empty string.country
- The country the request came from.region
- The first country subdivision the request came from.city
- The city the request came from.dma
- A US-only location description that is larger than city and smaller than states, but does not align to political borders.agent
- The original user agent.os_family
- Parsed from the user agent. One of "windows", "macos", "linux", "ios", "android", "chrome os", "blackberry", or "other".form_factor
- Parsed from the user agent. One of "desktop", "phone", "tablet", or "other"browser
- The browser and possibly version detected. Either "Firefox(XX)" where XX is the version, or "Other".rid
- The request ID.- WIP
accepts_english
- True if the user's Accept-Language header includes an English locale, false otherwise. requested_providers
- A comma separated list of providers requested via the query string, or an empty string if none were requested (in which case the default values would be used).client_variants
- Any client variants sent to Merino in the query string.session_id
- A UUID generated by the client for each search session.sequence_no
- A client-side event counter (0-based) that records the query sequence within each search session.
-
INFO request.summary
- The application request summary that follows the MozLog convention. This log is recorded for all incoming HTTP requests except for the suggest API endpoint.
ERROR dockerflow.error_endpoint
- The__error__
endpoint of the server was called. This is used to test our error reporting system. It is not a cause for concern, unless we receive a large amount of these records, in which case some outside service is likely malicious or misconfigured.
Merino Middleware Logs
Geolocation
WARNING merino.middleware.geolocation
- There was an error with a geolocation lookup.
Merino Cron Tasks
WARNING merino.cron
- There was an error while executing a cron task.
Merino Feature Flags
ERROR merino.featureflags
- There was an error while attempting to assign a feature flag for a suggest API request.
Curated Recommendations
ERROR merino.curated_recommendations.corpus_backends.corpus_api_backend
- Failed to get timezone for scheduled surface.WARNING merino.curated_recommendations.corpus_backends.corpus_api_backend
- Retrying CorpusApiBackend after an http client exception was raised.ERROR GcsEngagement failed to update cache: {e}
- unexpected exception when updating engagement.ERROR Curated recommendations engagement size {blob.size} > {self.max_size}
- Max engagement blob size is exceeded. The backend will gracefully fall back to cached data or 0's.INFO Curated recommendations engagement unchanged since {self.last_updated}.
- The engagement blob was not updated since the last check.last_updated
is expected to be between 0 and 30 minutes.
Metrics
A note on timers: Statsd timers are measured in milliseconds, and are reported as integers (at least in Cadence). Milliseconds are often not precise enough for the tasks we want to measure in Merino. Instead, we use generic histograms to record microsecond times. Metrics recorded in this way should have
-us
appended to their name, to mark the units used (since we shouldn't put the proper unit μs in metric names).
-
merino.providers.initialize
- A timer to measure the overall initialization duration (in ms) for all providers. -
merino.providers.initialize.<provider>
- A timer to measure the initialization duration (in ms) for the given<provider>
.Example:
merino.providers.initialize.adm
-
merino.<http_method>.<url_path>.status_codes.<status_code>
- A counter to measure the status codes of an HTTP method for the<url_path>
.Example:
merino.get.api.v1.suggest.status_codes.200
-
merino.<http_method>.<url_path>.timing
- A timer to measure the duration (in ms) of an HTTP method for a URL path.Example:
merino.get.api.v1.suggest.timing
-
merino.<provider_module>.query
- A timer to measure the query duration (in ms) of a certain suggestion provider.Example:
merino.providers.suggest.adm.query
-
merino.<provider_module>.query.timeout
- A counter to measure the query timeouts of a certain suggestion provider.Example:
merino.providers.suggest.wikipedia.query.timeout
-
merino.suggestions-per.request
- A histogram metric to get the distribution of suggestions per request. -
merino.suggestions-per.provider.<provider_module>
- A histogram metric to get the distribution of suggestions returned per provider (per request).Example:
merino.suggestions-per.provider.wikipedia
AccuWeather
The weather provider records additional metrics.
accuweather.upstream.request.<request_type>.get
- A counter to measure the number of times an upstream request to Accuweather was made.accuweather.request.location.not_provided
- A counter to measure the number of times a query was send without a location being provided, and therefore unable to process a weather request. Sampled at 75%.accuweather.request.location.dist_calculated.success
- A counter to measure the number of successful lat long distance calculations used to find location.accuweather.request.location.dist_calculated.fail
- A counter to measure the number of failed lat long distance calculations used to find location.merino.providers.accuweather.query.cache.fetch
- A timer to measure the duration (in ms) of looking up a weather report in the cache. Sampled at 75%.merino.providers.accuweather.query.cache.fetch.miss.locations
- A counter to measure the number of times weather location was not in the cache. Sampled at 75%.merino.providers.accuweather.query.cache.fetch.miss.currentconditions
- A counter to measure the number of times a current conditions was not in the cache. Sampled at 75%.merino.providers.accuweather.query.cache.fetch.miss.forecasts
- A counter to measure the number of times a forecast for a location was not in the cache. Sampled at 75%.merino.providers.accuweather.query.cache.fetch.hit.{locations | currentconditions | forecasts}
- A counter to measure the number of times a requested value like a location or forecast is in the cache. We don't count TTL hits explicitly, just misses. Sampled at 75%.merino.providers.accuweather.query.backend.get
- A timer to measure the duration (in ms) of a request for a weather report from the backend. This metric isn't recorded for cache hits. Sampled at 75%.merino.providers.accuweather.query.cache.store
- A timer to measure the duration (in ms) of saving a weather report from the backend to the cache. This metric isn't recorded for cache hits. Sampled at 75%.merino.providers.accuweather.query.cache.error
- A counter to measure the number of times the cache store returned an error when fetching or storing a weather report. This should be 0 in normal operation. In case of an error, the logs will include aWARNING
with the full error message.merino.providers.accuweather.skip_cities_mapping.total.size
- A counter to measure the total number of occurrences cities were skipped due to no locationmerino.providers.accuweather.skip_cities_mapping.unique.size
- A counter to measure the number of unique cities that are skipped due to no location
Curated Recommendations
The following additional metrics are recorded when curated recommendations are requested.
corpus_api.request.timing
- A timer to measure the duration (in ms) of looking up a list of scheduled corpus items.corpus_api.request.status_codes.{res.status_code}
- A counter to measure the status codes of an HTTP request to the curated-corpus-api.corpus_api.request.graphql_error
- A counter to measure the number of GraphQL errors from the curated-corpus-api.recommendation.engagement.update.timing
- A timer to measure the duration (in ms) of updating the engagement data from GCS.recommendation.engagement.size
- A gauge to track the size of the engagement blob on GCS.recommendation.engagement.count
- A gauge to measure the total number of engagement records.recommendation.engagement.{country}.count
- A gauge to measure the number of scheduled corpus items with engagement data per country.recommendation.engagement.{country}.clicks
- A gauge to measure the number of clicks per country in our GCS engagement blob.recommendation.engagement.{country}.impressions
- A gauge to measure the number of impressions per country in our GCS engagement blob.recommendation.engagement.last_updated
- A gauge for the staleness (in seconds) of the engagement data, measured between when the data was updated in GCS and the current time.recommendation.prior.update.timing
- A timer to measure the duration (in ms) of updating the prior data from GCS.recommendation.prior.size
- A gauge to track the size of the Thompson sampling priors blob on GCS.recommendation.prior.last_updated
- A gauge for the staleness (in seconds) of the prior data, measured between when the data was updated in GCS and the current time.
Manifest
When requesting a manifest file, we record the following metrics.
manifest.request.get
- A counter for how many requests against the/manifest
endpoint where made.manifest.request.timing
- A timer for how long it took the endpoint to fulfill the request.manifest.gcs.fetch_time
- A timer for how long it took to download the latest manifest file from the Google Cloud bucket.manifest.request.no_manifest
- A counter to measure how many times we didn't find the latest manifest file.manifest.request.error
- A counter to measure how many times we could not provide a valid JSON manifest file.