Merino Load (Locust) Tests

This documentation describes the load tests for Merino. This test framework uses IP2Location LITE data available from https://lite.ip2location.com

Overview

The tests in the tests/load directory spawn multiple HTTP clients that consume Merino's API, in order to simulate real-world load on the Merino infrastructure. These tests use the Locust framework and are triggered at the discretion of the Merino Engineering Team.

Local Execution

Note that if you make changes to the load test code, you must stop and remove the Docker containers and networks for changes to reflect. Do this by running make load-tests-clean.

Follow the steps bellow to execute the load tests locally:

Setup Environment

1. Configure Environment Variables

The following environment variables as well as Locust environment variables can be set in tests\load\docker-compose.yml. Make sure any required API key is added but then not checked into source control.

WARNING: if the WIKIPEDIA__ES_API_KEY is missing, the load tests will not execute.

Environment Variable	Node(s)	Description
LOAD_TESTS__LOGGING_LEVEL	master & worker	Level for the logger in the load tests as an int (`10` for `DEBUG`, `20` for `INFO` etc.)
MERINO_REMOTE_SETTINGS__SERVER	master & worker	Server URL of the Kinto instance containing suggestions
MERINO_REMOTE_SETTINGS__BUCKET	master & worker	Kinto bucket with the suggestions
MERINO_REMOTE_SETTINGS__COLLECTION	master & worker	Kinto collection with the suggestions
MERINO_PROVIDERS__TOP_PICKS__TOP_PICKS_FILE_PATH	master & worker	The minimum character limit set for long domain suggestion indexing
MERINO_PROVIDERS__TOP_PICKS__QUERY_CHAR_LIMIT	master & worker	The minimum character limit set for short domain suggestion indexing
MERINO_PROVIDERS__TOP_PICKS__FIREFOX_CHAR_LIMIT	master & worker	File path to the json file of domains
MERINO_PROVIDERS__WIKIPEDIA__ES_API_KEY	master & worker	The base64 key used to authenticate on the Elasticsearch cluster specified by es_cloud_id
MERINO_PROVIDERS__WIKIPEDIA__ES_URL	master & worker	The Cloud ID of the Elasticsearch cluster
MERINO_PROVIDERS__WIKIPEDIA__ES_INDEX	master & worker	The index identifier of Wikipedia in Elasticsearch

2. Host Locust via Docker

Execute the following from the repository root:


make load-tests

3. (Optional) Host Merino Locally

Use one of the following commands to host Merino locally. Execute the following from the repository root:

Option 1: Use the local development instance
```
make dev
```
Option 2: Use the profiler instance
```
make profile
```

Option 3: Use the Docker instance


make docker-build && docker run -p 8000:8000 app:build

Run Test Session

1. Start Load Test

In a browser navigate to http://localhost:8089/
Set up the load test parameters:
- Option 1: Select the MerinoSmokeLoadTestShape or MerinoAverageLoadTestShape
  - These options have pre-defined settings
- Option 2: Select the Default load test shape with the following recommended settings:
  - Number of users: 25
  - Spawn rate: 1
  - Host: 'https://stagepy.merino.nonprod.cloudops.mozgcp.net'
    - Set host to 'http://host.docker.internal:8000' to test against a local instance of Merino
  - Duration (Optional): 10m
Select "Start Swarming"

2. Stop Load Test

Select the 'Stop' button in the top right hand corner of the Locust UI, after the desired test duration has elapsed. If the 'Run time' is set in step 1, the load test will stop automatically.

3. Analyse Results

See Distributed GCP Execution (Manual Trigger) - Analyse Results
Only client-side measures, provided by Locust, are available when executing against a local instance of Merino.

Clean-up Environment

1. Remove Load Test Docker Containers

Execute the following from the repository root:


make load-tests-clean

Distributed GCP Execution - Manual Trigger

Follow the steps bellow to execute the distributed load tests on GCP with a manual trigger:

Setup Environment

1. Start a GCP Cloud Shell

The load tests can be executed from the contextual-services-test-eng cloud shell.

2. Configure the Bash Script

The setup_k8s.sh file, located in the tests\load directory, contains shell commands to create a GKE cluster, setup an existing GKE cluster or delete a GKE cluster
- Modify the script to include the MERINO_PROVIDERS__WIKIPEDIA__ES_API_KEY environment variables
- Execute the following from the root directory, to make the file executable:
```
chmod +x tests/load/setup_k8s.sh
```

3. Create the GCP Cluster

Execute the setup_k8s.sh file and select the create option, in order to initiate the process of creating a cluster, setting up the env variables and building the docker image. Choose smoke or average depending on the type of load test required.
```
./tests/load/setup_k8s.sh create [smoke|average]
```
- Smoke - The smoke load test verifies the system's performance under minimal load. The test is run for a short period, possibly in CD, to ensure the system is working correctly.
- Average - The average load test measures the system's performance under standard operational conditions. The test is meant to reflect an ordinary day in production.
The cluster creation process will take some time. It is considered complete, once an external IP is assigned to the locust_master node. Monitor the assignment via a watch loop:
```
kubectl get svc locust-master --watch
```
The number of workers is defaulted to 5, but can be modified with the kubectl scale command. Example (10 workers):
```
kubectl scale deployment/locust-worker --replicas=10
```
To apply new changes to an existing GCP Cluster, execute the setup_k8s.sh file and select the setup option.
- This option will consider the local commit history, creating new containers and deploying them (see Artifact Registry)

Run Test Session

1. Start Load Test

In a browser navigate to http://$EXTERNAL_IP:8089

This url can be generated via command


EXTERNAL_IP=$(kubectl get svc locust-master -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
echo http://$EXTERNAL_IP:8089

Select the MerinoSmokeLoadTestShape, this option has pre-defined settings and will last 5 minutes
Select "Start Swarming"

2. Stop Load Test

Select the 'Stop' button in the top right hand corner of the Locust UI, after the desired test duration has elapsed. If the 'Run time' is set in step 1, the load test will stop automatically.

3. Analyse Results

RPS

The request-per-second load target for Merino is 1500
Locust reports client-side RPS via the "merino_stats.csv" file and the UI (under the "Statistics" tab or the "Charts" tab)
Grafana reports the server-side RPS via the "HTTP requests per second per country" chart

HTTP Request Failures

The number of responses with errors (5xx response codes) should be 0
Locust reports Failures via the "merino_failures.csv" file and the UI (under the "Failures" tab or the "Charts" tab)
Grafana reports Failures via the "HTTP Response codes" chart and the "HTTP 5xx error rate" chart

Exceptions

The number of exceptions raised by the test framework should be 0
Locust reports Exceptions via the "merino_exceptions.csv" file and the UI (under the "Exceptions" tab)

Latency

The HTTP client-side response time (aka request duration) for 95 percent of users is required to be 200ms or less (p95 <= 200ms), excluding weather requests
Locust reports client-side latency via the "merino_stats.csv" file and the UI (under the "Statistics" tab or the "Charts" tab)
- Warning! A Locust worker with too many users will bottleneck RPS and inflate client-side latency measures. Locust reports worker CPU and memory usage metrics via the UI (under the "Workers" tab)
Grafana reports server-side latency via the "p95 latency" chart

Resource Consumption

To conserve costs, resource allocation must be kept to a minimum. It is expected that container, CPU and memory usage should trend consistently between load test runs.
Grafana reports metrics on resources via the "Container Count", "CPU usage time sum" and "Memory usage sum" charts

4. Report Results

Results should be recorded in the Merino Load Test Spreadsheet

Optionally, the Locust reports can be saved and linked in the spreadsheet:

Download the results via the Locust UI or via command:


kubectl cp <master-pod-name>:/home/locust/merino_stats.csv merino_stats.csv
kubectl cp <master-pod-name>:/home/locust/merino_exceptions.csv merino_exceptions.csv
kubectl cp <master-pod-name>:/home/locust/merino_failures.csv merino_failures.csv

The master-pod-name can be found at the top of the pod list:


kubectl get pods -o wide

Upload the files to the ConServ drive and record the links in the spreadsheet

Clean-up Environment

1. Delete the GCP Cluster

Execute the setup_k8s.sh file and select the delete option


./tests/load/setup_k8s.sh

Distributed GCP Execution - CI Trigger

The load tests are triggered in CI via Jenkins, which has a command overriding the load test Dockerfile entrypoint.

Follow the steps below to execute the distributed load tests on GCP with a CI trigger:

Run Test Session

1. Execute Load Test

To modify the load testing behavior, you must include a label in your Git commit. This must be the merge commit on the main branch, since only the most recent commit is checked for the label. The label format is: [load test: (abort|skip|warn)]. Take careful note of correct syntax and spacing within the label. There are three options for load tests: abort, skip, and warn:

The abort label will prevent a prod deployment if the load test fails
Ex. feat: Add feature ABC [load test: abort].
The skip label will bypass load testing entirely during deployment
Ex. feat: Add feature LMN [load test: skip].
The warn label will output a Slack warning if the load test fails but still allow for the production deployment
Ex. feat: Add feature XYZ [load test: warn].

If no label is included in the commit message, the load test will be executed with the warn action.

The commit tag signals load test instructions to Jenkins by modifying the Docker image tag. The Jenkins deployment workflow first deploys to stage and then runs load tests if requested. The Docker image tag passed to Jenkins appears as follows: ^(?P<environment>stage|prod)(?:-(?P<task>\w+)-(?P<action>abort|skip|warn))?-(?P<commit>[a-z0-9]+)$

2. Analyse Results

See Distributed GCP Execution (Manual Trigger) - Analyse Results

3. Report Results

Optionally, results can be recorded in the Merino Load Test Spreadsheet. It is recommended to do so if unusual behavior is observed during load test execution or if the load tests fail.
The Locust reports can be saved and linked in the spreadsheet. The results are persisted in the /data directory of the locust-master-0 pod in the locust-master k8s cluster in the GCP project of merino-nonprod. To access the Locust logs:
- Open a cloud shell in the Merino stage environment
- Authenticate by executing the following command:
```
  gcloud container clusters get-credentials merino-nonprod-v1 \
    --region us-west1 --project moz-fx-merino-nonprod-ee93
```
- Identify the log files needed in the Kubernetes pod by executing the following command, which lists the log files along with file creation timestamp when the test was performed. The {run-id} uniquely identifies each load test run:
```
  kubectl exec -n locust-merino locust-master-0 -- ls -al /data/
```
- Download the results via the Locust UI or via command:
```
kubectl -n locust-merino cp locust-master-0:/data/{run-id}-merino_stats.csv merino_stats.csv
kubectl -n locust-merino cp locust-master-0:/data/{run-id}-merino_exceptions.csv merino_exceptions.csv
kubectl -n locust-merino cp locust-master-0:/data/{run-id}-merino_failures.csv merino_failures.csv
```
- Upload the files to the ConServ drive and record the links in the spreadsheet

Calibration

Following the addition of new features, such as a Locust Task or Locust User, or environmental changes, such as node size or the upgrade of a major dependency like the python version image, it may be necessary to re-establish the recommended parameters of the performance test.

Parameter	Description
`WAIT TIME`	- Changing this cadence will increase or decrease the number of channel subscriptions and notifications sent by a MerinoUser. - The default is currently in use for the MerinoUser class.
`TASK WEIGHT`	- Changing this weight impacts the probability of a task being chosen for execution. - This value is hardcoded in the task decorators of the MerinoUser class.
`USERS_PER_WORKER`	- This value should be set to the maximum number of users a Locust worker can support given CPU and memory constraints. - This value is hardcoded in the LoadTestShape classes.
`WORKER_COUNT`	- This value is derived by dividing the total number of users needed for the performance test by the `USERS_PER_WORKER`. - This value is hardcoded in the LoadTestShape classes and the setup_k8s.sh script.

Locust documentation is available for [WAIT TIME][13] and [TASK WEIGHT][14]

Calibrating for USERS_PER_WORKER

This process is used to determine the number of users that a Locust worker can support.

Setup Environment

1. Start a GCP Cloud Shell

The load tests can be executed from the contextual-services-test-eng cloud shell. If executing a load test for the first time, the git merino-py repository will need to be cloned locally.

2. Configure the Bash Script

The setup_k8s.sh file, located in the tests\load directory, contains shell commands to create a GKE cluster, setup an existing GKE cluster or delete a GKE cluster
- Execute the following from the root directory, to make the file executable:
```
chmod +x tests/load/setup_k8s.sh
```

3. Create the GCP Cluster

In the setup_k8s.sh script, modify the WORKER_COUNT variable to equal 1
Execute the setup_k8s.sh file from the root directory and select the create option, in order to initiate the process of creating a cluster, setting up the env variables and building the docker image. Choose smoke or average depending on the type of load test required.
```
./tests/load/setup_k8s.sh create [smoke|average]
```
The cluster creation process will take some time. It is considered complete, once an external IP is assigned to the locust_master node. Monitor the assignment via a watch loop:
```
kubectl get svc locust-master --watch
```

Calibrate

Repeat steps 1 to 3, using a process of elimination, such as the bisection method, to determine the maximum USERS_PER_WORKER. The load tests are considered optimized when CPU and memory resources are maximally utilized. This step is meant to determine the maximum user count that a node can accommodate by observing CPU and memory usage while steadily increasing or decreasing the user count. You can monitor the CPU percentage in the Locust UI but also in the Kubernetes engine Workloads tab where both memory and CPU are visualized on charts.

1. Start Load Test

In a browser navigate to http://$EXTERNAL_IP:8089 This url can be generated via command


EXTERNAL_IP=$(kubectl get svc locust-master -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
echo http://$EXTERNAL_IP:8089

Set up the load test parameters:
- ShapeClass: Default
- UserClasses: MerinoUser
- Number of users: USERS_PER_WORKER (Consult the Merino_spreadsheet to determine a starting point)
- Ramp up: RAMP_UP (RAMP_UP = 5/USERS_PER_WORKER)
- Host: 'https://stagepy.merino.nonprod.cloudops.mozgcp.net'
- Duration (Optional): 600s
Select "Start Swarm"

2. Stop Load Test

Select the 'Stop' button in the top right hand corner of the Locust UI, after the desired test duration has elapsed. If the 'Run time' or 'Duration' is set in step 1, the load test will stop automatically.

3. Analyse Results

CPU and Memory Resource Graphs

CPU and Memory usage should be less than 90% of the available capacity
- CPU and Memory Resources can be observed in Google Cloud > Kubernetes Engine > Workloads

Log Errors or Warnings

Locust will emit errors or warnings if high CPU or memory usage occurs during the execution of a load test. The presence of these logs is a strong indication that the USERS_PER_WORKER count is too high

4. Report Results

See Distributed GCP Execution (Manual Trigger) - Analyse Results

5. Update Shape and Script Values

WORKER_COUNT = MAX_USERS/USERS_PER_WORKER
- If MAX_USERS is unknown, calibrate to determine WORKER_COUNT
Update the USERS_PER_WORKER and WORKER_COUNT values in the following files:
- \tests\load\locustfiles\smoke_load.py or \tests\load\locustfiles\average_load.py
- \tests\load\setup_k8s.sh

Clean-up Environment

See Distributed GCP Execution (Manual Trigger) - Clean-up Environment

Calibrating for WORKER_COUNT

This process is used to determine the number of Locust workers required in order to generate sufficient load for a test given a SHAPE_CLASS.

Setup Environment

See Distributed GCP Execution (Manual Trigger) - Setup Environment
Note that in the setup_k8s.sh the maximum number of nodes is set using the total-max-nodes google cloud option. It may need to be increased if the number of workers can't be supported by the cluster.

Calibrate

Repeat steps 1 to 4, using a process of elimination, such as the bisection method, to determine the maximum WORKER_COUNT. The tests are considered optimized when they generate the minimum load required to cause node scaling in the the Merino-py Stage environment. You can monitor the Merino-py pod counts on Grafana.

1. Update Shape and Script Values

Update the WORKER_COUNT values in the following files:
- \tests\load\locustfiles\smoke_load.py or \tests\load\locustfiles\average_load.py
- \tests\load\setup_k8s.sh
Using Git, commit the changes locally

2. Start Load Test

In a browser navigate to http://$EXTERNAL_IP:8089 This url can be generated via command


EXTERNAL_IP=$(kubectl get svc locust-master -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
echo http://$EXTERNAL_IP:8089

Set up the load test parameters:
- ShapeClass: SHAPE_CLASS
- Host: 'https://stagepy.merino.nonprod.cloudops.mozgcp.net'
Select "Start Swarm"

3. Stop Load Test

Select the 'Stop' button in the top right hand corner of the Locust UI, after the desired test duration has elapsed. If the 'Run time', 'Duration' or 'ShapeClass' are set in step 1, the load test will stop automatically.

4. Analyse Results

Stage Environment Pod Counts

The 'Merino-py Pod Count' should demonstrate scaling during the execution of the load test
- The pod counts can be observed in Grafana

CPU and Memory Resources

CPU and Memory usage should be less than 90% of the available capacity in the cluster
- CPU and Memory Resources can be observed in Google Cloud > Kubernetes Engine > Workloads

5. Report Results

See Distributed GCP Execution (Manual Trigger) - Report Results

Clean-up Environment

See Distributed GCP Execution (Manual Trigger) - Clean-up Environment

Maintenance

The load test maintenance schedule cadence is once a quarter and should include updating the following:

poetry version and python dependencies
- pyproject.toml
- poetry.lock
Docker artifacts
- Dockerfile
- docker-compose.yml
Distributed GCP execution scripts and Kubernetes configurations
Documentation
- load test docs

Merino Book