What to do with test failures in CI?

  1. Investigate the cause of the test failure

    • For functional tests (unit, integration or contract), logs can be found on CircleCI
    • For performance tests (load), insights can be found on Grafana and in the Locust logs. To access the Locust logs see the Distributed GCP Exection - CI Trigger section of the load test documentation.
  2. Fix or mitigate the failure

    • If a fix can be identified in a relatively short time, then submit a fix
    • If the failure is caused by a flaky or intermittent functional test and the risk to the end-user experience is low, then the test can be "skipped", using the pytestxfail decorator during continued investigation. Example:
      @pytest.mark.xfail(reason="Test Flake Detected (ref: DISCO-####)")
      
  3. Re-Deploy

    • A fix or mitigation will most likely require a PR merge to the main branch that will automatically trigger the deployment process. If this is not possible, a re-deployment can be initiated manually by triggering the CI pipeline in CircleCI.