Documentation for purge_ttl.py
Summary
The purge_ttl.py script is a utility for purging expired Time-To-Live (TTL) records from a Google Spanner database. This script is designed to manage and clean up old data from specific database tables, ensuring efficient use of storage and maintaining database performance. It offers flexible options for targeting specific collections, user ID prefixes, and modes of operation, with optional dry-run functionality for testing changes without affecting the database.
Status
- Running as Kubernetes Workload cron job in
sync-prod. - Runs at 10 minutes past every 2nd hour.
- Runs per-collection and is configured for each of the following:
- batches
- clients
- crypto
- forms
- meta
- tabs
- See YAML configuration when editing each job.
- See Kubernetes Engine Workload Panel in sync-prod for more information.
Specifics
- Database: Google Spanner.
- Tables:
batches: Contains batch entries, with cascading deletes for childbatch_bsos.bsos: Stores Sync Basic Storage Objects (BSO).
- Supported Modes:
batches: Purges expired entries in thebatchestable.bsos: Purges expired entries in thebsostable.both: Performs purges on both tables.
- Expiry Modes:
now: Purges entries withexpiry < CURRENT_TIMESTAMP().midnight: Purges entries withexpiry < TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY, "UTC").
The script uses parameters like collection IDs, user ID prefixes, and auto-splitting for fine-grained control over the purging process. It tracks execution duration and results using StatsD metrics for performance monitoring.
Notes
- Ensure proper access to the Spanner instance and database through IAM permissions.
- Use the
--dryrunoption to verify query logic before actual purging. - Consider setting up automated monitoring for long-running operations or performance issues.
Instructions for Running the Script
Prerequisites
- Python Environment: Ensure Python 3.7+ is installed.
- Google Cloud SDK: Install and authenticate with Google Cloud.
- Dependencies: Install required Python packages:
pip install google-cloud-spanner statsd - Environment Variables:
INSTANCE_ID: Spanner instance ID (default: spanner-test).DATABASE_ID: Database ID (default: sync_schema3).SYNC_SYNCSTORAGE__DATABASE_URL: Database connection URL (e.g., spanner://instance/database).
Usage
Run the script using the following command:
python purge_ttl.py [options]
Options
| Option | Description | Default |
|---|---|---|
-i, --instance_id | Spanner instance ID. | spanner-test |
-d, --database_id | Spanner database ID. | sync_schema3 |
-u, --sync_database_url | Spanner DSN connection URL (overrides instance_id and database_id). | SYNC_SYNCSTORAGE__DATABASE_URL |
--collection_ids, --ids | Comma-separated list of collection IDs to purge. | [] |
--uid_prefixes, --prefix | Comma-separated list of UID prefixes to filter purges. | [] |
--auto_split | Automatically generate UID prefixes for the specified number of hexadecimal digits. | None |
--mode | Purge mode: batches, bsos, or both. | both |
--expiry_mode | Expiry mode: now (current timestamp) or midnight (start of current day, UTC). | midnight |
--dryrun | Perform a dry run without making changes to the database. | False |
Examples
Example 1: Basic Purge
Purge expired entries from both batches and bsos tables using default configurations:
python purge_ttl.py
Example 2: Specify Instance and Database
Purge expired entries in a specific instance and database:
python purge_ttl.py -i my-instance -d my-database
Example 3: Filter by Collection IDs
Purge only for specific collection IDs:
python purge_ttl.py --collection_ids [123,456,789]
Example 4: Filter by UID Prefixes
Limit purging to specific UID prefixes:
python purge_ttl.py --uid_prefixes [abc,def,123]
Example 5: Auto-Generated Prefixes
Generate prefixes automatically for a 2-digit hexadecimal range:
python purge_ttl.py --auto_split 2
Example 6: Perform a Dry Run
Test the script without making actual changes:
python purge_ttl.py --dryrun
Detailed Usage
-
Connecting to Spanner:
- The script connects to Google Spanner using either explicitly provided
instance_idanddatabase_idor a DSN URL.
- The script connects to Google Spanner using either explicitly provided
-
Purge Modes:
batches: Deletes expired entries from thebatchestable, which cascades deletions forbatch_bsosvia Spanner'sON DELETE CASCADE.bsos: Deletes expired Binary Sync Objects (BSOs).both: Executes purges on bothbatchesandbsos.
-
Expiry Conditions:
now: Purge entries that have already expired at the current timestamp.midnight: Purge entries that expired at or before the start of the current UTC day.
-
Query Customization:
- Filters can be added based on collection IDs or UID prefixes.
- Queries are dynamically constructed using helper functions (
add_conditions,get_expiry_condition).
-
Performance Monitoring:
- Metrics for execution duration and rows affected are logged and sent to StatsD for monitoring.
-
Error Handling:
- The script validates input parameters, raises exceptions for invalid configurations, and logs details for troubleshooting.
-
Dry Run:
- Enabling the
--dryrunflag ensures that the queries are constructed and logged without executing them on the database.
- Enabling the