DynamoDB Message Table Rotation (legacy)
Note: this section does not apply to our BigTable database backend. All documentation below is deprecated and left for historical purposes.
As of version 1.45.0, message table rotation can be disabled. This is
because DynamoDB now provides automatic entry expiration. This is
controlled in our data by the "expiry" field. (Note, field
expiration is only available in full DynamoDB, and is not replicated
with the mock DynamoDB API provided for development.) The following
feature is disabled with the no_table_rotation
flag set in the
autopush_shared.ini
configuration file.
If table rotation is disabled, the last message table used will become 'frozen' and will be used for all future messages. While this may not be aesthetically pleasing, it's more efficient than copying data to a new, generic table. If it's preferred, service can be shut down, previous tables dropped, the current table renamed, and service brought up again.
Message Table Rotation information (legacy)
Note: this section does not apply to our BigTable database backend. All documentation below is deprecated and left for historical purposes.
To avoid costly table scans, autopush used a rotating message and router table. Clients that hadn't connected in 30-60 days would have their router and message table entries dropped and needed to re-register. Tables were post-fixed with the year/month they were meant for, i.e. : messages_2015_02 Tables must have been created and had their read/write units properly allocated by a separate process in advance of the month switch-over as autopush nodes would assume the tables already existed. Scripts [were provided(https://github.com/mozilla-services/autopush/blob/master/maintenance.py) that could be run weekly to ensure all necessary tables were present, and tables old enough were dropped.
Within a few days of the new month, the load on the prior months table would fall as clients transition to the new table. The read/write units on the prior month may then be lowered.
DynamoDB Rotating Message Table Interaction Rules (legacy)
Due to the complexity of having notifications spread across two tables, several rules were used to avoid losing messages during the month transition.
The logic for connection nodes is more complex, since only the connection node knows when the client connects, and how many messages it has read through.
When table rotation was allowed, the router table used the curmonth
field to indicate the last month the client had read notifications
through. This was independent of the last_connect since it was possible
for a client to connect, fail to read its notifications, then reconnect.
This field was updated for a new month when the client connected after
it had ack'd all the notifications out of the last month.
To avoid issues with time synchronization, the node the client is connected to acts as the source of truth for when the month has flipped over. Clients are only moved to the new table on connect, and only after reading/acking all the notifications for the prior month.
Rules for Endpoints
-
Check the router table to see the current_month the client is on.
-
Read the chan list entry from the appropriate month message table to see if its a valid channel.
If its valid, move to step 3.
-
Store the notification in the current months table if valid. (Note that this step does not copy the blank entry of valid channels)
Rules for Connection Nodes
After Identification:
-
Check to see if the current_month matches the current month, if it does then proceed normally using the current months message table.
If the connection node month does not match stored current_month in the clients router table entry, proceed to step 2.
-
Read notifications from prior month and send to client.
Once all ACKs are received for all the notifications for that month proceed to step 3.
-
Copy the blank message entry of valid channels to the new month message table.
-
Update the router table for the current_month.
During switchover, only after the router table update are new commands from the client accepted.
Handling of Edge Cases:
- Connection node gets more notifications during step 3, enough to buffer, such that the endpoint starts storing them in the previous current_month. In this case the connection node will check the old table, then the new table to ensure it doesn't lose message during the switch.
- Connection node dies, or client disconnects during step 3/4. Not a problem as the reconnect will pick it up at the right spot.