Context
They have recs for our db
- Kevin: head of customer success
- Haydn: engineer support
We currently have logical backups, but our db is too big: we should probably switch over to point in time backups.
- They will send over some info about PIT backups
- Logical are extracted data via a basic sql query. This is actually really taxing on the DB. It extracts ALL of our data from the db every night at midnight.
- PITR - copies the disk, and the write ahead log (WAL) so we can roll back transactions.
- $100/month for 7-day restore.
- TODO: turn on PITR, will turn off logical backups
Metrics to track on Grafana
- Originally for grafana they showed 6 default dashboards. They recently added more dashboards. Especially saw that our IOPS was a big deal red flag.
What happened to our db when it went down?
- we couldn’t get the db up fast enough in 90s bc of all our iops
- He saw our user_contract_metrics had the gin index. When postgres updates a row, it inserts a new row and marks the old one as bad. Autovacuum couldn’t ever get a lock on that table to drop the old rows. Just got more and more bloated.
- We can query postgres to see if autovacuum is successfully running on a table.
- Also one of the features they’re going to release during launch week is something to help with this. They’ll have lots of extra postgresql commands to inspect the db usage.
What’s left to do to switch over from firebase?
- Write our own locking mechanism
- Kevin saw a lot of locks on the user_contract_metrics from the incoming writes table.
- Kevin: We’re already seeing lock contention as we’re doing it. Suggests doing double inserts. Keep all infrastructure as it is. Point the insert to another table as well to try out our new strategy so that we can compare the results of our method to the old method to make sure our new strategy is working.