The system continues to perform as normal, so we're transitioning this issue to resolved.
We're now reviewing the incident to learn more about what went wrong and how we can avoid similar problems in the future.
So far, we've determined that a new disk partition reached capacity around 12:30 am EDT this morning which caused the system to become unavailable. We managed to make more disk space available and were back online by 6:15 am EDT.
We have multiple redundancies in place to monitor disk space that all failed in this situation. Both our DBA provider and secure hosting provider should have systems to notify us when disk space is close to capacity, but it appears that they either didn't enable monitoring on this particular disk partition or that there's an issue with their system that prevented notifications from being sent.
We also have a database redundancy layer in place that should automatically transition to a different database in the event of a failure in a primary database. In this case, the system didn't recognize the failure of the primary database, so the failover was never triggered.
To prevent this issue from happening again, we're working with our DBA and hosting provider to confirm that they have monitors in place for this server partition. We're also trying to determine why the system didn't failover to backup database servers and will be making adjustments to ensure that failover is successfully triggered in the event of a similar issue in the future.
We want to apologize to our merchants for the inconvenience this issue caused. We know that you and your customers rely on Cheddar being available for essential billing and subscription activity. Rest assured that our team will be taking all steps possible to make sure that Cheddar's uptime isn't affected by disk capacity again.
Please get in touch with our support team at support.getcheddar.com if you have any questions.
Posted Sep 02, 2019 - 12:12 EDT
Monitoring
A fix for the database capacity issue has been implemented and the website, API, and dashboard are back online. We're continuing to monitor the system at this time.
Posted Sep 02, 2019 - 06:31 EDT
Identified
We've identified the issue as an unexpected capacity problem in one of our databases. We've engaged our database management provider and are currently working toward a solution.
Posted Sep 02, 2019 - 06:11 EDT
Investigating
Some customers have reported that they are unable to log in to Cheddar or successfully send API calls. When attempting to access Cheddar via the dashboard or API they get a "Service Unavailable" error. We're currently investigating and our team is working to resolve this issue as quickly as possible.
Posted Sep 02, 2019 - 02:10 EDT
This incident affected: API, Dashboard, and Website.