问HN:Supabase PG升级导致生产数据库丢失,PITR备份失败
我们目前在 Supabase 上遇到了全面的生产中断和严重的数据丢失,无法从支持团队获得回应。希望他们的团队有人能看到这个信息。
故障时间线:
1. 我们对实例进行了 Postgres 版本升级。
2. 由于未知原因,此次升级触发了磁盘大小的意外降级。
3. 我们执行了标准的 REINDEX:
REINDEX DATABASE postgres;
由于第二步中的错误严重限制了磁盘空间,磁盘完全耗尽了空间。
4. 这个空间不足事件导致整个数据库被清空。
5. 我们立即尝试进行时间点恢复(PITR),但恢复过程在 Supabase 端失败。
我们的项目现在完全无法访问。
我们有一个开放的关键支持工单(#SU-342355),在 GitHub 讨论区发布过,并在 X 上联系过,但没有收到任何人工回应。
如果 @kiwicopple、@antwilson 或任何 Supabase 基础设施工程师看到这个消息:请不要删除底层的 AWS EBS 卷。我们需要一名工程师手动挂载该卷并提取 WAL 或原始数据页,以免数据块被覆盖。
如果社区有任何建议以进一步升级此问题,我们将不胜感激。
查看原文
We are currently experiencing a total production outage and severe data loss on Supabase, and we cannot get a response from support. We are hoping someone from their team sees this here.<p>The Timeline of Failure:<p>1. We performed a Postgres version upgrade on our instance.
2. For an unknown reason, this upgrade triggered an unexpected downgrade of our disk size.
3. We ran a standard REINDEX:
REINDEX DATABASE postgres;
Because the disk space was severely limited by the bug in step 2, the disk ran out of space entirely.
4. This out-of-space event caused the entire database to wipe.
5. We immediately attempted a Point-in-Time Recovery (PITR), but the restore process is failing on Supabase's end.<p>Our project is now completely inaccessible.<p>We have an open critical support ticket (#SU-342355), posted on GitHub discussions, and reached out on X, but have received zero response from a human.<p>If @kiwicopple, @antwilson, or any Supabase infra engineers are reading this: please do not delete the underlying AWS EBS volume. We need an engineer to manually mount the volume and extract the WAL or raw data pages before the blocks are overwritten.<p>Any advice from the community on escalating this further is appreciated.