什么是排序规则,为什么我的数据会损坏?——与肖恩·托马斯的PG Phridays
Postgres 在其大部分历史中依赖操作系统来处理文本排序。当 2018 年 glibc 2.28 发布时,进行了重大 Unicode 排序规则的 overhaul,所有在旧规则下构建的文本索引都变得无效……但这一切都是悄无声息的。没有警告,没有错误。只是错误的查询结果和遗漏的行。
Postgres 17 增加了一个内置的区域设置提供程序,完全消除了外部依赖:
```
initdb --locale-provider=builtin --locale=C.UTF-8
```
这一变化有助于在操作系统升级时保持排序的稳定性。glibc 仍然是 Postgres 18 的默认选项,因此在创建新集群时必须指定这一点。
对于已经在运行的集群:Postgres 13 及以上版本将在排序规则版本更改时记录警告。该警告是重建受影响索引的指示。
想了解更多详细信息,请查看本周 Shaun Thomas 的 PG Phriday 博客文章:[https://www.pgedge.com/blog/what-is-a-collation-and-why-is-my-data-corrupt](https://www.pgedge.com/blog/what-is-a-collation-and-why-is-my-data-corrupt)
查看原文
Postgres has relied on the OS to handle text sorting for most of its history. When glibc 2.28 shipped in 2018 with a major Unicode collation overhaul, every existing text index built under the old rules became invalid... but silently. No warnings, no errors. Just wrong query results and missed rows.<p>Postgres 17 added a builtin locale provider that removes the external dependency entirely:<p>initdb --locale-provider=builtin --locale=C.UTF-8<p>This change helps sorting to become stable across OS upgrades. glibc is still the default in Postgres 18, so this must be specified when creating a new cluster.<p>For clusters already running: Postgres 13+ will log a warning when a collation version changes. That warning is an instruction to rebuild affected indexes.<p>Get more details here in this week's PG Phriday blog post from Shaun Thomas: https://www.pgedge.com/blog/what-is-a-collation-and-why-is-my-data-corrupt