我使用Apache Pulsar解决PostgreSQL多租户问题的经验

1作者: rudderdev5 天前原帖
背景:在RudderStack,我成功地使用Postgres处理事件流用例,扩展到每秒10万个事件(注意:选择Postgres而非Kafka有其合理原因)。尽管如此,我们仍在继续探索优化的机会。因此,我和我的团队开始尝试使用Pulsar(仅针对我们系统中的数据摄取部分)。 我们对比了使用Apache Pulsar进行数据摄取与为每个客户设置专用Postgres数据库的效果(一个客户可以拥有一个或多个Postgres数据库,所有数据库都是主节点,无法共享数据,每次扩展操作时都需要手动迁移数据)。 现在使用Pulsar已经有一段时间了,我觉得可以分享一些关于用Pulsar替代基于Postgres的流处理解决方案的经验,希望能从大家的意见和见解中学习。 ---- 我喜欢Pulsar的地方: 1. 租户隔离非常可靠,自动负载均衡效果良好:到目前为止,我们没有遇到过一个活跃的租户影响到其他租户的情况。我们使用同一个集群来处理所有客户的数据(按地区划分,一个在美国,一个在欧洲)。多租户功能结合集群自动扩展使我们能够控制成本。 2. 不再有单点故障(数据在多个bookie中复制):数据现在至少在两个bookie中复制。这使我们在数据丢失方面变得更加可靠。 3. 维护更简单:不再有单一主节点的限制,这简化了很多基础设施的维护(想象一下将一个Postgres pod迁移到不同的EC2节点,这可能导致停机)。 ---- Pulsar的痛点: 1. StreamNative的许可费用相当高。 2. 随着多可用区和复制,网络成本显著增加。 3. 学习曲线比预期陡峭,调试也更复杂。 ---- 我很想听听你们对Postgres/Pulsar的经验,以及对这种方法/挑战的任何意见或见解。希望这个对话能帮助社区中的其他人,欢迎随时问我任何问题。
查看原文
Background: At RudderStack, I had been successfully using Postgres for the event streaming use case, scaled to 100k events&#x2F;sec (note: there were good reasons to choose Postgres over Kafka). Nevertheless, we continue to further explore opportunities to optimize. So I and my team started experimenting with Pulsar (only for the parts of our system - data ingestion specifically). We experimented with Apache Pulsar for ingesting data vs having dedicated Postgres databases per customer (one customer can have 1+ Postgres databases, they would be all master nodes with no ability to share data which would need to be manually migrated each time a scaling operation happens).<p>Now that it&#x27;s been quite some time using Pulsar, I feel that I can share some notes about my experience in replacing postgres-based streaming solutions with Pulsar and hopefully learn from your opinions&#x2F;insights.<p>----<p>What I liked about Pulsar:<p>1. Tenant isolation is solid, auto load balancing works well: We haven&#x27;t experienced so far a chatty tenant affecting others. We use the same cluster to ingest the data of all our customers (per region, one in US, one in EU). MultiTenancy along with cluster auto-scaling allowed us to contain costs.<p>2. No more single points of failure (data replicated across bookies): Data is replicated in at least two bookies now. This made us a lot more reliable when it comes to data loss.<p>3. Maintenance is easier: No single master constraint anymore, this simplified a lot of the infra maintenance (imagine having to move a Postgres pod into a different EC2 node, it could lead to downtime).<p>----<p>What&#x27;s painful about Pulsar:<p>1. StreamNative licensing costs were significant<p>2. Network costs considerably increased with multi-AZ + replication<p>3. Learning curve was steeper than expected, also it was more complex to debug<p>----<p>Would love to hear your experience with Postgres&#x2F;Pulsar, any opinions or insights on the approach&#x2F;challenges. I hope this dialogue helps others in the community, feel free to ask me anything.