HackerNews中文版

错误：循环：aws_security_group.app -> aws_security_group.db -> aws_security_group.app 如果你在将AWS基础设施导入Terraform时遇到这个错误，你一定知道这种痛苦。 Terraform的核心引擎依赖于有向无环图（DAG）。它需要知道：“先创建A，然后创建B。” 但是AWS是最终一致的，并且乐于允许循环的存在。死锁最常见的罪魁祸首是安全组。想象一下两个微服务： - SG-App允许向SG-DB的出站流量 - SG-DB允许来自SG-App的入站流量如果你用内联规则编写这个（这是terraform import的默认行为），你就会创建一个循环： ```hcl resource "aws_security_group" "app" { egress { security_groups = [aws_security_group.db.id] } } resource "aws_security_group" "db" { ingress { security_groups = [aws_security_group.app.id] } } ``` Terraform无法应用这个配置。它无法在没有db的ID的情况下创建app，反之亦然。图论视角在构建一个基础设施反向工程工具时，我意识到我不能仅仅将API响应直接转储为HCL。我们将AWS建模为一个图：节点是资源，边是依赖关系。在一个健康的配置中，依赖关系是一个DAG： [VPC] --> [Subnet] --> [EC2] 但安全组往往会形成循环： ``` ┌──────────────┐ ▼ │ [SG-App] [SG-DB] │ ▲ └──────────────┘ ``` 寻找“结” 为了为数千个资源解决这个问题，我们使用Tarjan算法来查找强连通分量（SCCs）。它识别出“结”——循环依赖的节点集群——并将其标记以进行处理。在我们的测试中，一个典型的企业AWS账户拥有500多个安全组，通常包含3到7个这样的集群。解决方案：“壳与填充” 我们使用一种策略来打破循环： 1. 创建空壳：生成没有规则的安全组。Terraform会立即创建这些。 2. 用规则填充：将规则提取到单独的aws_security_group_rule资源中，并引用这些壳。 ```hcl 第一步：创建壳 [SG-App (空)] [SG-DB (空)] 第二步：创建规则 ▲ ▲ │ │ [规则：出站->DB] [规则：入站<-App] ``` 现在图是无环的。 “为什么不总是使用单独的规则？” 这是个合理的问题。问题在于： 1. terraform import通常生成内联规则。 2. 许多现有代码库更喜欢内联规则以提高可读性。 3. AWS API呈现的是“逻辑”视图（规则捆绑在一起）。该工具需要检测循环并仅对有问题的部分进行处理。为什么terraform import不够标准导入按原样读取状态。它不会在生成代码之前构建全局依赖图或执行拓扑排序。它将重构的负担放在了人类身上。对于拥有2000多个资源的现有迁移，这并不可行。 --- 我在一个名为RepliMap的工具中实现了这个图引擎。我已经开源了运行只读扫描所需的文档和IAM策略。如果你对像这样的边缘案例（或root_block_device陷阱）感兴趣，仓库在这里： https://github.com/RepliMap/replimap-community 欢迎提问。

查看原文

Error: Cycle: aws_security_group.app -> aws_security_group.db -> aws_security_group.appIf you've ever seen this error while importing AWS infrastructure to Terraform, you know the pain.Terraform's core engine relies on a Directed Acyclic Graph (DAG). It needs to know: "Create A first, then B."But AWS is eventually consistent and happily allows cycles.The DeadlockThe most common culprit is Security Groups. Imagine two microservices:- SG-App allows outbound traffic to SG-DB - SG-DB allows inbound traffic from SG-AppIf you write this with inline rules (which is what terraform import defaults to), you create a cycle:<pre><code> resource "aws_security_group" "app" { egress { security_groups = [aws_security_group.db.id] } } resource "aws_security_group" "db" { ingress { security_groups = [aws_security_group.app.id] } } </code></pre> Terraform cannot apply this. It can't create app without db's ID, and vice versa.The Graph Theory ViewWhen building an infrastructure reverse-engineering tool, I realized I couldn't just dump API responses to HCL. We model AWS as a graph: Nodes are Resources, Edges are Dependencies.In a healthy config, dependencies are a DAG: [VPC] --> [Subnet] --> [EC2]But Security Groups often form cycles: ┌──────────────┐ ▼ │ [SG-App] [SG-DB] │ ▲ └──────────────┘Finding the KnotsTo solve this for thousands of resources, we use Tarjan's algorithm to find Strongly Connected Components (SCCs). It identifies "knots" — clusters of nodes that are circularly dependent — and flags them for surgery.In our testing, a typical enterprise AWS account with 500+ SGs contains 3-7 of these clusters.The Fix: "Shell & Fill"We use a strategy to break the cycle:1. Create Empty Shells: Generate SGs with no rules. Terraform creates these instantly. 2. Fill with Rules: Extract rules into separate aws_security_group_rule resources that reference the shells.<pre><code> Step 1: Create Shells [SG-App (empty)] [SG-DB (empty)] Step 2: Create Rules ▲ ▲ │ │ [Rule: egress->DB] [Rule: ingress<-App] </code></pre> The graph is now acyclic."Why not just always use separate rules?"Fair question. The problem is: 1. terraform import often generates inline rules. 2. Many existing codebases prefer inline rules for readability. 3. The AWS API presents the "logical" view (rules bundled inside).The tool needs to detect cycles and surgically convert only the problematic ones.Why terraform import isn't enoughStandard import reads state as-is. It doesn't build a global dependency graph or perform topological sorting before generating code. It places the burden of refactoring on the human. For brownfield migrations with 2,000+ resources, that's not feasible.---I've implemented this graph engine in a tool called RepliMap. I've open-sourced the documentation and IAM policies needed to run read-only scans safely.If you're interested in edge cases like this (or the root_block_device trap), the repo is here:https://github.com/RepliMap/replimap-communityHappy to answer questions.

Terraform 需要有向无环图（DAG）。而 AWS 允许循环。以下是我对这两者差异的映射方法。