Terraform 需要有向无环图(DAG)。而 AWS 允许循环。以下是我对这两者差异的映射方法。

3作者: davidlu100112 天前原帖
错误:循环:aws_security_group.app -> aws_security_group.db -> aws_security_group.app 如果你在将AWS基础设施导入Terraform时遇到这个错误,你一定知道这种痛苦。 Terraform的核心引擎依赖于有向无环图(DAG)。它需要知道:“先创建A,然后创建B。” 但是AWS是最终一致的,并且乐于允许循环的存在。 死锁 最常见的罪魁祸首是安全组。想象一下两个微服务: - SG-App允许向SG-DB的出站流量 - SG-DB允许来自SG-App的入站流量 如果你用内联规则编写这个(这是terraform import的默认行为),你就会创建一个循环: ```hcl resource "aws_security_group" "app" { egress { security_groups = [aws_security_group.db.id] } } resource "aws_security_group" "db" { ingress { security_groups = [aws_security_group.app.id] } } ``` Terraform无法应用这个配置。它无法在没有db的ID的情况下创建app,反之亦然。 图论视角 在构建一个基础设施反向工程工具时,我意识到我不能仅仅将API响应直接转储为HCL。我们将AWS建模为一个图:节点是资源,边是依赖关系。 在一个健康的配置中,依赖关系是一个DAG: [VPC] --> [Subnet] --> [EC2] 但安全组往往会形成循环: ``` ┌──────────────┐ ▼ │ [SG-App] [SG-DB] │ ▲ └──────────────┘ ``` 寻找“结” 为了为数千个资源解决这个问题,我们使用Tarjan算法来查找强连通分量(SCCs)。它识别出“结”——循环依赖的节点集群——并将其标记以进行处理。 在我们的测试中,一个典型的企业AWS账户拥有500多个安全组,通常包含3到7个这样的集群。 解决方案:“壳与填充” 我们使用一种策略来打破循环: 1. 创建空壳:生成没有规则的安全组。Terraform会立即创建这些。 2. 用规则填充:将规则提取到单独的aws_security_group_rule资源中,并引用这些壳。 ```hcl 第一步:创建壳 [SG-App (空)] [SG-DB (空)] 第二步:创建规则 ▲ ▲ │ │ [规则:出站->DB] [规则:入站<-App] ``` 现在图是无环的。 “为什么不总是使用单独的规则?” 这是个合理的问题。问题在于: 1. terraform import通常生成内联规则。 2. 许多现有代码库更喜欢内联规则以提高可读性。 3. AWS API呈现的是“逻辑”视图(规则捆绑在一起)。 该工具需要检测循环并仅对有问题的部分进行处理。 为什么terraform import不够 标准导入按原样读取状态。它不会在生成代码之前构建全局依赖图或执行拓扑排序。它将重构的负担放在了人类身上。对于拥有2000多个资源的现有迁移,这并不可行。 --- 我在一个名为RepliMap的工具中实现了这个图引擎。我已经开源了运行只读扫描所需的文档和IAM策略。 如果你对像这样的边缘案例(或root_block_device陷阱)感兴趣,仓库在这里: https://github.com/RepliMap/replimap-community 欢迎提问。
查看原文
Error: Cycle: aws_security_group.app -&gt; aws_security_group.db -&gt; aws_security_group.app<p>If you&#x27;ve ever seen this error while importing AWS infrastructure to Terraform, you know the pain.<p>Terraform&#x27;s core engine relies on a Directed Acyclic Graph (DAG). It needs to know: &quot;Create A first, then B.&quot;<p>But AWS is eventually consistent and happily allows cycles.<p>The Deadlock<p>The most common culprit is Security Groups. Imagine two microservices:<p>- SG-App allows outbound traffic to SG-DB - SG-DB allows inbound traffic from SG-App<p>If you write this with inline rules (which is what terraform import defaults to), you create a cycle:<p><pre><code> resource &quot;aws_security_group&quot; &quot;app&quot; { egress { security_groups = [aws_security_group.db.id] } } resource &quot;aws_security_group&quot; &quot;db&quot; { ingress { security_groups = [aws_security_group.app.id] } } </code></pre> Terraform cannot apply this. It can&#x27;t create app without db&#x27;s ID, and vice versa.<p>The Graph Theory View<p>When building an infrastructure reverse-engineering tool, I realized I couldn&#x27;t just dump API responses to HCL. We model AWS as a graph: Nodes are Resources, Edges are Dependencies.<p>In a healthy config, dependencies are a DAG: [VPC] --&gt; [Subnet] --&gt; [EC2]<p>But Security Groups often form cycles: ┌──────────────┐ ▼ │ [SG-App] [SG-DB] │ ▲ └──────────────┘<p>Finding the Knots<p>To solve this for thousands of resources, we use Tarjan&#x27;s algorithm to find Strongly Connected Components (SCCs). It identifies &quot;knots&quot; — clusters of nodes that are circularly dependent — and flags them for surgery.<p>In our testing, a typical enterprise AWS account with 500+ SGs contains 3-7 of these clusters.<p>The Fix: &quot;Shell &amp; Fill&quot;<p>We use a strategy to break the cycle:<p>1. Create Empty Shells: Generate SGs with no rules. Terraform creates these instantly. 2. Fill with Rules: Extract rules into separate aws_security_group_rule resources that reference the shells.<p><pre><code> Step 1: Create Shells [SG-App (empty)] [SG-DB (empty)] Step 2: Create Rules ▲ ▲ │ │ [Rule: egress-&gt;DB] [Rule: ingress&lt;-App] </code></pre> The graph is now acyclic.<p>&quot;Why not just always use separate rules?&quot;<p>Fair question. The problem is: 1. terraform import often generates inline rules. 2. Many existing codebases prefer inline rules for readability. 3. The AWS API presents the &quot;logical&quot; view (rules bundled inside).<p>The tool needs to detect cycles and surgically convert only the problematic ones.<p>Why terraform import isn&#x27;t enough<p>Standard import reads state as-is. It doesn&#x27;t build a global dependency graph or perform topological sorting before generating code. It places the burden of refactoring on the human. For brownfield migrations with 2,000+ resources, that&#x27;s not feasible.<p>---<p>I&#x27;ve implemented this graph engine in a tool called RepliMap. I&#x27;ve open-sourced the documentation and IAM policies needed to run read-only scans safely.<p>If you&#x27;re interested in edge cases like this (or the root_block_device trap), the repo is here:<p>https:&#x2F;&#x2F;github.com&#x2F;RepliMap&#x2F;replimap-community<p>Happy to answer questions.