Terraform 需要有向无环图(DAG)。而 AWS 允许循环。以下是我对这两者差异的映射方法。
错误:循环:aws_security_group.app -> aws_security_group.db -> aws_security_group.app
如果你在将AWS基础设施导入Terraform时遇到这个错误,你一定知道这种痛苦。
Terraform的核心引擎依赖于有向无环图(DAG)。它需要知道:“先创建A,然后创建B。”
但是AWS是最终一致的,并且乐于允许循环的存在。
死锁
最常见的罪魁祸首是安全组。想象一下两个微服务:
- SG-App允许向SG-DB的出站流量
- SG-DB允许来自SG-App的入站流量
如果你用内联规则编写这个(这是terraform import的默认行为),你就会创建一个循环:
```hcl
resource "aws_security_group" "app" {
egress {
security_groups = [aws_security_group.db.id]
}
}
resource "aws_security_group" "db" {
ingress {
security_groups = [aws_security_group.app.id]
}
}
```
Terraform无法应用这个配置。它无法在没有db的ID的情况下创建app,反之亦然。
图论视角
在构建一个基础设施反向工程工具时,我意识到我不能仅仅将API响应直接转储为HCL。我们将AWS建模为一个图:节点是资源,边是依赖关系。
在一个健康的配置中,依赖关系是一个DAG:
[VPC] --> [Subnet] --> [EC2]
但安全组往往会形成循环:
```
┌──────────────┐
▼ │
[SG-App] [SG-DB]
│ ▲
└──────────────┘
```
寻找“结”
为了为数千个资源解决这个问题,我们使用Tarjan算法来查找强连通分量(SCCs)。它识别出“结”——循环依赖的节点集群——并将其标记以进行处理。
在我们的测试中,一个典型的企业AWS账户拥有500多个安全组,通常包含3到7个这样的集群。
解决方案:“壳与填充”
我们使用一种策略来打破循环:
1. 创建空壳:生成没有规则的安全组。Terraform会立即创建这些。
2. 用规则填充:将规则提取到单独的aws_security_group_rule资源中,并引用这些壳。
```hcl
第一步:创建壳
[SG-App (空)] [SG-DB (空)]
第二步:创建规则
▲ ▲
│ │
[规则:出站->DB] [规则:入站<-App]
```
现在图是无环的。
“为什么不总是使用单独的规则?”
这是个合理的问题。问题在于:
1. terraform import通常生成内联规则。
2. 许多现有代码库更喜欢内联规则以提高可读性。
3. AWS API呈现的是“逻辑”视图(规则捆绑在一起)。
该工具需要检测循环并仅对有问题的部分进行处理。
为什么terraform import不够
标准导入按原样读取状态。它不会在生成代码之前构建全局依赖图或执行拓扑排序。它将重构的负担放在了人类身上。对于拥有2000多个资源的现有迁移,这并不可行。
---
我在一个名为RepliMap的工具中实现了这个图引擎。我已经开源了运行只读扫描所需的文档和IAM策略。
如果你对像这样的边缘案例(或root_block_device陷阱)感兴趣,仓库在这里:
https://github.com/RepliMap/replimap-community
欢迎提问。
查看原文
Error: Cycle: aws_security_group.app -> aws_security_group.db -> aws_security_group.app<p>If you've ever seen this error while importing AWS infrastructure to Terraform, you know the pain.<p>Terraform's core engine relies on a Directed Acyclic Graph (DAG). It needs to know: "Create A first, then B."<p>But AWS is eventually consistent and happily allows cycles.<p>The Deadlock<p>The most common culprit is Security Groups. Imagine two microservices:<p>- SG-App allows outbound traffic to SG-DB
- SG-DB allows inbound traffic from SG-App<p>If you write this with inline rules (which is what terraform import defaults to), you create a cycle:<p><pre><code> resource "aws_security_group" "app" {
egress {
security_groups = [aws_security_group.db.id]
}
}
resource "aws_security_group" "db" {
ingress {
security_groups = [aws_security_group.app.id]
}
}
</code></pre>
Terraform cannot apply this. It can't create app without db's ID, and vice versa.<p>The Graph Theory View<p>When building an infrastructure reverse-engineering tool, I realized I couldn't just dump API responses to HCL. We model AWS as a graph: Nodes are Resources, Edges are Dependencies.<p>In a healthy config, dependencies are a DAG:
[VPC] --> [Subnet] --> [EC2]<p>But Security Groups often form cycles:
┌──────────────┐
▼ │
[SG-App] [SG-DB]
│ ▲
└──────────────┘<p>Finding the Knots<p>To solve this for thousands of resources, we use Tarjan's algorithm to find Strongly Connected Components (SCCs). It identifies "knots" — clusters of nodes that are circularly dependent — and flags them for surgery.<p>In our testing, a typical enterprise AWS account with 500+ SGs contains 3-7 of these clusters.<p>The Fix: "Shell & Fill"<p>We use a strategy to break the cycle:<p>1. Create Empty Shells: Generate SGs with no rules. Terraform creates these instantly.
2. Fill with Rules: Extract rules into separate aws_security_group_rule resources that reference the shells.<p><pre><code> Step 1: Create Shells
[SG-App (empty)] [SG-DB (empty)]
Step 2: Create Rules
▲ ▲
│ │
[Rule: egress->DB] [Rule: ingress<-App]
</code></pre>
The graph is now acyclic.<p>"Why not just always use separate rules?"<p>Fair question. The problem is:
1. terraform import often generates inline rules.
2. Many existing codebases prefer inline rules for readability.
3. The AWS API presents the "logical" view (rules bundled inside).<p>The tool needs to detect cycles and surgically convert only the problematic ones.<p>Why terraform import isn't enough<p>Standard import reads state as-is. It doesn't build a global dependency graph or perform topological sorting before generating code. It places the burden of refactoring on the human. For brownfield migrations with 2,000+ resources, that's not feasible.<p>---<p>I've implemented this graph engine in a tool called RepliMap. I've open-sourced the documentation and IAM policies needed to run read-only scans safely.<p>If you're interested in edge cases like this (or the root_block_device trap), the repo is here:<p>https://github.com/RepliMap/replimap-community<p>Happy to answer questions.