ADR-022: Dual BFS Engine (Go + Rust FFI)
Status: Accepted Date: 2026-04-06 Deciders: Liem Vo-Nguyen Supersedes: None Extends: ADR-008 (Attack Path Computation), ADR-020 (Security Graph Architecture)
Context
CloudForge's attack path engine (ADR-008) was originally implemented as an in-memory Go BFS traversal over finding data. This works well at demo scale (1-5K findings, sub-millisecond computation) but benchmark profiling at enterprise scale revealed severe limitations:
| Metric | 200 findings | 20K findings | 300K findings (target) |
|---|---|---|---|
| Go BFS wall time | 2.4ms | 119.5s | projected ~136.5s |
| Go BFS allocations | 18K | 184M | projected >1B |
| Go BFS memory | 12MB | 42.5GB | OOM on 16GB machines |
The root cause is Go's garbage collector — the BFS creates millions of transient []Finding slices during path enumeration, and GC pause time dominates at scale. Additionally, Go's single-threaded BFS cannot exploit the natural parallelism of per-account-partition path computation.
A secondary hot path is JSON loading: the server startup deserializes 42MB of findings JSON (300K corpus) in ~120s using encoding/json, which does not parallelize.
Decision
Maintain two BFS engine implementations — Go (pure, portable) and Rust (FFI, high-performance) — selectable at build time via Go build tags. This is not a migration; both engines are permanent.
Implementation Status
As of 2026-04-06, this ADR is accepted and partially implemented:
- the Rust crate, CGo bridge, build targets, and benchmarks exist on disk
- the Go BFS engine remains the active server runtime path in
cmd/server/attackpath.go - the documented
AEGIS_RUST_PATHSactivation path is not yet wired into the live request/runtime bootstrap
Treat this ADR as the accepted dual-engine design and code path, not as evidence that production requests already execute through Rust by default.
Architecture
cmd/server/attackpath.go ← Go BFS engine (default, always available)
↕ (feature-flagged)
rust/bridge.go ← CGo FFI bridge (build tag: cgo && rust)
↓
rust/libaegispath/ ← Rust cdylib crate
├── src/attackpath.rs ← Rayon-parallelized BFS (port of attackpath.go)
├── src/loader.rs ← Serde JSON deser + filter + reserialize
├── src/types.rs ← Finding struct (11 of 56 fields for BFS)
└── src/lib.rs ← C-ABI exports (aegis_compute_attack_paths, etc.)
Selection Logic
| Condition | Engine | Rationale |
|---|---|---|
AEGIS_RUST_PATHS=true + binary built with -tags rust | Rust FFI | Production/demo with large corpus |
| Default (no env var, or no Rust build tag) | Go BFS | CI, local dev, zero-dependency deploys |
PUPPYGRAPH_URL set | PuppyGraph Gremlin (ADR-020) | Graph-native traversal when available |
The three engines form a precedence chain: PuppyGraph > Rust FFI > Go BFS. Each reads from the same graph_edges table (ADR-020) or falls back to heuristic co-location when edges are unavailable.
FFI Boundary Design
The FFI contract is JSON-in/JSON-out across the CGo boundary:
- Go serializes
[]Findingto JSON bytes - Go passes a pointer + length to the C-ABI function
- Rust deserializes with
serde_json, computes BFS, serializes result - Rust returns a pointer + length to a Rust-allocated buffer
- Go copies the result into Go-managed memory via
unsafe.Slice - Go calls
aegis_free()to release the Rust buffer
This design eliminates shared-pointer hazards between Go's GC and Rust's ownership model. The serialization overhead (~3-5ms for 20K findings) is negligible compared to the 100x computation savings.
Safety constraints:
- 64MB input cap (prevents OOM on malformed input)
C.size_tfor lengths (avoidsC.inttruncation on buffers >2GB)staticliblinkage option for deployment without.dylibdistribution
Rust Performance Characteristics
Criterion benchmarks (M2 Max, 200 findings baseline):
| Operation | Time | Notes |
|---|---|---|
| BFS compute | 2.43ms | Rayon parallel across account partitions |
| JSON deserialize | 397μs | Serde, 11 of 56 fields (rest ignored) |
| Full pipeline (deser + BFS + serialize) | 3.71ms | End-to-end FFI round trip |
Projected at 20K findings: 15-25s (vs Go's 119.5s) — a 5-8x improvement driven by:
rayon::par_iterparallelizes BFS across account partitions (each account's findings are independent)- Zero-copy graph construction (Rust
Vecvs Go[]Findingslice copies) - No GC pauses (ownership model, deterministic deallocation)
Output Equivalence
Both engines must produce byte-identical JSON output for the same input. This is enforced by:
- 17 Rust unit tests mirroring Go test cases (graph construction, BFS traversal, edge classification, empty/degenerate inputs)
- Integration test: run both engines on the same 500-finding subset, diff outputs
- Same
AttackPath/AttackPathStatsresponse types (defined in Go, mirrored in Rust)
Consequences
Positive
- Portability preserved — CI pipelines, Docker alpine images, and
go runall work without Rust toolchain - 5-8x throughput at enterprise scale with Rust enabled — critical for 300K finding corpus demo
- Incremental adoption — enable Rust per-environment, zero risk to existing deploys
- Clean separation — Rust crate is a self-contained library with its own test suite and benchmarks
- AdjacencySet integration — both engines consume
secgraph.AdjacencySetfor evidence-based edges (ADR-020), replacing heuristic co-location inference
Negative
- Build complexity — Rust toolchain required for FFI builds (
cargo build --release→make rust-build) - Two codebases — BFS logic is duplicated across Go and Rust. Changes to path computation must be applied to both.
- CGo overhead — enables CGo which disables some Go optimizations (cross-compilation, static binaries without
staticlib) - Debugging — stack traces cross the FFI boundary; Rust panics become Go crashes unless caught
Risks
| Risk | Impact | Mitigation |
|---|---|---|
| Rust/Go output divergence after BFS change | Silent correctness regression | Integration test compares both engines on identical input |
| CGo FFI memory leak | Gradual memory growth in production | copyAndFree pattern: every Rust allocation is freed immediately after Go copy |
| Rust toolchain unavailable in CI | Cannot build FFI variant | Go engine is always the default; Rust is opt-in via build tag |
rayon thread pool contention with Go runtime | Degraded throughput under high concurrency | Rayon defaults to num_cpus threads; Go runtime separately manages goroutines on remaining cores |
Alternatives Considered
1. Rust-Only Engine (Drop Go BFS)
Require Rust toolchain for all builds, remove Go BFS entirely.
Rejected because: Breaks zero-dependency go run workflow. CI pipelines, Docker alpine images, and contributor onboarding all depend on pure-Go builds. The Go engine is also the PuppyGraph fallback when graph infrastructure is unavailable.
2. Go-Only with Optimizations (sync.Pool, Arena)
Optimize Go BFS using sync.Pool for slice reuse and Go 1.22 arena allocations.
Evaluated and insufficient: Profiling showed GC pause time (not allocation rate) is the bottleneck at 20K+ findings. sync.Pool reduces allocation count but does not eliminate GC scanning of live []Finding references during BFS. Arena is experimental and does not support slice-of-struct patterns.
3. Separate Microservice (gRPC)
Run Rust BFS as a sidecar service, communicate via gRPC.
Rejected because: Adds network hop latency, container orchestration dependency, and operational complexity. The FFI approach has <5ms overhead vs ~20-50ms for gRPC round trip, and ships as a single binary.
References
- ADR-008: Attack Path Computation Strategy (original Go BFS)
- ADR-020: Security Graph Architecture (AdjacencySet, graph_edges)
cmd/server/attackpath.go— Go BFS engine (computeAttackPaths, isEntryPoint, isTarget, canConnect, buildChain)rust/bridge.go— CGo FFI bridge (ComputeAttackPaths, LoadAndSerializeFindings)rust/libaegispath/src/attackpath.rs— Rust BFS engine (rayon-parallelized)rust/libaegispath/src/loader.rs— Serde JSON loading pipelineinternal/secgraph/adjacency.go— AdjacencySet for evidence-based edge lookup