Outages · Gen 1
0
6+ years FI production
Microservices · Gen 2
10
NATS transport
Arch Pillars · Gen 3
3
Event · Crypto · Determinism
Risk Items
6
Pre-submission review
Three-Generation Platform Heritage
From FI-Proven Node.js to
Rust Cloud-Native Doctrine
Production-proven architecture across three generations. EveLedger ran 6+ years in live FI production with zero outages — the most credible data point in any engagement with a Tier-1 bank.
Generation 1 · EveLedger · 2018–2024+
Hardened Production Node.js
Generation 2 · SNR Core · 2026
TypeScript Microservices
Generation 3 · GCP Core 4.1.26a
Rust / Cloud-Native Doctrine
Event Sourcing as Legal Foundation
Kafka append-only log as source of truth. Signed events. Deterministic replay. Snapshot optimization. Sequence enforcement. State is a projection of events — never mutated directly.
Cryptographic Tamper Evidence
Every mutation is signed. GDPR erasure via envelope encryption — key destruction, not data deletion. Audit trail is cryptographically sealed and independently verifiable.
Regulatory-Grade Determinism
rust_decimal bans IEEE 754 floats at type level. Reg CC hold logic is code, not config. ACH return thresholds (3% / 0.5%) hardcoded. AVAILABLE vs CURRENT correctly separated.
Full Stack Comparison — Three Generations
| Dimension | Gen 1 · EveLedger | Gen 2 · SNR Core (TS) | Gen 3 · GCP Core (Rust) |
|---|---|---|---|
| Language | JavaScript / Node.js | TypeScript 5.9 strict | Rust (memory-safe, no GC) |
| Async | Event loop (single-thread) | Event loop + NATS pub/sub | tokio async runtime (multi-thread) |
| Service Mesh | Custom / Traefik | Moleculer + NATS | gRPC (tonic) + protobuf (prost) |
| Message Bus | None / custom | NATS 2.29 (pub/sub) | Kafka (Confluent) — durable event log |
| Primary DB | MongoDB | MongoDB + Mongoose 9 | Cloud Spanner (external consistency) |
| Cache | Redis (implied) | Redis Alpine + ioredis | Redis active-active CRDT |
| API Layer | GraphQL | Apollo Server 5 / GraphQL 16 | gRPC internal + REST/GraphQL external |
| Decimal Math | BigNumber.js scalar | BigNumber (string-stored) | rust_decimal — type-level f64 ban |
| State Model | Mutable documents | Mutable + MongoDB sessions | Event sourcing — append-only |
| Auth | Production | Not implemented | Cryptographic signatures |
| Compliance | Proven 6yr FI | Schema-ready | Reg CC · GDPR |
| Outage Record | 0 / 6+ Years | Not deployed | Proposed |
SNR Core — What's Already Right
Production-Grade Patterns Present
- Mod10 / Luhn account generation (IBM/Fidelity IFS standard)
- Double-entry bookkeeping — DEBIT/CREDIT with balance tracking
- MongoDB atomic sessions — ACID across 4 document types
- Aggregator → Platform → Customer entity hierarchy
- CURRENT / AVAILABLE / DAILY balance kinds (correct model)
- Federal Reserve BAI codes on all ledger entries
- gluId UUID for cross-system reconciliation
- Traefik routing — battle-tested from Gen 1
- NATS transport — zero SPOF in cluster mode
- Circuit breaker (50%) + bulkhead (10 concurrent / 100 queue)
Gaps vs Tier-1 Requirements
Must Be Added for Full FI Deployment
- Authentication not implemented — JWT planned but absent
- No event sourcing — all mutations are destructive
- No cryptographic tamper evidence on any records
- Sanctions screening is mock — L1/L3 are stubs
- No audit log — state corruption undetectable
- BigNumber as string scalar — not compile-time enforced
- External rails (FedWire, SWIFT, TCH) are placeholders
- Reg CC hold logic not implemented
- No CTR/SAR reporting
- No load test results documented
NATS · SNR Core / Moleculer
- Sub-millisecond latency, minimal resource footprint
- Simple ops — single binary, no ZooKeeper/KRaft
- Zero SPOF in cluster mode
- Moleculer circuit breaker works natively
- At-least-once delivery with NATS JetStream
- State lives in MongoDB — NATS is the routing bus only
- No offset-based replay — not retained indefinitely
- Cannot replay entire financial history from log
- No consumer group lag monitoring
- Schema governance is application-level only
- Not designed for event sourcing pattern
Kafka · GCP Core 4.1.26a
- Durable, ordered, replayable log — the legal record
- Consumer group lag monitoring — observable processing
- Schema registry enforces message contracts at publish
- Exactly-once semantics (EOS) with transactions
- Full state reconstruction via event replay
- Audit trail is the log itself — regulatory gold standard
- Confluent Cloud vendor dependency
- Consumer lag = projection lag under high load
- Significantly higher ops complexity
- Requires schema governance discipline
- Partition split failure modes need explicit handling
- Sequence gap handling critical for financial integrity
Kafka Partition Split
Consumers may receive events out of sequence. Must detect gap and halt — not auto-recover. Auto-recovery without human sign-off is unacceptable.
Spanner Stall
Projection lag increases. Read-only replicas may serve stale data silently. TrueTime consistency degrades during regional events.
Redis CRDT Mis-merge
Active-active CRDT can produce incorrect available balance values via concurrent writes — silently.
- Highest throughput for auth / settlement hot path
- rust_decimal enforces monetary safety at compile time
- No GC pauses — critical for P99 card network SLAs (~100ms)
- Ownership model prevents data races at compile time
- 6–18 month onboarding for engineers new to Rust
- FI internal teams cannot maintain without retraining
- Slow compile times reduce iteration velocity
- async Rust + tokio lifetime complexity is non-trivial
- Largest cloud-native / Kubernetes ecosystem
- Google's own gRPC reference implementation
- Massive talent pool — FI can hire and own independently
- Fast compile, fast iteration velocity
- GC pauses tunable via GOGC — manageable but present
- shopspring/decimal requires convention enforcement
- No compile-time monetary type safety like rust_decimal
- Race conditions possible (detector is test-time only)
Moleculer → Rust / Go Capability Mapping
| Moleculer Feature | Moleculer (NATS) | Rust Equivalent | Go Equivalent | Fit |
|---|---|---|---|---|
| Service registry | Built-in auto-discover | Consul / etcd / k8s | Consul / etcd / k8s | All viable |
| Load balancing | Round-robin local-pref | Envoy sidecar | Envoy sidecar | Envoy preferred |
| Circuit breaker | Built-in (50% / 60s) | tower::retry + governor | go-resilience | More config |
| Bulkhead | Built-in (10 / 100) | tower::concurrency_limit | x/time/rate + sema | Both solid |
| Message transport | NATS pub/sub | rdkafka | confluent-kafka-go | Kafka for event source |
| Decimal math | BigNumber (strings) | rust_decimal | shopspring/decimal | Rust wins |
| Request tracing | gluId UUID | opentelemetry-rust | go.opentelemetry.io/otel | Both excellent |
| Inter-service calls | ctx.call() via broker | tonic gRPC | grpc-go | gRPC preferred |
| Ops complexity | Low (single broker) | High (Kafka + Spanner) | Medium-High | NATS simpler |
High Severity
Document Is Extremely Opinionated
Mandates Rust, Spanner, specific AI vendors, Confluent Cloud. Target FI has existing vendor contracts and internal architecture standards. "We selected this" framing triggers ARB conflicts.
AI Layer Triggers Model Risk Management
Tool-use AI for financial mutations will alarm the FI's MRM team. SR 11-7 compliance analysis required. Any AI initiating financial mutations requires Model Risk Validation before sign-off.
Document Length — 100+ Page Engineering Doctrine
No executive reads this in full. Without a proper executive summary it reads as an academic exercise, not a production proposal.
Medium Severity
Rust Talent Gap at Target FI
Large FI technologist pools are predominantly Java/Python/C++. Full Rust platform creates maintenance dependency on your team for incident response and evolution.
Kafka Ops Discipline Requirement
Requires consumer lag management, schema registry, key rotation, partition split recovery, cross-region replication. If FI ops is not Kafka-fluent, GCP Core is a liability.
SNR Core (Gen 2) — Critical Gaps vs Tier-1
Auth not implemented. Sanctions screening is mock. External rails are stubs. No audit log. No event sourcing. Strong ledger core — not yet a complete Tier-1 platform.
Det. Risk 1
Snapshot Rebuild Safety
Snapshot version must be cryptographically tied to the Kafka offset at creation. Snapshots must be validated against projected state before serving. Never serve an unvalidated snapshot.
Det. Risk 2
Sequence Gap Handling
Missing event sequence numbers are a legal problem. System must detect gaps, halt projection, alert ops, and refuse to serve stale state. Auto-recovery without human sign-off is unacceptable.
Det. Risk 3
Projection Lag
For card-present auth, lag must stay below card network timeout (~100ms). Define lag SLOs per operation type. Circuit break on breach for real-time operations.
How to Win the Room
Conversation Strategy
- Lead with Gen 1's 6-year zero-outage FI production record
- Position SNR Core as the typed evolution of that architecture
- Present GCP Core as phased target state — not day-one mandate
- Offer Go as FI-maintainable alternative to Rust
- Scope Rust to card authorization hot path only
- AI section → Appendix A with human-approval gate language
- Make infra flexibility explicit: GCP, AWS, Azure all supportable
Document Package