SNR Core Platform — FI Assessment

Outages · Gen 1

0

6+ years FI production

Microservices · Gen 2

10

NATS transport

Arch Pillars · Gen 3

3

Event · Crypto · Determinism

Risk Items

6

Pre-submission review

Three-Generation Platform Heritage

From FI-Proven Node.js to
Rust Cloud-Native Doctrine

Production-proven architecture across three generations. EveLedger ran 6+ years in live FI production with zero outages — the most credible data point in any engagement with a Tier-1 bank.

Gen 1 — Live FI Production Gen 2 — Deployment Pending Gen 3 — Proposed GCP

Generation 1 · EveLedger · 2018–2024+

Hardened Production Node.js

runtime

Node.js (legacy)

api

GraphQL

database

MongoDB

routing

Traefik

multi-cloud

Azure + AWS (site-to-site)

uptime

0 Outages · 6+ Yrs

env

FI Production (live)

Generation 2 · SNR Core · 2026

TypeScript Microservices

runtime

Node.js 24 + TypeScript 5.9

framework

Moleculer 0.14 + NATS

api

Apollo Server 5 / GraphQL 16

database

MongoDB + Mongoose 9

routing

Traefik Alpine

status

Deployment Pending

services

10 microservices

Generation 3 · GCP Core 4.1.26a

Rust / Cloud-Native Doctrine

runtime

Rust + tokio async

transport

gRPC / tonic + protobuf

events

Kafka (Confluent) — event sourcing

database

Cloud Spanner (global)

cache

Redis active-active CRDT

status

Proposed · GCP

decimal

rust_decimal (no IEEE 754)

📜

Event Sourcing as Legal Foundation

Kafka append-only log as source of truth. Signed events. Deterministic replay. Snapshot optimization. Sequence enforcement. State is a projection of events — never mutated directly.

🔐

Cryptographic Tamper Evidence

Every mutation is signed. GDPR erasure via envelope encryption — key destruction, not data deletion. Audit trail is cryptographically sealed and independently verifiable.

⚖️

Regulatory-Grade Determinism

rust_decimal bans IEEE 754 floats at type level. Reg CC hold logic is code, not config. ACH return thresholds (3% / 0.5%) hardcoded. AVAILABLE vs CURRENT correctly separated.

Core premise: "The primary risk is not scale — it is state corruption." Event sourcing addresses auditability and regulatory replay requirements that MongoDB mutation-based systems cannot provide natively.

Full Stack Comparison — Three Generations

Dimension	Gen 1 · EveLedger	Gen 2 · SNR Core (TS)	Gen 3 · GCP Core (Rust)
Language	JavaScript / Node.js	TypeScript 5.9 strict	Rust (memory-safe, no GC)
Async	Event loop (single-thread)	Event loop + NATS pub/sub	tokio async runtime (multi-thread)
Service Mesh	Custom / Traefik	Moleculer + NATS	gRPC (tonic) + protobuf (prost)
Message Bus	None / custom	NATS 2.29 (pub/sub)	Kafka (Confluent) — durable event log
Primary DB	MongoDB	MongoDB + Mongoose 9	Cloud Spanner (external consistency)
Cache	Redis (implied)	Redis Alpine + ioredis	Redis active-active CRDT
API Layer	GraphQL	Apollo Server 5 / GraphQL 16	gRPC internal + REST/GraphQL external
Decimal Math	BigNumber.js scalar	BigNumber (string-stored)	rust_decimal — type-level f64 ban
State Model	Mutable documents	Mutable + MongoDB sessions	Event sourcing — append-only
Auth	Production	Not implemented	Cryptographic signatures
Compliance	Proven 6yr FI	Schema-ready	Reg CC · GDPR
Outage Record	0 / 6+ Years	Not deployed	Proposed

SNR Core — What's Already Right

Production-Grade Patterns Present

Mod10 / Luhn account generation (IBM/Fidelity IFS standard)
Double-entry bookkeeping — DEBIT/CREDIT with balance tracking
MongoDB atomic sessions — ACID across 4 document types
Aggregator → Platform → Customer entity hierarchy
CURRENT / AVAILABLE / DAILY balance kinds (correct model)
Federal Reserve BAI codes on all ledger entries
gluId UUID for cross-system reconciliation
Traefik routing — battle-tested from Gen 1
NATS transport — zero SPOF in cluster mode
Circuit breaker (50%) + bulkhead (10 concurrent / 100 queue)

Gaps vs Tier-1 Requirements

Must Be Added for Full FI Deployment

Authentication not implemented — JWT planned but absent
No event sourcing — all mutations are destructive
No cryptographic tamper evidence on any records
Sanctions screening is mock — L1/L3 are stubs
No audit log — state corruption undetectable
BigNumber as string scalar — not compile-time enforced
External rails (FedWire, SWIFT, TCH) are placeholders
Reg CC hold logic not implemented
No CTR/SAR reporting
No load test results documented

Critical distinction: NATS in SNR Core is a message transport — service-to-service RPC routing. Kafka in GCP Core 4.1.26a is a durable event log — the legal source of truth for event sourcing, replay, and audit. These are fundamentally different patterns, not competing technologies.

NATS · SNR Core / Moleculer

Sub-millisecond latency, minimal resource footprint
Simple ops — single binary, no ZooKeeper/KRaft
Zero SPOF in cluster mode
Moleculer circuit breaker works natively
At-least-once delivery with NATS JetStream
State lives in MongoDB — NATS is the routing bus only
No offset-based replay — not retained indefinitely
Cannot replay entire financial history from log
No consumer group lag monitoring
Schema governance is application-level only
Not designed for event sourcing pattern

Kafka · GCP Core 4.1.26a

Durable, ordered, replayable log — the legal record
Consumer group lag monitoring — observable processing
Schema registry enforces message contracts at publish
Exactly-once semantics (EOS) with transactions
Full state reconstruction via event replay
Audit trail is the log itself — regulatory gold standard
Confluent Cloud vendor dependency
Consumer lag = projection lag under high load
Significantly higher ops complexity
Requires schema governance discipline
Partition split failure modes need explicit handling
Sequence gap handling critical for financial integrity

Kafka Partition Split

Consumers may receive events out of sequence. Must detect gap and halt — not auto-recover. Auto-recovery without human sign-off is unacceptable.

Dead-letter queue + manual reconciliation gate before replay resumes.

Spanner Stall

Projection lag increases. Read-only replicas may serve stale data silently. TrueTime consistency degrades during regional events.

Timeout budgets per operation type. Circuit breaker on Spanner client.

Redis CRDT Mis-merge

Active-active CRDT can produce incorrect available balance values via concurrent writes — silently.

Do NOT use CRDT for balance state. Spanner is the sole authoritative source.

Gen 1 advantage: No Kafka meant no projection lag, no consumer group ops, no schema governance surface. Simplicity was the resilience. The tradeoff: NATS/MongoDB cannot prove regulatory state reconstruction from first principles — which Tier-1 legal and audit teams will require.

Rust GCP Core 4.1.26a

async

tokio — multi-threaded, zero-cost

gRPC

tonic (built on tokio)

proto

prost

decimal

rust_decimal — f64 is a compile error

db client

sqlx — async, Spanner via PG wire

serde

serde (JSON external) + prost (internal)

memory

No GC — deterministic latency

throughput

~500k–1M req/sec

Highest throughput for auth / settlement hot path
rust_decimal enforces monetary safety at compile time
No GC pauses — critical for P99 card network SLAs (~100ms)
Ownership model prevents data races at compile time
6–18 month onboarding for engineers new to Rust
FI internal teams cannot maintain without retraining
Slow compile times reduce iteration velocity
async Rust + tokio lifetime complexity is non-trivial

Go Alternative Option

async

goroutines — M:N threading, built-in scheduler

gRPC

google/grpc-go — Google's reference impl

proto

google/protobuf

decimal

shopspring/decimal (convention-based)

db client

pgx or Cloud Spanner Go SDK

serde

encoding/json

memory

GC (tunable via GOGC, arenas in 1.20+)

throughput

~200k–400k req/sec

Largest cloud-native / Kubernetes ecosystem
Google's own gRPC reference implementation
Massive talent pool — FI can hire and own independently
Fast compile, fast iteration velocity
GC pauses tunable via GOGC — manageable but present
shopspring/decimal requires convention enforcement
No compile-time monetary type safety like rust_decimal
Race conditions possible (detector is test-time only)

Recommendation — Hybrid: Go for platform services, Rust scoped to the card authorization hot path where GC pauses breach card network timeout budgets (~100ms). Preserves the FI's ability to own and maintain the platform independently.

Moleculer → Rust / Go Capability Mapping

Moleculer Feature	Moleculer (NATS)	Rust Equivalent	Go Equivalent	Fit
Service registry	Built-in auto-discover	Consul / etcd / k8s	Consul / etcd / k8s	All viable
Load balancing	Round-robin local-pref	Envoy sidecar	Envoy sidecar	Envoy preferred
Circuit breaker	Built-in (50% / 60s)	tower::retry + governor	go-resilience	More config
Bulkhead	Built-in (10 / 100)	tower::concurrency_limit	x/time/rate + sema	Both solid
Message transport	NATS pub/sub	rdkafka	confluent-kafka-go	Kafka for event source
Decimal math	BigNumber (strings)	rust_decimal	shopspring/decimal	Rust wins
Request tracing	gluId UUID	opentelemetry-rust	go.opentelemetry.io/otel	Both excellent
Inter-service calls	ctx.call() via broker	tonic gRPC	grpc-go	gRPC preferred
Ops complexity	Low (single broker)	High (Kafka + Spanner)	Medium-High	NATS simpler

High Severity

R1

Document Is Extremely Opinionated

Mandates Rust, Spanner, specific AI vendors, Confluent Cloud. Target FI has existing vendor contracts and internal architecture standards. "We selected this" framing triggers ARB conflicts.

HighReframe: "We align to your preferences"

R2

AI Layer Triggers Model Risk Management

Tool-use AI for financial mutations will alarm the FI's MRM team. SR 11-7 compliance analysis required. Any AI initiating financial mutations requires Model Risk Validation before sign-off.

HighExtract to Appendix A — human approval gate

R3

Document Length — 100+ Page Engineering Doctrine

No executive reads this in full. Without a proper executive summary it reads as an academic exercise, not a production proposal.

Medium10-pg exec · 30-pg technical · 3-pg compliance

Medium Severity

R4

Rust Talent Gap at Target FI

Large FI technologist pools are predominantly Java/Python/C++. Full Rust platform creates maintenance dependency on your team for incident response and evolution.

MediumGo as primary, Rust for hot path only

R5

Kafka Ops Discipline Requirement

Requires consumer lag management, schema registry, key rotation, partition split recovery, cross-region replication. If FI ops is not Kafka-fluent, GCP Core is a liability.

MediumInclude runbook + SRE staffing estimates

R6

SNR Core (Gen 2) — Critical Gaps vs Tier-1

Auth not implemented. Sanctions screening is mock. External rails are stubs. No audit log. No event sourcing. Strong ledger core — not yet a complete Tier-1 platform.

MediumHonest: "Production ledger engine, ops pending"

Det. Risk 1

Snapshot Rebuild Safety

Snapshot version must be cryptographically tied to the Kafka offset at creation. Snapshots must be validated against projected state before serving. Never serve an unvalidated snapshot.

Det. Risk 2

Sequence Gap Handling

Missing event sequence numbers are a legal problem. System must detect gaps, halt projection, alert ops, and refuse to serve stale state. Auto-recovery without human sign-off is unacceptable.

Det. Risk 3

Projection Lag

For card-present auth, lag must stay below card network timeout (~100ms). Define lag SLOs per operation type. Circuit break on breach for real-time operations.

A

EveLedger — Gen 1

Production Credibility

6+ years, zero outages, live FI deployment. Your strongest credential. Lead every conversation with this — demonstrated reliability outweighs any architectural argument.

Use as Primary Proof Point

B+

SNR Core — Gen 2

Bridge Platform

Excellent ledger core. Correct financial patterns, strong type safety, proven infra stack. Complete auth, event sourcing overlay, and live ops record before presenting to FI.

Complete Critical Gaps First

A−

GCP Core 4.1.26a

Architectural Vision

Intellectually correct. Event sourcing, rust_decimal, Reg CC as code, GDPR envelope encryption are genuinely senior-level. Risk is presentation, not substance.

Right-Size for Target FI

How to Win the Room

Conversation Strategy

Lead with Gen 1's 6-year zero-outage FI production record
Position SNR Core as the typed evolution of that architecture
Present GCP Core as phased target state — not day-one mandate
Offer Go as FI-maintainable alternative to Rust
Scope Rust to card authorization hot path only
AI section → Appendix A with human-approval gate language
Make infra flexibility explicit: GCP, AWS, Azure all supportable

Document Package

What to Deliver

Doc 1

10-page Executive Summary

Doc 2

30-page Technical Architecture (edited GCP Core)

Doc 3

3-page Risk & Compliance Summary

Appendix A

AI/ML Layer — SR 11-7, human approval gate

Appendix B

Infrastructure flexibility (GCP / AWS / Azure)

Appendix C

Rust vs Go technical comparison

Bottom line: GCP Core 4.1.26a is architecturally correct. The risk is presentation, not substance. Present as a recommendation, not a mandate. Lead with the zero-outage heritage and let the architecture speak for itself.

Platform Assessment — GCP Core 4.1.26a

From FI-Proven Node.js toRust Cloud-Native Doctrine

Hardened Production Node.js

TypeScript Microservices

Rust / Cloud-Native Doctrine

Event Sourcing as Legal Foundation

Cryptographic Tamper Evidence

Regulatory-Grade Determinism

Full Stack Comparison — Three Generations

Production-Grade Patterns Present

Must Be Added for Full FI Deployment

Moleculer → Rust / Go Capability Mapping

Document Is Extremely Opinionated

AI Layer Triggers Model Risk Management

Document Length — 100+ Page Engineering Doctrine

Rust Talent Gap at Target FI

Kafka Ops Discipline Requirement

SNR Core (Gen 2) — Critical Gaps vs Tier-1

Snapshot Rebuild Safety

Sequence Gap Handling

Projection Lag

Conversation Strategy

What to Deliver

From FI-Proven Node.js to
Rust Cloud-Native Doctrine