What is Certificate Transparency and why does it matter for AI security?

Certificate Transparency (CT) is a public append-only log of every TLS certificate issued by every Certificate Authority. It was created to catch rogue CAs, but it has a side effect: every internal hostname your engineers protect with HTTPS becomes immediately discoverable. For AI teams shipping RAG pipelines, LLM proxies, and vector databases under their own subdomains, CT logs are the loudest backdoor in modern infrastructure.

Does the Shadow AI Radar use real Certificate Transparency data?

Yes. EchelonGraph runs a server-side poller that queries crt.sh — the public Certificate Transparency search index — every 60 seconds for hostnames matching the AI subdomain taxonomy. The marketing page renders aggregate stats server-side every 15 seconds, with a live ticker that re-polls client-side at the same cadence. We never display synthetic or mock entries.

How often is the Shadow AI Radar data refreshed?

The Certificate Transparency poller queries crt.sh every 60 seconds. Shodan banner-grab enrichment runs every 6 hours. The marketing page caches aggregates for 15 seconds and the live ticker re-polls client-side every 15 seconds — so the dashboard you see is at most 15 seconds stale.

How can I access the Shadow AI Radar data programmatically?

Three public REST endpoints, no authentication required: GET https://app.echelongraph.io/api/v1/public/shadow-ai-radar returns the paginated observation feed (page, page_size, category, include_resolved filters); GET .../stats returns aggregate counters including top products / countries / issuers / 30-day trend; POST .../recheck/:id triggers an on-demand verification of a single row (5-minute cooldown per row). Responses cache at 15-second granularity. For higher request volumes please email data@echelongraph.io.

What is your responsible-disclosure policy for exposed infrastructure surfaced on the radar?

Every observation comes from public sources (Certificate Transparency, Shodan banner-grab) — we surface what is already broadcast. If you are the owner of an exposed asset and would like a private 24-hour notification window before public takedown verification, contact disclosure@echelongraph.io with the hostname; we will verify ownership via TXT-record challenge and remove the record from the public radar.

How can I tell if our company appears in CT logs with shadow AI subdomains?

Search crt.sh for your apex domain (e.g. crt.sh/?q=acme.com) and look for hostnames matching rag-*, *-gpt-*, mlflow-*, ray-*, milvus-*, qdrant-*, jupyter-*, ollama-*, mcp-*. EchelonGraph automates this continuously and ties each finding to a category-specific attacker playbook + remediation. Contact security engineering for a free sweep.

What can attackers actually do with a leaked AI subdomain?

It depends on the category. RAG pipelines and vector databases default to no auth — attackers exfiltrate the embedded corpus. LLM proxies hold long-lived API tokens — attackers burn the budget and steal system prompts. MLflow and Ray endpoints chain into remote-code-execution via known CVEs. Jupyter notebooks give interactive Python execution against the host. The Shadow AI Radar's category cards on this page give the full per-category attacker playbook.

How does the Shadow AI Radar compare to Shodan or Censys for AI infrastructure discovery?

Shodan and Censys index every internet-reachable service across all ports — broad but unspecialised. The Shadow AI Radar narrows that surface to a curated taxonomy of 80 AI-specific subdomain patterns (plus 52 Shodan banner dorks) across 14 risk categories, then verifies each candidate with category-specific deep service probes (Jupyter /api/kernels, Ollama /api/tags, Weaviate /v1/meta, MLflow /api/2.0/mlflow/experiments/list, MinIO /api/v1/buckets, MCP tools/list, etc.). EchelonGraph cross-enriches its Certificate-Transparency observations with Shodan banner-grab data — so the radar is complementary to Shodan, not a replacement. Think of it as Shodan + AI-aware classification + verification.

Can I use Shadow AI Radar data as audit evidence for MITRE ATLAS or EU AI Act compliance?

Yes — the dataset is published under the CC-BY-4.0 license with attribution to EchelonGraph, and the page provides a stable citation block (APA + BibTeX) for inclusion in audit reports. Each observation maps to one of 14 risk categories with documented references to MITRE ATLAS AML.T0011 (Shadow AI Detection), OWASP LLM Top 10 (LLM07 — Supply Chain), NIST AI-RMF MS-2.1 (Risk Monitoring), EU AI Act Articles 9 and 15 (risk management + accuracy/robustness), and ISO/IEC 42001 §A.8 (interested-parties disclosure). Compliance officers commonly screenshot these mappings for audit evidence. The radar surfaces public-source exposures; final risk determination for your assets remains your responsibility.

Is the Shadow AI Radar dataset free to use commercially?

Yes. The radar dataset and the underlying public API are free for both non-commercial and commercial use under the Creative Commons Attribution 4.0 (CC-BY-4.0) license — the same license used by CIRCL, abuse.ch, and other public security-intelligence feeds. Attribution to EchelonGraph (https://echelongraph.io/shadow-ai-radar) is required. No API key, no rate-limit signup, no paywall. For higher request volumes (\u003e60 req/min sustained) or for a commercial support agreement, email data@echelongraph.io.

Live · Certificate Transparency Streampublic dataset · CC-BY-4.0

The Shadow AI Radar: Live Exposed AI Infrastructure

Right now, tens of thousands of AI services — Jupyter notebooks, model stores, LLM proxies, vector databases — are reachable on the public internet. Thousands answer with no authentication at all — one HTTP request from full data theft or remote code execution. Most owners do not know they are listed below.

Every TLS certificate issued for an internal AI service is broadcast through public Certificate Transparency logs within seconds. The Shadow AI Radar is the public, citable, continuously-refreshed dataset of that exposure — 14 categories, 130+ detection patterns, Shodan-verified liveness, refreshed every 60 seconds.

Is your AI infrastructure exposed? Scan it free →External scan · no signup · results in ~30 seconds

Observations · last 24 h

858

fresh certificates in the AI taxonomy

Verified unauthenticated

720

confirmed reachable without auth

Top exposed category

LLM Proxy

418 active observations

Total observations tracked

30,808

30-day rolling window

Last data refresh

739823d ago

poller cadence: 60s · status: healthy

Data sources

crt.sh · Shodan · CT logs

multi-source aggregation

Today on the radar — daily intelligence brief

2026-07-27 UTC

Past 24 hours: 858 new observations across LLM Proxy (418), Vector Database (221), AI Workflow Builder (43), Inference Server (26). 24 verified reachable without authentication. Most exposed product: MinIO Server (1,747). Most active region: United States (7,349).

Top exposed categories

Confirmed reachable (visible_by_category)

LLM Proxy418

Vector Database221

AI Workflow Builder43

Inference Server26

Model Registry10

MCP Server1

Notebook Server1

Top exposed products

Shodan banner-grab enrichment

MinIO Server1,747

Tornado Web Server1,298

LiteLLM766

Jupyter Hub661

LocalAI579

Jupyter Server576

nginx486

Weaviate97

Jupyter Notebook84

Sophos SSL VPN User Portal67

Top regions

Country resolution from Shodan ASN data

United States7,349

China5,663

Germany2,831

Japan1,036

Singapore1,016

United Kingdom924

India871

France824

Canada794

Australia712

30-day observation trend

Daily new observations across all 14 categories

Peak: 1,980 on 2026-07-16 · 30-day total: 30,807

2026-06-282026-07-27

Top issuing networks

Hosting providers / ASN organisations from Shodan enrichment

Aliyun Computing Co., LTD (China)2,091

Linode (United States)1,719

Let's Encrypt913

Amazon Technologies Inc. (United States)896

Amazon.com, Inc. (United States)744

Hetzner Online GmbH (Germany)663

Tencent cloud computing (Beijing) Co., Ltd. (China)624

Contabo GmbH (Germany)500

Google LLC (United States)499

Amazon Data Services Canada (Canada)489

Live observation feed

Filterable, click any row for the per-category attacker playbook + remediation

Tuning radar to CT log stream…

14 risk categories · 130+ detection patterns

Every category we monitor, with the full attacker playbook on each

Click any category to expand the attacker playbook, blast radius, remediation steps, subdomain patterns, and CVE / OWASP references. Verified counts are confirmed-reachable observations; total counts include unverified rows still in the verification queue.

RAG Pipeline

Retrieval-Augmented Generation endpoints reveal the topology of your knowledge stack.

Unauthenticated endpoints → wholesale document exfiltration

0verified

0total

What an attacker learns

A subdomain like internal-rag-pipeline.acme.com tells an attacker that you run a RAG architecture, almost always on a default port (FastAPI/Flask 8000–9000). It confirms three things at once: (1) you have an embedding store reachable over HTTPS, (2) you have a backing LLM connected to it, and (3) somewhere in that stack there is privileged data — code, contracts, customer records — being indexed for retrieval.

Attacker playbook

Enumerate the discovered hostname for /docs, /openapi.json, /retrieve, /query, and /health.
Confirm whether the endpoint is unauthenticated by sending an empty similarity-search request.
If reachable, exfiltrate top-k documents with crafted high-recall queries.
Pivot: discover associated *-vector-db-* and *-llm-proxy-* subdomains via the same CT log.
Persistence: poison the embedding store with adversarial documents to manipulate downstream LLM output.

Blast radius

Wholesale data exfiltration of every document indexed by the pipeline — typically internal wikis, support tickets, contracts, or source code. Successful poisoning attacks turn the LLM into a controlled puppet for every employee that consumes its answers.

Remediation

Put RAG endpoints behind a private VPC + IAP / SSO only — never on the public internet.
Use wildcard certificates (*.internal.acme.com) so individual services do not appear in CT logs.
Subscribe to your own CT logs with crt.sh / Censys and alert on every new internal hostname.
Add an authentication middleware to FastAPI / Flask before any /retrieve route.
Enable mutual TLS between the RAG service and the embedding store + LLM.

Subdomain patterns matched

*rag**retrieval**knowledge**kb-**eval-**rag-customer**rag-sales*

References

Vector Database

Standalone vector stores (Milvus, Qdrant, Chroma, Weaviate) commonly ship with no auth.

Ships with no auth — anonymous read / write / DELETE on your corpus

221verified

831total

What an attacker learns

Hostnames matching milvus-*, qdrant-*, chroma-*, weaviate-* signal a bare vector database — not a managed service. These products default to open access for developer convenience; the vast majority of internet-exposed instances accept anonymous reads and writes on their well-known ports (Milvus 19530, Qdrant 6333, Chroma 8000, Weaviate 8080).

Attacker playbook

Hit /collections (Qdrant, Chroma) or list_collections (Milvus) without auth — confirm enumeration.
Pull top-k embeddings from the largest collection, store the vectors locally.
Use embedding-inversion (vec2text, GEIA) to reconstruct the source text from stolen embeddings.
Inject adversarial vectors that hijack downstream retrieval (data poisoning).
DELETE /collections wipes the entire knowledge base — single-call destructive op.

Blast radius

Stealing embeddings is provably equivalent to stealing the documents they were built from. A successful read attack therefore exfiltrates years of internal corpus content. A successful write attack lets the attacker dictate what the company's own AI tells employees.

Remediation

Bind the vector DB to a private subnet only (use SecurityGroup / NSG / GCP firewall).
Enable the database's native auth (Milvus RBAC, Qdrant API keys, Weaviate API tokens).
Front the DB with an authenticated proxy (e.g. Envoy + JWT) for any cross-service access.
Rotate API tokens via your secret manager; never hard-code in client code.
Audit every CREATE/DELETE collection event to a SIEM.

Subdomain patterns matched

*milvus**qdrant**chroma**weaviate**vector-db**pinecone*

References

LLM Proxy

Internal proxies routing to OpenAI, Anthropic, or self-hosted models — token-bearing surfaces.

Holds your model API keys — token theft + six-figure bill abuse

418verified

5,194total

What an attacker learns

Subdomains like openai-internal.startup.io, claude-bridge.acme.com, anthropic-proxy.fintech.ai are the chokepoints where engineering teams stash their model API keys. Every employee request flows through them, which means the proxy holds long-lived API tokens for the upstream model provider.

Attacker playbook

Probe /v1/chat/completions, /v1/messages, /openai/v1/* — proxies usually mirror the upstream API shape.
If unauthenticated, submit prompt-jailbreak payloads to confirm the upstream is reachable.
Burn the company's token budget at scale (financial DoS — six-figure monthly bills are common).
Pivot: extract system prompts and tool definitions through prompt-injection on the proxy.
Use the proxy as a free LLM for further attacker-side automation (model laundering).

Blast radius

A direct hit on a single LLM proxy can cost a company hundreds of thousands of dollars in unbudgeted model spend, leak the company's system prompts (which encode product strategy), and expose any tool integrations wired into the agent layer.

Remediation

Require an Authorization Bearer header on every proxy route — reject anonymous calls with 401.
Issue per-employee proxy tokens via your IdP (Okta, Azure AD); never share a master key.
Set per-tenant + per-user budget caps and alert on anomalies in token usage.
Strip / sanitise system prompts from server responses; never echo them back.
Put the proxy behind your edge auth (Cloudflare Access, IAP) — not on the open internet.

Subdomain patterns matched

*gpt-**claude-**anthropic-**openai-**openai.**llm-proxy**llm-gateway**llm-api**ai-proxy**model-proxy*

References

Model Registry

MLflow / Ray / SageMaker registries leak model artifacts, prompt templates, and training data.

RCE — Ray CVE-2023-48022 (ShadowRay), exploited in the wild

10verified

328total

What an attacker learns

Hostnames like mlflow-internal.acme.com, ray-dashboard.startup.io, model-registry.fintech.ai expose the build system for AI. These dashboards show every registered model version, every experiment's parameters and metrics, every artifact path, and — critically — the raw prompts and configurations a RAG / agent pipeline shipped to production.

Attacker playbook

MLflow: hit /api/2.0/mlflow/experiments/list to enumerate all experiments without auth.
Download artifact files via /get-artifact?path=... — known path traversal in older versions (CVE-2024-1483).
Ray: hit /api/jobs to submit arbitrary Python (CVE-2023-48022 — ShadowRay RCE).
Pivot: use leaked artifact paths to fingerprint internal cloud storage (S3 / GCS bucket names).
Persistence: register a malicious model as 'production' so the next deploy ships attacker code.

Blast radius

Full model intellectual-property theft + fine-tuning data exposure + remote code execution on the head node. The Ray exposure alone — confirmed at 2,000+ public clusters — is consistently one of the top AI security incidents of the year.

Remediation

Run MLflow / Ray behind your VPN or IdP; never expose them on a public LB.
Patch MLflow ≥ 2.10.1 (CVE-2024-1483 path traversal) and Ray ≥ 2.8.1 (CVE-2023-48022 RCE).
Enable Ray's --auth flag and MLflow's basic auth plugin; rotate credentials via secrets manager.
Sign model artifacts (Sigstore, in-toto) so a malicious registration is rejected at deploy time.
Stream registry audit logs to a SIEM and alert on new artifact downloads from unknown IPs.

Subdomain patterns matched

*mlflow**ray**model-registry**registry**training**fine-tune*

References

AI Agent

Autonomous agents (LangChain, AutoGen, agent-orchestrator) with tool-execution surfaces.

Tool layer = shell + internal-API access via one prompt injection

0verified

9total

What an attacker learns

Subdomains like agent-orchestrator.acme.com or langserve-staging.startup.io expose an autonomous agent — an LLM with tools. The tools are the dangerous part: shell access, filesystem access, internal API calls, browser automation, even cloud account credentials.

Attacker playbook

Hit /agents/run, /chains/run, /invoke — most LangServe deployments expose these by default.
Send a prompt-injection that triggers the tool layer to run arbitrary commands.
Use the agent as a confused deputy to call internal APIs the attacker cannot reach directly.
Exfiltrate credentials by asking the agent to 'help debug' an environment variable.
Persistence: store a malicious memory entry that re-triggers on every future agent invocation.

Blast radius

Anything a tool the agent has access to can do — usually filesystem read, internal HTTP, and shell execution. A leaked agent endpoint is a direct foothold inside your VPC, no exploit required.

Remediation

Treat the agent as untrusted and apply least-privilege to its tool credentials.
Wrap every tool call in policy-engine middleware (OPA, Cerbos) before execution.
Disable shell / filesystem tools in production; route them via a sandboxed worker only.
Never expose the orchestrator on a public LB — front it with SSO and rate-limit per user.
Log every tool call with full prompt + response to your SIEM for prompt-injection forensics.

Subdomain patterns matched

*ai-agent**langserve**langchain**autogen**agent-orchestrator**ai-copilot**agentic**crewai*

References

Prompt Cache

Redis / Memcached / disk caches storing prompt + response pairs verbatim.

Redis / Memcached no-auth — every cached prompt readable

0verified

0total

What an attacker learns

Hostnames like llm-prompt-cache.acme.com or embeddings-gateway.startup.io are typically a Redis or Memcached instance fronting an LLM. The cache keys are full user prompts; the values are full model responses. Every cache hit is a transcript of an employee asking a private question.

Attacker playbook

If Redis: connect anonymously, run KEYS * — every cached prompt is now visible.
Pull cached values to reconstruct what employees asked the AI (HR queries, code, contract drafts).
Inject adversarial cache entries so the next employee sees attacker-controlled output.
Pivot: use the cache contents to fingerprint other internal services they reference.

Blast radius

A historical record of every privileged conversation a company's employees had with their AI tooling — typically the most sensitive material in the org, written in plain language. Public exposure is a near-automatic regulator escalation under GDPR / DPDP / HIPAA depending on content.

Remediation

Bind Redis / Memcached to localhost or a private VPC subnet only.
Require AUTH on every cache (rediss:// + ACL users in Redis 6+).
Store only hashed prompt keys + encrypted response values; rotate encryption keys quarterly.
Set short TTLs (≤ 5 minutes) on cached responses to reduce blast radius.
Audit cache contents for PII / PHI before turning the cache on at all.

Subdomain patterns matched

*cache**embeddings-gateway**prompt-cache**redis-llm*

References

Redis security best practices ↗

Notebook Server

Jupyter / JupyterHub notebooks with interactive code execution, shell access, and stored credentials.

Arbitrary code execution — actively crypto-mined (Qubitstrike, TeamTNT)

1verified

3,299total

What an attacker learns

Subdomains like jupyter-internal.acme.com, notebook-staging.startup.io, or lab-ds.fintech.ai expose a full interactive Python execution environment. Jupyter notebooks ship with no authentication in single-user mode. Every running notebook has filesystem access, a built-in terminal, and typically contains API keys, database passwords, and cloud credentials embedded in code cells or .env files in the working directory.

Attacker playbook

Access the notebook UI — if no auth, you have immediate code execution via a new Python cell.
Open the built-in terminal: Full shell access as the notebook server user (often root in containers).
Search cells and filesystem for credentials: AWS keys in ~/.aws/, .env files, hardcoded API tokens.
Install crypto miners (Qubitstrike, TeamTNT campaigns actively target exposed Jupyter servers).
Pivot: use the notebook's network position to scan internal services unreachable from the internet.

Blast radius

Arbitrary code execution on the host machine with full filesystem access. Exposed notebooks are actively exploited for cryptocurrency mining and as pivot points into internal networks. Stored credentials in notebook cells provide lateral movement to cloud accounts, databases, and internal APIs.

Remediation

Never expose Jupyter to the public internet — use JupyterHub with SSO/OAuth behind a VPN.
Set c.NotebookApp.token and c.NotebookApp.password in jupyter_notebook_config.py — never run tokenless.
Run notebooks in isolated containers with no-root, read-only filesystem where possible.
Use managed notebook services (Vertex AI Workbench, SageMaker Studio) that handle auth natively.
Strip credentials from notebook cells before committing — use environment variables or secret managers.

Subdomain patterns matched

*jupyter**jupyterhub**notebook**nb-**jupyter-lab**lab-ds**ipynb*

References

AI Workflow Builder

No-code AI builders (OpenWebUI, Flowise, Dify) storing API keys for multiple LLM providers and full chat histories.

RCE — Flowise CVE-2025-59528 + multi-provider API-key theft (LLMjacking)

43verified

10,716total

What an attacker learns

Subdomains like chat-ai.acme.com, flowise.startup.io, or dify-internal.corp.ai expose a full AI workflow builder. These platforms store API keys for OpenAI, Anthropic, Azure, and local models in a single dashboard. Chat histories contain every question employees asked the AI — often including PII, financial data, legal discussions, and source code. Tool integrations provide access to internal databases, CRMs, and file systems.

Attacker playbook

Access the UI — many instances use default credentials or have registration open to anyone.
Navigate to Settings/API Keys — extract stored OpenAI, Anthropic, Azure, Bedrock API keys.
Browse chat histories — employees ask AI about sensitive internal matters (HR, legal, code).
Exploit CVE-2025-59528 (Flowise RCE) to execute arbitrary code on the server.
Use tool integrations (database, filesystem, API connectors) to access internal systems.

Blast radius

Multi-provider API key theft enables LLMjacking ($46K-$100K/day in unauthorized inference bills). Chat history exfiltration exposes every sensitive question employees asked the AI. RCE vulnerabilities (CVE-2025-59528) provide full server compromise. These are the fastest-growing shadow AI surface.

Remediation

Never expose AI workflow builders to the public internet — place behind VPN + SSO.
Disable open registration — require admin approval for new user accounts.
Rotate all stored API keys immediately if the instance was publicly accessible.
Patch Flowise to latest version (CVE-2025-59528 RCE fix).
Enable audit logging for all chat sessions and tool invocations.
Use EchelonGraph to monitor for new instances appearing in CT logs.

Subdomain patterns matched

*openwebui**open-webui**flowise**dify**n8n-ai**ai-workflow**ai-builder**chat-ui**chatbot-*

References

Inference Server

TorchServe, TF Serving, vLLM, and TGI model-serving endpoints with management APIs exposed.

RCE — ShellTorch CVE-2023-43654 + malicious model registration

26verified

245total

What an attacker learns

Subdomains like inference-api.acme.com, torchserve.startup.io, or vllm-prod.fintech.ai expose production model serving infrastructure. Management APIs allow registering/unregistering models, viewing loaded model metadata, and accessing inference endpoints. TorchServe binds gRPC management ports to 0.0.0.0 by default (CVE-2024-35199), making internal management APIs publicly accessible.

Attacker playbook

Probe management API: GET /models (TorchServe), GET /v1/models (vLLM/TGI) to enumerate loaded models.
Register a malicious model via POST /models (TorchServe management API) — models can execute arbitrary code.
Exploit CVE-2024-35198 (TorchServe path traversal) to load models from attacker-controlled URLs.
Chain ShellTorch (CVE-2023-43654): SSRF + unsafe deserialization → full RCE on the server.
Use inference API to extract model weights via repeated queries (model extraction attack).

Blast radius

Model poisoning via unauthorized model registration causes every downstream consumer to serve attacker-controlled output. ShellTorch RCE chain provides full server compromise. Model extraction through repeated inference queries enables intellectual property theft of proprietary AI models.

Remediation

Bind management APIs to localhost only — never expose gRPC ports 7070/7071 (TorchServe) publicly.
Patch TorchServe ≥ 0.11.1 (CVE-2024-35199, CVE-2024-35198, CVE-2023-43654).
Enable token-based authorization on TorchServe (available since v0.11.1).
Place inference endpoints behind an API gateway with rate limiting and authentication.
Only load model archives (.mar) from trusted, verified artifact registries.

Subdomain patterns matched

*torchserve**tf-serving**inference-**model-serve**serving-**prediction-**vllm**tgi-**hf-inference**text-gen*

References

ML Pipeline

Apache Airflow and ML pipeline orchestrators leaking DAG code, credentials, and execution logs.

Every connected DB / cloud credential exposed — Airflow CVE-2024-45784

0verified

434total

What an attacker learns

Subdomains like airflow.acme.com, dag-scheduler.startup.io, or pipeline-internal.corp.ai expose the orchestration control plane for ML training pipelines. Airflow dashboards reveal every DAG definition (pipeline code), connection credentials (database passwords, cloud API keys), and task execution logs — which frequently contain plain-text secrets (CVE-2024-45784). Variable endpoints expose every stored secret in the system.

Attacker playbook

Access the Airflow web UI — many instances ship with no auth or default admin:admin credentials.
Browse DAGs: view full Python pipeline code including hardcoded database passwords and API keys.
Read task logs: CVE-2024-45784 causes credentials to appear in plain text in execution logs.
Access /api/v1/connections to enumerate every stored database and cloud credential.
Trigger DAG runs with modified parameters to exfiltrate data or poison training pipelines.

Blast radius

Full credential exposure for every connected system — databases, cloud accounts, APIs, model registries. Pipeline manipulation enables training data poisoning that corrupts all downstream models. Task logs contain a historical record of every secret that ever transited the pipeline.

Remediation

Never expose Airflow to the public internet — bind to private VPC behind SSO.
Upgrade to Airflow ≥ 2.10.3 to fix CVE-2024-45784 (secret masking in logs).
Use Airflow Secret Backends (Vault, AWS Secrets Manager, GCP Secret Manager) — never store credentials in Connections/Variables directly.
Enable RBAC and restrict DAG-level access permissions.
Audit and rotate all credentials that may have been exposed in task logs.

Subdomain patterns matched

*airflow**dag-**ml-pipeline**data-pipeline**pipeline-**orchestrator-*

References

Data Annotation

Label Studio and Argilla annotation tools exposing training data, labels, and project configurations.

Path traversal to server files — Label Studio CVE-2025-25295

0verified

1,300total

What an attacker learns

Subdomains like label-studio.acme.com or annotation-tool.startup.io expose the data labeling platform used to prepare training datasets. These tools contain the raw training data (images, text, audio), annotation guidelines (which reveal model architecture intent), project configurations, and user accounts. CVE-2025-25295 allows reading arbitrary server files via path traversal.

Attacker playbook

Access the Label Studio UI — many instances have open registration enabled.
Browse projects to access raw training data (images, documents, audio recordings).
Exploit CVE-2025-25295 (path traversal) to read /etc/passwd, .env, or cloud credentials.
Exploit CVE-2025-25296 (XSS) to hijack admin sessions and steal authentication cookies.
Modify annotations to poison training data — subtle label changes corrupt downstream models.

Blast radius

Training data theft reveals the content and scope of AI projects. Modified annotations create poisoned training data that degrades or manipulates model behavior. Path traversal vulnerabilities expose server credentials and internal configuration files.

Remediation

Update Label Studio to ≥ 1.16.0 (fixes CVE-2025-25295, CVE-2025-25296).
Disable open registration — require admin approval for new accounts.
Place behind VPN/SSO — never expose annotation tools to the public internet.
Enable Content Security Policy (not report-only mode).
Audit annotation history for unauthorized modifications.

Subdomain patterns matched

*label-studio**labelstudio**argilla**annotation-**labeling-**annotate-*

References

Model Storage

MinIO and S3-compatible object stores containing model weights, training datasets, and experiment artifacts.

Default creds minioadmin:minioadmin → full model + training-data theft

0verified

6,710total

What an attacker learns

Subdomains like minio-ml.acme.com, artifacts.startup.io, or model-storage.corp.ai expose S3-compatible object storage — the default artifact backend for MLflow, Kubeflow, and most ML pipelines. These stores contain trained model weights (intellectual property worth millions), training datasets (often containing PII), experiment artifacts, and configuration files with cloud credentials. MinIO ships with default credentials (minioadmin:minioadmin).

Attacker playbook

Try default credentials: minioadmin:minioadmin (MinIO default, rarely changed in dev environments).
List buckets and enumerate contents — look for model-weights/, training-data/, experiments/.
Download model weights: full IP theft of proprietary AI models.
Download training data: PII, financial records, medical data used to train models.
Upload a poisoned model to the production bucket — the next deploy ships attacker code.

Blast radius

Complete intellectual property theft of every model trained by the organization. Training data exfiltration may contain PII subject to GDPR/HIPAA breach notification. Model poisoning via artifact replacement turns the production AI into an attacker-controlled puppet.

Remediation

Change MinIO default credentials immediately — use a strong admin password.
Restrict bucket policies: never set public-read on buckets containing model artifacts.
Enable server-side encryption (SSE-S3 or SSE-KMS) for all stored objects.
Place MinIO behind a private VPC — never expose the console (port 9001) publicly.
Enable audit logging on all bucket operations and alert on downloads from unknown IPs.

Subdomain patterns matched

*minio**minio-**artifacts-**model-artifacts**model-storage**ml-storage*

References

MinIO security best practices ↗

Experiment Dashboard

TensorBoard and Weights & Biases dashboards revealing model architectures, hyperparameters, and training metrics.

Model architecture, hyperparameters + logged credentials leaked

0verified

2total

What an attacker learns

Subdomains like tensorboard.acme.com, wandb-internal.startup.io, or metrics-ml.corp.ai expose experiment tracking dashboards. These reveal every model's architecture (layer configurations, attention heads, embedding dimensions), hyperparameter search spaces, training data distributions, loss curves, hardware configurations (GPU types, cluster size), and environment variables.

Attacker playbook

Browse experiment runs to reconstruct the full model architecture and training recipe.
Extract hyperparameters to replicate the model with lower cost (IP theft via architecture cloning).
Identify the training data distribution to craft targeted adversarial examples.
View hardware configuration to estimate the organization's AI investment and capabilities.
Access environment variables logged during training — may contain cloud credentials.

Blast radius

Model architecture and hyperparameter theft enables competitors to replicate months of R&D. Training data distribution leaks reveal what data the model was trained on (potential regulatory exposure). Environment variables logged during training may contain cloud credentials.

Remediation

Place TensorBoard behind authentication — use --bind_all=false to restrict to localhost.
Enable Weights & Biases team-level access controls and private projects.
Never log environment variables or credentials during training runs.
Use managed experiment tracking with built-in auth (Vertex AI Experiments, SageMaker).
Audit experiment logs for accidentally logged secrets.

Subdomain patterns matched

*tensorboard**tb-**wandb**weights-biases**experiment-**metrics-ml*

References

TensorBoard security considerations ↗

MCP Server

Model Context Protocol servers exposing tool definitions, system prompts, and data connectors without auth.

No auth in the protocol — direct gateway to every connected tool

1verified

1,740total

What an attacker learns

Subdomains like mcp-server.acme.com, mcp-gateway.startup.io, or tool-server.fintech.ai host MCP (Model Context Protocol) endpoints — the emerging standard for connecting LLMs to external tools and data. An exposed MCP server reveals every tool the AI can call (filesystem, database, API, browser), the system prompt governing its behavior, and the authentication tokens it uses to access internal services. Because MCP servers are designed to be plugged into Claude, ChatGPT, Cursor, and other AI clients, they often ship with zero authentication — the protocol itself has no built-in auth layer.

Attacker playbook

Send JSON-RPC 2.0 method 'tools/list' to enumerate every available tool.
Call 'resources/list' to discover all connected data sources (databases, file systems, APIs).
Execute tools directly via 'tools/call' — if the server has filesystem tools, read arbitrary files.
Extract system prompts via 'prompts/list' and 'prompts/get'.
Pivot: use database or API tools to access internal systems the MCP server is authorized to reach.

Blast radius

An exposed MCP server is a direct gateway into every system the AI assistant can access. This typically includes internal databases, code repositories, CRM systems, and cloud APIs — all reachable through the tool layer without additional authentication. The attacker inherits the full privilege set of the MCP server's service account.

Remediation

Never expose MCP servers on public networks — bind to localhost or a private VPC only.
Implement authentication middleware (OAuth 2.0 / API key) before the MCP transport layer.
Apply least-privilege to the MCP server's service account.
Audit and restrict the tool catalogue: disable filesystem, shell, and database tools in production.
Log every tool invocation with full input/output to your SIEM for forensic analysis.

Subdomain patterns matched

*mcp-server**mcp-proxy**mcp-gateway**model-context**tool-server**mcp-bridge*

References

Full detection-pattern catalogue(97 patterns across 14 risk categories)expand ↓

EchelonGraph monitors Certificate Transparency and Shodan banner-grab data for the following subdomain conventions. Each pattern is a literal substring match against freshly-issued certificate hostnames; a single hostname can match multiple categories. Publishing this list keeps the methodology auditable — anyone with access to crt.sh can reproduce our detection logic.

RAG Pipeline7

*rag**retrieval**knowledge**kb-**eval-**rag-customer**rag-sales*

Vector Database6

*milvus**qdrant**chroma**weaviate**vector-db**pinecone*

LLM Proxy10

*gpt-**claude-**anthropic-**openai-**openai.**llm-proxy**llm-gateway**llm-api**ai-proxy**model-proxy*

Model Registry6

*mlflow**ray**model-registry**registry**training**fine-tune*

AI Agent8

*ai-agent**langserve**langchain**autogen**agent-orchestrator**ai-copilot**agentic**crewai*

Prompt Cache4

*cache**embeddings-gateway**prompt-cache**redis-llm*

Notebook Server7

*jupyter**jupyterhub**notebook**nb-**jupyter-lab**lab-ds**ipynb*

AI Workflow Builder9

*openwebui**open-webui**flowise**dify**n8n-ai**ai-workflow**ai-builder**chat-ui**chatbot-*

Inference Server10

*torchserve**tf-serving**inference-**model-serve**serving-**prediction-**vllm**tgi-**hf-inference**text-gen*

ML Pipeline6

*airflow**dag-**ml-pipeline**data-pipeline**pipeline-**orchestrator-*

Data Annotation6

*label-studio**labelstudio**argilla**annotation-**labeling-**annotate-*

Model Storage6

*minio**minio-**artifacts-**model-artifacts**model-storage**ml-storage*

Experiment Dashboard6

*tensorboard**tb-**wandb**weights-biases**experiment-**metrics-ml*

MCP Server6

*mcp-server**mcp-proxy**mcp-gateway**model-context**tool-server**mcp-bridge*

This is not theoretical

Documented shadow-AI incidents — 2023-2025

Every incident below is sourced to a primary advisory, vendor disclosure, or published research. The radar surfaces the same class of exposures continuously — these are the public moments where the gap became unmissable.

Apr 2023highLLM-PROXY

Samsung engineers paste source code + meeting notes into ChatGPT (3 incidents)

Three separate incidents within 20 days where Samsung Semiconductor engineers pasted proprietary source code, hardware test scripts, and internal meeting transcripts into ChatGPT — under OpenAI's then-default training-data retention policy. Triggered Samsung's company-wide ban on generative-AI tools.

Read primary source ↗

Nov 2023criticalMODEL-REG

ShadowRay — CVE-2023-48022 — thousands of Ray clusters unauthenticated

Anyscale Ray's dashboard endpoint defaults to no authentication. Security researchers found thousands of public Ray clusters via CT-log + Shodan enumeration; the same dashboard accepts job-submission API calls leading to RCE on the head node. Documented attacks deployed cryptominers across enterprise GPU fleets.

Read primary source ↗

Feb 2024highAI-AGENT

Air Canada chatbot hallucinated refund policy — tribunal orders airline to honor it

A grieving passenger followed AI chatbot guidance on the airline's site for a bereavement-fare refund; the policy didn't exist. British Columbia Civil Resolution Tribunal ruled Air Canada is responsible for what its chatbot says. First major precedent on legal accountability for shadow-AI outputs.

Read primary source ↗

Mar 2024criticalMODEL-REG

MLflow path traversal — CVE-2024-1483 — model-artifact exfiltration

Unauthenticated MLflow tracking servers exposed to the internet allowed `GET /model-versions/get-artifact?path=../../etc/passwd` style requests to traverse the host filesystem. CVSS 9.8. Affected every MLflow ≤ 2.10.0 — including ~3000 instances publicly discoverable via Shodan.

Read primary source ↗

May 2024highMODEL-STORE

Hugging Face dataset-poisoning campaigns + token-theft via malicious models

Researchers found dozens of malicious `pickle`-format models on Hugging Face Hub that executed code on load — stealing Hugging Face tokens, AWS credentials, and SSH keys from the data-scientist's machine. Adjacent to ongoing dataset-poisoning research on the same platform.

Read primary source ↗

Aug 2024highINFERENCE-SRV

ServiceNow Ollama / vLLM exposure research — 1000s of inference servers without auth

Wiz Research published a sweep of public Ollama and vLLM endpoints; both inference frameworks default to no authentication. Attackers can enumerate loaded models, send arbitrary prompts (burning customer GPU budget), and in some configurations remote-execute via known model-load CVEs.

Read primary source ↗

Jan 2025criticalVECTOR-DB

DeepSeek API exposure — production keys + database publicly accessible

Wiz Research discovered a fully open ClickHouse database belonging to DeepSeek, the Chinese LLM provider, exposed on a default port without authentication. Contained API keys, backend logs, and chat-history metadata. Highlighted that even foundation-model labs ship shadow infrastructure.

Read primary source ↗

Mar 2025criticalNOTEBOOK

Jupyter notebook servers exposed without auth — recurring discovery pattern

Multiple security teams (Tenable, Censys, GreyNoise) report 4000+ Jupyter notebook servers reachable on the public internet without authentication. Each provides interactive Python execution against the host — including any embedded credentials in environment variables, AWS/GCP tokens, internal data files.

Read primary source ↗

Apr 2025mediumLLM-PROXY

Anthropic API + Claude-app log indexing leak (research write-ups)

Independent security researchers documented multiple cases of Claude.ai sessions and Anthropic API logs surfacing in public search-engine indexes via misconfigured analytics or shared-URL endpoints. Anthropic remediated promptly; the case underscored that LLM-front-end deployments need the same indexing hygiene as any web app.

Read primary source ↗

ContinuouscriticalVECTOR-DB

Verified live vector-database exposures on this radar

Banner-grab telemetry from this very platform confirms thousands of Milvus, Qdrant, Chroma, Weaviate, and Ollama instances accepting anonymous queries — discoverable via the same Certificate-Transparency feed the Shadow AI Radar polls. See the live ticker on this page or the AI Threat Map for current counts.

View live data →

Methodology

We publish the methodology so the dataset is independently verifiable. Anyone with access to Certificate Transparency and Shodan can reproduce our detection logic.

1. Data sources

Two public sources, no scraping, no proprietary signals:

crt.sh — the public Certificate Transparency search index. Polled every 60 seconds.
Shodan — banner-grab enrichment (ASN, country, product). Polled every 6 hours.

2. Detection patterns

48 literal substring matches against newly-issued certificate hostnames, grouped into the 14 risk categories shown above. See the full detection-pattern catalogue.

3. Verification

Every observation passes through a 6-state liveness probe so we never publish unverified false positives as “exposures”:

unverified — newly observed, probe pending
active — HTTPS 200 + category-specific deep probe matched
authenticated — login wall detected
resolved — DNS resolves but the deep probe didn't match — keeps the row, demotes from “active”
unreachable — DNS / TCP failed; cert was issued but the service was never reachable from a public network
rechecking — user-triggered re-verification in flight

Category-specific deep probes hit endpoints like Jupyter /api/kernels, Ollama /api/tags, Weaviate /v1/meta, MLflow /api/2.0/mlflow/experiments/list, MinIO /api/v1/buckets, MCP tools/list — full list in code.

4. What we do not claim

Detection ≠ compromise. The radar surfaces exposed AI services; it does not assert any service was breached. Owners may have intentional public access (research demos, public model playgrounds, educational workshops). Treat observations as starting points for verification, not as confirmed vulnerabilities.

Citable in audit evidence

How shadow-AI exposure maps to published compliance frameworks

Shadow AI Detection

MITRE ATLAS — AML.T0011

MITRE ATLAS catalogues adversary techniques targeting ML systems. AML.T0011 — "Discover ML Model Family" — covers the reconnaissance step the Shadow AI Radar surfaces.

Read primary source ↗

Supply-Chain Vulnerabilities

OWASP LLM Top 10 — LLM07

LLM07 covers third-party / shadow components in the LLM supply chain. Exposed MLflow registries, vector databases, and Jupyter notebooks are textbook LLM07 instances.

Read primary source ↗

Risk monitoring

NIST AI-RMF — MS-2.1

NIST AI Risk Management Framework MS-2 (Measure) requires continuous monitoring of AI system risks. Shadow-AI exposure is a measurable, external-attack-surface risk.

Read primary source ↗

Risk management + accuracy/robustness

EU AI Act — Articles 9 + 15

High-risk AI systems must implement risk management (Art. 9) and demonstrate accuracy + robustness (Art. 15). Exposed inference services are direct evidence of inadequate controls.

Read primary source ↗

Interested-parties disclosure

ISO/IEC 42001 — §A.8

ISO 42001 Annex A.8 requires organisations to manage information disclosure to interested parties. Subdomain leaks via CT logs are an unmanaged disclosure pathway.

Read primary source ↗

Cite this page

APA

EchelonGraph. (2026). Shadow AI Radar — Live Exposed AI Infrastructure. Retrieved from https://echelongraph.io/shadow-ai-radar

BibTeX

@misc{echelongraph_shadow_ai_radar,
  author = {EchelonGraph},
  title  = {Shadow AI Radar — Live Exposed AI Infrastructure},
  year   = {2026},
  url    = {https://echelongraph.io/shadow-ai-radar}
}

Press contact

[email protected]

Responsible disclosure

[email protected]

Data license

CC-BY-4.0

Public API

/api/v1/public/shadow-ai-radar

Seeing this scanner in your logs? It's us. Every genuine EchelonGraph request announces itself — like Googlebot — with the User-Agent EchelonGraph-<Radar>/1.0 (+echelongraph.io/responsible-disclosure; [email protected]) and a From: [email protected] header. It is a single, passive, read-only check — we never log in, exploit, write, or read your data. Who we are, how we confirm exposures read-only, and how to opt out → Genuine requests also carry a signed receipt you can validate at /verify-scan.