Graph Data Model¶
AgentHound builds a directed trust graph in Neo4j. The core principle: edges represent exploitable relationships and direction follows the flow of access and control.
The fundamental traversal pattern:
An attacker (or a compromised agent) moves along edge direction to escalate access. Shortest-path and weighted-path queries over this graph surface attack paths that cross protocol boundaries — including MCP-to-A2A paths that single-protocol scanners cannot see.
1. Node Types¶
Collector-Produced (22 kinds)¶
These are the node kinds accepted in ingest input (sdk/ingest.AllowedNodeKinds).
| Label | Source | Key Properties |
|---|---|---|
MCPServer |
Config + MCP | name, endpoint, transport (stdio/http), auth_method, protocol_version, instructions, capabilities, is_pinned, has_tasks_capability |
MCPTool |
MCP | name, description, input_schema, output_schema, annotations, description_hash (SHA-256), capability_surface[], has_injection_patterns, has_cross_references |
MCPResource |
MCP | uri, name, mime_type, size, uri_scheme, sensitivity (auto-classified) |
MCPPrompt |
MCP | name, description, arguments |
A2AAgent |
A2A | name, description, url, provider, version, protocol_versions, capabilities, security_schemes, auth_method, is_signed, signature_valid, card_hash |
A2ASkill |
A2A | id, name, description, input_modes, output_modes, description_hash, has_injection_patterns |
AgentInstance |
Config | name, framework, config_path |
Identity |
Config + MCP | type (none/apiKey/oauth/bearer/mtls), scope, is_static |
Credential |
Config + LiteLLM Looter | type (envVar/hardcoded/vaultRef/inputPrompt/master_key/apiKey/virtual_key), name, source, is_exposed, high_entropy, value_hash (SHA-256) |
Host |
Config + A2A | hostname, ip, is_local, is_private, is_public |
ConfigFile |
Config | path, client, server_count |
InstructionFile |
Config | path, type (agents.md/claude.md/cursorrules/copilot-instructions/memory.md), hash, is_suspicious |
OllamaInstance |
Network scan + Ollama fingerprinter | endpoint, version, auth_method, is_anonymous_loot, discovered_via |
VLLMInstance |
Network scan + vLLM fingerprinter | endpoint, version, auth_method, is_anonymous_loot |
QdrantInstance |
Network scan + Qdrant fingerprinter | endpoint, version, collection_count |
MLflowServer |
Network scan + MLflow fingerprinter | endpoint, version, experiment_count |
LiteLLMGateway |
Network scan + LiteLLM fingerprinter | endpoint, auth_method, is_anonymous_loot, docs_enabled |
JupyterServer |
Network scan + Jupyter fingerprinter | endpoint, version, token_required |
LangServeApp |
Network scan + LangServe fingerprinter | endpoint, chains |
OpenWebUIInstance |
Network scan + Open WebUI fingerprinter | endpoint, version, webui_auth_enabled |
AIService |
Multi-label umbrella (see below) | (no unique properties — carried as companion label) |
AIModel |
Ollama Looter | name, size, digest, family, parameter_size, quantization |
Synthetic (2 kinds, post-processor created)¶
These labels exist in AllNodeLabels but NOT in AllowedNodeKinds — collectors cannot emit them.
| Label | Source | Key Properties |
|---|---|---|
ResourceGroup |
Post-processor | type, sensitivity |
TrustZone |
Post-processor | name, level, node_count |
2. Umbrella Labels¶
The AIService label is a multi-label companion, not a standalone node kind. Every per-service node (OllamaInstance, VLLMInstance, QdrantInstance, MLflowServer, LiteLLMGateway, JupyterServer, LangServeApp, OpenWebUIInstance) also carries :AIService as a secondary Neo4j label.
This enables queries like MATCH (n:AIService) to find all AI infrastructure regardless of specific service type, while per-kind queries (MATCH (n:OllamaInstance)) still work.
Schema constraint implication: The schema-init loop skips labels in UmbrellaLabels when creating objectid IS UNIQUE constraints. A uniqueness constraint on :AIService would falsely collide between distinct service kinds.
3. Edge Types¶
Raw Edges (16 collector-produced)¶
| Edge | Source | Target | Collector | Meaning |
|---|---|---|---|---|
TRUSTS_SERVER |
AgentInstance | MCPServer | Config | Agent trusts this server to provide tools |
PROVIDES_TOOL |
MCPServer | MCPTool | MCP | Server exposes this tool |
PROVIDES_RESOURCE |
MCPServer / JupyterServer | MCPResource | MCP / Jupyter Looter | Server exposes this resource |
PROVIDES_PROMPT |
MCPServer | MCPPrompt | MCP | Server exposes this prompt template |
ADVERTISES_SKILL |
A2AAgent | A2ASkill | A2A | Agent advertises this skill |
DELEGATES_TO |
A2AAgent | A2AAgent | A2A | Agent delegates tasks to another agent |
AUTHENTICATES_WITH |
MCPServer / A2AAgent | Identity | Config / A2A | Entity uses this auth identity |
USES_CREDENTIAL |
Identity | Credential | Config | Identity backed by this credential material |
RUNS_ON |
MCPServer / A2AAgent | Host | Config / A2A | Entity runs on this host |
CONFIGURED_IN |
MCPServer | ConfigFile | Config | Server defined in this config file |
HAS_ENV_VAR |
MCPServer | Credential | Config | Server has access to this env var |
LOADS_INSTRUCTIONS |
AgentInstance | InstructionFile | Config | Agent loads this instruction file |
SAME_AUTH_DOMAIN |
A2AAgent | A2AAgent | A2A | Agents share an authentication domain |
EXPOSES |
AIService | AIService | Fingerprinters | Service exposes another service (e.g., Open WebUI → Ollama backend) |
EXPOSES_CREDENTIAL |
AIService | Credential | LiteLLM Looter | Service exposes credential material (master keys, upstream provider keys, virtual keys) |
PROVIDES_MODEL |
OllamaInstance | AIModel | Ollama Looter | Instance serves this model |
Composite Edges (8 post-processor computed)¶
| Edge | Source | Target | Depends On | Meaning |
|---|---|---|---|---|
HAS_ACCESS_TO |
MCPTool | MCPResource | Raw edges | Capability surface matches resource URI scheme |
CAN_EXECUTE |
MCPTool | Host | Raw edges | Tool has shell_access or code_execution capability |
SHADOWS |
MCPTool | MCPTool | Raw edges | Tool on another server references this tool's name/description |
POISONED_DESCRIPTION |
MCPTool | MCPTool (self-edge) | Raw edges | Tool description contains injection patterns |
POISONED_INSTRUCTIONS |
InstructionFile | InstructionFile (self-edge) | Raw edges | Suspicious patterns: imperative overrides, exfiltration commands, hidden Unicode |
CAN_REACH |
AgentInstance / A2AAgent | MCPResource | HAS_ACCESS_TO | Transitive access through trust chain (includes cross-protocol and credential chain variants up to 6 hops) |
CAN_EXFILTRATE_VIA |
AgentInstance | MCPTool | CAN_REACH | Agent reaches sensitive data AND has outbound exfiltration channel |
CAN_IMPERSONATE |
A2AAgent | A2AAgent | Raw edges | TF-IDF cosine similarity > 0.8 on skill descriptions |
Edge Struct (Go SDK)¶
type Edge struct {
Source string `json:"source"`
Target string `json:"target"`
Kind string `json:"kind"`
SourceKind string `json:"source_kind,omitempty"`
TargetKind string `json:"target_kind,omitempty"`
Properties map[string]any `json:"properties"`
}
Edge Properties (all edges carry these)¶
| Property | Type | Description |
|---|---|---|
scan_id |
string | Scan that created/updated this edge |
last_seen |
ISO 8601 | Timestamp of last observation |
confidence |
float64 | 0.0–1.0 confidence score |
risk_weight |
float64 | Lower = easier to exploit (used by Dijkstra) |
is_composite |
bool | True for post-processed edges |
evidence |
string | Human-readable explanation |
Composite edges additionally carry source_collector (mcp or a2a) for scoped stale-edge cleanup.
4. Node ID Strategy¶
All node IDs are deterministic, content-based SHA-256 hashes. This ensures identical entities from different collectors merge on the same Neo4j node.
| Node Kind | ID Computation |
|---|---|
MCPServer |
SHA-256("MCPServer:" + transport + ":" + endpoint + ":" + sorted_args) |
MCPTool |
SHA-256("MCPTool:" + server_id + ":" + tool_name) |
MCPResource |
SHA-256("MCPResource:" + server_id + ":" + resource_uri) |
MCPPrompt |
SHA-256("MCPPrompt:" + server_id + ":" + prompt_name) |
A2AAgent |
SHA-256("A2AAgent:" + normalized_agent_base_url) |
A2ASkill |
SHA-256("A2ASkill:" + agent_id + ":" + skill_id) |
AgentInstance |
SHA-256("AgentInstance:" + config_file_id + ":" + client_name) |
ConfigFile |
SHA-256("ConfigFile:" + absolute_path) |
Host |
SHA-256("Host:" + hostname_or_ip) |
Identity |
SHA-256("Identity:" + parent_id + ":" + type) |
Credential |
SHA-256("Credential:" + source + ":" + name) |
InstructionFile |
SHA-256("InstructionFile:" + absolute_path) |
AIModel |
SHA-256("AIModel:" + instance_id + ":" + model_name) |
Critical invariant: The MCPServer ID MUST match between Config Collector and MCP Collector outputs. This is the merge point connecting trust relationships (who trusts what) to capabilities (what a server exposes).
5. Cross-Collector Merge via value_hash¶
The value_hash property on Credential nodes is the cross-collector merge primitive. It enables the cross_service_credential_chain post-processor to join credentials discovered independently by different collectors.
How it works:
- Config Collector emits a Credential node via
HAS_ENV_VAR(MCP server → credential) - LiteLLM Looter emits a Credential node via
EXPOSES_CREDENTIAL(gateway → master/upstream/virtual keys) - Both compute
value_hash = SHA-256(credential_value)viasdk/common.HashCredentialValue - Same secret value → same
value_hash→ nodes merge onobjectidregardless of how each collector derives it
This is what enables attack paths like: AgentInstance → MCPServer → Credential ← LiteLLMGateway → upstream provider — proving that a local agent's environment variable is the same key that a LiteLLM gateway uses to reach an upstream LLM provider.
Requirement: Every v0.3+ Looter MUST populate value_hash on every emitted Credential node.
6. Post-Processor Execution Order¶
Processors run in strict dependency order. A processor may only read edges produced by earlier processors.
| Order | Processor | Produces | Dependencies |
|---|---|---|---|
| 1 | has_access_to | HAS_ACCESS_TO |
Raw edges only |
| 2 | can_execute | CAN_EXECUTE |
Raw edges only |
| 3 | shadows | SHADOWS |
Raw edges only |
| 4 | poisoned_description | POISONED_DESCRIPTION |
Raw edges only |
| 5 | poisoned_instructions | POISONED_INSTRUCTIONS |
Raw edges only |
| 6 | can_reach | CAN_REACH |
1 (HAS_ACCESS_TO) |
| 7 | cross_service_credential_chain | CAN_REACH (credential variant) |
1, 6 (joins on Credential.value_hash) |
| 8 | can_exfiltrate_via | CAN_EXFILTRATE_VIA |
6 (CAN_REACH) |
| 9 | can_impersonate | CAN_IMPERSONATE |
Raw edges only |
| 10 | cross_protocol | CAN_REACH (cross-protocol) |
1 (HAS_ACCESS_TO) + DELEGATES_TO |
| 11 | risk_score | Node property updates | 1–10 (all prior processors) |
7. Risk Scoring¶
Edge Risk Weights¶
Lower weight = easier to exploit = higher risk. Used by Dijkstra weighted-path queries.
| Edge | Weight | Condition |
|---|---|---|
TRUSTS_SERVER |
0.1 | auth_method = none |
TRUSTS_SERVER |
0.3 | auth_method = static_key |
TRUSTS_SERVER |
0.7 | auth_method = oauth |
TRUSTS_SERVER |
0.9 | auth_method = mtls |
PROVIDES_TOOL |
0.1 | Always (tools are always available once trusted) |
HAS_ACCESS_TO |
0.2 | — |
CAN_EXECUTE |
0.1 | — |
DELEGATES_TO |
0.1 | auth_method = none |
DELEGATES_TO |
0.5 | authenticated |
SHADOWS |
0.4 | — |
CAN_IMPERSONATE |
0.6 | — |
Node Risk Scores (0–100)¶
Agent: 0.30 * credential + 0.25 * blast_radius + 0.20 * auth_posture + 0.15 * tool_surface + 0.10 * poisoning
Server: 0.35 * auth_strength + 0.25 * tool_risk + 0.20 * exposure + 0.20 * credential_handling
Tool: 0.30 * capability_class + 0.25 * poisoning + 0.25 * access_sensitivity + 0.20 * input_validation
Resource Sensitivity Auto-Classification¶
| Pattern | Sensitivity |
|---|---|
| postgres/mysql/mongodb + prod | critical |
file:///etc/ |
critical |
*.env, *.key, *.pem |
critical |
| redis + prod | critical |
| Database (non-prod) | high |
file:/// (general) |
medium |
8. Merge Semantics¶
Node Merge¶
Nodes merge by objectid using Cypher MERGE. When the same entity appears from multiple collectors:
- Properties use last-write-wins semantics
ON MATCH SET n.previous_description_hash = n.description_hashpreserves old hash for rug-pull detection- Edges accumulate (different collectors contribute different edge types to the same node)
Stale Edge Cleanup¶
On partial scans (e.g., only the MCP collector ran), only composite edges whose source_collector matches the current scan's collector are deleted and recomputed. This prevents ping-pong deletion when collectors run independently on different schedules.
Neo4j Version Compatibility¶
Schema init detects Neo4j version via CALL dbms.components():
- 4.4: Uses
CREATE CONSTRAINT ... ON (n:Label) ASSERT n.objectid IS UNIQUE - 5.x: Uses
CREATE CONSTRAINT ... FOR (n:Label) REQUIRE n.objectid IS UNIQUE
9. Emitting Nodes and Edges (Module Author Guide)¶
New modules emit nodes and edges via the sdk/ingest wire format:
{
"meta": {
"version": 1,
"type": "agenthound-ingest",
"collector": "mcp|a2a|config|scan",
"collector_version": "0.1.0",
"timestamp": "2025-01-15T10:30:00Z",
"scan_id": "scan-abc123"
},
"graph": {
"nodes": [{"id": "sha256:...", "kinds": ["MCPServer"], "properties": {...}}],
"edges": [{"source": "sha256:...", "target": "sha256:...", "kind": "PROVIDES_TOOL", "properties": {...}}]
}
}
Rules for module authors:
- Only emit node kinds in
AllowedNodeKindsand edge kinds inRawEdgeKinds - Compute deterministic
idvalues per the Node ID Strategy above - Populate
value_hashon allCredentialnodes (mandatory since v0.3) - Set
source_kind/target_kindon edges when the endpoint map has multiple valid sources/targets - Use snake_case for all property keys (the normalizer converts camelCase, but emit clean data)
- Valid collectors:
mcp,a2a,config,scan