Graph Data Model¶

AgentHound builds a directed trust graph in Neo4j. The core principle: edges represent exploitable relationships and direction follows the flow of access and control.

The fundamental traversal pattern:

Agent → Server → Tool → Resource

An attacker (or a compromised agent) moves along edge direction to escalate access. Shortest-path and weighted-path queries over this graph surface attack paths that cross protocol boundaries — including MCP-to-A2A paths that single-protocol scanners cannot see.

1. Node Types¶

Collector-Produced (22 kinds)¶

These are the node kinds accepted in ingest input (sdk/ingest.AllowedNodeKinds).

Label	Source	Key Properties
`MCPServer`	Config + MCP	`name`, `endpoint`, `transport` (stdio/http), `auth_method`, `protocol_version`, `instructions`, `capabilities`, `is_pinned`, `has_tasks_capability`
`MCPTool`	MCP	`name`, `description`, `input_schema`, `output_schema`, `annotations`, `description_hash` (SHA-256), `capability_surface[]`, `has_injection_patterns`, `has_cross_references`
`MCPResource`	MCP	`uri`, `name`, `mime_type`, `size`, `uri_scheme`, `sensitivity` (auto-classified)
`MCPPrompt`	MCP	`name`, `description`, `arguments`
`A2AAgent`	A2A	`name`, `description`, `url`, `provider`, `version`, `protocol_versions`, `capabilities`, `security_schemes`, `auth_method`, `is_signed`, `signature_valid`, `card_hash`
`A2ASkill`	A2A	`id`, `name`, `description`, `input_modes`, `output_modes`, `description_hash`, `has_injection_patterns`
`AgentInstance`	Config	`name`, `framework`, `config_path`
`Identity`	Config + MCP	`type` (none/apiKey/oauth/bearer/mtls), `scope`, `is_static`
`Credential`	Config + LiteLLM Looter	`type` (envVar/hardcoded/vaultRef/inputPrompt/master_key/apiKey/virtual_key), `name`, `source`, `is_exposed`, `high_entropy`, `value_hash` (SHA-256)
`Host`	Config + A2A	`hostname`, `ip`, `is_local`, `is_private`, `is_public`
`ConfigFile`	Config	`path`, `client`, `server_count`
`InstructionFile`	Config	`path`, `type` (agents.md/claude.md/cursorrules/copilot-instructions/memory.md), `hash`, `is_suspicious`
`OllamaInstance`	Network scan + Ollama fingerprinter	`endpoint`, `version`, `auth_method`, `is_anonymous_loot`, `discovered_via`
`VLLMInstance`	Network scan + vLLM fingerprinter	`endpoint`, `version`, `auth_method`, `is_anonymous_loot`
`QdrantInstance`	Network scan + Qdrant fingerprinter	`endpoint`, `version`, `collection_count`
`MLflowServer`	Network scan + MLflow fingerprinter	`endpoint`, `version`, `experiment_count`
`LiteLLMGateway`	Network scan + LiteLLM fingerprinter	`endpoint`, `auth_method`, `is_anonymous_loot`, `docs_enabled`
`JupyterServer`	Network scan + Jupyter fingerprinter	`endpoint`, `version`, `token_required`
`LangServeApp`	Network scan + LangServe fingerprinter	`endpoint`, `chains`
`OpenWebUIInstance`	Network scan + Open WebUI fingerprinter	`endpoint`, `version`, `webui_auth_enabled`
`AIService`	Multi-label umbrella (see below)	(no unique properties — carried as companion label)
`AIModel`	Ollama Looter	`name`, `size`, `digest`, `family`, `parameter_size`, `quantization`

Synthetic (2 kinds, post-processor created)¶

These labels exist in AllNodeLabels but NOT in AllowedNodeKinds — collectors cannot emit them.

Label	Source	Key Properties
`ResourceGroup`	Post-processor	`type`, `sensitivity`
`TrustZone`	Post-processor	`name`, `level`, `node_count`

2. Umbrella Labels¶

The AIService label is a multi-label companion, not a standalone node kind. Every per-service node (OllamaInstance, VLLMInstance, QdrantInstance, MLflowServer, LiteLLMGateway, JupyterServer, LangServeApp, OpenWebUIInstance) also carries :AIService as a secondary Neo4j label.

This enables queries like MATCH (n:AIService) to find all AI infrastructure regardless of specific service type, while per-kind queries (MATCH (n:OllamaInstance)) still work.

Schema constraint implication: The schema-init loop skips labels in UmbrellaLabels when creating objectid IS UNIQUE constraints. A uniqueness constraint on :AIService would falsely collide between distinct service kinds.

3. Edge Types¶

Raw Edges (16 collector-produced)¶

Edge	Source	Target	Collector	Meaning
`TRUSTS_SERVER`	AgentInstance	MCPServer	Config	Agent trusts this server to provide tools
`PROVIDES_TOOL`	MCPServer	MCPTool	MCP	Server exposes this tool
`PROVIDES_RESOURCE`	MCPServer / JupyterServer	MCPResource	MCP / Jupyter Looter	Server exposes this resource
`PROVIDES_PROMPT`	MCPServer	MCPPrompt	MCP	Server exposes this prompt template
`ADVERTISES_SKILL`	A2AAgent	A2ASkill	A2A	Agent advertises this skill
`DELEGATES_TO`	A2AAgent	A2AAgent	A2A	Agent delegates tasks to another agent
`AUTHENTICATES_WITH`	MCPServer / A2AAgent	Identity	Config / A2A	Entity uses this auth identity
`USES_CREDENTIAL`	Identity	Credential	Config	Identity backed by this credential material
`RUNS_ON`	MCPServer / A2AAgent	Host	Config / A2A	Entity runs on this host
`CONFIGURED_IN`	MCPServer	ConfigFile	Config	Server defined in this config file
`HAS_ENV_VAR`	MCPServer	Credential	Config	Server has access to this env var
`LOADS_INSTRUCTIONS`	AgentInstance	InstructionFile	Config	Agent loads this instruction file
`SAME_AUTH_DOMAIN`	A2AAgent	A2AAgent	A2A	Agents share an authentication domain
`EXPOSES`	AIService	AIService	Fingerprinters	Service exposes another service (e.g., Open WebUI → Ollama backend)
`EXPOSES_CREDENTIAL`	AIService	Credential	LiteLLM Looter	Service exposes credential material (master keys, upstream provider keys, virtual keys)
`PROVIDES_MODEL`	OllamaInstance	AIModel	Ollama Looter	Instance serves this model

Composite Edges (8 post-processor computed)¶

Edge	Source	Target	Depends On	Meaning
`HAS_ACCESS_TO`	MCPTool	MCPResource	Raw edges	Capability surface matches resource URI scheme
`CAN_EXECUTE`	MCPTool	Host	Raw edges	Tool has shell_access or code_execution capability
`SHADOWS`	MCPTool	MCPTool	Raw edges	Tool on another server references this tool's name/description
`POISONED_DESCRIPTION`	MCPTool	MCPTool (self-edge)	Raw edges	Tool description contains injection patterns
`POISONED_INSTRUCTIONS`	InstructionFile	InstructionFile (self-edge)	Raw edges	Suspicious patterns: imperative overrides, exfiltration commands, hidden Unicode
`CAN_REACH`	AgentInstance / A2AAgent	MCPResource	HAS_ACCESS_TO	Transitive access through trust chain (includes cross-protocol and credential chain variants up to 6 hops)
`CAN_EXFILTRATE_VIA`	AgentInstance	MCPTool	CAN_REACH	Agent reaches sensitive data AND has outbound exfiltration channel
`CAN_IMPERSONATE`	A2AAgent	A2AAgent	Raw edges	TF-IDF cosine similarity > 0.8 on skill descriptions

Edge Struct (Go SDK)¶

type Edge struct {
    Source     string         `json:"source"`
    Target     string         `json:"target"`
    Kind       string         `json:"kind"`
    SourceKind string         `json:"source_kind,omitempty"`
    TargetKind string         `json:"target_kind,omitempty"`
    Properties map[string]any `json:"properties"`
}

Edge Properties (all edges carry these)¶

Property	Type	Description
`scan_id`	string	Scan that created/updated this edge
`last_seen`	ISO 8601	Timestamp of last observation
`confidence`	float64	0.0–1.0 confidence score
`risk_weight`	float64	Lower = easier to exploit (used by Dijkstra)
`is_composite`	bool	True for post-processed edges
`evidence`	string	Human-readable explanation

Composite edges additionally carry source_collector (mcp or a2a) for scoped stale-edge cleanup.

4. Node ID Strategy¶

All node IDs are deterministic, content-based SHA-256 hashes. This ensures identical entities from different collectors merge on the same Neo4j node.

Node Kind	ID Computation
`MCPServer`	`SHA-256("MCPServer:" + transport + ":" + endpoint + ":" + sorted_args)`
`MCPTool`	`SHA-256("MCPTool:" + server_id + ":" + tool_name)`
`MCPResource`	`SHA-256("MCPResource:" + server_id + ":" + resource_uri)`
`MCPPrompt`	`SHA-256("MCPPrompt:" + server_id + ":" + prompt_name)`
`A2AAgent`	`SHA-256("A2AAgent:" + normalized_agent_base_url)`
`A2ASkill`	`SHA-256("A2ASkill:" + agent_id + ":" + skill_id)`
`AgentInstance`	`SHA-256("AgentInstance:" + config_file_id + ":" + client_name)`
`ConfigFile`	`SHA-256("ConfigFile:" + absolute_path)`
`Host`	`SHA-256("Host:" + hostname_or_ip)`
`Identity`	`SHA-256("Identity:" + parent_id + ":" + type)`
`Credential`	`SHA-256("Credential:" + source + ":" + name)`
`InstructionFile`	`SHA-256("InstructionFile:" + absolute_path)`
`AIModel`	`SHA-256("AIModel:" + instance_id + ":" + model_name)`

Critical invariant: The MCPServer ID MUST match between Config Collector and MCP Collector outputs. This is the merge point connecting trust relationships (who trusts what) to capabilities (what a server exposes).

5. Cross-Collector Merge via value_hash¶

The value_hash property on Credential nodes is the cross-collector merge primitive. It enables the cross_service_credential_chain post-processor to join credentials discovered independently by different collectors.

How it works:

Config Collector emits a Credential node via HAS_ENV_VAR (MCP server → credential)
LiteLLM Looter emits a Credential node via EXPOSES_CREDENTIAL (gateway → master/upstream/virtual keys)
Both compute value_hash = SHA-256(credential_value) via sdk/common.HashCredentialValue
Same secret value → same value_hash → nodes merge on objectid regardless of how each collector derives it

This is what enables attack paths like: AgentInstance → MCPServer → Credential ← LiteLLMGateway → upstream provider — proving that a local agent's environment variable is the same key that a LiteLLM gateway uses to reach an upstream LLM provider.

Requirement: Every v0.3+ Looter MUST populate value_hash on every emitted Credential node.

6. Post-Processor Execution Order¶

Processors run in strict dependency order. A processor may only read edges produced by earlier processors.

Order	Processor	Produces	Dependencies
1	has_access_to	`HAS_ACCESS_TO`	Raw edges only
2	can_execute	`CAN_EXECUTE`	Raw edges only
3	shadows	`SHADOWS`	Raw edges only
4	poisoned_description	`POISONED_DESCRIPTION`	Raw edges only
5	poisoned_instructions	`POISONED_INSTRUCTIONS`	Raw edges only
6	can_reach	`CAN_REACH`	1 (HAS_ACCESS_TO)
7	cross_service_credential_chain	`CAN_REACH` (credential variant)	1, 6 (joins on `Credential.value_hash`)
8	can_exfiltrate_via	`CAN_EXFILTRATE_VIA`	6 (CAN_REACH)
9	can_impersonate	`CAN_IMPERSONATE`	Raw edges only
10	cross_protocol	`CAN_REACH` (cross-protocol)	1 (HAS_ACCESS_TO) + DELEGATES_TO
11	risk_score	Node property updates	1–10 (all prior processors)

7. Risk Scoring¶

Edge Risk Weights¶

Lower weight = easier to exploit = higher risk. Used by Dijkstra weighted-path queries.

Edge	Weight	Condition
`TRUSTS_SERVER`	0.1	auth_method = none
`TRUSTS_SERVER`	0.3	auth_method = static_key
`TRUSTS_SERVER`	0.7	auth_method = oauth
`TRUSTS_SERVER`	0.9	auth_method = mtls
`PROVIDES_TOOL`	0.1	Always (tools are always available once trusted)
`HAS_ACCESS_TO`	0.2	—
`CAN_EXECUTE`	0.1	—
`DELEGATES_TO`	0.1	auth_method = none
`DELEGATES_TO`	0.5	authenticated
`SHADOWS`	0.4	—
`CAN_IMPERSONATE`	0.6	—

Node Risk Scores (0–100)¶

Agent: 0.30 * credential + 0.25 * blast_radius + 0.20 * auth_posture + 0.15 * tool_surface + 0.10 * poisoning

Server: 0.35 * auth_strength + 0.25 * tool_risk + 0.20 * exposure + 0.20 * credential_handling

Tool: 0.30 * capability_class + 0.25 * poisoning + 0.25 * access_sensitivity + 0.20 * input_validation

Resource Sensitivity Auto-Classification¶

Pattern	Sensitivity
postgres/mysql/mongodb + prod	critical
`file:///etc/`	critical
`.env`, `.key`, `*.pem`	critical
redis + prod	critical
Database (non-prod)	high
`file:///` (general)	medium

8. Merge Semantics¶

Node Merge¶

Nodes merge by objectid using Cypher MERGE. When the same entity appears from multiple collectors:

Properties use last-write-wins semantics
ON MATCH SET n.previous_description_hash = n.description_hash preserves old hash for rug-pull detection
Edges accumulate (different collectors contribute different edge types to the same node)

Stale Edge Cleanup¶

On partial scans (e.g., only the MCP collector ran), only composite edges whose source_collector matches the current scan's collector are deleted and recomputed. This prevents ping-pong deletion when collectors run independently on different schedules.

Neo4j Version Compatibility¶

Schema init detects Neo4j version via CALL dbms.components():

4.4: Uses CREATE CONSTRAINT ... ON (n:Label) ASSERT n.objectid IS UNIQUE
5.x: Uses CREATE CONSTRAINT ... FOR (n:Label) REQUIRE n.objectid IS UNIQUE

9. Emitting Nodes and Edges (Module Author Guide)¶

New modules emit nodes and edges via the sdk/ingest wire format:

{
  "meta": {
    "version": 1,
    "type": "agenthound-ingest",
    "collector": "mcp|a2a|config|scan",
    "collector_version": "0.1.0",
    "timestamp": "2025-01-15T10:30:00Z",
    "scan_id": "scan-abc123"
  },
  "graph": {
    "nodes": [{"id": "sha256:...", "kinds": ["MCPServer"], "properties": {...}}],
    "edges": [{"source": "sha256:...", "target": "sha256:...", "kind": "PROVIDES_TOOL", "properties": {...}}]
  }
}

Rules for module authors:

Only emit node kinds in AllowedNodeKinds and edge kinds in RawEdgeKinds
Compute deterministic id values per the Node ID Strategy above
Populate value_hash on all Credential nodes (mandatory since v0.3)
Set source_kind / target_kind on edges when the endpoint map has multiple valid sources/targets
Use snake_case for all property keys (the normalizer converts camelCase, but emit clean data)
Valid collectors: mcp, a2a, config, scan