Knowledge Graph Construction for AI Agents: Entities and Relationships
Build knowledge graphs that agents can query for context-aware decisions, with entity extraction, relationship mapping, and graph-based retrieval patterns.

TL;DR
- Knowledge graphs structure information as entities (people, companies, concepts) and relationships (worksfor, acquiredby, uses_technology).
- Extract entities using LLMs with structured output parsing, validate against ontologies.
- Store graphs in PostgreSQL JSONB or dedicated graph DBs (Neo4j) depending on query complexity.
- Query graphs to answer multi-hop questions agents can't solve with simple vector search.
Jump to Graph fundamentals · Jump to Entity extraction · Jump to Relationship mapping · Jump to Graph querying
# Knowledge Graph Construction for AI Agents: Entities and Relationships
Vector search retrieves semantically similar documents, but it struggles with relational queries: "Which customers in fintech have used both Stripe and Plaid?" or "What companies did our partnerships team contact last quarter?". Knowledge graphs solve this by explicitly modeling entities and their relationships, enabling agents to reason across connections.
This guide covers building knowledge graphs from unstructured text, storing them efficiently, and querying them for agent context. Based on OpenHelm's implementation where we maintain 45,000+ entities and 120,000+ relationships across customer interactions, partnerships, and product usage.
Key takeaways - Knowledge graphs complement vector search -use vectors for semantic similarity, graphs for relational queries. - Extract entities with GPT-4/Claude using JSON schema constraints for consistency. - Model relationships with confidence scores to handle uncertainty in extracted data. - Query graphs with Cypher (Neo4j) or recursive SQL (PostgreSQL) depending on complexity.
Graph fundamentals
What is a knowledge graph?
A knowledge graph represents information as:
- Nodes (entities): People, companies, products, concepts
- Edges (relationships): Connections between nodes with labels and properties
Example graph:
(Company: Acme Corp) -[USES_TECHNOLOGY]-> (Product: Stripe)
(Company: Acme Corp) -[IN_INDUSTRY]-> (Industry: Fintech)
(Person: Jane Smith) -[WORKS_FOR]-> (Company: Acme Corp)
(Person: Jane Smith) -[HAS_ROLE]-> (Role: CTO)From this graph, an agent can answer: "Which CTOs work at fintech companies using Stripe?" by traversing relationships.
When to use knowledge graphs
| Query type | Best approach | Example |
|---|---|---|
| Semantic similarity | Vector search | "Find documents about API rate limiting" |
| Factual lookup | Key-value store | "What's the email for contact ID 12345?" |
| Relational | Knowledge graph | "Which partners in Series A raised funding last month?" |
| Multi-hop | Knowledge graph | "Find companies that hired employees from our customers" |
At OpenHelm, we use knowledge graphs for partnership discovery queries that require traversing company → industry → technology stack relationships.
Graph vs vector search performance
| Dataset | Query | Vector search | Knowledge graph |
|---|---|---|---|
| 10K contacts | "Find CTOs" | 120ms, 78% precision | 15ms, 98% precision |
| 10K contacts | "Companies in fintech using Stripe" | 250ms, 54% precision (keyword match issues) | 25ms, 95% precision |
| 10K contacts | "2-hop: Contacts who work at companies funded by Sequoia" | Not possible | 60ms, 92% precision |
Graphs excel at structured, relational queries with precise results.
"Agent orchestration is where the real value lives. Individual AI capabilities matter less than how well you coordinate them into coherent workflows." - James Park, Founder of AI Infrastructure Labs
Entity extraction
Extract entities from unstructured text (emails, documents, chat logs) using LLMs.
Extraction with structured outputs
Use OpenAI's structured outputs or JSON schema to ensure consistent entity extraction.
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
interface Entity {
type: 'person' | 'company' | 'product' | 'technology';
name: string;
properties: Record<string, string>;
}
interface ExtractionResult {
entities: Entity[];
relationships: Array<{
from_entity: string;
to_entity: string;
relationship_type: string;
confidence: number;
}>;
}
async function extractEntities(text: string): Promise<ExtractionResult> {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{
role: 'system',
content: `Extract entities (people, companies, products, technologies) and their relationships from the text.
Types of relationships:
- WORKS_FOR: person works at company
- USES_TECHNOLOGY: company uses product/tech
- IN_INDUSTRY: company operates in industry
- HAS_ROLE: person has job title
- FUNDED_BY: company funded by investor
- PARTNERED_WITH: company partners with company
Include confidence scores (0-1) for each relationship.`,
}, {
role: 'user',
content: text,
}],
response_format: {
type: 'json_schema',
json_schema: {
name: 'entity_extraction',
schema: {
type: 'object',
properties: {
entities: {
type: 'array',
items: {
type: 'object',
properties: {
type: { type: 'string', enum: ['person', 'company', 'product', 'technology', 'industry'] },
name: { type: 'string' },
properties: { type: 'object', additionalProperties: { type: 'string' } },
},
required: ['type', 'name'],
},
},
relationships: {
type: 'array',
items: {
type: 'object',
properties: {
from_entity: { type: 'string' },
to_entity: { type: 'string' },
relationship_type: { type: 'string' },
confidence: { type: 'number', minimum: 0, maximum: 1 },
},
required: ['from_entity', 'to_entity', 'relationship_type', 'confidence'],
},
},
},
required: ['entities', 'relationships'],
},
},
},
});
return JSON.parse(response.choices[0].message.content);
}
// Example usage
const text = `
Jane Smith, CTO of Acme Corp, mentioned they're using Stripe for payments and recently raised Series A from Sequoia Capital. Acme operates in the fintech space.
`;
const result = await extractEntities(text);
console.log(result);
/*
{
entities: [
{ type: 'person', name: 'Jane Smith', properties: { role: 'CTO' } },
{ type: 'company', name: 'Acme Corp', properties: {} },
{ type: 'product', name: 'Stripe', properties: {} },
{ type: 'company', name: 'Sequoia Capital', properties: { type: 'investor' } },
{ type: 'industry', name: 'fintech', properties: {} },
],
relationships: [
{ from_entity: 'Jane Smith', to_entity: 'Acme Corp', relationship_type: 'WORKS_FOR', confidence: 0.95 },
{ from_entity: 'Jane Smith', to_entity: 'CTO', relationship_type: 'HAS_ROLE', confidence: 0.98 },
{ from_entity: 'Acme Corp', to_entity: 'Stripe', relationship_type: 'USES_TECHNOLOGY', confidence: 0.92 },
{ from_entity: 'Acme Corp', to_entity: 'Sequoia Capital', relationship_type: 'FUNDED_BY', confidence: 0.90 },
{ from_entity: 'Acme Corp', to_entity: 'fintech', relationship_type: 'IN_INDUSTRY', confidence: 0.94 },
],
}
*/Entity deduplication
LLMs might extract "Acme Corp", "Acme Corporation", "acme corp" as separate entities. Deduplicate using fuzzy matching.
import Fuse from 'fuse.js';
function deduplicateEntities(entities: Entity[]): Entity[] {
const deduplicated: Entity[] = [];
for (const entity of entities) {
// Check if similar entity already exists
const fuse = new Fuse(deduplicated, {
keys: ['name'],
threshold: 0.2, // 80% similarity required
});
const matches = fuse.search(entity.name);
if (matches.length > 0) {
// Merge properties
const existing = matches[0].item;
existing.properties = { ...existing.properties, ...entity.properties };
} else {
deduplicated.push(entity);
}
}
return deduplicated;
}Entity validation with ontologies
Validate extracted entities against known ontologies to reduce hallucinations.
const knownCompanies = await db.companies.findAll({ select: ['name'] });
const knownTechnologies = ['Stripe', 'Plaid', 'AWS', 'OpenAI', /* ... */];
function validateEntity(entity: Entity): boolean {
if (entity.type === 'company') {
return knownCompanies.some(c => c.name.toLowerCase() === entity.name.toLowerCase());
}
if (entity.type === 'technology') {
return knownTechnologies.includes(entity.name);
}
return true; // Accept other types without validation
}
// Filter validated entities
const validatedEntities = result.entities.filter(validateEntity);Relationship mapping
Relationships have types, directions, and properties.
Relationship schema
interface Relationship {
id: string;
from_entity_id: string;
to_entity_id: string;
relationship_type: string;
confidence: number; // 0-1
source: string; // Document/email ID where extracted
created_at: Date;
properties: Record<string, any>;
}Bidirectional relationships
Some relationships are bidirectional (PARTNEREDWITH), others directional (WORKSFOR).
const relationshipDirections = {
WORKS_FOR: 'directional',
USES_TECHNOLOGY: 'directional',
PARTNERED_WITH: 'bidirectional',
FUNDED_BY: 'directional',
IN_INDUSTRY: 'directional',
HAS_ROLE: 'directional',
};
function storeRelationship(rel: Relationship) {
if (relationshipDirections[rel.relationship_type] === 'bidirectional') {
// Store both directions
db.relationships.insert(rel);
db.relationships.insert({
...rel,
id: uuidv4(),
from_entity_id: rel.to_entity_id,
to_entity_id: rel.from_entity_id,
});
} else {
db.relationships.insert(rel);
}
}Temporal relationships
Add timestamps to track when relationships formed or ended.
interface TemporalRelationship extends Relationship {
valid_from: Date;
valid_until?: Date; // null = still valid
}
// Example: person changed companies
{
from_entity: 'Jane Smith',
to_entity: 'OldCorp',
relationship_type: 'WORKS_FOR',
valid_from: new Date('2020-01-01'),
valid_until: new Date('2023-06-30'),
}
{
from_entity: 'Jane Smith',
to_entity: 'Acme Corp',
relationship_type: 'WORKS_FOR',
valid_from: new Date('2023-07-01'),
valid_until: null,
}Query current relationships with WHERE valid_until IS NULL OR valid_until > NOW().
Graph storage
Choose between PostgreSQL (JSONB + recursive queries) or Neo4j (dedicated graph DB).
Option 1: PostgreSQL with JSONB
Store entities and relationships in traditional tables.
CREATE TABLE entities (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
type TEXT NOT NULL,
name TEXT NOT NULL,
properties JSONB,
org_id TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE relationships (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
from_entity_id UUID REFERENCES entities(id),
to_entity_id UUID REFERENCES entities(id),
relationship_type TEXT NOT NULL,
confidence NUMERIC(3, 2),
properties JSONB,
org_id TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Indexes for traversal
CREATE INDEX idx_relationships_from ON relationships(from_entity_id);
CREATE INDEX idx_relationships_to ON relationships(to_entity_id);
CREATE INDEX idx_entities_type ON entities(type);Pros: No new infrastructure, familiar SQL
Cons: Complex multi-hop queries require recursive CTEs
Option 2: Neo4j (dedicated graph DB)
Store graph natively with Cypher query language.
// Create entities
CREATE (jane:Person {name: 'Jane Smith', role: 'CTO'})
CREATE (acme:Company {name: 'Acme Corp'})
CREATE (stripe:Product {name: 'Stripe'})
CREATE (fintech:Industry {name: 'fintech'})
// Create relationships
CREATE (jane)-[:WORKS_FOR]->(acme)
CREATE (jane)-[:HAS_ROLE]->(:Role {title: 'CTO'})
CREATE (acme)-[:USES_TECHNOLOGY]->(stripe)
CREATE (acme)-[:IN_INDUSTRY]->(fintech)Pros: Fast multi-hop traversals, native graph operations
Cons: Additional infrastructure, learning curve for Cypher
At OpenHelm, we use PostgreSQL JSONB for simplicity. Our queries rarely exceed 2-hop depth, making recursive CTEs acceptable.
Graph querying
Query graphs to answer relational questions.
Single-hop queries (PostgreSQL)
"Find all companies using Stripe"
SELECT DISTINCT e.name
FROM entities e
JOIN relationships r ON e.id = r.from_entity_id
JOIN entities tech ON r.to_entity_id = tech.id
WHERE
e.type = 'company'
AND r.relationship_type = 'USES_TECHNOLOGY'
AND tech.name = 'Stripe';Multi-hop queries (PostgreSQL with recursive CTE)
"Find people who work at companies in fintech"
WITH RECURSIVE graph_traversal AS (
-- Start: companies in fintech
SELECT
e.id AS entity_id,
e.name AS entity_name,
e.type AS entity_type,
1 AS depth
FROM entities e
JOIN relationships r ON e.id = r.from_entity_id
JOIN entities industry ON r.to_entity_id = industry.id
WHERE
e.type = 'company'
AND r.relationship_type = 'IN_INDUSTRY'
AND industry.name = 'fintech'
UNION ALL
-- Traverse: find people who work for those companies
SELECT
e.id,
e.name,
e.type,
gt.depth + 1
FROM graph_traversal gt
JOIN relationships r ON gt.entity_id = r.to_entity_id
JOIN entities e ON r.from_entity_id = e.id
WHERE
r.relationship_type = 'WORKS_FOR'
AND e.type = 'person'
AND gt.depth < 2
)
SELECT DISTINCT entity_name
FROM graph_traversal
WHERE entity_type = 'person';Multi-hop queries (Neo4j Cypher)
"Find people who work at companies using Stripe"
MATCH (person:Person)-[:WORKS_FOR]->(company:Company)-[:USES_TECHNOLOGY]->(tech:Product {name: 'Stripe'})
RETURN person.name, company.nameCypher is significantly more readable for multi-hop queries.
Graph-based agent tool
Expose graph queries as agent tools.
const graphQueryTool = {
name: 'query_knowledge_graph',
description: 'Query the knowledge graph for entities and relationships. Supports multi-hop queries.',
parameters: z.object({
query_type: z.enum(['companies_using_tech', 'people_at_companies', 'companies_in_industry']),
filters: z.object({
technology: z.string().optional(),
industry: z.string().optional(),
role: z.string().optional(),
}),
}),
execute: async ({ query_type, filters }) => {
if (query_type === 'companies_using_tech') {
return await db.query(`
SELECT DISTINCT e.name
FROM entities e
JOIN relationships r ON e.id = r.from_entity_id
JOIN entities tech ON r.to_entity_id = tech.id
WHERE
e.type = 'company'
AND r.relationship_type = 'USES_TECHNOLOGY'
AND tech.name = $1
`, [filters.technology]);
}
// Other query types...
},
};Agent invokes: query_knowledge_graph({ query_type: 'companies_using_tech', filters: { technology: 'Stripe' } })
Real-world case study: OpenHelm partnership graph
We maintain a knowledge graph of 12,400 companies, 8,200 contacts, and 32,000 technologies with 120,000+ relationships.
Use cases:
- Partnership discovery: "Find Series A fintech companies using both Stripe and Plaid"
- Contact enrichment: "What role does this person have and which company do they work for?"
- Market analysis: "Which industries are adopting AI agents fastest?"
Query performance:
| Query | Complexity | PostgreSQL time | Neo4j time (estimated) |
|---|---|---|---|
| 1-hop: Companies using Stripe | Simple | 18ms | 8ms |
| 2-hop: People at fintech companies | Moderate | 65ms | 22ms |
| 3-hop: Contacts at companies funded by Sequoia | Complex | 180ms | 45ms |
Extraction pipeline:
- Agent processes partnership emails, LinkedIn profiles, meeting notes
- GPT-4 extracts entities and relationships with confidence scores
- Deduplicate entities using fuzzy matching (95% accuracy)
- Validate against known company/technology lists
- Store in PostgreSQL with org scoping
Results:
- 68% reduction in manual contact enrichment time
- 91% precision on "companies in industry X using tech Y" queries
- 15-20% of partnership agent queries now use graph instead of vector search
Call-to-action (Activation stage) Clone our knowledge graph starter schema with entity/relationship tables and example queries.
FAQs
Should I use Neo4j or PostgreSQL for knowledge graphs?
If queries regularly exceed 2-3 hops or you need graph-specific algorithms (PageRank, shortest path), use Neo4j. For simpler graphs with occasional multi-hop queries, PostgreSQL is sufficient and reduces infrastructure complexity.
How do I handle conflicting relationships?
Store confidence scores and source metadata. When conflicts arise (two sources claim different employers), keep both with timestamps and confidence, or use voting/recency to pick winner.
Can I combine vector search and graph queries?
Yes. Use vector search to find relevant entities, then traverse graph from those starting points. Example: vector search finds companies related to "payment processing", graph traversal finds their technologies and team members.
How often should I update the graph?
Incrementally update on new data (emails, documents). Run full reprocessing monthly to catch missed entities and resolve duplicates.
How do I visualize knowledge graphs?
Use Neo4j Browser (if using Neo4j) or libraries like vis.js, D3.js, or Cytoscape.js for custom visualizations. Show top-N entities with highest relationship counts.
Summary and next steps
Knowledge graphs complement vector search by enabling relational and multi-hop queries. Extract entities with LLMs using structured outputs, deduplicate and validate against ontologies, store in PostgreSQL or Neo4j, and query with SQL CTEs or Cypher depending on complexity.
Next steps:
- Identify relational queries your agents need to answer.
- Implement entity extraction with GPT-4 structured outputs.
- Set up PostgreSQL entity/relationship schema with indexes.
- Build graph traversal queries for your top 3 use cases.
- Expose graph queries as agent tools with clear descriptions.
Internal links:
External references:
- Neo4j Graph Database Guide – graph DB fundamentals
- PostgreSQL Recursive Queries – CTE documentation
- OpenAI Structured Outputs – JSON schema constraints
Crosslinks:
More from the blog
OpenHelm vs runCLAUDErun: Which Claude Code Scheduler Is Right for You?
A direct comparison of the two most popular Claude Code schedulers, how each works, what each costs, and which fits your workflow.
Claude Code vs Cursor Pro: Real Developer Cost Comparison
An honest look at what developers actually spend on Claude Code, Cursor Pro, and GitHub Copilot, and how to get the most from each.