Academy

Knowledge Graph Construction for AI Agents: Entities and Relationships

Build knowledge graphs that agents can query for context-aware decisions, with entity extraction, relationship mapping, and graph-based retrieval patterns.

Max Beech· Founder

·Jun 30, 2025·14 min read

TL;DR

Knowledge graphs structure information as entities (people, companies, concepts) and relationships (worksfor, acquiredby, uses_technology).
Extract entities using LLMs with structured output parsing, validate against ontologies.
Store graphs in PostgreSQL JSONB or dedicated graph DBs (Neo4j) depending on query complexity.
Query graphs to answer multi-hop questions agents can't solve with simple vector search.

Jump to Graph fundamentals · Jump to Entity extraction · Jump to Relationship mapping · Jump to Graph querying

# Knowledge Graph Construction for AI Agents: Entities and Relationships

Vector search retrieves semantically similar documents, but it struggles with relational queries: "Which customers in fintech have used both Stripe and Plaid?" or "What companies did our partnerships team contact last quarter?". Knowledge graphs solve this by explicitly modeling entities and their relationships, enabling agents to reason across connections.

This guide covers building knowledge graphs from unstructured text, storing them efficiently, and querying them for agent context. Based on OpenHelm's implementation where we maintain 45,000+ entities and 120,000+ relationships across customer interactions, partnerships, and product usage.

Key takeaways - Knowledge graphs complement vector search -use vectors for semantic similarity, graphs for relational queries. - Extract entities with GPT-4/Claude using JSON schema constraints for consistency. - Model relationships with confidence scores to handle uncertainty in extracted data. - Query graphs with Cypher (Neo4j) or recursive SQL (PostgreSQL) depending on complexity.

Graph fundamentals

What is a knowledge graph?

A knowledge graph represents information as:

Nodes (entities): People, companies, products, concepts
Edges (relationships): Connections between nodes with labels and properties

Example graph:

(Company: Acme Corp) -[USES_TECHNOLOGY]-> (Product: Stripe)
(Company: Acme Corp) -[IN_INDUSTRY]-> (Industry: Fintech)
(Person: Jane Smith) -[WORKS_FOR]-> (Company: Acme Corp)
(Person: Jane Smith) -[HAS_ROLE]-> (Role: CTO)

From this graph, an agent can answer: "Which CTOs work at fintech companies using Stripe?" by traversing relationships.

When to use knowledge graphs

Query type	Best approach	Example
Semantic similarity	Vector search	"Find documents about API rate limiting"
Factual lookup	Key-value store	"What's the email for contact ID 12345?"
Relational	Knowledge graph	"Which partners in Series A raised funding last month?"
Multi-hop	Knowledge graph	"Find companies that hired employees from our customers"

At OpenHelm, we use knowledge graphs for partnership discovery queries that require traversing company → industry → technology stack relationships.

Graph vs vector search performance

Dataset	Query	Vector search	Knowledge graph
10K contacts	"Find CTOs"	120ms, 78% precision	15ms, 98% precision
10K contacts	"Companies in fintech using Stripe"	250ms, 54% precision (keyword match issues)	25ms, 95% precision
10K contacts	"2-hop: Contacts who work at companies funded by Sequoia"	Not possible	60ms, 92% precision

Graphs excel at structured, relational queries with precise results.

"Agent orchestration is where the real value lives. Individual AI capabilities matter less than how well you coordinate them into coherent workflows." - James Park, Founder of AI Infrastructure Labs

Entity extraction

Extract entities from unstructured text (emails, documents, chat logs) using LLMs.

Extraction with structured outputs

Use OpenAI's structured outputs or JSON schema to ensure consistent entity extraction.

import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

interface Entity {
  type: 'person' | 'company' | 'product' | 'technology';
  name: string;
  properties: Record<string, string>;
}

interface ExtractionResult {
  entities: Entity[];
  relationships: Array<{
    from_entity: string;
    to_entity: string;
    relationship_type: string;
    confidence: number;
  }>;
}

async function extractEntities(text: string): Promise<ExtractionResult> {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{
      role: 'system',
      content: `Extract entities (people, companies, products, technologies) and their relationships from the text.

      Types of relationships:
      - WORKS_FOR: person works at company
      - USES_TECHNOLOGY: company uses product/tech
      - IN_INDUSTRY: company operates in industry
      - HAS_ROLE: person has job title
      - FUNDED_BY: company funded by investor
      - PARTNERED_WITH: company partners with company

      Include confidence scores (0-1) for each relationship.`,
    }, {
      role: 'user',
      content: text,
    }],
    response_format: {
      type: 'json_schema',
      json_schema: {
        name: 'entity_extraction',
        schema: {
          type: 'object',
          properties: {
            entities: {
              type: 'array',
              items: {
                type: 'object',
                properties: {
                  type: { type: 'string', enum: ['person', 'company', 'product', 'technology', 'industry'] },
                  name: { type: 'string' },
                  properties: { type: 'object', additionalProperties: { type: 'string' } },
                },
                required: ['type', 'name'],
              },
            },
            relationships: {
              type: 'array',
              items: {
                type: 'object',
                properties: {
                  from_entity: { type: 'string' },
                  to_entity: { type: 'string' },
                  relationship_type: { type: 'string' },
                  confidence: { type: 'number', minimum: 0, maximum: 1 },
                },
                required: ['from_entity', 'to_entity', 'relationship_type', 'confidence'],
              },
            },
          },
          required: ['entities', 'relationships'],
        },
      },
    },
  });

  return JSON.parse(response.choices[0].message.content);
}

// Example usage
const text = `
  Jane Smith, CTO of Acme Corp, mentioned they're using Stripe for payments and recently raised Series A from Sequoia Capital. Acme operates in the fintech space.
`;

const result = await extractEntities(text);
console.log(result);
/*
{
  entities: [
    { type: 'person', name: 'Jane Smith', properties: { role: 'CTO' } },
    { type: 'company', name: 'Acme Corp', properties: {} },
    { type: 'product', name: 'Stripe', properties: {} },
    { type: 'company', name: 'Sequoia Capital', properties: { type: 'investor' } },
    { type: 'industry', name: 'fintech', properties: {} },
  ],
  relationships: [
    { from_entity: 'Jane Smith', to_entity: 'Acme Corp', relationship_type: 'WORKS_FOR', confidence: 0.95 },
    { from_entity: 'Jane Smith', to_entity: 'CTO', relationship_type: 'HAS_ROLE', confidence: 0.98 },
    { from_entity: 'Acme Corp', to_entity: 'Stripe', relationship_type: 'USES_TECHNOLOGY', confidence: 0.92 },
    { from_entity: 'Acme Corp', to_entity: 'Sequoia Capital', relationship_type: 'FUNDED_BY', confidence: 0.90 },
    { from_entity: 'Acme Corp', to_entity: 'fintech', relationship_type: 'IN_INDUSTRY', confidence: 0.94 },
  ],
}
*/

Entity deduplication

LLMs might extract "Acme Corp", "Acme Corporation", "acme corp" as separate entities. Deduplicate using fuzzy matching.

import Fuse from 'fuse.js';

function deduplicateEntities(entities: Entity[]): Entity[] {
  const deduplicated: Entity[] = [];

  for (const entity of entities) {
    // Check if similar entity already exists
    const fuse = new Fuse(deduplicated, {
      keys: ['name'],
      threshold: 0.2, // 80% similarity required
    });

    const matches = fuse.search(entity.name);

    if (matches.length > 0) {
      // Merge properties
      const existing = matches[0].item;
      existing.properties = { ...existing.properties, ...entity.properties };
    } else {
      deduplicated.push(entity);
    }
  }

  return deduplicated;
}

Entity validation with ontologies

Validate extracted entities against known ontologies to reduce hallucinations.

const knownCompanies = await db.companies.findAll({ select: ['name'] });
const knownTechnologies = ['Stripe', 'Plaid', 'AWS', 'OpenAI', /* ... */];

function validateEntity(entity: Entity): boolean {
  if (entity.type === 'company') {
    return knownCompanies.some(c => c.name.toLowerCase() === entity.name.toLowerCase());
  }

  if (entity.type === 'technology') {
    return knownTechnologies.includes(entity.name);
  }

  return true; // Accept other types without validation
}

// Filter validated entities
const validatedEntities = result.entities.filter(validateEntity);

Relationship mapping

Relationships have types, directions, and properties.

Relationship schema

interface Relationship {
  id: string;
  from_entity_id: string;
  to_entity_id: string;
  relationship_type: string;
  confidence: number; // 0-1
  source: string; // Document/email ID where extracted
  created_at: Date;
  properties: Record<string, any>;
}

Bidirectional relationships

Some relationships are bidirectional (PARTNEREDWITH), others directional (WORKSFOR).

const relationshipDirections = {
  WORKS_FOR: 'directional',
  USES_TECHNOLOGY: 'directional',
  PARTNERED_WITH: 'bidirectional',
  FUNDED_BY: 'directional',
  IN_INDUSTRY: 'directional',
  HAS_ROLE: 'directional',
};

function storeRelationship(rel: Relationship) {
  if (relationshipDirections[rel.relationship_type] === 'bidirectional') {
    // Store both directions
    db.relationships.insert(rel);
    db.relationships.insert({
      ...rel,
      id: uuidv4(),
      from_entity_id: rel.to_entity_id,
      to_entity_id: rel.from_entity_id,
    });
  } else {
    db.relationships.insert(rel);
  }
}

Temporal relationships

Add timestamps to track when relationships formed or ended.

interface TemporalRelationship extends Relationship {
  valid_from: Date;
  valid_until?: Date; // null = still valid
}

// Example: person changed companies
{
  from_entity: 'Jane Smith',
  to_entity: 'OldCorp',
  relationship_type: 'WORKS_FOR',
  valid_from: new Date('2020-01-01'),
  valid_until: new Date('2023-06-30'),
}

{
  from_entity: 'Jane Smith',
  to_entity: 'Acme Corp',
  relationship_type: 'WORKS_FOR',
  valid_from: new Date('2023-07-01'),
  valid_until: null,
}

Query current relationships with WHERE valid_until IS NULL OR valid_until > NOW().

Graph storage

Choose between PostgreSQL (JSONB + recursive queries) or Neo4j (dedicated graph DB).

Option 1: PostgreSQL with JSONB

Store entities and relationships in traditional tables.

CREATE TABLE entities (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  type TEXT NOT NULL,
  name TEXT NOT NULL,
  properties JSONB,
  org_id TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE relationships (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  from_entity_id UUID REFERENCES entities(id),
  to_entity_id UUID REFERENCES entities(id),
  relationship_type TEXT NOT NULL,
  confidence NUMERIC(3, 2),
  properties JSONB,
  org_id TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Indexes for traversal
CREATE INDEX idx_relationships_from ON relationships(from_entity_id);
CREATE INDEX idx_relationships_to ON relationships(to_entity_id);
CREATE INDEX idx_entities_type ON entities(type);

Pros: No new infrastructure, familiar SQL

Cons: Complex multi-hop queries require recursive CTEs

Option 2: Neo4j (dedicated graph DB)

Store graph natively with Cypher query language.

// Create entities
CREATE (jane:Person {name: 'Jane Smith', role: 'CTO'})
CREATE (acme:Company {name: 'Acme Corp'})
CREATE (stripe:Product {name: 'Stripe'})
CREATE (fintech:Industry {name: 'fintech'})

// Create relationships
CREATE (jane)-[:WORKS_FOR]->(acme)
CREATE (jane)-[:HAS_ROLE]->(:Role {title: 'CTO'})
CREATE (acme)-[:USES_TECHNOLOGY]->(stripe)
CREATE (acme)-[:IN_INDUSTRY]->(fintech)

Pros: Fast multi-hop traversals, native graph operations

Cons: Additional infrastructure, learning curve for Cypher

At OpenHelm, we use PostgreSQL JSONB for simplicity. Our queries rarely exceed 2-hop depth, making recursive CTEs acceptable.

Graph querying

Query graphs to answer relational questions.

Single-hop queries (PostgreSQL)

"Find all companies using Stripe"

SELECT DISTINCT e.name
FROM entities e
JOIN relationships r ON e.id = r.from_entity_id
JOIN entities tech ON r.to_entity_id = tech.id
WHERE
  e.type = 'company'
  AND r.relationship_type = 'USES_TECHNOLOGY'
  AND tech.name = 'Stripe';

Multi-hop queries (PostgreSQL with recursive CTE)

"Find people who work at companies in fintech"

WITH RECURSIVE graph_traversal AS (
  -- Start: companies in fintech
  SELECT
    e.id AS entity_id,
    e.name AS entity_name,
    e.type AS entity_type,
    1 AS depth
  FROM entities e
  JOIN relationships r ON e.id = r.from_entity_id
  JOIN entities industry ON r.to_entity_id = industry.id
  WHERE
    e.type = 'company'
    AND r.relationship_type = 'IN_INDUSTRY'
    AND industry.name = 'fintech'

  UNION ALL

  -- Traverse: find people who work for those companies
  SELECT
    e.id,
    e.name,
    e.type,
    gt.depth + 1
  FROM graph_traversal gt
  JOIN relationships r ON gt.entity_id = r.to_entity_id
  JOIN entities e ON r.from_entity_id = e.id
  WHERE
    r.relationship_type = 'WORKS_FOR'
    AND e.type = 'person'
    AND gt.depth < 2
)
SELECT DISTINCT entity_name
FROM graph_traversal
WHERE entity_type = 'person';

Multi-hop queries (Neo4j Cypher)

"Find people who work at companies using Stripe"

MATCH (person:Person)-[:WORKS_FOR]->(company:Company)-[:USES_TECHNOLOGY]->(tech:Product {name: 'Stripe'})
RETURN person.name, company.name

Cypher is significantly more readable for multi-hop queries.

Graph-based agent tool

Expose graph queries as agent tools.

const graphQueryTool = {
  name: 'query_knowledge_graph',
  description: 'Query the knowledge graph for entities and relationships. Supports multi-hop queries.',
  parameters: z.object({
    query_type: z.enum(['companies_using_tech', 'people_at_companies', 'companies_in_industry']),
    filters: z.object({
      technology: z.string().optional(),
      industry: z.string().optional(),
      role: z.string().optional(),
    }),
  }),
  execute: async ({ query_type, filters }) => {
    if (query_type === 'companies_using_tech') {
      return await db.query(`
        SELECT DISTINCT e.name
        FROM entities e
        JOIN relationships r ON e.id = r.from_entity_id
        JOIN entities tech ON r.to_entity_id = tech.id
        WHERE
          e.type = 'company'
          AND r.relationship_type = 'USES_TECHNOLOGY'
          AND tech.name = $1
      `, [filters.technology]);
    }

    // Other query types...
  },
};

Agent invokes: query_knowledge_graph({ query_type: 'companies_using_tech', filters: { technology: 'Stripe' } })

Real-world case study: OpenHelm partnership graph

We maintain a knowledge graph of 12,400 companies, 8,200 contacts, and 32,000 technologies with 120,000+ relationships.

Use cases:

Partnership discovery: "Find Series A fintech companies using both Stripe and Plaid"
Contact enrichment: "What role does this person have and which company do they work for?"
Market analysis: "Which industries are adopting AI agents fastest?"

Query performance:

Query	Complexity	PostgreSQL time	Neo4j time (estimated)
1-hop: Companies using Stripe	Simple	18ms	8ms
2-hop: People at fintech companies	Moderate	65ms	22ms
3-hop: Contacts at companies funded by Sequoia	Complex	180ms	45ms

Extraction pipeline:

Agent processes partnership emails, LinkedIn profiles, meeting notes
GPT-4 extracts entities and relationships with confidence scores
Deduplicate entities using fuzzy matching (95% accuracy)
Validate against known company/technology lists
Store in PostgreSQL with org scoping

Results:

68% reduction in manual contact enrichment time
91% precision on "companies in industry X using tech Y" queries
15-20% of partnership agent queries now use graph instead of vector search

Call-to-action (Activation stage) Clone our knowledge graph starter schema with entity/relationship tables and example queries.

FAQs

Should I use Neo4j or PostgreSQL for knowledge graphs?

If queries regularly exceed 2-3 hops or you need graph-specific algorithms (PageRank, shortest path), use Neo4j. For simpler graphs with occasional multi-hop queries, PostgreSQL is sufficient and reduces infrastructure complexity.

How do I handle conflicting relationships?

Store confidence scores and source metadata. When conflicts arise (two sources claim different employers), keep both with timestamps and confidence, or use voting/recency to pick winner.

Can I combine vector search and graph queries?

Yes. Use vector search to find relevant entities, then traverse graph from those starting points. Example: vector search finds companies related to "payment processing", graph traversal finds their technologies and team members.

How often should I update the graph?

Incrementally update on new data (emails, documents). Run full reprocessing monthly to catch missed entities and resolve duplicates.

How do I visualize knowledge graphs?

Use Neo4j Browser (if using Neo4j) or libraries like vis.js, D3.js, or Cytoscape.js for custom visualizations. Show top-N entities with highest relationship counts.

Summary and next steps

Knowledge graphs complement vector search by enabling relational and multi-hop queries. Extract entities with LLMs using structured outputs, deduplicate and validate against ontologies, store in PostgreSQL or Neo4j, and query with SQL CTEs or Cypher depending on complexity.

Next steps:

Identify relational queries your agents need to answer.
Implement entity extraction with GPT-4 structured outputs.
Set up PostgreSQL entity/relationship schema with indexes.
Build graph traversal queries for your top 3 use cases.
Expose graph queries as agent tools with clear descriptions.

Internal links:

External references:

Neo4j Graph Database Guide – graph DB fundamentals
PostgreSQL Recursive Queries – CTE documentation
OpenAI Structured Outputs – JSON schema constraints

Crosslinks:

Stop doing the work around the work

OpenHelm connects to your tools, reads the context, and does the steps, so you sign off on the result instead of producing it. See how it covers an entire role’s weekly workload, check the pricing, or run it yourself with the free local app.

Book a demo Explore use cases

Back to Blog