Text to Knowledge Graph
See how easy it is to implement a method to convert natural language text to nodes and edges in a knowledge graph.
Table of contents
All the cool kids are converting natural language text to knowledge graphs, so why not me!? 😊
In this article I show how to use a “1-liner” using Concerto Graph to create nodes and edges in a knowledge graph from text, whilst ensuring that the nodes and edges conform to a Concerto data model (schema).
I’ve extended Concerto Graph to add a new class, TextToGraph, with a new method, mergeText, which makes converting a block of text to nodes and edges in a knowledge graph very simple.
Simply pass a block of text to the mergeText method and it will take care of extracting the nodes and edges from the text and adding them to the knowledge graph, ensuring that the structure of the nodes and edges conforms to the Concerto data model that is associated with the graph model.
const text = `Kingdom of the Planet of the Apes
Maze Runner director Wes Ball returns to the Planet of the Apes franchise, with a story set 300 years after the events of
War for the Planet of the Apes. Pitched by Ball as Apocalypto but with apes and focusing more on action and adventure,
the story follows a new protagonist Noa (Owen Teague) as he tries to steer the apes away from the totalitarian future
they are headed towards. Freya Allan, Kevin Durand, Dichen Lachman, and William H. Macy star.`;
const textToGraph = new TextToGraph(graphModel, {logger, maxContextSize: 64000});
const results = await textToGraph.mergeText(text);
Graph nodes and edges created from Kingdom of the Planet of the Apes text snippet
The debug output should be similar to that below. In this case a single Movie node, a single Director node, and five Actor nodes are created. Relationships are then created to link the Movie, Director and Actors.
created node Movie with identifier Kingdom of the Planet of the Apes []
created node Person with identifier Wes Ball []
created node Person with identifier Owen Teague []
created node Person with identifier Freya Allan []
created node Person with identifier Kevin Durand []
created node Person with identifier Dichen Lachman []
created node Person with identifier William H. Macy []
created relationship directed between source node Wes Ball and target node Kingdom of the Planet of the Apes []
created relationship actedIn between source node Owen Teague and target node Kingdom of the Planet of the Apes []
created relationship actedIn between source node Freya Allan and target node Kingdom of the Planet of the Apes []
created relationship actedIn between source node Kevin Durand and target node Kingdom of the Planet of the Apes []
created relationship actedIn between source node Dichen Lachman and target node Kingdom of the Planet of the Apes []
created relationship actedIn between source node William H. Macy and target node Kingdom of the Planet of the Apes []
The implementation uses a pretty simple prompt, which is parameterised using the Concerto model.
The code is committed and the demo code has been updated. I’d be grateful for testing with other data models and any feedback.
Additional resources
Dan Selman has over 25 years experience in IT, creating software products for BEA Systems, ILOG, IBM, Docusign and more. He was a Co-founder and CTO of Clause, Inc., acquired by Docusign in June 2021.