A knowledge graph that was accurate when it was built but has not been updated in six months is worse than no knowledge graph at all: it gives the appearance of a reliable knowledge source while silently presenting outdated information as current. Incremental knowledge mining — the design of systems that continuously update the graph as source data changes — must be addressed at the architecture stage, not retrofitted after deployment.
The Sources of Change
Changes that require graph updates come from multiple sources: new records entering source databases, revisions to existing records, updates to the ontologies and terminologies that the graph uses as its reference layer, new publications and regulatory documents that add or modify evidence, and formal corrections or retractions of previously extracted assertions. Each of these change types requires a different detection and processing mechanism.
Change Detection Strategies
For structured database sources, change detection is most reliable when source systems emit change events (via triggers, CDC — change data capture — mechanisms, or message queues). Where change events are not available, hash-based comparison of key fields against the previous extraction state provides a workable alternative. For document sources such as publications and regulatory filings, a combination of date-based filtering and document fingerprinting identifies new and revised documents for re-processing. Ontology version updates require systematic comparison of the new and previous ontology versions to identify concept additions, deprecations, and relationship changes — and a re-mapping process for affected graph nodes.
Versioning the Knowledge Graph
Incremental updates should be versioned. Every assertion in the knowledge graph should carry a timestamp of when it was created, when it was last confirmed, and — for deprecated assertions — when it was invalidated. This versioning allows downstream applications to query the state of the knowledge graph at a specific point in time, which is essential for regulatory applications where the information available at the time of a decision must be reproducible. Graph versioning also provides an audit trail that satisfies the traceability requirements of pharmacovigilance and clinical trial oversight processes.