A useful biomarker must satisfy three conditions: it must be technically measurable with sufficient precision and reproducibility; it must be biologically linked to the disease process or treatment mechanism it is proposed to reflect; and it must be clinically validated — demonstrated to predict the relevant outcome in appropriate patient populations. Finding candidates that satisfy all three conditions requires integrating evidence from molecular biology, disease pathology, and clinical medicine — precisely the kind of cross-domain knowledge synthesis that knowledge graphs are designed to support.
Structuring the Biomarker Evidence Space
A biomarker knowledge graph organises the evidence space around four key node types: molecular entities (genes, proteins, metabolites, and their measured features — expression levels, variant status, phosphorylation state); biological processes (pathways, cellular mechanisms, disease-relevant biological functions); disease concepts (conditions, subtypes, stages, and the ontological relationships between them); and clinical outcomes (endpoints, survival measures, response criteria). The edges connecting these node types represent the evidence relationships: this protein is upregulated in this disease stage; this gene variant is associated with this outcome in this patient population; this pathway is activated by this treatment mechanism.
Hypothesis Generation Queries
Given this structure, biomarker discovery queries can be expressed as graph traversals: "identify proteins that are differentially expressed in disease stage X, that are members of pathways known to be dysregulated in disease X, and for which genetic variants are associated with clinical outcome Y in published GWAS studies." This query, which requires synthesising evidence from proteomics databases, pathway databases, and genetic association databases, produces a ranked list of biologically plausible biomarker candidates that would take weeks to assemble manually and minutes to retrieve from a well-maintained knowledge graph.
From Hypothesis to Validation Plan
The knowledge graph also supports the transition from discovery hypothesis to validation plan. For each candidate biomarker, the graph can retrieve information about existing measurement assays, existing clinical cohorts where the biomarker has been measured, existing clinical trials where it could be assessed as an exploratory endpoint, and regulatory guidance on the evidence required to qualify it as an accepted endpoint. This context transforms a list of molecular hypotheses into actionable development plans — connecting the discovery phase directly to the clinical validation programme.