Target identification — selecting the molecular target most likely to yield a safe and effective drug for a specific disease — is one of the highest-stakes and most uncertain decisions in pharmaceutical development. Most drugs that fail in late-stage clinical development fail not because of chemistry or formulation problems, but because the target was not as central to the disease as preclinical evidence suggested. Reducing this uncertainty requires integrating evidence from multiple biological and clinical domains in a way that highlights which targets have the strongest, most consistent support across different evidence types.

The Multi-Evidence Integration Problem

Target identification evidence comes from sources that are heterogeneous in type, quality, and relevance: genetic association studies (GWAS, rare variant analyses, Mendelian randomisation), proteomic studies, pathway analyses, preclinical model data, human tissue expression patterns, and — crucially — evidence from existing drugs that modulate the target. Integrating this evidence manually is extremely slow and is limited by the knowledge each individual scientist brings to the review. A knowledge graph that formally represents all of these evidence types, using a shared ontological vocabulary for genes, proteins, diseases, pathways, and drug mechanisms, enables systematic multi-evidence queries that no individual could perform manually at scale.

Genetic Evidence as the Anchor

Genetic evidence — particularly human genetic evidence linking target perturbation to disease phenotype — has emerged as the most predictive indicator of clinical success for novel mechanisms. A knowledge graph that integrates GWAS summary statistics, rare variant clinical phenotypes, and expression quantitative trait locus (eQTL) data with disease ontology can identify targets where the genetic evidence is strong, directional, and disease-specific. Candidates supported by strong human genetic evidence for their target have roughly twice the clinical success rate of those supported only by preclinical biology, making this evidence layer a critical input to target prioritisation decisions.

Safety Evidence Integration

Target identification decisions must also integrate safety evidence: what adverse effects have been associated with existing drugs that modulate the target, what phenotypes are produced by loss-of-function mutations in humans, and what on-target toxicities are predicted by the target's known biology. A knowledge graph that links pharmacological targets to safety ontology concepts — adverse drug reactions, safety-relevant phenotypes, human genetic loss-of-function phenotypes — provides an early warning of target-related liabilities before chemical synthesis begins, enabling programmes to incorporate safety considerations from the earliest stages of target selection.