The debate between fully automated knowledge extraction and manual curation in biomedical NLP is a false dichotomy that wastes resources and delays value delivery. Fully automated systems are fast and scalable but make errors that are dangerous in regulated domains. Fully manual curation is accurate but cannot scale to the volume of biomedical text published and generated every year. The productive question is not which approach to choose, but how to allocate human expert attention where it generates the most value — and design automation to handle everything else reliably.
Where Automation Works Well
Automation excels at high-volume, well-defined extraction tasks where the target relation types are linguistically regular and the consequences of individual errors are bounded by the downstream validation process. Extracting drug names from structured drug labels, coding adverse event reports to MedDRA terms from structured CRF fields, and mapping ICD codes to SNOMED CT concepts are all tasks where modern NLP achieves accuracy levels that justify automation with light sampling-based monitoring rather than per-record review.
Where Human Expertise Remains Essential
Human expert attention is irreplaceable for three categories of task. First, novel concept identification: when a new drug, disease subtype, or clinical finding is not yet in any ontology, automated systems will either miss it or map it incorrectly to the nearest existing concept. Expert curators identify these gaps and initiate ontology extension requests. Second, ambiguous relation interpretation: clinical language is rich with hedging, negation, temporal qualification, and implicit context that automated systems handle poorly in edge cases. Third, quality sampling and calibration: periodic expert review of automated outputs is essential for detecting systematic errors before they propagate through the knowledge graph at scale.
Designing the Human-in-the-Loop Interface
The efficiency of expert curation depends heavily on the interface through which curators interact with automated outputs. Curators who must review raw NLP outputs against source text in separate windows spend most of their time on navigation rather than judgement. A well-designed curation interface presents the source context, the proposed extraction, alternative interpretations ranked by confidence, and the ontology concept it maps to — all in a single view, with one-click accept/reject/modify actions. This interface investment typically improves curator throughput by a factor of three to five, making the curation process sustainable at the volumes required for production knowledge graph maintenance.