The term "semantic search" now covers two technically distinct approaches that are frequently confused: dense vector retrieval using transformer-generated embeddings, and ontology-driven concept expansion using structured knowledge. Both improve over keyword search in important ways. They have fundamentally different strengths, failure modes, and suitability for regulated pharmaceutical applications — and understanding the difference is essential for making the right architectural choice.
Vector Embeddings: Strengths and Limitations
Dense vector retrieval represents both queries and documents as points in a high-dimensional embedding space, then retrieves documents whose vectors are closest to the query vector. Its strength is capturing semantic similarity without explicit concept definitions — it finds documents about concepts related to the query even when neither the query term nor any of its ontological synonyms appears in the document. This is particularly useful for exploratory research queries where the user does not know the precise terminology. Its limitation is explainability and control: a vector similarity score cannot be traced to a specific ontological relationship, making it difficult to audit, reproduce, or defend in a regulated context. It also tends to be inconsistent at boundaries — two queries that differ in one word can produce dramatically different result sets.
Ontology-Driven Search: Strengths and Limitations
Ontology-driven search expands queries using the explicit relationships defined in a biomedical ontology: synonyms, broader terms, narrower terms, and related concepts. Its strength is determinism and traceability — the same query always produces the same expansion, and the reason a document was retrieved can be traced to a specific ontological relationship. This is essential for pharmacovigilance signal detection, regulatory submission support, and any application where a missed document has safety or compliance consequences. Its limitation is that it only retrieves documents about concepts that are explicitly present in the ontology; novel concepts, informal terminology, and cross-domain analogy are not captured.
The Hybrid Architecture
Production search systems for pharmaceutical document repositories increasingly use a hybrid architecture: ontology-driven concept expansion for precision and traceability on well-defined clinical and pharmacological concepts, supplemented by vector similarity for broader relevance ranking and for the discovery of conceptually related documents that do not match any expanded query term. The two components are applied in sequence: the ontology layer defines the candidate set; the vector layer ranks that set by estimated relevance to the specific query context. This combination achieves both the recall and the auditability that regulated applications require.