PedRAG: Retrieval-Augmented Generation for Pediatric Medical QA

Overview

Built a RAG system for pediatric medical question-answering as Graduate Research Assistant with Prof. Cornelia Paulik at UC Berkeley. Extends the MedRAG benchmark to evaluate age-appropriate retrieval and answer generation across four pediatric cohorts.

Target venue: ICML 2026 (Poster).

Technical Approach

Dual-retrieval architecture — Combines dense encoders (semantic similarity) with sparse retrieval (BM25 exact term matching) and cross-encoder reranking. Dense retrieval catches conceptually related content; sparse retrieval catches precise medical terminology.

Age-group classification — PyTorch multi-class classifier segments queries and documents into four pediatric age groups (0–2, 3–5, 6–12, 13–18), achieving 94% accuracy. Ensures retrieval is age-appropriate — a dosing recommendation for a 2-year-old is not the same as one for a 15-year-old.

Hallucination evaluation — Evaluated against the MedRAG baseline on both answer accuracy and hallucination rates, using the PediatricsMQA dataset.

Results

  • 34% improvement in answer accuracy over baseline
  • 42% reduction in hallucination rates over baseline
  • 94% age-group classification accuracy