CV

Education

University of California, Berkeley — M.S. Information and Data Science (MIDS), GPA: 4.0 Expected August 2026. Relevant coursework: Applied Machine Learning, Natural Language Processing, Computer Vision.

University of California, Riverside — B.S. Computer Science, GPA: 4.0 June 2025.

Research Experience

Graduate Research Assistant — UC Berkeley (Advisor: Prof. Tanya Roosta) January 2026 – Present

Co-authored systems-level survey on hallucination in medical LLMs, synthesizing 50+ papers on structural mechanisms, detection, mitigation, and benchmarking through a risk management lens. Published MetaArXiv March 2026.
Developing novel detection and mitigation methods for LLMs building on survey findings, targeting submission to a top-tier ML venue.

Graduate Research Assistant — UC Berkeley (Advisor: Prof. Cornelia Paulik) August 2025 – Present

Built RAG pipeline for pediatric medical QA combining dense encoders and sparse retrieval (BM25) with cross-encoder reranking, improving accuracy by 34% and reducing hallucinations by 42% over baseline.
Developed PyTorch classifiers for age-group segmentation (0–2, 3–5, 6–12, 13–18), achieving 94% accuracy for targeted QA.

Machine Learning Researcher — DASION (Research collaboration with Prof. Weiqing Gu, Harvey Mudd College) September 2021 – Present, Claremont, CA (Remote)

Developed multimodal depression detection using Wav2Vec 2.0 audio and MT5/RoBERTa text transformers with CTC temporal alignment and multi-instance learning, achieving F1=0.94 on DAIC-WOZ.
Secured $2.5M in NSF Phase I and II funding for clinical ML research. Led team of 8 engineers building production ML infrastructure on AWS, achieving 99.9% uptime and deploying to 5+ institutions.
Built interpretable CNN models using PyTorch and SHAP for clinical imaging achieving 93% diagnostic accuracy. Developed BERT-based NLP for clinical documentation reducing manual processing by 60%.

Machine Learning Research Engineer (Contract) — Angel Technologies September 2023 – February 2024, Brea, CA

Built transformer-based NLP system for automated regulatory compliance detection using fine-tuned BERT models, reducing processing time by 40% and error rates by 85%.
Developed predictive models using time-series forecasting for customer behavior analysis, improving conversion by 30%.

Publications

Rethinking Medical LLM Hallucinations: A System-Level Survey Matthews, Vankadaru, Roosta, Passban. MetaArXiv, March 2026. Survey arguing hallucination in medical LLMs is a structural property of probabilistic generation. Synthesizes detection, mitigation, and benchmark literature through a systems and risk management lens.

Multimodal Multi-Instance Learning for Depression Detection (Target: NeurIPS 2026) First multimodal MIL framework for depression detection combining MT5/RoBERTa text with Wav2Vec 2.0 audio via CTC temporal alignment. Achieves F1>0.90 on DAIC-WOZ, surpassing text-only MIL baseline (F1=0.88). Directly addresses interviewer bias via strict prompt exclusion.

PedRAG: Retrieval-Augmented Generation for Pediatric Medical QA (Target: ICML Poster 2026) RAG framework combining dense and sparse retrieval with age-specific classification, achieving 34% accuracy improvement and 42% hallucination reduction over baselines.

Professional Experience

Chief Technology Officer — AGMNT February 2024 – Present, San Ramon, CA (Hybrid, Part-time)

Built TensorFlow recommendation engine and full-stack platform (Node.js, React, MongoDB) serving 10K+ users and 15+ brands, improving conversion rates by 28%.
Implemented A/B testing framework processing 1M+ daily events to drive product decisions.

Machine Learning Intern — Ambassadore Healthcare Inc. May 2023 – August 2023, Artesia, CA

Engineered distributed data pipeline using Kafka and MongoDB processing 100GB+ daily clinical data with statistical modeling in Scala/Spark, achieving 75% storage optimization and sub-100ms query latency.

Software Engineer Intern — Royal Majesty Home Care June 2022 – August 2022, Long Beach, CA

Built AI scheduling system using decision trees and linear programming, reducing overtime costs by $200K annually.

Technical Skills

Languages: Python, R, C++, SQL, JavaScript

ML/AI: PyTorch, TensorFlow, Hugging Face, SHAP, MLflow, Pandas, NumPy, Scikit-learn, AWS (SageMaker, Lambda, S3, EC2), Docker, PostgreSQL, MongoDB

Certifications: AWS Certified Machine Learning — Specialty (2025), Linear Algebra with Applications in ML (Harvey Mudd College)

Grants and Funding

NSF Phase I and II Grants — $2.5M for clinical ML research (2021–2026), in collaboration with Prof. Weiqing Gu at Harvey Mudd College.

Vijay Vankadaru