A HIPAA-Aware Benchmark and Evaluation Harness for Clinical LLMs to Quantify Hallucination, Bias, and PHI Leakage

Valentina Palama

A HIPAA-Aware Benchmark and Evaluation Harness for Clinical LLMs to Quantify Hallucination, Bias, and PHI Leakage

Authors

Valentina Palama MSc in Computer Information Systems (Prairie View A&M University), USA

Keywords:

Clinical large language models, HIPAA compliance, hallucination detection, algorithmic bias, protected health information leakage, healthcare AI evaluation

Abstract

The growing use of large language models (LLMs) in clinical practice has brought up serious doubts about the reliability, fairness, and compliance with regulations. Model hallucinations may undermine clinical decision-making in a health care setting, algorithmic bias may create health inequities, and unintended disclosure of protected health information (PHI), may contravene privacy rules. Although more focus is given to clinical LLM assessment, current benchmarks pay much attention to overall performance and do not address these safety and compliance risks in a holistic manner in a HIPAA-conscious system. This paper suggests a uniform benchmark and assessment harness that is specifically rigorously developed in clinical LLMs to quantify hallucination, bias, and leakage of PHI systematically. The framework includes clinically-based exercises, de-identified and artificial data sets, and automated identification tools consistent with the HIPAA-related privacy categories. Through the combination of several dimensions of evaluation into a single and reproducible harness, the benchmark can allow comparative clinical LLM evaluation of safety, fairness and privacy measures. The findings demonstrate that there is a significant variability in model behavior, which indicates trade-offs between clinical capability and risk exposure. The work adds a useful evaluation system to help implement LLMs responsibly, regulate, and monitor their use in healthcare settings.

Downloads

Requires Subscription Pdf

Published

18-08-2025

How to Cite

Palama, V. (2025). A HIPAA-Aware Benchmark and Evaluation Harness for Clinical LLMs to Quantify Hallucination, Bias, and PHI Leakage. Well Testing Journal, 34(S3), 830–849. Retrieved from https://welltestingjournal.com/index.php/WT/article/view/274

Download Citation

Issue

Vol. 34 No. S3 (2025)

Section

Research Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

This license requires that re-users give credit to the creator. It allows re-users to distribute, remix, adapt, and build upon the material in any medium or format, for noncommercial purposes only.

A HIPAA-Aware Benchmark and Evaluation Harness for Clinical LLMs to Quantify Hallucination, Bias, and PHI Leakage

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

SCOPUS

SCIMAGO

Keywords

CC