Trust but Verify: Benchmarks for Hallucination, Vulnerability, and Style Drift in AI-Generated Code Reviews
Keywords:
AI code reviews, hallucination, vulnerability, style drift, AI verification, benchmarks, software development, coding standards, AI reliability, system securityAbstract
The growing popularity of AI-based code reviews in software development necessitates a thorough understanding of their shortcomings and potential risks. The current paper addresses three key problems: hallucination, vulnerability, and style drift, which may jeopardize the quality and security of AI-generated code reviews. Hallucinations refer to instances where AI provides incorrect or irrelevant recommendations, whereas vulnerability points out the threats of misuse or assaults on AI systems. Style drift refers to the shift in coding standards used by the AI. The primary objective of this study is to establish clear standards for identifying and confirming these issues, thereby enhancing the accuracy and reliability of AI-mediated code assessments. The most important finding is that in the absence of proper verification solutions, AI-produced code reviews may cause significant quality differences. The research also contains suggestions on the measures to improve the reliability of AI systems, so that they could correspond to the industry requirements.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Well Testing Journal

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
This license requires that re-users give credit to the creator. It allows re-users to distribute, remix, adapt, and build upon the material in any medium or format, for noncommercial purposes only.