LLMs and prompting for unit test generation: a large-scale evaluation

Koyuncu, Anıl; Ouedraogo, Wendkuuni C.; Kabore, Kader; Tian, Haoye; Song, Yewei; Klein, Jacques; Lo, David; Bissyande, Tegawende F.

LLMs and prompting for unit test generation: a large-scale evaluation

buir.contributor.author	Koyuncu, Anıl
dc.citation.epage	2465
dc.citation.spage	2464
dc.contributor.author	Koyuncu, Anıl
dc.contributor.author	Ouedraogo, Wendkuuni C.
dc.contributor.author	Kabore, Kader
dc.contributor.author	Tian, Haoye
dc.contributor.author	Song, Yewei
dc.contributor.author	Klein, Jacques
dc.contributor.author	Lo, David
dc.contributor.author	Bissyande, Tegawende F.
dc.coverage.spatial	Sacramento, California, United States
dc.date.accessioned	2025-02-21T10:59:20Z
dc.date.available	2025-02-21T10:59:20Z
dc.date.issued	2024-11-01
dc.department	Department of Computer Engineering
dc.description	Conference Name: Proceedings - 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024
dc.description	Date of Conference: 28 October 2024 - 1 November 2024
dc.description.abstract	Unit testing, essential for identifying bugs, is often neglected due to time constraints. Automated test generation tools exist but typically lack readability and require developer intervention. Large Language Models (LLMs) like GPT and Mistral show potential in test generation, but their effectiveness remains unclear.This study evaluates four LLMs and five prompt engineering techniques, analyzing 216 300 tests for 690 Java classes from diverse datasets. We assess correctness, readability, coverage, and bug detection, comparing LLM-generated tests to EvoSuite. While LLMs show promise, improvements in correctness are needed. The study highlights both the strengths and limitations of LLMs, offering insights for future research. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
dc.description.provenance	Submitted by Serdar Sevin (serdar.sevin@bilkent.edu.tr) on 2025-02-21T10:59:20Z No. of bitstreams: 1 LLMs_and_Prompting_for_Unit_Test_Generation_A_Large_Scale_Evaluation.pdf: 777952 bytes, checksum: 6f25dcf702b7b32fa3c3111dcdde2dd4 (MD5)	en
dc.description.provenance	Made available in DSpace on 2025-02-21T10:59:20Z (GMT). No. of bitstreams: 1 LLMs_and_Prompting_for_Unit_Test_Generation_A_Large_Scale_Evaluation.pdf: 777952 bytes, checksum: 6f25dcf702b7b32fa3c3111dcdde2dd4 (MD5) Previous issue date: 2024-11-01	en
dc.identifier.doi	10.1145/3691620.3695330
dc.identifier.isbn	979-840071248-7
dc.identifier.uri	https://hdl.handle.net/11693/116554
dc.language.iso	English
dc.publisher	Association for Computing Machinery, Inc
dc.relation.isversionof	https://dx.doi.org/10.1145/3691620.3695330
dc.rights	CC BY 4.0 DEED (Attribution 4.0 International )
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.source.title	Association for Computing
dc.subject	Automatic test generation
dc.subject	Empirical evaluation
dc.subject	Large language models
dc.subject	Prompt engineering
dc.subject	Unit tests
dc.title	LLMs and prompting for unit test generation: a large-scale evaluation
dc.type	Conference Paper

Files

Original bundle

Now showing 1 - 1 of 1

Name:: LLMs_and_Prompting_for_Unit_Test_Generation_A_Large_Scale_Evaluation.pdf
Size:: 759.72 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Publications - Computer Engineering