LLMs and prompting for unit test generation: a large-scale evaluation

buir.contributor.authorKoyuncu, Anıl
dc.citation.epage2465
dc.citation.spage2464
dc.contributor.authorKoyuncu, Anıl
dc.contributor.authorOuedraogo, Wendkuuni C.
dc.contributor.authorKabore, Kader
dc.contributor.authorTian, Haoye
dc.contributor.authorSong, Yewei
dc.contributor.authorKlein, Jacques
dc.contributor.authorLo, David
dc.contributor.authorBissyande, Tegawende F.
dc.coverage.spatialSacramento, California, United States
dc.date.accessioned2025-02-21T10:59:20Z
dc.date.available2025-02-21T10:59:20Z
dc.date.issued2024-11-01
dc.departmentDepartment of Computer Engineering
dc.descriptionConference Name: Proceedings - 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024
dc.descriptionDate of Conference: 28 October 2024 - 1 November 2024
dc.description.abstractUnit testing, essential for identifying bugs, is often neglected due to time constraints. Automated test generation tools exist but typically lack readability and require developer intervention. Large Language Models (LLMs) like GPT and Mistral show potential in test generation, but their effectiveness remains unclear.This study evaluates four LLMs and five prompt engineering techniques, analyzing 216 300 tests for 690 Java classes from diverse datasets. We assess correctness, readability, coverage, and bug detection, comparing LLM-generated tests to EvoSuite. While LLMs show promise, improvements in correctness are needed. The study highlights both the strengths and limitations of LLMs, offering insights for future research. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
dc.description.provenanceSubmitted by Serdar Sevin (serdar.sevin@bilkent.edu.tr) on 2025-02-21T10:59:20Z No. of bitstreams: 1 LLMs_and_Prompting_for_Unit_Test_Generation_A_Large_Scale_Evaluation.pdf: 777952 bytes, checksum: 6f25dcf702b7b32fa3c3111dcdde2dd4 (MD5)en
dc.description.provenanceMade available in DSpace on 2025-02-21T10:59:20Z (GMT). No. of bitstreams: 1 LLMs_and_Prompting_for_Unit_Test_Generation_A_Large_Scale_Evaluation.pdf: 777952 bytes, checksum: 6f25dcf702b7b32fa3c3111dcdde2dd4 (MD5) Previous issue date: 2024-11-01en
dc.identifier.doi10.1145/3691620.3695330
dc.identifier.isbn979-840071248-7
dc.identifier.urihttps://hdl.handle.net/11693/116554
dc.language.isoEnglish
dc.publisherAssociation for Computing Machinery, Inc
dc.relation.isversionofhttps://dx.doi.org/10.1145/3691620.3695330
dc.rightsCC BY 4.0 DEED (Attribution 4.0 International )
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.source.titleAssociation for Computing
dc.subjectAutomatic test generation
dc.subjectEmpirical evaluation
dc.subjectLarge language models
dc.subjectPrompt engineering
dc.subjectUnit tests
dc.titleLLMs and prompting for unit test generation: a large-scale evaluation
dc.typeConference Paper

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LLMs_and_Prompting_for_Unit_Test_Generation_A_Large_Scale_Evaluation.pdf
Size:
759.72 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: