This is the official repository for the FSE 2026 paper: Generalizing Test Cases for Comprehensive Test Scenario Coverage
-
For Python:
- python 3.12.11
- pytorch 2.7.1 (with CUDA support)
- transformers 4.52.4
- openai 2.8.1
- tree-sitter 0.20.1
- tree-sitter-java 0.23.5
- tqdm 4.67.1
- numpy 2.3.0
- beautifulsoup4 4.13.4
- A CUDA-capable GPU (required for knowledge base construction and retrieval)
-
For Java:
- CodeQL CLI 2.18.3 (Download Link)
- Apache Maven 3.6.3
- JDK 1.8.0_311 (for building and running the subject Java projects via Maven)
- JDK 17.0.12 (for running JDTLS; must be accessible as
java17on PATH) - JDTLS 1.9.0
- PIT (Pitest) 1.17.0 (downloaded automatically by Maven; pitest-junit5-plugin 1.2.1 for JUnit 5 projects)
Note
Make sure JDK 1.8.0_311, JDK 17.0.12, and CodeQL are installed correctly.
$ java --version
java version "1.8.0_311"
Java(TM) SE Runtime Environment (build 1.8.0_311-b11)
$ java17 --version
java 17.0.12 2024-07-16 LTS
Java(TM) SE Runtime Environment (build 17.0.12+8-LTS-286)
$ mvn --version
Apache Maven 3.6.3
Maven home: /usr/share/maven
Java version: 1.8.0_311, vendor: Oracle Corporation, runtime: ...
$ codeql --version
CodeQL command-line toolchain release 2.18.3.
Also, run mvn clean test in each repository in data/repos to ensure that every repo can compile and test.
In configs.py, set the API key self.openai_key and base URL self.openai_base_url with real values for the LLM you intend to use.
- Download the dataset into the root folder.
- Run
tar -xzvf data.tar.gzto getdata/, includingcollected_coverages/,dataset/,prompt_optimization,repos/folders.
-
Generate exams
cd ./offline_collection python -u collect_examination.py --project_name spark_mini --llm_name gpt-o4-mini --stage exam -
Generate answers with facts
python -u collect_examination.py --project_name spark_mini --llm_name gpt-o4-mini --stage answer
-
Collect project knowledge offline
Note:
Salesforce/codet5p-110m-embedding(~440 MB) is downloaded automatically from HuggingFace on first run.cd ../knowledge_base/ python -u constructor.py --project_name spark_mini -
Generate test scenario templates
cd ../offline_collection python -u generalize_scenario.py --project_name spark_mini --llm_name gpt-o4-mini --stage template -
Generate test scenario instances
python -u generalize_scenario.py --project_name spark_mini --llm_name gpt-o4-mini --stage instance
- Generate tests regarding the scenario instances
cd .. python -u main.py --project_name spark_mini --llm_name gpt-o4-mini --junit_version 4
- Auto-tune a prompt from scratch
cd prompt_optimization/ python -u optimizer.py --strategy auto-tuning --batch_size 5
All commands below are run from the mutation_test/ directory.
-
Prepare the ground-truth test case file
cd ./mutation_test python -u prepare_ground_truth.py --project_name spark_mini -
Generate mutation reports for the ground-truth test cases
python -u generate_pit_reports.py --project_name spark_mini --tool_name ground-truth -
Generate mutation reports for the generated test cases. Use the folder name under
data/generated_test_cases/as--tool_nameand, for our approach (where the path includes anllm_namesubfolder), also pass--llm_name.python -u generate_pit_reports.py --project_name spark_mini --tool_name gpt-o4-mini --llm_name gpt-o4-miniPitest CSV reports will be saved to
./data/mutation_data/. -
Calculate mutation-based scenario coverage scores
python -u calculate_scores.py --project_name spark_mini --tool_name gpt-o4-mini --llm_name gpt-o4-miniScore tables will be generated in
./data/collected_mutation_scores/.
Run from the project root:
python -u llm_judger.py --project_name spark_mini --approach ours-gpt-o4-mini
Supported --approach values: ours-gpt-o4-mini, ours-deepseek-v3.1, evosuite, chattester, vanilla-chattester, ablate-rule, ablate-fact.
Results will be saved to ./data/llm_judge_results/<approach>/<project_name>.json.
@article{qi2026generalizing,
title = {Generalizing Test Cases for Comprehensive Test Scenario Coverage},
author = {Qi, Binhang and Lin, Yun and Weng, Xinyi and Liu, Chenyan and Sun, Hailong and Fraser, Gordon and Dong, Jin Song},
journal = {Proceedings of the ACM on Software Engineering},
volume = {3},
number = {FSE},
year = {2026},
doi = {10.1145/3808216}
}