TestGeneralizer

This is the official repository for the FSE 2026 paper: Generalizing Test Cases for Comprehensive Test Scenario Coverage

🕹️ Setup

For Python:
- python 3.12.11
- pytorch 2.7.1 (with CUDA support)
- transformers 4.52.4
- openai 2.8.1
- tree-sitter 0.20.1
- tree-sitter-java 0.23.5
- tqdm 4.67.1
- numpy 2.3.0
- beautifulsoup4 4.13.4
- A CUDA-capable GPU (required for knowledge base construction and retrieval)
For Java:
- CodeQL CLI 2.18.3 (Download Link)
- Apache Maven 3.6.3
- JDK 1.8.0_311 (for building and running the subject Java projects via Maven)
- JDK 17.0.12 (for running JDTLS; must be accessible as java17 on PATH)
- JDTLS 1.9.0
- PIT (Pitest) 1.17.0 (downloaded automatically by Maven; pitest-junit5-plugin 1.2.1 for JUnit 5 projects)

Note

Make sure JDK 1.8.0_311, JDK 17.0.12, and CodeQL are installed correctly.

$ java --version
java version "1.8.0_311"
Java(TM) SE Runtime Environment (build 1.8.0_311-b11)

$ java17 --version
java 17.0.12 2024-07-16 LTS
Java(TM) SE Runtime Environment (build 17.0.12+8-LTS-286)

$ mvn --version
Apache Maven 3.6.3
Maven home: /usr/share/maven
Java version: 1.8.0_311, vendor: Oracle Corporation, runtime: ...

$ codeql --version
CodeQL command-line toolchain release 2.18.3.

Also, run mvn clean test in each repository in data/repos to ensure that every repo can compile and test.

🚀 Running Experiments

Set API Credentials

In configs.py, set the API key self.openai_key and base URL self.openai_base_url with real values for the LLM you intend to use.

Prepare Datasets

Download the dataset into the root folder.
Run tar -xzvf data.tar.gz to get data/, including collected_coverages/, dataset/, prompt_optimization, repos/ folders.

Stage 1: Collect Exams and Answers

Generate exams

cd ./offline_collection
python -u collect_examination.py --project_name spark_mini --llm_name gpt-o4-mini --stage exam

Generate answers with facts

python -u collect_examination.py --project_name spark_mini --llm_name gpt-o4-mini --stage answer

Stage 2: Generate Test Scenario Templates and Instances

Collect project knowledge offline

Note: Salesforce/codet5p-110m-embedding (~440 MB) is downloaded automatically from HuggingFace on first run.
```
cd ../knowledge_base/
python -u constructor.py --project_name spark_mini
```

Generate test scenario templates

cd ../offline_collection
python -u generalize_scenario.py --project_name spark_mini --llm_name gpt-o4-mini --stage template

Generate test scenario instances

python -u generalize_scenario.py --project_name spark_mini --llm_name gpt-o4-mini --stage instance

Stage 3: Generate Test Cases

Generate tests regarding the scenario instances

cd ..
python -u main.py --project_name spark_mini --llm_name gpt-o4-mini --junit_version 4

Prompt Auto-Tuning (Optional)

Auto-tune a prompt from scratch

cd prompt_optimization/
python -u optimizer.py --strategy auto-tuning --batch_size 5

Calculation of Mutation-based Scenario Coverage

All commands below are run from the mutation_test/ directory.

Prepare the ground-truth test case file

cd ./mutation_test
python -u prepare_ground_truth.py --project_name spark_mini

Generate mutation reports for the ground-truth test cases

python -u generate_pit_reports.py --project_name spark_mini --tool_name ground-truth

Generate mutation reports for the generated test cases. Use the folder name under data/generated_test_cases/ as --tool_name and, for our approach (where the path includes an llm_name subfolder), also pass --llm_name.
```
python -u generate_pit_reports.py --project_name spark_mini --tool_name gpt-o4-mini --llm_name gpt-o4-mini
```
Pitest CSV reports will be saved to ./data/mutation_data/.
Calculate mutation-based scenario coverage scores
```
python -u calculate_scores.py --project_name spark_mini --tool_name gpt-o4-mini --llm_name gpt-o4-mini
```
Score tables will be generated in ./data/collected_mutation_scores/.

Calculation of LLM-Assessed Scenario Coverage

Run from the project root:

python -u llm_judger.py --project_name spark_mini --approach ours-gpt-o4-mini

Supported --approach values: ours-gpt-o4-mini, ours-deepseek-v3.1, evosuite, chattester, vanilla-chattester, ablate-rule, ablate-fact.

Results will be saved to ./data/llm_judge_results/<approach>/<project_name>.json.

📝 Citation

@article{qi2026generalizing,
    title = {Generalizing Test Cases for Comprehensive Test Scenario Coverage},
    author = {Qi, Binhang and Lin, Yun and Weng, Xinyi and Liu, Chenyan and Sun, Hailong and Fraser, Gordon and Dong, Jin Song},
    journal = {Proceedings of the ACM on Software Engineering},
    volume  = {3},
    number  = {FSE},
    year = {2026},
    doi = {10.1145/3808216}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LSPs		LSPs
evaluate_on_buggy_code		evaluate_on_buggy_code
knowledge_base		knowledge_base
mutation_test		mutation_test
offline_collection		offline_collection
parser		parser
pit		pit
prompt_optimization		prompt_optimization
utilities		utilities
.gitignore		.gitignore
FSE-26_TestGeneralizer.pdf		FSE-26_TestGeneralizer.pdf
README.md		README.md
agents.py		agents.py
baseline_vanilla.py		baseline_vanilla.py
configs.py		configs.py
dataset.py		dataset.py
degrade_test_quality.py		degrade_test_quality.py
examiner.py		examiner.py
generalizer.py		generalizer.py
generator.py		generator.py
graph_explorer.py		graph_explorer.py
llm_judger.py		llm_judger.py
main.py		main.py
overview.png		overview.png
requirements.txt		requirements.txt
test_case_runner.py		test_case_runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TestGeneralizer

🕹️ Setup

🚀 Running Experiments

Set API Credentials

Prepare Datasets

Stage 1: Collect Exams and Answers

Stage 2: Generate Test Scenario Templates and Instances

Stage 3: Generate Test Cases

Prompt Auto-Tuning (Optional)

Calculation of Mutation-based Scenario Coverage

Calculation of LLM-Assessed Scenario Coverage

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TestGeneralizer

🕹️ Setup

🚀 Running Experiments

Set API Credentials

Prepare Datasets

Stage 1: Collect Exams and Answers

Stage 2: Generate Test Scenario Templates and Instances

Stage 3: Generate Test Cases

Prompt Auto-Tuning (Optional)

Calculation of Mutation-based Scenario Coverage

Calculation of LLM-Assessed Scenario Coverage

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages