Skip to content

code-philia/TestGeneralizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TestGeneralizer

This is the official repository for the FSE 2026 paper: Generalizing Test Cases for Comprehensive Test Scenario Coverage

🕹️ Setup

  • For Python:

    • python 3.12.11
    • pytorch 2.7.1 (with CUDA support)
    • transformers 4.52.4
    • openai 2.8.1
    • tree-sitter 0.20.1
    • tree-sitter-java 0.23.5
    • tqdm 4.67.1
    • numpy 2.3.0
    • beautifulsoup4 4.13.4
    • A CUDA-capable GPU (required for knowledge base construction and retrieval)
  • For Java:

    • CodeQL CLI 2.18.3 (Download Link)
    • Apache Maven 3.6.3
    • JDK 1.8.0_311 (for building and running the subject Java projects via Maven)
    • JDK 17.0.12 (for running JDTLS; must be accessible as java17 on PATH)
    • JDTLS 1.9.0
    • PIT (Pitest) 1.17.0 (downloaded automatically by Maven; pitest-junit5-plugin 1.2.1 for JUnit 5 projects)

Note

Make sure JDK 1.8.0_311, JDK 17.0.12, and CodeQL are installed correctly.

$ java --version
java version "1.8.0_311"
Java(TM) SE Runtime Environment (build 1.8.0_311-b11)
$ java17 --version
java 17.0.12 2024-07-16 LTS
Java(TM) SE Runtime Environment (build 17.0.12+8-LTS-286)
$ mvn --version
Apache Maven 3.6.3
Maven home: /usr/share/maven
Java version: 1.8.0_311, vendor: Oracle Corporation, runtime: ...
$ codeql --version
CodeQL command-line toolchain release 2.18.3.

Also, run mvn clean test in each repository in data/repos to ensure that every repo can compile and test.

🚀 Running Experiments

Set API Credentials

In configs.py, set the API key self.openai_key and base URL self.openai_base_url with real values for the LLM you intend to use.

Prepare Datasets

  1. Download the dataset into the root folder.
  2. Run tar -xzvf data.tar.gz to get data/, including collected_coverages/, dataset/, prompt_optimization, repos/ folders.

Stage 1: Collect Exams and Answers

  1. Generate exams

    cd ./offline_collection
    python -u collect_examination.py --project_name spark_mini --llm_name gpt-o4-mini --stage exam
    
  2. Generate answers with facts

    python -u collect_examination.py --project_name spark_mini --llm_name gpt-o4-mini --stage answer
    

Stage 2: Generate Test Scenario Templates and Instances

  1. Collect project knowledge offline

    Note: Salesforce/codet5p-110m-embedding (~440 MB) is downloaded automatically from HuggingFace on first run.

    cd ../knowledge_base/
    python -u constructor.py --project_name spark_mini
    
  2. Generate test scenario templates

    cd ../offline_collection
    python -u generalize_scenario.py --project_name spark_mini --llm_name gpt-o4-mini --stage template
    
  3. Generate test scenario instances

    python -u generalize_scenario.py --project_name spark_mini --llm_name gpt-o4-mini --stage instance
    

Stage 3: Generate Test Cases

  1. Generate tests regarding the scenario instances
    cd ..
    python -u main.py --project_name spark_mini --llm_name gpt-o4-mini --junit_version 4
    

Prompt Auto-Tuning (Optional)

  1. Auto-tune a prompt from scratch
    cd prompt_optimization/
    python -u optimizer.py --strategy auto-tuning --batch_size 5
    

Calculation of Mutation-based Scenario Coverage

All commands below are run from the mutation_test/ directory.

  1. Prepare the ground-truth test case file

    cd ./mutation_test
    python -u prepare_ground_truth.py --project_name spark_mini
    
  2. Generate mutation reports for the ground-truth test cases

    python -u generate_pit_reports.py --project_name spark_mini --tool_name ground-truth
    
  3. Generate mutation reports for the generated test cases. Use the folder name under data/generated_test_cases/ as --tool_name and, for our approach (where the path includes an llm_name subfolder), also pass --llm_name.

    python -u generate_pit_reports.py --project_name spark_mini --tool_name gpt-o4-mini --llm_name gpt-o4-mini
    

    Pitest CSV reports will be saved to ./data/mutation_data/.

  4. Calculate mutation-based scenario coverage scores

    python -u calculate_scores.py --project_name spark_mini --tool_name gpt-o4-mini --llm_name gpt-o4-mini
    

    Score tables will be generated in ./data/collected_mutation_scores/.

Calculation of LLM-Assessed Scenario Coverage

Run from the project root:

python -u llm_judger.py --project_name spark_mini --approach ours-gpt-o4-mini

Supported --approach values: ours-gpt-o4-mini, ours-deepseek-v3.1, evosuite, chattester, vanilla-chattester, ablate-rule, ablate-fact.

Results will be saved to ./data/llm_judge_results/<approach>/<project_name>.json.

📝 Citation

@article{qi2026generalizing,
    title = {Generalizing Test Cases for Comprehensive Test Scenario Coverage},
    author = {Qi, Binhang and Lin, Yun and Weng, Xinyi and Liu, Chenyan and Sun, Hailong and Fraser, Gordon and Dong, Jin Song},
    journal = {Proceedings of the ACM on Software Engineering},
    volume  = {3},
    number  = {FSE},
    year = {2026},
    doi = {10.1145/3808216}
}

About

This is the official repository for the FSE 2026 paper: Generalizing Test Cases for Comprehensive Test Scenario Coverage

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages