📢 [Project Page] [Datasets] [Models]
This is the official repository for the paper "GUIPilot: A Consistency-based Mobile GUI Testing Approach for Detecting Application-specific Bugs", published at ISSTA 2025.
GUIPilot detects inconsistencies between mobile app designs and their implementations. It addresses two main types of inconsistencies: screen and process inconsistencies, using a combination of widget alignment and vision-language models. We’re continuously updating this repository. Stay tuned for more developments!
-
Screen Inconsistency Detection:
- Detects differences between the actual and expected UI appearance.
- Converts the screen-matching problem into an optimizable widget alignment task.
-
Process Inconsistency Detection:
- Detects discrepancies between the actual and expected UI transitions after an action.
- Translates natural language descriptions of transitions in mockups into stepwise actions (e.g., clicks, long-presses, text inputs).
- Utilizes a vision-language model to infer actions on the real screen, ensuring that the expected transitions occur in the app.
This repository contains three components:
- The core module (
/guipilot). - The datasets module (
/dataset), which records the dataset repositories. - The experiments module (
/experiments), which supports the research questions 1-4 as presented in the paper.
The core GUIPilot module is organized as follows:
/agent: Handles the action completion using a Vision-Language Model (VLM) agent/matcher: Pairs widgets across two different screens for comparison/checker: Detects bounding box, color, and text inconsistencies between widget pairs/entities: Defines Process, Screen, Widget, and Inconsistency entities used throughout the module/models: Contains OCR and widget detection models
- Python 3.12+
- uv package manager
- NVIDIA GPU with CUDA 12.6-compatible driver (driver ≥ 525.85)
git clone https://github.com/code-philia/GUIPilot.git
cd GUIPilot
uv syncuv sync creates .venv/ and installs all dependencies, including GPU-enabled builds of PyTorch (cu126) and PaddlePaddle (cu126) from their respective wheel indexes configured in pyproject.toml.
GPU setup is fully automatic. guipilot/__init__.py pre-loads all NVIDIA CUDA shared libraries from the venv at import time, so torch and paddle find them without any LD_LIBRARY_PATH configuration.
Each directory within /experiments includes a README.md file with instructions for setting up datasets and running the experiment.
-
Place your mock/real screenshot pairs in
output_images/, named<name>_mock.<ext>and<name>_real.<ext>(e.g.login_mock.jpg/login_real.jpg). -
Run:
uv run run_checks.py --input output_images/ --output results/
-
Results are written to
<output>/:results.csv— per-pair inconsistency tablevisualizations/<name>.jpg— side-by-side annotated images (mock | real)- Green boxes: matched widget pairs with no inconsistency
- Yellow boxes: matched pairs with a detected inconsistency
- Red boxes: unmatched widgets (missing in real / excess in real)
The YOLO widget detector weights are downloaded automatically from HuggingFace on first run.
The OCR language defaults to ch (Chinese & English). To change it, set OCR_LANG in .env or the shell before running. See PaddleOCR documentation on supported languages.
Refer to /experiments/rq1_screen_inconsistency/main.py for a complete working example.
Each Screen instance requires an RGB screenshot (numpy.ndarray). Widgets can be loaded from a JSON annotation file or auto-detected.
Option A — auto-detect with GUIPilot’s built-in models:
import cv2
from guipilot.entities import Screen
screenA = Screen(cv2.imread(pathA))
screenB = Screen(cv2.imread(pathB))
screenA.detect(); screenA.ocr()
screenB.detect(); screenB.ocr()Option B — load widgets from a JSON annotation file:
import cv2, json
from guipilot.entities import Bbox, Widget, WidgetType, Screen
def load_widgets(path):
raw = json.load(open(path, encoding="utf-8"))
return {
i: Widget(type=WidgetType(item["type"]), bbox=Bbox(*item["bbox"]))
for i, item in enumerate(raw)
}
screenA = Screen(cv2.imread(pathA), load_widgets(widgetsA_path))
screenB = Screen(cv2.imread(pathB), load_widgets(widgetsB_path))from guipilot.matcher import GUIPilotV2 as Matcher
from guipilot.checker import GVT as Checker
pairs, _, match_time = Matcher().match(screenA, screenB)
inconsistencies, check_time = Checker().check(screenA, screenB, pairs)If you find our work useful, please consider citing our work.
@article{liu2025guipilot,
title={GUIPilot: A Consistency-Based Mobile GUI Testing Approach for Detecting Application-Specific Bugs},
author={Liu, Ruofan and Teoh, Xiwen and Lin, Yun and Chen, Guanjie and Ren, Ruofei and Poshyvanyk, Denys and Dong, Jin Song},
journal={Proceedings of the ACM on Software Engineering},
volume={2},
number={ISSTA},
pages={753--776},
year={2025},
publisher={ACM New York, NY, USA}
}