We use uv for dependency management.
On macOS and Linux:
curl -LsSf https://astral.sh/uv/install.sh | shSee the uv installation docs for other options.
uv syncTo include the optional data collection dependencies:
uv sync --extra datacollectionPre-collected data and trained checkpoints are available on Google Drive.
Download and place them at the repo root so the layout looks like:
MirrorGaze/
├── data/
│ └── collected/
└── checkpoints/
With these in place you can skip data collection and start at Pre-process Data.
Data collection requires the optional datacollection extras (macOS only — uses AppKit to place the stimulus window on a second display):
uv sync --extra datacollectionThen run:
cd scripts
uv run "1. Data Collection.py"What this does:
- Prompts for a participant ID (auto-increments based on existing folders in
data/collected/). - Opens a fullscreen stimulus window on a secondary display (rotated 90° CW for a portrait monitor) showing a gray dot, and a live camera feed in a second window.
- Press
rin the terminal to start a 30-second recording. The dot moves smoothly between random screen positions (~2 seconds per segment) while the webcam records. - Press
ragain to begin another recording; pressqto quit. - Each recording saves a
video_N.mp4andlog_N.csv(columns:frame_id, dot_x, dot_y, timestamp_ms) intodata/collected/p<ID>/.
Example collected data is included under data/collected/.
cd scripts
uv run "3. Visualize Collected Data.py"The input to the MirrorGaze model is a pair of left/right eye images that have been cropped, scaled, and rotated to a canonical orientation, then concatenated side-by-side. This script creates these images from the collected data.
cd scripts
uv run "2. Make Crops.py"What this does:
- Scans
data/collected/for everyp<ID>/video_N.mp4+log_N.csvpair. - For each frame, runs MediaPipe Face Landmarker to locate the eyes, then crops, scales (to a fixed inter-corner eye width), and rotates each eye to a canonical orientation.
- Writes the concatenated left|right crop as
data/crops/p<ID>_<N>/<frame_id>.png, matching theframe_idcolumn in the log CSV for downstream linkage. - Also writes a per-recording
blinks.csvwith eye-openness metrics and blink flags; blink frames are logged but no PNG is saved.
For a live demonstration of this preprocessing pipeline, see scripts/0. Live Eye Crop Demo.py:
cd scripts
uv run "0. Live Eye Crop Demo.py"cd scripts
uv run "5. Train MirrorGaze.py"- Dataset:
dataset.py— loads the concatenated eye crops fromdata/crops/and their ground-truth dot positions from the matchinglog_N.csv, normalizing the(dot_x, dot_y)targets by the device canvas size (806 × 1194). - Model:
model.py— afastvit_t8backbone (pretrained on ImageNet, viatimm) with a 2-layer MLP head that regresses normalized gaze(x, y). Trained with MSE loss; validation also reports error in cm (using the on-screen pixel→cm scale). - Train/test split is done by participant (the lists at the top of the train script) so the model is evaluated cross-user rather than cross-frame.
- Logging is via Weights & Biases (project
MirrorGaze). Log in withwandb loginbefore the first run. - Checkpoints are written to
checkpoints/MirrorGaze-<timestamp>/, with the best-by-val_losspaths recorded inbest_model.txt.
Edit the all_pids / _test_pids lists at the top of the script to change which participants are held out.
Render a side-by-side video comparing ground-truth vs. predicted gaze for a held-out participant's recordings:
cd scripts
uv run "6. Demo.py" <checkpoint_path> <pid>Example:
uv run 6.\ Demo.py ../checkpoints/MirrorGaze-03172026-233857/epoch=19-val_loss=0.01027-val_error_cm=1.750.ckpt 22For each recording belonging to participant <pid>, the script:
- Loads the trained checkpoint (uses MPS on Apple Silicon, CUDA if available, else CPU).
- Runs MediaPipe + the same crop preprocessing as training, feeds the concatenated eye crop to the model, and converts the normalized output back to canvas pixels.
- Writes an MP4 to
results/demo_p<pid>_<rec>_<ckpt>.mp4showing the canvas with the GT dot (green) and predicted dot (red) on the left, and the eye crops on the right.