Skip to content

FIGLAB/MirrorGaze

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MirrorGaze

Setup

We use uv for dependency management.

Install uv

On macOS and Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

See the uv installation docs for other options.

Install dependencies

uv sync

To include the optional data collection dependencies:

uv sync --extra datacollection

Download Data and Checkpoints

Pre-collected data and trained checkpoints are available on Google Drive.

Download and place them at the repo root so the layout looks like:

MirrorGaze/
├── data/
│   └── collected/
└── checkpoints/

With these in place you can skip data collection and start at Pre-process Data.

1. Collect Data

Data collection requires the optional datacollection extras (macOS only — uses AppKit to place the stimulus window on a second display):

uv sync --extra datacollection

Then run:

cd scripts
uv run "1. Data Collection.py"

What this does:

  • Prompts for a participant ID (auto-increments based on existing folders in data/collected/).
  • Opens a fullscreen stimulus window on a secondary display (rotated 90° CW for a portrait monitor) showing a gray dot, and a live camera feed in a second window.
  • Press r in the terminal to start a 30-second recording. The dot moves smoothly between random screen positions (~2 seconds per segment) while the webcam records.
  • Press r again to begin another recording; press q to quit.
  • Each recording saves a video_N.mp4 and log_N.csv (columns: frame_id, dot_x, dot_y, timestamp_ms) into data/collected/p<ID>/.

Example collected data is included under data/collected/.

Visualize collected data

cd scripts
uv run "3. Visualize Collected Data.py"

2. Pre-process Data

The input to the MirrorGaze model is a pair of left/right eye images that have been cropped, scaled, and rotated to a canonical orientation, then concatenated side-by-side. This script creates these images from the collected data.

cd scripts
uv run "2. Make Crops.py"

What this does:

  • Scans data/collected/ for every p<ID>/video_N.mp4 + log_N.csv pair.
  • For each frame, runs MediaPipe Face Landmarker to locate the eyes, then crops, scales (to a fixed inter-corner eye width), and rotates each eye to a canonical orientation.
  • Writes the concatenated left|right crop as data/crops/p<ID>_<N>/<frame_id>.png, matching the frame_id column in the log CSV for downstream linkage.
  • Also writes a per-recording blinks.csv with eye-openness metrics and blink flags; blink frames are logged but no PNG is saved.

Visualize Model input

For a live demonstration of this preprocessing pipeline, see scripts/0. Live Eye Crop Demo.py:

cd scripts
uv run "0. Live Eye Crop Demo.py"

3. Train MirrorGaze Model

cd scripts
uv run "5. Train MirrorGaze.py"
  • Dataset: dataset.py — loads the concatenated eye crops from data/crops/ and their ground-truth dot positions from the matching log_N.csv, normalizing the (dot_x, dot_y) targets by the device canvas size (806 × 1194).
  • Model: model.py — a fastvit_t8 backbone (pretrained on ImageNet, via timm) with a 2-layer MLP head that regresses normalized gaze (x, y). Trained with MSE loss; validation also reports error in cm (using the on-screen pixel→cm scale).
  • Train/test split is done by participant (the lists at the top of the train script) so the model is evaluated cross-user rather than cross-frame.
  • Logging is via Weights & Biases (project MirrorGaze). Log in with wandb login before the first run.
  • Checkpoints are written to checkpoints/MirrorGaze-<timestamp>/, with the best-by-val_loss paths recorded in best_model.txt.

Edit the all_pids / _test_pids lists at the top of the script to change which participants are held out.

4. Demo

Render a side-by-side video comparing ground-truth vs. predicted gaze for a held-out participant's recordings:

cd scripts
uv run "6. Demo.py" <checkpoint_path> <pid>

Example:

uv run 6.\ Demo.py ../checkpoints/MirrorGaze-03172026-233857/epoch=19-val_loss=0.01027-val_error_cm=1.750.ckpt 22

For each recording belonging to participant <pid>, the script:

  • Loads the trained checkpoint (uses MPS on Apple Silicon, CUDA if available, else CPU).
  • Runs MediaPipe + the same crop preprocessing as training, feeds the concatenated eye crop to the model, and converts the normalized output back to canvas pixels.
  • Writes an MP4 to results/demo_p<pid>_<rec>_<ckpt>.mp4 showing the canvas with the GT dot (green) and predicted dot (red) on the left, and the eye crops on the right.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages