Skip to content

MarkD1Zzz/workflow-tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

workflow-tracker

Automatic experiment & workflow tracker for ML/AI projects — no manual logging needed. Detects experiments in real time, stores structured records, generates progress reports on demand.

License: MIT Skills.sh

中文文档


Why?

Common pains in ML experimentation:

  • Ran dozens of experiments, forgot the parameters and conclusions two weeks later
  • Notes scattered across chat logs, terminal output, and memory — impossible to aggregate
  • Writing weekly reports means digging through all experiment records from scratch
  • Paper CHANGELOGs and experiment logs have inconsistent formats

workflow-tracker solves all of these automatically — detects experiment activity and silently records it. Zero extra effort.


Install

npx skills add workflow-tracker -g

Or from a local clone:

git clone https://github.com/MarkD1Zzz/workflow-tracker.git
npx skills add ./workflow-tracker -g

Requires: Node.js ≥ 18, Claude Code or compatible agent.


Features

Auto-Detect & Silent Logging

Triggers automatically on these signals, without interrupting your workflow:

  • Running training/evaluation scripts
  • Metric changes (accuracy, F1, loss, etc.)
  • Parameter changes ("change lr to 0.001")
  • Verbal experiment conclusions ("tried X, result was Y")

Dual-Mode Output

Project Type Detection Signal Output Files
Engineering data/train/, train.py, pipeline workflow.json + workflow.md
Paper tex/, manuscript, figures/ CHANGELOG.md + experiment_log.md

Three-Level Structure

Phase → Task → Experiment

Each experiment auto-extracts: Hypothesis / Method / Parameters / Results (with delta) / Conclusion / Tags

Report Generation

Say "generate report" to produce:

  • Paper project: Update CHANGELOG.md + experiment_log.md
  • Engineering project: Generate .docx.json + .pptx.json intermediate format (render with any tool later)

Examples

Scenario 1: Engineering — Classifier Swap

You: Swapped Stage 2 MLP for SVM(linear, C=1). Accuracy: 93.75% → 94.79%. SVM is deterministic.

Claude: Recorded. SVM(linear) → SUCCESS, delta +1.04pp.
       → workflow.json + workflow.md updated

Scenario 2: Paper — Ablation Study

You: Finished attention module ablation. SE 94.2%, CBAM 94.8%, FAA 96.1%.

Claude: Paper project detected.
       → CHANGELOG.md appended with timeline entry
       → experiment_log.md appended with detailed record

Scenario 3: Report Generation

You: Generate a progress report for the last two weeks.

Claude: Generated report_20260614.docx.json + report_20260614.pptx.json
        Run node render.js or python render.py to produce final files.

Output Formats

workflow.json (Engineering)

{
  "project": "Welding Defect Classification",
  "updated": "2026-06-14T14:30",
  "phases": [{
    "name": "Phase 1: Accuracy Optimization",
    "status": "in_progress",
    "tasks": [{
      "name": "Task 1.1: Classifier Replacement",
      "status": "completed",
      "experiments": [{
        "date": "2026-06-14",
        "title": "SVM(linear) replaces MLP",
        "method": "SVC(kernel='linear', C=1, class_weight='balanced')",
        "params": {"kernel": "linear", "C": 1},
        "results": {"baseline": 93.75, "new": 94.79, "delta": 1.04},
        "conclusion": "SUCCESS",
        "tags": ["classifier", "svm", "breakthrough"]
      }]
    }]
  }]
}

CHANGELOG.md (Paper)

## 2026-06-14 — Attention Module Ablation

### Background
Comparing SE / CBAM / FAA attention modules on NEU-DET.

### Results
| Module | Accuracy | Delta vs SE |
|--------|----------|-------------|
| SE     | 94.2%    | baseline    |
| CBAM   | 94.8%    | +0.6pp      |
| FAA    | 96.1%    | +1.9pp      |

### Conclusion
FAA significantly outperforms SE and CBAM. Ablation validates the attention redundancy hypothesis.

How It Works

  1. Project Type Detection: Scans directory structure (tex/→paper, data/train/→engineering)
  2. Signal Detection: Matches experiment keywords + numeric change patterns in conversation
  3. Batch Writing: Accumulates experiments, writes once per round (avoids excessive IO)
  4. Delta Auto-Calculation: Computes difference whenever old and new values appear
  5. Tag Auto-Classification: Assigns tags like architecture, hyperparameter-tuning, classifier, data-augmentation based on method type

Use Cases

  • Deep learning model training & tuning
  • Academic paper ablation study management
  • GAN/VAE/Diffusion model iteration
  • Computer vision classification/detection/segmentation
  • Any ML workflow that needs "what was tried → what happened → what it means" tracking

Sub-Skills

manuscript-check — Academic Manuscript Integrity Verifier

A structured six-step audit pipeline for academic manuscripts. Designed to catch the errors that survive multiple rounds of self-editing: stale counts after table row deletion, figure scripts holding outdated metric values, loss-function mismatches in comparison charts, and narrative claims that drifted out of sync with the data.

What it checks:

Audit Target Example
Data provenance Table rows traceable to actual experiments "Was every ablation row actually run?"
Architecture attribution Component naming matches original source "Is this module ours or a cited baseline?"
Cross-file consistency Manuscript ↔ figure scripts ↔ experiment logs F1 values in fig1_scatter() matching Table 2
Benchmark fairness Same loss function across compared methods "CE Loss baselines vs. CB Focal Loss ours in one chart?"
Stale counts "N comparisons/runs/variants" after row deletion \multirow{N} still correct after removing rows
Narrative drift Claims between sections don't contradict "Backbone-dependent" vs. "universally redundant"

Workflow: Source verification → impact-surface grep analysis → one-pass batch editing (tex + tables + figure scripts) → residue sweep → script-to-table data cross-audit → memory persistence.

Trigger signals: "verify X", "was this experiment actually run?", "X never existed in this architecture", "do the figures need updating?", "check consistency".

Scope: Auto-detects the active paper project directory. Architecture facts are extracted and cached per project on first run, not hardcoded.


Repo Structure

workflow-tracker/
├── SKILL.md               # Main skill file (Claude Code entry point)
├── SKILL_EN.md            # English skill definition
├── README.md              # This file (English)
├── README_zh.md           # Chinese documentation
├── LICENSE                # MIT
├── evals.json             # 6 test cases, 25 assertions
├── .gitignore
└── manuscript-check/      # Sub-skill: paper manuscript integrity checker
    └── SKILL.md           # Six-step verification workflow

Development

Running Tests

cd workspace/iteration-2
python grade_all.py

Benchmark (v2)

Metric Value
Avg Response Time 131s
Avg Tokens 27k
Pass Rate (6 evals) 100%
Paper Mode
JSON Intermediate Format

License

MIT © 2026


Credits

Built on the Claude Code Skills framework. Inspired by real-world experiment management needs across computer vision, defect detection, and generative model research.


Changelog

v1.2.0 (2026-06-17)

  • Enhanced: manuscript-check — script-to-table data cross-audit, benchmark fairness checks, known-pitfalls reference table
  • Fixed: Removed all hardcoded local paths and project-specific facts; replaced with auto-detection and per-project caching
  • Improved: README sub-skill documentation with audit-dimension table and concrete examples

v1.1.0 (2026-06-16)

  • New: manuscript-check sub-skill — six-step paper manuscript integrity verification
    • Source-to-manuscript cross-referencing with grep impact analysis
    • Multi-section batch editing (tex + tables + figure scripts)
    • Post-edit residue checking + narrative consistency audit
    • Automatic memory file persistence
  • Improved: Bootstrap CHANGELOG.md + experiment_log.md on first paper project load

v1.0.0 (2026-06-14)

  • Initial release
  • Auto-detect & silent logging for ML experiments
  • Dual-mode output: engineering (workflow.json + workflow.md) / paper (CHANGELOG.md + experiment_log.md)
  • Three-level structure: Phase → Task → Experiment
  • Report generation: .docx.json + .pptx.json intermediate format

About

Automatic experiment & workflow tracker for ML/AI projects

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors