Skip to content

ND-SaNDwichLAB/coding-agent-misalignment

Repository files navigation

Coding Agent Misalignment

Replication package for the paper "How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions". Read the full paper on arXiv.

Due to copyright considerations, raw chat traces are not redistributed; only episodes from repositories whose licenses explicitly permit redistribution (e.g., MIT, Apache-2.0) are released, while those from non-permissively licensed repositories are used only in aggregate analysis.

We provide an interactive viewer for the identified misalignment cases, as well as the annotation labels, available at Coding Agent Misalignment Atlas. The viewer includes only cases from permissively licensed repositories, with all personally identifiable information removed.

Modules

  • session-formatting/: preprocessing step. Formats parsed sessions into LLM-ready text files for extraction.
  • batch-runner/: reusable OpenAI Batch toolkit (build, submit, check, download, retry, postprocess).
  • misalignment-extraction/: extracts candidate misalignment episodes from formatted sessions.
  • misalignment-validation/: validates extracted episodes and filters unsupported cases.
  • misalignment-annotation/: multi-axial annotation of validated episodes.
  • data-aggregation/: aggregates intermediate outputs into downstream analysis tables; see data specs in this folder.
  • distribution-analysis/: notebooks and utilities for paper figures/tables and analysis-ready outputs.
  • misalignment-viewer/: static viewer for browsing the misalignment corpus.
  • workspace/: expected data layout (not distributed here) for repository/session-level inputs.
  • misalignments.json: aggregated list of all identified misalignment episodes with metadata and annotations, filtered to include only those from permissively licensed repositories.

Minimal Pipeline Order

  1. Session preprocessing: session-formatting/
  2. Extraction: misalignment-extraction/
  3. Validation: misalignment-validation/
  4. Annotation: misalignment-annotation/
  5. Aggregation: data-aggregation/
  6. Distribution analysis: distribution-analysis/
  7. Misalignment viewer: misalignment-viewer/

Data Layout (Expected)

Typical structure under workspace/:

workspace/
└── {repo_id}/
    ├── session_parsed/    # Per-session parsed chat records
    │   ├── session_001.json
    │   └── ...
    ├── session_formatted/ # Per-session formatted chat records for LLM analysis
    │   ├── session_001.txt
    │   └── ...
    └── meta.json          # Repository metadata (e.g., name, language, session count)

Citation

@article{tang2026coding,
  title={How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions},
  author={Tang, Ningzhi and Chen, Chaoran and Xu, Gelei and Shi, Yiyu and Huang, Yu and McMillan, Collin and Dong, Tao and Li, Toby Jia-Jun},
  journal={arXiv preprint arXiv:2605.29442},
  year={2026}
}

About

Replication package for How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors