Skip to content

resaid-lab/what_breaks_when_LLMs_code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Replication Package: What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants

This repository contains the official replication package, source code, and datasets for the empirical study: "What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants." This package provides all necessary scripts, raw data, and qualitative coding guidelines required to replicate our data collection, filtering, and taxonomy construction for both real-world GitHub incidents and prior academic literature.

🗂️ Project Structure

The repository is organized into the following main directories and files:

📦 replication-package
 ┣ 📜 README.md                         # This documentation file
 ┣ 📜 annotation-guideline.pdf          # The coding manual used by the authors
 ┣ 📂 Figures/                          # Contains all figures presented in the paper
 ┗ 📂 Code/                             # Contains all automated collection and filtering scripts
 ┃ ┣ 📂 Issue-collection/
 ┃ ┃ ┗ 📜 issue_llm_filter.py           # Uses LLMs to filter for genuine operational safety failures
 ┃ ┗ 📂 Paper-collection/
 ┃ ┃ ┣ 📜 keyword_paper_filter.py       # Filters academic literature based on predefined SE safety keywords
 ┃ ┃ ┗ 📜 paper_llm_filter.py           # Uses LLM to assess paper relevance to code generation

Components Overview

1. Annotation Guideline (annotation-guideline.pdf)

This document contains the coding procedures used to map the 547 in-the-wild GitHub incidents and 185 academic papers into our 33-node, 7-dimension safety taxonomy. It includes definitions, inclusion/exclusion criteria, and examples to ensure inter-rater reliability.

2. Figures Directory (Figures/)

This folder contains all six figures presented in this paper.

3. Issue Collection Scripts (Code/Issue-collection/)

Scripts to extract and refine real-world operational failures caused by autonomous coding agents.

  • issue_llm_filter.py: Feeds the parsed issues through LLMs to automatically filter out standard bugs and isolate genuine autonomous safety and execution failures.

4. Paper Collection Scripts (Code/Paper-collection/)

Scripts to collect and filter the academic literature dataset.

  • keyword_paper_filter.py: Applies keyword filter to isolate papers mentioning LLM safety, code generation, and agentic workflows.
  • paper_llm_filter.py: Evaluates the titles and abstracts of the collected papers to determine if their focus is on code generation.

About

This repository contains the replication package for paper titled 'What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants'.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages