Skip to content

mustafamm072/radreport_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

radreport-parser

Parse radiology free-text reports into structured data. No ML. No GPU. No dependencies.

PyPI version Python 3.9+ License: MIT

Radiology reports come out as free-text PDFs. Downstream systems — EMRs, telehealth portals, billing platforms, research pipelines — need structured data. This library bridges that gap.

Three things it does well:

  1. Parse — splits any free-text report into labeled sections, extracts measurements, links findings to anatomy
  2. Detect — flags critical/urgent findings with negation awareness (no false alerts for "no pneumothorax")
  3. Export — outputs FHIR R4 DiagnosticReport resources ready for any EMR

Install

pip install radreport-parser

Zero required dependencies. Works on Python 3.9+.


Quick Start

from radreport_parser import ReportParser, CriticalFindingsDetector, FHIRExporter
import json

report_text = """
INDICATION: Chest pain, rule out PE.

FINDINGS:
Lungs: Filling defect in the right main pulmonary artery consistent with
pulmonary embolism. No pneumothorax.

IMPRESSION:
Pulmonary embolism, right main pulmonary artery. Urgent correlation recommended.
"""

# 1. Parse
parser = ReportParser()
report = parser.parse(report_text, modality="CT")

print(report.impression)
# → "Pulmonary embolism, right main pulmonary artery. Urgent correlation recommended."

# 2. Detect critical findings
detector = CriticalFindingsDetector()
report = detector.detect(report)

for cf in report.critical_findings:
    if not cf.negated:
        print(f"[{cf.severity.upper()}] {cf.term} ({cf.category})")
        print(f"  Context: {cf.context}")
# → [CRITICAL] pulmonary embolism (pulmonary)
#     Context: Filling defect in the right main pulmonary artery consistent with pulmonary embolism.

# 3. Export to FHIR
exporter = FHIRExporter()
fhir = exporter.export(report, patient_id="pt-001")
print(json.dumps(fhir, indent=2))

CLI

After installation, the radreport command is available for single-file and batch processing:

# Parse a single report to JSON
radreport report.txt

# Parse with critical findings detection
radreport report.txt --critical

# Export as FHIR DiagnosticReport
radreport report.txt --fhir --patient-id pt-001 --modality CT

# Batch process multiple files → JSON array
radreport reports/*.txt --critical -o batch.json

# Specify modality for all files
radreport *.txt --modality MRI --fhir -o fhir_batch.json

Flags:

Flag Short Description
--modality MOD -m CT, MRI, XR, US, NM, PET …
--critical -c Run critical findings detection
--fhir -f Export as FHIR R4 DiagnosticReport (implies --critical)
--patient-id ID FHIR Patient resource ID
--output FILE -o Write output to file instead of stdout

Parsing

Sections

The parser recognizes standard radiology report sections regardless of formatting style:

Section key Matched headers
indication Indication, Clinical Indication, History, Reason for Exam
technique Technique, Procedure, Protocol
comparison Comparison, Prior Study, Previous
findings Findings, Observations
impression Impression, Conclusion, Assessment, Diagnosis
recommendation Recommendation, Follow-up, Advised
report = parser.parse(text, modality="MRI")

findings = report.get_section("findings")
print(findings.raw_text)

impression = report.get_section("impression")
print(impression.raw_text)

Measurements

All measurements are extracted and normalized to millimeters:

for m in report.all_measurements:
    print(f"  Raw: {m.raw}")
    print(f"  Normalized (mm): {m.dimensions_mm}")
    print(f"  Largest dimension: {m.largest_dimension_mm} mm")

# Raw: 2.3 x 1.8 cm
# Normalized (mm): [23.0, 18.0]
# Largest dimension: 23.0 mm

Handles: 1.2 x 0.8 cm, 12mm, 1.2cm, 12 x 8 x 5 mm, 1.2 x 0.8 x 0.5 cm

Findings by anatomy

findings_section = report.get_section("findings")
for finding in findings_section.findings:
    print(f"Anatomy: {finding.anatomy or 'unspecified'}")
    print(f"Text: {finding.text}")

Batch processing

reports = parser.parse_batch(list_of_texts, modality="CT")
# Returns list[ParsedReport | None] — None for empty/unparseable inputs
active = [r for r in reports if r is not None]

JSON serialization

report = parser.parse(text, modality="CT")

# As dict
d = report.to_dict()

# As JSON string (shorthand)
json_str = report.to_json()
json_str = report.to_json(indent=4)

Critical Findings Detection

Rule-based. Fully auditable. No black boxes.

Covers 45+ terms across 8 categories:

Category Examples
vascular aortic dissection, DVT, aortic aneurysm
pulmonary pulmonary embolism, PE, pneumothorax, hemothorax
neuro subdural hematoma, midline shift, intracranial hemorrhage
abdominal free air, bowel perforation, appendicitis
cardiac cardiac tamponade, pericardial effusion
spinal cord compression, cervical fracture
oncologic malignancy, metastasis, carcinoma

Negation awareness

# "No pneumothorax identified" → negated=True, won't trigger alert
# "Pneumothorax present" → negated=False, triggers alert

active = [cf for cf in report.critical_findings if not cf.negated]

Severity levels

  • critical — requires immediate action (PE, subdural hematoma, pneumothorax)
  • urgent — requires same-day follow-up (DVT, bowel obstruction, appendicitis)
  • significant — requires follow-up (malignancy, metastasis)

Extending the term list

from radreport_parser.critical_findings import CRITICAL_TERMS

CRITICAL_TERMS["tension pneumothorax"] = ("pulmonary", "critical")
CRITICAL_TERMS["septic emboli"] = ("vascular", "urgent")

FHIR Export

Outputs a valid FHIR R4 DiagnosticReport resource.

from datetime import datetime

fhir = exporter.export(
    report,
    patient_id="pt-001",       # Optional: links to FHIR Patient resource
    report_id="rpt-20240315",   # Optional: custom resource ID
    issued_dt=datetime.now(),   # Optional: defaults to UTC now
)

What's included

  • resourceType: DiagnosticReport
  • status: final
  • code: LOINC code matched to modality (CT, MRI, US, etc.)
  • conclusion: impression text
  • presentedForm: full report text as base64 attachment
  • contained: FHIR Observations for each active (non-negated) critical finding
  • extension: structured sections for downstream parsing
  • subject: patient reference (when patient_id provided)

Full Pipeline Example

import json
from radreport_parser import ReportParser, CriticalFindingsDetector, FHIRExporter

parser   = ReportParser()
detector = CriticalFindingsDetector()
exporter = FHIRExporter()

def process_report(text: str, modality: str, patient_id: str) -> dict:
    report = parser.parse(text, modality=modality)
    report = detector.detect(report)

    active_criticals = [cf for cf in report.critical_findings if not cf.negated]
    if active_criticals:
        print(f"WARNING: {len(active_criticals)} critical finding(s) detected")

    return exporter.export(report, patient_id=patient_id)

fhir_json = process_report(report_text, modality="CT", patient_id="pt-001")
print(json.dumps(fhir_json, indent=2))

See full_pipeline.py for a runnable end-to-end example.


Design Principles

No dependencies. The library installs with no third-party packages. This matters in hospital environments where every dependency goes through security review.

Rule-based, not ML-based. Every decision the library makes is traceable to a specific rule. No model weights, no GPU, no probabilistic outputs. Clinical teams can audit exactly why a finding was flagged.

Negation-aware. A library that can't distinguish "no pneumothorax" from "pneumothorax" is dangerous in clinical contexts. Negation detection is built into the core.

FHIR-first output. Every modern EMR speaks FHIR. The export format is designed to drop into existing integrations without transformation.


Running Tests

pip install radreport-parser[dev]
pytest tests/ -v

Roadmap

  • CLI tool for single-file and batch processing (radreport command)
  • parse_batch() API for processing lists of reports
  • to_json() convenience method on ParsedReport
  • Template matching for common report types (Chest XR, CT Abdomen, MRI Brain)
  • Structured output for follow-up recommendations
  • Additional FHIR resource types (ImagingStudy, Condition)
  • CSV export mode for research/analytics workflows

Disclaimer

This library is a developer tool for structuring report text. It is not a medical device and is not intended for direct clinical decision-making. Critical findings detection is designed to assist human review workflows, not replace radiologist judgment.


License

MIT

About

Parse radiology free-text reports into structured data. No ML. No GPU. No dependencies.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages