CarbonOps-Parser

Auditable public carbon emission factor ingestion and validation for climate-tech data infrastructure.

CarbonOps-Parser is a public, reviewable climate-tech data ingestion project for carbon accounting source data. It focuses on auditable ingestion, parsing, validation, diagnostics, and PostgreSQL operation for public emission factors from GHG Protocol, DEFRA/DESNZ, and IPCC EFDB, with parallel Python and .NET runtime evidence. The Python package includes a configured PostgreSQL ingestion runtime for operator-managed deployments. CarbonOps-Parser is project-level production-ready in the narrow supported scope documented in the final verdict: operator-run/scheduled Python ingestion, PostgreSQL-backed source-specific persistence, and .NET parity evidence through service entrypoint, config/redaction, schema/year-state, source-cycle orchestration, persistence, Docker PostgreSQL E2E, and persisted parity validation. The repository is intentionally conservative: default examples are deterministic and local-only, production runs require explicit configuration and credentials, and the project does not claim production carbon-accounting, legal, compliance, source-owner, or factor correctness.

The project is independent from carbonops-assistant. It is not a continuation, module, plugin, or dependency of that project.

Start Here

Production operator runbook - supported Python operator path, PostgreSQL readiness, cron scheduling, validation, and troubleshooting.
Final project production-ready verdict - narrow production-ready scope and explicit non-claims.
Production parity contract - Python production path and .NET parity evidence.
Python runtime docs - local Docker PostgreSQL ingestion runbook for the packaged Python path.
.NET runtime docs - .NET Worker Service path and parity-oriented runtime notes.
Database model and PostgreSQL startup - shared metadata plus source-specific table groups.
Documentation index - curated documentation map for operators, contributors, and reviewers.
Maintainer release/sync checklist - develop-to-main, stale PR/issue cleanup, and first alpha/review readiness.
Contribution guide - issues, features, forks, branches, pull requests, validation, secrets, artifacts, and maintainer-only merge policy.
Issue templates - bug reports, feature requests, documentation requests, and production-readiness questions.
Pull request guide - PR checklist for scope, validation, runtime impact, PostgreSQL impact, docs, secrets, artifacts, and production-ready claims.

Problem Statement

Public carbon emissions workflows often depend on emission factor spreadsheets, databases, and reference documents that change over time and vary by source family. CarbonOps-Parser exists to make carbon factor ingestion reviewable: source identity, version or checksum evidence, parser output, validation issues, persistence readiness, and diagnostics should be visible before any operational use. The project is infrastructure for data ingestion and validation, not an emissions calculator or compliance decision engine.

Current Status

CarbonOps-Parser is in Phase 1 and has a narrow project-level production-ready status for the documented operator path. The repository contains an active Python ingestion runtime, .NET parity evidence, PostgreSQL schema/runtime boundaries, deterministic examples, local dry-run validation, and production operator documentation. The production-ready claim applies only to the scope documented in Final Project Production-Ready Verdict and Production Parity Contract. It is not a published package release.

Explicit Non-Claims

CarbonOps-Parser does not claim to be:

A production carbon-accounting calculator or emissions reporting engine.
Legal, compliance, audit, or regulatory advice.
A source-owner correctness guarantee for GHG Protocol, DEFRA/DESNZ, IPCC EFDB, or any source document.
A universal carbon factor model across all source families.
A published package, unless release/package files and repository releases prove otherwise.

Area	Phase 1 completed capabilities	Phase 2 roadmap
Source families	Local fixture and contract coverage for GHG Protocol, DEFRA/DESNZ, and IPCC EFDB boundaries.	Broader source onboarding rules, fixture policy, and source-family hardening slices.
Python	Source acquisition contracts, parser contracts, DEFRA/DESNZ fixture parser path, normalization handoff, persistence previews, diagnostics, and local dry-run CLI.	Runtime hardening, richer validation, controlled source expansion, and opt-in execution boundaries.
.NET	Service entrypoint, config/redaction, PostgreSQL schema/year-state, source-cycle orchestration, source-specific persistence, Docker PostgreSQL E2E, and persisted parity validation baselines.	Runtime parity review where shared behavior changes; package/service promotion remains separately scoped.
PostgreSQL	Schema descriptors, DDL preview, additive runtime bootstrap, configured Python source-family writes, idempotent duplicate skipping, and opt-in integration boundaries.	Broader migration, rollback, recovery, and operational hardening slices.
Safety posture	Local-only examples, non-destructive dry runs, preview-only SQL, no default network calls, and no production credentials.	Release-gate expansion and production-readiness reviews before live source or write-path promotion.

Users who clone or fork the repository should be able to inspect either implementation path without relying on production infrastructure.

Phase 1 Scope

Phase 1 focuses on scheduled ingestion and parsing for:

Source family	Public discovery value	Phase 1 posture
GHG Protocol	Greenhouse Gas Protocol tools and factor workbooks used in carbon accounting workflows.	Source discovery/download contracts, parser contracts, normalized content parser boundaries, and parity tests.
DEFRA/DESNZ	UK government conversion factors used for carbon emissions and greenhouse gas reporting workflows.	Deterministic local fixture parser and normalization path plus source discovery/download contracts.
IPCC EFDB	IPCC Emission Factor Database source family with heterogeneous emission factor records.	Source discovery/download contracts, parser contracts, normalized content parser boundaries, and parity tests.

The intended Phase 1 workflow is:

Read configuration.
Validate the database provider.
Connect to PostgreSQL.
Check whether required tables exist.
Create missing tables if needed.
Initialize source schedules.
Check source version and file hash.
Download a source document when a new version or hash is detected.
Archive the raw source file.
Parse source-specific structures.
Validate parsed records.
Persist shared ingestion metadata and source-specific records.
Store import summaries and validation issues.

Architecture At A Glance

source schedule
  -> version/hash check
  -> download when changed
  -> raw file archive
  -> source-specific parser
  -> validation
  -> PostgreSQL persistence
  -> import summary and validation issues

Phase 1 uses shared ingestion metadata tables plus source-specific master/detail tables. It does not force GHG Protocol, DEFRA/DESNZ, and IPCC EFDB into one canonical factor table. A normalized or search-oriented projection may be considered in a later phase.

The Python path under src/carbonfactor_parser holds the current implementation boundaries for source acquisition, parser execution, normalization, PostgreSQL persistence previews, configured PostgreSQL ingestion, local dry-run composition, and diagnostics. The .NET path under src/dotnet holds shared contract records and parity tests for the same public concepts. PostgreSQL support includes schema descriptors, bootstrap/readiness checks, DDL previews, opt-in integration boundaries, and the Python configured cycle runner. Parity, validation, diagnostics, and non-destructive dry-run behavior are part of the public architecture so reviewers can inspect the handoff from source artifact to parser output to persistence input without connecting to a database or making network calls.

Implementation Options

Python

The Python implementation is the active Phase 1 path for source discovery contracts, parser mapping, validation, normalization handoff, persistence previews, and data engineering workflows.

The active Python runtime path lives under src/carbonfactor_parser and exposes the local dry-run CLI plus the configured carbonops-parser run-ingestion operator command for PostgreSQL-backed source-family ingestion. The initial Python source adapter contracts and in-memory registry live under src/carbonfactor_parser/source_adapters.

.NET

The .NET implementation is an independent Worker Service path that follows the same conceptual workflow with .NET-oriented application structure. The reviewed production scope treats .NET as parity-validated through its service entrypoint, configuration/redaction, PostgreSQL schema/year-state, source-cycle orchestration, source-specific persistence, Docker PostgreSQL E2E, and persisted parity baselines.

See src/dotnet/README.md.

Install And Local Dry-Run Quickstart

From a fresh checkout or local working copy:

git clone <REPOSITORY_URL> CarbonOps-Parser
cd CarbonOps-Parser
python -m pip install -e .

Run the test suite if you want a quick local smoke check:

python -m pytest

Run the checked-in DEFRA/DESNZ fixture through the local dry-run CLI:

carbonops-parser local-dry-run \
  --local-path examples/fixtures/defra_desnz_minimal.csv \
  --source-family defra_desnz \
  --source-id defra-desnz-minimal-fixture \
  --content-type text/csv \
  --format-hint csv

Expected summary:

status=success
parsed_record_count=2
normalization_record_count=2
persistence_input_record_count=2
ddl_preview_present=True
issue_count=0

Run the JSON variant:

carbonops-parser local-dry-run \
  --local-path examples/fixtures/defra_desnz_minimal.csv \
  --source-family defra_desnz \
  --source-id defra-desnz-minimal-fixture \
  --content-type text/csv \
  --format-hint csv \
  --json

Key output fields:

status: dry-run outcome such as success, failed, unsupported, or no_records
parsed_record_count: records parsed by the minimal local DEFRA/DESNZ fixture parser
normalization_record_count: records produced by the minimal fixture normalization mapper
persistence_input_record_count: records prepared as PersistenceInput
ddl_preview_present: whether review-only PostgreSQL DDL preview text is attached
issues: structured local loader, parser, normalization, or persistence-input issues

Optionally include PostgreSQL insert preview data in text output:

carbonops-parser local-dry-run \
  --local-path examples/fixtures/defra_desnz_minimal.csv \
  --source-family defra_desnz \
  --source-id defra-desnz-minimal-fixture \
  --content-type text/csv \
  --format-hint csv \
  --include-postgresql-preview

Trimmed expected preview lines:

postgresql_preview_included=True
postgresql_preview_status=ready
postgresql_preview_only=True
postgresql_preview_sql_execution=False
postgresql_preview_database_connection=False
postgresql_preview_target_table=normalized_records
postgresql_preview_record_count=2
postgresql_preview_sql=INSERT INTO normalized_records (source_family, source_id, record_id, record_index, row_number, normalized_fields, source_reference, source_artifact_reference, source_checksum_sha256, parser_metadata, normalization_metadata, created_at, updated_at) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
postgresql_preview_issue_count=0

Run the JSON PostgreSQL preview variant:

carbonops-parser local-dry-run \
  --local-path examples/fixtures/defra_desnz_minimal.csv \
  --source-family defra_desnz \
  --source-id defra-desnz-minimal-fixture \
  --content-type text/csv \
  --format-hint csv \
  --json \
  --include-postgresql-preview

Trimmed expected JSON preview section:

{
  "postgresql_persistence_preview": {
    "included": true,
    "preview_only": true,
    "sql_execution": false,
    "database_connection": false,
    "status": "ready",
    "target_table": "normalized_records",
    "record_count": 2,
    "ordered_columns": [
      "source_family",
      "source_id",
      "record_id",
      "record_index",
      "row_number",
      "normalized_fields",
      "source_reference",
      "source_artifact_reference",
      "source_checksum_sha256",
      "parser_metadata",
      "normalization_metadata",
      "created_at",
      "updated_at"
    ],
    "idempotency_key_fields": [
      "source_family",
      "source_id",
      "record_id",
      "source_artifact_reference",
      "source_checksum_sha256"
    ],
    "issues": []
  }
}

The postgresql_persistence_preview section is preview-only. It includes the target table, ordered columns, parameter rows, record count, SQL text with placeholders, and idempotency metadata, but it does not execute SQL or persist records. No PostgreSQL server, database configuration, or credentials are required.

This quickstart is local dry-run only. It does not connect to PostgreSQL, write records, execute SQL, run migrations, perform network calls, trigger source acquisition, load config files, or require credentials. It does not make production DEFRA/DESNZ correctness claims.

Production Operator Command

The supported Python production entrypoint is:

carbonops-parser run-ingestion \
  --config /etc/carbonops-parser/ingestion.production.json \
  --cycles 1

Before running it, operators must provide explicit CARBONOPS_POSTGRESQL_* environment values, including the password through an external secret boundary, and validate the config:

carbonops-parser validate-ingestion-config \
  --config /etc/carbonops-parser/ingestion.production.json \
  --cycles 1

See Production Packaging And Operator Runbook for install, configuration, PostgreSQL readiness, cron scheduling, verification SQL, rerun/idempotency checks, and troubleshooting.

This is the supported Python runtime production operator path. Project-level production-ready is limited to the scope in Final Project Production-Ready Verdict and Production Parity Contract.

For boundary details, see Local Dry-Run CLI Boundary, Local File Normalized Persistence Dry-Run Boundary, PostgreSQL Persistence Preview Boundary, and Local Dry-Run Troubleshooting.

To run the packaged Python ingestion cycle against local Docker PostgreSQL with the three checked-in source fixture families, see Python Ingestion Local Runbook.

Developer Tests

Run the lightweight Python test suite from the repository root:

python -m pytest

Pytest configuration is kept in pyproject.toml, including the src package import path used by the tests.

Public API Examples

The carbonfactor_parser.source_adapters package exposes source adapter contracts and lightweight helpers for tests, prototypes, and implementation slices.

Hash source content without reading or downloading files:

from carbonfactor_parser.source_adapters import (
    sha256_hex_from_bytes,
    sha256_hex_from_text,
)

content_hash = sha256_hex_from_bytes(b"sample source content")
note_hash = sha256_hex_from_text("sample metadata note")

Create and validate metadata for an existing local file:

from pathlib import Path

from carbonfactor_parser.source_adapters import (
    SourceFamily,
    build_source_document_from_file,
    validate_source_document_metadata,
)

document = build_source_document_from_file(
    source_family=SourceFamily.DEFRA_DESNZ,
    source_name="Example local factor file",
    file_path=Path("data/raw/example/source.csv"),
)

metadata_issues = validate_source_document_metadata(document)

Create and validate an ingestion summary contract:

from carbonfactor_parser.source_adapters import (
    SourceFamily,
    create_ingestion_run_summary,
    validate_ingestion_run_summary,
)

summary = create_ingestion_run_summary(
    ingestion_id="example-run-001",
    source_family=SourceFamily.DEFRA_DESNZ,
    source_name="Example local factor file",
)

summary_issues = validate_ingestion_run_summary(summary)

Use the artificial-only source acquisition validation pipeline with in-memory metadata:

from carbonfactor_parser import (
    create_artificial_source_acquisition_metadata,
    validate_and_summarize_artificial_source_acquisition_metadata,
)

metadata = create_artificial_source_acquisition_metadata(
    source_family="artificial_source_acquisition",
    logical_source_name="artificial-in-memory-source",
    declared_content_type="text/csv",
    checksum_sha256="a" * 64,
    acquired_at_label="static-artificial-acquisition-label",
)

pipeline_result = validate_and_summarize_artificial_source_acquisition_metadata(
    metadata,
)
issue_count = pipeline_result.summary.total_issue_count

This pipeline is limited to artificial metadata shape checks and deterministic summaries. It does not acquire real sources, read files, validate real source URLs, run parsers or normalization, check factor correctness, or provide compliance/legal or carbon accounting correctness. See docs/artificial-source-acquisition-validation-pipeline.md, docs/artificial-source-acquisition-module-recap.md, and examples/example_artificial_source_acquisition_validation_pipeline.py.

Source acquisition CLI quickstart

Use the carbonops-source-acquisition CLI for local source descriptor checks and acquisition flow previews.

Default run mode is noop and offline.
HTTP mode is opt-in with --client http.
validate checks local descriptor metadata only; it does not verify live URLs.
run --dry-run plans targets only and does not acquire content or write files/manifests.
Parser execution and database persistence are outside this CLI boundary at this phase.

carbonops-source-acquisition validate
carbonops-source-acquisition list
carbonops-source-acquisition list --source-id defra_desnz
carbonops-source-acquisition run --dry-run --base-directory ./data/source-acquisition
carbonops-source-acquisition run --output-format json
carbonops-source-acquisition run --client http --source-id ghg_protocol
carbonops-source-acquisition run --client http --source-id ghg_protocol --persist-content --base-directory ./data/source-acquisition

For boundary details, see:

See examples/example_acquisition_artifact_parser_input_mapping.py for a deterministic in-memory example of mapping acquisition artifact metadata into a future parser input boundary without executing a parser.

The parser package exposes ParserInputContract, create_parser_input_contract(), validate_parser_input_contract(), ParserFileContentInput, local parser file content loading helpers, parser file content validation helpers, parse_defra_desnz_file_content(), raw parsed record payload contracts, the ParserAdapter protocol, NoopParserAdapter, ArtificialParserAdapter, DefraDesnzParserAdapter, parser adapter registry helpers, parser execution planning and runner helpers, and parser execution result contracts for future parser adapter input handoff. The normalization package exposes parser execution handoff helpers, normalization input helpers for successful parser results with raw payloads, and a minimal DEFRA/DESNZ fixture normalization mapper. The persistence package exposes normalized result persistence input contracts, a logical PostgreSQL schema descriptor, a review-only DDL preview helper, a deterministic insert SQL builder, PostgreSQL persistence preview helpers, repository protocol/result contracts, an explicit caller-provided PostgreSQL options contract, a default-disabled PostgreSQL integration test boundary, and a PostgreSQL repository skeleton that returns unsupported results without database runtime behavior. The pipeline package exposes a local DEFRA/DESNZ fixture dry-run helper that composes those boundaries to produce PersistenceInput plus DDL preview metadata without DB or network behavior. These contracts keep acquisition metadata, already-loaded content, raw parser output, parser output metadata, normalization input, normalization handoff metadata, persistence input metadata, schema metadata, repository options metadata, integration test metadata, preview metadata, and repository result metadata separate; they do not include database connection behavior or full source-specific correctness claims.

Examples And Fixtures

The examples entry point is examples/README.md. It identifies deterministic local examples, including the checked-in DEFRA/DESNZ fixture used by the local dry-run quickstart, and separates real examples from future placeholders.

Source Support

Each Phase 1 source family will have its own schedule, source version/hash check, parser, validation rules, archive layout, and source-specific tables.

Source family	Phase 1 role	Table group
GHG Protocol	Source-specific parser and workbook/tool mapping	`ghg_*`
DEFRA/DESNZ	Active checked-in fixture and source-specific ingestion slice	`defra_*`
IPCC EFDB	Heterogeneous source discovery and parser mapping	`ipcc_*`

See docs/source-support.md and docs/source-discovery.md.

Configuration Summary

The conceptual configuration model includes:

Database provider and connection settings.
Raw archive path.
Source-specific enabled flags.
Source-specific schedules with day, week, month, year, time, and timezone support.

Phase 1 implements only postgres as the database provider. mysql and mssql are recognized as conceptual provider names but are not implemented in Phase 1.

See docs/configuration-model.md.

The shared conceptual example lives at config/carbonops.config.example.yaml.

Database Model Summary

PostgreSQL is the Phase 1 persistence target. The model includes:

Shared ingestion metadata tables: carbon_sources, carbon_source_versions, carbon_import_runs, carbon_raw_files, carbon_validation_issues, and carbon_job_locks.
DEFRA/DESNZ tables: defra_categories, defra_subcategories, defra_factor_sets, and defra_factor_values.
GHG Protocol tables: ghg_tools, ghg_factor_sheets, ghg_factor_groups, and ghg_factor_values.
IPCC EFDB tables: ipcc_sectors, ipcc_categories, ipcc_references, ipcc_factor_records, and ipcc_factor_values.

See docs/database-model.md, docs/database-startup.md, and database/postgres/README.md.

PostgreSQL persistence uses shared ingestion metadata plus source-specific master/detail table groups for GHG Protocol, DEFRA/DESNZ, and IPCC EFDB. That layout preserves source-family structure for reviewable carbon emission factor ingestion instead of claiming one universal carbon accounting factor model.

Documentation Map

Roadmap Summary

Near-term work keeps the narrow production-ready scope conservative while separating package publication, infrastructure ownership, live-source expansion, and future runtime promotion into separately reviewed tasks.

See docs/roadmap.md and docs/task-breakdown.md.

Governance

Issues and pull requests are welcome for documentation, examples, parser mappings, source discovery, database schema notes, and implementation improvements.

Non-Goals

CarbonOps-Parser does not:

Calculate carbon inventories.
Produce emissions reports.
Replace source-owner documentation or source files.
Guarantee source data correctness.
Provide a deployment platform.
Normalize all source families into one shared factor table during Phase 1.

License

CarbonOps-Parser is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 1,033 Commits
.agent		.agent
.github		.github
config		config
database/postgres		database/postgres
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CarbonOps-Parser

Start Here

Problem Statement

Current Status

Explicit Non-Claims

Phase 1 Scope

Architecture At A Glance

Implementation Options

Python

.NET

Install And Local Dry-Run Quickstart

Production Operator Command

Developer Tests

Public API Examples

Source acquisition CLI quickstart

Examples And Fixtures

Source Support

Configuration Summary

Database Model Summary

Documentation Map

Roadmap Summary

Governance

Non-Goals

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CarbonOps-Parser

Start Here

Problem Statement

Current Status

Explicit Non-Claims

Phase 1 Scope

Architecture At A Glance

Implementation Options

Python

.NET

Install And Local Dry-Run Quickstart

Production Operator Command

Developer Tests

Public API Examples

Source acquisition CLI quickstart

Examples And Fixtures

Source Support

Configuration Summary

Database Model Summary

Documentation Map

Roadmap Summary

Governance

Non-Goals

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages