Auditable public carbon emission factor ingestion and validation for climate-tech data infrastructure.
CarbonOps-Parser is a public, reviewable climate-tech data ingestion project for carbon accounting source data. It focuses on auditable ingestion, parsing, validation, diagnostics, and PostgreSQL operation for public emission factors from GHG Protocol, DEFRA/DESNZ, and IPCC EFDB, with parallel Python and .NET runtime evidence. The Python package includes a configured PostgreSQL ingestion runtime for operator-managed deployments. CarbonOps-Parser is project-level production-ready in the narrow supported scope documented in the final verdict: operator-run/scheduled Python ingestion, PostgreSQL-backed source-specific persistence, and .NET parity evidence through service entrypoint, config/redaction, schema/year-state, source-cycle orchestration, persistence, Docker PostgreSQL E2E, and persisted parity validation. The repository is intentionally conservative: default examples are deterministic and local-only, production runs require explicit configuration and credentials, and the project does not claim production carbon-accounting, legal, compliance, source-owner, or factor correctness.
The project is independent from carbonops-assistant. It is not a continuation, module, plugin, or dependency of that project.
- Production operator runbook - supported Python operator path, PostgreSQL readiness, cron scheduling, validation, and troubleshooting.
- Final project production-ready verdict - narrow production-ready scope and explicit non-claims.
- Production parity contract - Python production path and .NET parity evidence.
- Python runtime docs - local Docker PostgreSQL ingestion runbook for the packaged Python path.
- .NET runtime docs - .NET Worker Service path and parity-oriented runtime notes.
- Database model and PostgreSQL startup - shared metadata plus source-specific table groups.
- Documentation index - curated documentation map for operators, contributors, and reviewers.
- Maintainer release/sync checklist - develop-to-main, stale PR/issue cleanup, and first alpha/review readiness.
- Contribution guide - issues, features, forks, branches, pull requests, validation, secrets, artifacts, and maintainer-only merge policy.
- Issue templates - bug reports, feature requests, documentation requests, and production-readiness questions.
- Pull request guide - PR checklist for scope, validation, runtime impact, PostgreSQL impact, docs, secrets, artifacts, and production-ready claims.
Public carbon emissions workflows often depend on emission factor spreadsheets, databases, and reference documents that change over time and vary by source family. CarbonOps-Parser exists to make carbon factor ingestion reviewable: source identity, version or checksum evidence, parser output, validation issues, persistence readiness, and diagnostics should be visible before any operational use. The project is infrastructure for data ingestion and validation, not an emissions calculator or compliance decision engine.
CarbonOps-Parser is in Phase 1 and has a narrow project-level production-ready status for the documented operator path. The repository contains an active Python ingestion runtime, .NET parity evidence, PostgreSQL schema/runtime boundaries, deterministic examples, local dry-run validation, and production operator documentation. The production-ready claim applies only to the scope documented in Final Project Production-Ready Verdict and Production Parity Contract. It is not a published package release.
CarbonOps-Parser does not claim to be:
- A production carbon-accounting calculator or emissions reporting engine.
- Legal, compliance, audit, or regulatory advice.
- A source-owner correctness guarantee for GHG Protocol, DEFRA/DESNZ, IPCC EFDB, or any source document.
- A universal carbon factor model across all source families.
- A published package, unless release/package files and repository releases prove otherwise.
| Area | Phase 1 completed capabilities | Phase 2 roadmap |
|---|---|---|
| Source families | Local fixture and contract coverage for GHG Protocol, DEFRA/DESNZ, and IPCC EFDB boundaries. | Broader source onboarding rules, fixture policy, and source-family hardening slices. |
| Python | Source acquisition contracts, parser contracts, DEFRA/DESNZ fixture parser path, normalization handoff, persistence previews, diagnostics, and local dry-run CLI. | Runtime hardening, richer validation, controlled source expansion, and opt-in execution boundaries. |
| .NET | Service entrypoint, config/redaction, PostgreSQL schema/year-state, source-cycle orchestration, source-specific persistence, Docker PostgreSQL E2E, and persisted parity validation baselines. | Runtime parity review where shared behavior changes; package/service promotion remains separately scoped. |
| PostgreSQL | Schema descriptors, DDL preview, additive runtime bootstrap, configured Python source-family writes, idempotent duplicate skipping, and opt-in integration boundaries. | Broader migration, rollback, recovery, and operational hardening slices. |
| Safety posture | Local-only examples, non-destructive dry runs, preview-only SQL, no default network calls, and no production credentials. | Release-gate expansion and production-readiness reviews before live source or write-path promotion. |
Users who clone or fork the repository should be able to inspect either implementation path without relying on production infrastructure.
Phase 1 focuses on scheduled ingestion and parsing for:
| Source family | Public discovery value | Phase 1 posture |
|---|---|---|
| GHG Protocol | Greenhouse Gas Protocol tools and factor workbooks used in carbon accounting workflows. | Source discovery/download contracts, parser contracts, normalized content parser boundaries, and parity tests. |
| DEFRA/DESNZ | UK government conversion factors used for carbon emissions and greenhouse gas reporting workflows. | Deterministic local fixture parser and normalization path plus source discovery/download contracts. |
| IPCC EFDB | IPCC Emission Factor Database source family with heterogeneous emission factor records. | Source discovery/download contracts, parser contracts, normalized content parser boundaries, and parity tests. |
The intended Phase 1 workflow is:
- Read configuration.
- Validate the database provider.
- Connect to PostgreSQL.
- Check whether required tables exist.
- Create missing tables if needed.
- Initialize source schedules.
- Check source version and file hash.
- Download a source document when a new version or hash is detected.
- Archive the raw source file.
- Parse source-specific structures.
- Validate parsed records.
- Persist shared ingestion metadata and source-specific records.
- Store import summaries and validation issues.
source schedule
-> version/hash check
-> download when changed
-> raw file archive
-> source-specific parser
-> validation
-> PostgreSQL persistence
-> import summary and validation issues
Phase 1 uses shared ingestion metadata tables plus source-specific master/detail tables. It does not force GHG Protocol, DEFRA/DESNZ, and IPCC EFDB into one canonical factor table. A normalized or search-oriented projection may be considered in a later phase.
The Python path under src/carbonfactor_parser holds the current implementation boundaries for source acquisition, parser execution, normalization, PostgreSQL persistence previews, configured PostgreSQL ingestion, local dry-run composition, and diagnostics. The .NET path under src/dotnet holds shared contract records and parity tests for the same public concepts. PostgreSQL support includes schema descriptors, bootstrap/readiness checks, DDL previews, opt-in integration boundaries, and the Python configured cycle runner. Parity, validation, diagnostics, and non-destructive dry-run behavior are part of the public architecture so reviewers can inspect the handoff from source artifact to parser output to persistence input without connecting to a database or making network calls.
The Python implementation is the active Phase 1 path for source discovery contracts, parser mapping, validation, normalization handoff, persistence previews, and data engineering workflows.
The active Python runtime path lives under src/carbonfactor_parser and exposes the local dry-run CLI plus the configured carbonops-parser run-ingestion operator command for PostgreSQL-backed source-family ingestion. The initial Python source adapter contracts and in-memory registry live under src/carbonfactor_parser/source_adapters.
The .NET implementation is an independent Worker Service path that follows the same conceptual workflow with .NET-oriented application structure. The reviewed production scope treats .NET as parity-validated through its service entrypoint, configuration/redaction, PostgreSQL schema/year-state, source-cycle orchestration, source-specific persistence, Docker PostgreSQL E2E, and persisted parity baselines.
See src/dotnet/README.md.
From a fresh checkout or local working copy:
git clone <REPOSITORY_URL> CarbonOps-Parser
cd CarbonOps-Parser
python -m pip install -e .Run the test suite if you want a quick local smoke check:
python -m pytestRun the checked-in DEFRA/DESNZ fixture through the local dry-run CLI:
carbonops-parser local-dry-run \
--local-path examples/fixtures/defra_desnz_minimal.csv \
--source-family defra_desnz \
--source-id defra-desnz-minimal-fixture \
--content-type text/csv \
--format-hint csvExpected summary:
status=success
parsed_record_count=2
normalization_record_count=2
persistence_input_record_count=2
ddl_preview_present=True
issue_count=0
Run the JSON variant:
carbonops-parser local-dry-run \
--local-path examples/fixtures/defra_desnz_minimal.csv \
--source-family defra_desnz \
--source-id defra-desnz-minimal-fixture \
--content-type text/csv \
--format-hint csv \
--jsonKey output fields:
status: dry-run outcome such assuccess,failed,unsupported, orno_recordsparsed_record_count: records parsed by the minimal local DEFRA/DESNZ fixture parsernormalization_record_count: records produced by the minimal fixture normalization mapperpersistence_input_record_count: records prepared asPersistenceInputddl_preview_present: whether review-only PostgreSQL DDL preview text is attachedissues: structured local loader, parser, normalization, or persistence-input issues
Optionally include PostgreSQL insert preview data in text output:
carbonops-parser local-dry-run \
--local-path examples/fixtures/defra_desnz_minimal.csv \
--source-family defra_desnz \
--source-id defra-desnz-minimal-fixture \
--content-type text/csv \
--format-hint csv \
--include-postgresql-previewTrimmed expected preview lines:
postgresql_preview_included=True
postgresql_preview_status=ready
postgresql_preview_only=True
postgresql_preview_sql_execution=False
postgresql_preview_database_connection=False
postgresql_preview_target_table=normalized_records
postgresql_preview_record_count=2
postgresql_preview_sql=INSERT INTO normalized_records (source_family, source_id, record_id, record_index, row_number, normalized_fields, source_reference, source_artifact_reference, source_checksum_sha256, parser_metadata, normalization_metadata, created_at, updated_at) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
postgresql_preview_issue_count=0
Run the JSON PostgreSQL preview variant:
carbonops-parser local-dry-run \
--local-path examples/fixtures/defra_desnz_minimal.csv \
--source-family defra_desnz \
--source-id defra-desnz-minimal-fixture \
--content-type text/csv \
--format-hint csv \
--json \
--include-postgresql-previewTrimmed expected JSON preview section:
{
"postgresql_persistence_preview": {
"included": true,
"preview_only": true,
"sql_execution": false,
"database_connection": false,
"status": "ready",
"target_table": "normalized_records",
"record_count": 2,
"ordered_columns": [
"source_family",
"source_id",
"record_id",
"record_index",
"row_number",
"normalized_fields",
"source_reference",
"source_artifact_reference",
"source_checksum_sha256",
"parser_metadata",
"normalization_metadata",
"created_at",
"updated_at"
],
"idempotency_key_fields": [
"source_family",
"source_id",
"record_id",
"source_artifact_reference",
"source_checksum_sha256"
],
"issues": []
}
}The postgresql_persistence_preview section is preview-only. It includes the
target table, ordered columns, parameter rows, record count, SQL text with
placeholders, and idempotency metadata, but it does not execute SQL or persist
records. No PostgreSQL server, database configuration, or credentials are
required.
This quickstart is local dry-run only. It does not connect to PostgreSQL, write records, execute SQL, run migrations, perform network calls, trigger source acquisition, load config files, or require credentials. It does not make production DEFRA/DESNZ correctness claims.
The supported Python production entrypoint is:
carbonops-parser run-ingestion \
--config /etc/carbonops-parser/ingestion.production.json \
--cycles 1Before running it, operators must provide explicit CARBONOPS_POSTGRESQL_*
environment values, including the password through an external secret boundary,
and validate the config:
carbonops-parser validate-ingestion-config \
--config /etc/carbonops-parser/ingestion.production.json \
--cycles 1See Production Packaging And Operator Runbook for install, configuration, PostgreSQL readiness, cron scheduling, verification SQL, rerun/idempotency checks, and troubleshooting.
This is the supported Python runtime production operator path. Project-level production-ready is limited to the scope in Final Project Production-Ready Verdict and Production Parity Contract.
For boundary details, see Local Dry-Run CLI Boundary, Local File Normalized Persistence Dry-Run Boundary, PostgreSQL Persistence Preview Boundary, and Local Dry-Run Troubleshooting.
To run the packaged Python ingestion cycle against local Docker PostgreSQL with the three checked-in source fixture families, see Python Ingestion Local Runbook.
Run the lightweight Python test suite from the repository root:
python -m pytestPytest configuration is kept in pyproject.toml, including the src package import path used by the tests.
The carbonfactor_parser.source_adapters package exposes source adapter contracts and lightweight helpers for tests, prototypes, and implementation slices.
Hash source content without reading or downloading files:
from carbonfactor_parser.source_adapters import (
sha256_hex_from_bytes,
sha256_hex_from_text,
)
content_hash = sha256_hex_from_bytes(b"sample source content")
note_hash = sha256_hex_from_text("sample metadata note")Create and validate metadata for an existing local file:
from pathlib import Path
from carbonfactor_parser.source_adapters import (
SourceFamily,
build_source_document_from_file,
validate_source_document_metadata,
)
document = build_source_document_from_file(
source_family=SourceFamily.DEFRA_DESNZ,
source_name="Example local factor file",
file_path=Path("data/raw/example/source.csv"),
)
metadata_issues = validate_source_document_metadata(document)Create and validate an ingestion summary contract:
from carbonfactor_parser.source_adapters import (
SourceFamily,
create_ingestion_run_summary,
validate_ingestion_run_summary,
)
summary = create_ingestion_run_summary(
ingestion_id="example-run-001",
source_family=SourceFamily.DEFRA_DESNZ,
source_name="Example local factor file",
)
summary_issues = validate_ingestion_run_summary(summary)Use the artificial-only source acquisition validation pipeline with in-memory metadata:
from carbonfactor_parser import (
create_artificial_source_acquisition_metadata,
validate_and_summarize_artificial_source_acquisition_metadata,
)
metadata = create_artificial_source_acquisition_metadata(
source_family="artificial_source_acquisition",
logical_source_name="artificial-in-memory-source",
declared_content_type="text/csv",
checksum_sha256="a" * 64,
acquired_at_label="static-artificial-acquisition-label",
)
pipeline_result = validate_and_summarize_artificial_source_acquisition_metadata(
metadata,
)
issue_count = pipeline_result.summary.total_issue_countThis pipeline is limited to artificial metadata shape checks and deterministic summaries. It does not acquire real sources, read files, validate real source URLs, run parsers or normalization, check factor correctness, or provide compliance/legal or carbon accounting correctness. See docs/artificial-source-acquisition-validation-pipeline.md, docs/artificial-source-acquisition-module-recap.md, and examples/example_artificial_source_acquisition_validation_pipeline.py.
Use the carbonops-source-acquisition CLI for local source descriptor checks and acquisition flow previews.
- Default
runmode isnoopand offline. - HTTP mode is opt-in with
--client http. validatechecks local descriptor metadata only; it does not verify live URLs.run --dry-runplans targets only and does not acquire content or write files/manifests.- Parser execution and database persistence are outside this CLI boundary at this phase.
carbonops-source-acquisition validate
carbonops-source-acquisition list
carbonops-source-acquisition list --source-id defra_desnz
carbonops-source-acquisition run --dry-run --base-directory ./data/source-acquisition
carbonops-source-acquisition run --output-format json
carbonops-source-acquisition run --client http --source-id ghg_protocol
carbonops-source-acquisition run --client http --source-id ghg_protocol --persist-content --base-directory ./data/source-acquisitionFor boundary details, see:
- Source Acquisition CLI Boundary
- Source Acquisition Registry
- Source Acquisition HTTP Client Boundary
- Source Acquisition Parser Handoff Contract
See examples/example_acquisition_artifact_parser_input_mapping.py for a deterministic in-memory example of mapping acquisition artifact metadata into a future parser input boundary without executing a parser.
The parser package exposes ParserInputContract, create_parser_input_contract(), validate_parser_input_contract(), ParserFileContentInput, local parser file content loading helpers, parser file content validation helpers, parse_defra_desnz_file_content(), raw parsed record payload contracts, the ParserAdapter protocol, NoopParserAdapter, ArtificialParserAdapter, DefraDesnzParserAdapter, parser adapter registry helpers, parser execution planning and runner helpers, and parser execution result contracts for future parser adapter input handoff. The normalization package exposes parser execution handoff helpers, normalization input helpers for successful parser results with raw payloads, and a minimal DEFRA/DESNZ fixture normalization mapper. The persistence package exposes normalized result persistence input contracts, a logical PostgreSQL schema descriptor, a review-only DDL preview helper, a deterministic insert SQL builder, PostgreSQL persistence preview helpers, repository protocol/result contracts, an explicit caller-provided PostgreSQL options contract, a default-disabled PostgreSQL integration test boundary, and a PostgreSQL repository skeleton that returns unsupported results without database runtime behavior. The pipeline package exposes a local DEFRA/DESNZ fixture dry-run helper that composes those boundaries to produce PersistenceInput plus DDL preview metadata without DB or network behavior. These contracts keep acquisition metadata, already-loaded content, raw parser output, parser output metadata, normalization input, normalization handoff metadata, persistence input metadata, schema metadata, repository options metadata, integration test metadata, preview metadata, and repository result metadata separate; they do not include database connection behavior or full source-specific correctness claims.
The examples entry point is examples/README.md. It identifies deterministic local examples, including the checked-in DEFRA/DESNZ fixture used by the local dry-run quickstart, and separates real examples from future placeholders.
Each Phase 1 source family will have its own schedule, source version/hash check, parser, validation rules, archive layout, and source-specific tables.
| Source family | Phase 1 role | Table group |
|---|---|---|
| GHG Protocol | Source-specific parser and workbook/tool mapping | ghg_* |
| DEFRA/DESNZ | Active checked-in fixture and source-specific ingestion slice | defra_* |
| IPCC EFDB | Heterogeneous source discovery and parser mapping | ipcc_* |
See docs/source-support.md and docs/source-discovery.md.
The conceptual configuration model includes:
- Database provider and connection settings.
- Raw archive path.
- Source-specific enabled flags.
- Source-specific schedules with day, week, month, year, time, and timezone support.
Phase 1 implements only postgres as the database provider. mysql and mssql are recognized as conceptual provider names but are not implemented in Phase 1.
See docs/configuration-model.md.
The shared conceptual example lives at config/carbonops.config.example.yaml.
PostgreSQL is the Phase 1 persistence target. The model includes:
- Shared ingestion metadata tables:
carbon_sources,carbon_source_versions,carbon_import_runs,carbon_raw_files,carbon_validation_issues, andcarbon_job_locks. - DEFRA/DESNZ tables:
defra_categories,defra_subcategories,defra_factor_sets, anddefra_factor_values. - GHG Protocol tables:
ghg_tools,ghg_factor_sheets,ghg_factor_groups, andghg_factor_values. - IPCC EFDB tables:
ipcc_sectors,ipcc_categories,ipcc_references,ipcc_factor_records, andipcc_factor_values.
See docs/database-model.md, docs/database-startup.md, and database/postgres/README.md.
PostgreSQL persistence uses shared ingestion metadata plus source-specific master/detail table groups for GHG Protocol, DEFRA/DESNZ, and IPCC EFDB. That layout preserves source-family structure for reviewable carbon emission factor ingestion instead of claiming one universal carbon accounting factor model.
- Architecture
- Configuration Model
- Configuration Example
- Background Job Model
- Database Model
- Database Startup
- Ingestion Metadata Model
- Codex-Assisted Runs
- Engineering Standards
- Production Packaging And Operator Runbook
- Maintainer Release/Sync Checklist
- Production Parity Contract
- Final Project Production-Ready Verdict
- Legacy Linux Service Planning - not supported production scheduling
- Source Support
- Source Discovery
- Source Ingestion Boundaries
- Source Acquisition Boundary
- Source Acquisition CLI Boundary
- Source Acquisition Sequencing Checklist
- Local Source Acquisition Contract Boundary
- Local Source Acquisition Examples Boundary
- Local Source Manifest Boundary
- Local Source Manifest Examples Boundary
- Source Manifest Adapter Handoff Boundary
- Source Manifest Adapter Handoff Examples Boundary
- Source Acquisition Validation Boundary
- Source Acquisition Validation Examples Boundary
- Source Acquisition Error Taxonomy Boundary
- Source Acquisition Error Taxonomy Examples Boundary
- Source Acquisition Review Gate Boundary
- Source Acquisition Review Gate Examples Boundary
- Source Acquisition Implementation Readiness Boundary
- Source Acquisition Implementation Readiness Examples Boundary
- Source Acquisition Implementation Sequencing Checklist
- Source Acquisition Implementation Sequencing Examples Boundary
- Source Acquisition Parser Handoff Contract
- Artificial Source Acquisition Validation Pipeline
- Artificial Source Acquisition Module Recap
- Artificial Source Acquisition Phase Closure
- Artificial Manifest Metadata Boundaries
- Artificial Manifest Validation Summary
- Artificial Manifest Metadata Collection
- Artificial Manifest Collection Validation Summary
- Artificial Manifest Metadata Phase Recap
- Artificial Manifest Next Phase Option Matrix
- Artificial In-Memory Manifest Usage Example
- Artificial Manifest Usage Example Phase Recap
- Source Adapter Contract
- Source Adapter Execution Flow
- Source Adapter Error And Warning Handling
- Source Adapter Configuration Boundaries
- Source-Specific Adapter Skeleton Guidance
- DEFRA/DESNZ Adapter Skeleton Boundaries
- Parser Adapter Boundary
- Parser Execution Planning Boundary
- Parser Execution Result Boundary
- Parser Execution Runner Boundary
- Source-Specific Parser Adapter Boundary
- Parser File Content Input Boundary
- Local Parser File Content Loader Boundary
- Parser Execution Normalization Handoff Boundary
- Parsed Raw Record Payload Boundary
- Parser Handoff Boundary
- Parser Contract Boundaries
- Source-Specific Parser Skeleton Boundaries
- DEFRA/DESNZ Parser Skeleton Boundaries
- Real Format Parser Boundary
- Normalization Boundary
- Normalization Input Boundary
- DEFRA/DESNZ Minimal Normalization Mapping Boundary
- Local File Normalized Persistence Dry-Run Boundary
- Local Dry-Run CLI Boundary
- Local Dry-Run Troubleshooting
- Normalized Result Persistence Boundary
- PostgreSQL Persistence Schema Boundary
- PostgreSQL DDL Preview Boundary
- PostgreSQL Insert SQL Builder Boundary
- PostgreSQL Persistence Preview Boundary
- Persistence Repository Boundary
- PostgreSQL Implementation Safety Gate
- PostgreSQL Integration Test Boundary
- PostgreSQL Opt-In Integration Runbook
- PostgreSQL Config Contract Boundary
- PostgreSQL Repository Skeleton Boundary
- PostgreSQL Repository Implementation Planning Boundary
- PostgreSQL Runtime Persistence Implementation Plan
- PostgreSQL Driver Dependency Decision
- PostgreSQL Connection Session Contract Boundary
- PostgreSQL Execution Adapter Boundary
- PostgreSQL Transaction Policy Boundary
- PostgreSQL Idempotency Conflict Strategy Boundary
- PostgreSQL psycopg Session Adapter Boundary
- PostgreSQL Disabled Runtime Execution Adapter Boundary
- PostgreSQL Repository Disabled Execution Preview Boundary
- PostgreSQL Runtime Execution Gate Boundary
- PostgreSQL Runtime Readiness Checklist
- Real-Source Smoke Mode
- Parser To Normalization Handoff Boundary
- Parser To Normalization Integration Recap
- Source To Normalization Pipeline Recap
- Normalization Execution Boundary
- Normalization Result Summary Boundary
- Normalization Summary Builder Boundary
- Normalization Pipeline Recap
- Normalization Public API Recap
- Normalization Test Coverage Recap
- Normalization Deferred Implementation Roadmap
- Public Roadmap Checkpoint
- Milestone Checkpoint CO-037 To CO-049
- Governance Smoke Test Checkpoint
- Stabilization Checkpoint
- Production Readiness Gap Analysis
- Production Readiness Sequencing Roadmap
- Repository Navigation Guide
- Review Readiness Checklist
- Documentation Map Consistency Checklist
- Source Adapter Package Recap
- Roadmap
- Task Breakdown
- Limitations
- Public Safety
- PostgreSQL Database Notes
Near-term work keeps the narrow production-ready scope conservative while separating package publication, infrastructure ownership, live-source expansion, and future runtime promotion into separately reviewed tasks.
See docs/roadmap.md and docs/task-breakdown.md.
Issues and pull requests are welcome for documentation, examples, parser mappings, source discovery, database schema notes, and implementation improvements.
CarbonOps-Parser does not:
- Calculate carbon inventories.
- Produce emissions reports.
- Replace source-owner documentation or source files.
- Guarantee source data correctness.
- Provide a deployment platform.
- Normalize all source families into one shared factor table during Phase 1.
CarbonOps-Parser is licensed under the Apache License 2.0.