feat: add bare metal support for Intel TDX and AMD SEV-SNP#73
feat: add bare metal support for Intel TDX and AMD SEV-SNP#73butler54 wants to merge 18 commits intovalidatedpatterns:mainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
b4eaf36 to
bad2552
Compare
Replace git branch references (repoURL/targetRevision/path) with released Helm chart references (chart/chartVersion) for trustee, sandboxed-containers, and sandboxed-policies in values-baremetal.yaml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tdx.enabled flag (default true) to baremetal chart to conditionally set kvm_intel.tdx=1 kernel argument. Without this, the kvm_intel module does not activate TDX and NFD cannot detect it. Enable intel-dcap application in values-baremetal.yaml for PCCS/QGS attestation services. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mplates Address PR review feedback: - Remove detect-runtime-class.yaml (OSC operator manages RuntimeClass) - Remove bm-kernel-params.yaml and kernel-params-mco.yaml (config should be provided via initdata or pod annotations to avoid inconsistencies) - Remove commented-out runtimeclass templates for AMD SNP and Intel TDX Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Chris Butler <chris.butler@redhat.com>
Conflicts resolved: - _helpers.tpl: kept runtimeClassName override support from baremetal - kbs-access/values.yaml: merged main's structure with runtimeClassName param - kbs-access/secure-pod.yaml: accepted deletion (replaced by secure-deployment.yaml) - kbs-access/secure-deployment.yaml: added runtimeClassName values override support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Kyverno chart and coco-kyverno-policies to baremetal values - Update trustee chart to 0.3.* with kbs.admin.format v1.1 - Remove bypassAttestation (proper attestation via init_data) - Remove explicit runtimeClassName overrides (auto-detected by platform) - Add syncPolicy prune to hello-openshift and kbs-access - Reset default clusterGroupName to simple Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The policy only fired on Pod/Deployment CREATE, so pods created before the initdata ConfigMap existed never got the cc_init_data annotation. Adding UPDATE allows Kyverno to inject the annotation when a Deployment is updated (e.g. by ArgoCD sync), triggering a rolling restart with the correct initdata. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e generation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds RAW_HASH field to both initdata and debug-initdata ConfigMaps. PCR8_HASH = SHA256(zeros || SHA256(toml)) — used by Azure vTPM attestation RAW_HASH = SHA256(toml) — used by baremetal TDX/SNP attestation Both are needed because Azure and baremetal present initdata differently in their attestation evidence. A single Trustee attestation server must accept both formats to support multi-platform deployments. Future: integrate veritas for comprehensive reference value generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Temporarily uses butler54/trustee-chart feature/baremetal-attestation branch instead of released chart. This branch includes: - Baremetal TDX and SNP attestation rules - Conditional pcr-stash (no error on baremetal without vTPM) - Raw init_data hash (zero-padded) for baremetal attestation - TDX QCNL config with use_secure_cert: false for local PCCS Revert to chartVersion after merging and releasing trustee chart. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The kbs-access-app container image is ~1GB which causes container creation timeouts with the default 2GB kata VM memory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The autogen Deployment rule causes admission failures when the initdata ConfigMap hasn't been propagated to the workload namespace yet. By targeting Pods only (autogen-controllers: none), Deployments are admitted without ConfigMap resolution. Pods get cc_init_data injected at creation time when the ConfigMap is available. A rollout restart picks up new initdata values. Also removes UPDATE operation — only CREATE is needed since a rollout restart creates new Pods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without braces, bash treats $initial_pcr followed by the hex hash as a single undefined variable name, producing SHA-256 of empty string instead of the correct PCR extend value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Today I have tested this PR with @butler54. Tests performed:
|
| metadata: | ||
| name: sgxdeviceplugin-sample | ||
| spec: | ||
| image: registry.connect.redhat.com/intel/intel-sgx-plugin@sha256:f2c77521c6dae6b4db1896a5784ba8b06a5ebb2a01684184fc90143cfcca7bf4 |
There was a problem hiding this comment.
| image: registry.connect.redhat.com/intel/intel-sgx-plugin@sha256:f2c77521c6dae6b4db1896a5784ba8b06a5ebb2a01684184fc90143cfcca7bf4 | |
| image: registry.connect.redhat.com/intel/intel-sgx-plugin@sha256:4ac8769c4f0a82b3ea04cf1532f15e9935c71fe390ff5a9dc3ee57f970a65f0b |
| privileged: true # Required for chcon to work on host files | ||
| containers: | ||
| - name: pccs | ||
| image: registry.redhat.io/openshift-sandboxed-containers/osc-pccs@sha256:de64fc7b13aaa7e466e825d62207f77e7c63a4f9da98663c3ab06abc45f2334d |
There was a problem hiding this comment.
Probably this should be also updated to newest one https://catalog.redhat.com/en/software/containers/openshift-sandboxed-containers/osc-pccs/69173baa2fa58c9e4c54171e
| dnsPolicy: ClusterFirstWithHostNet | ||
| initContainers: | ||
| - name: platform-registration | ||
| image: registry.redhat.io/openshift-sandboxed-containers/osc-tdx-qgs@sha256:86b23461c4eea073f4535a777374a54e934c37ac8c96c6180030f92ebf970524 |
There was a problem hiding this comment.
Probably can be updated to newer version:
https://catalog.redhat.com/en/software/containers/openshift-sandboxed-containers/osc-tdx-qgs/6917420fa47ef6ba964bb31d
| mountPath: /sys/firmware/efi/efivars | ||
| containers: | ||
| - name: tdx-qgs | ||
| image: registry.redhat.io/openshift-sandboxed-containers/osc-tdx-qgs@sha256:86b23461c4eea073f4535a777374a54e934c37ac8c96c6180030f92ebf970524 |
There was a problem hiding this comment.
Probably can be updated to newer version:
https://catalog.redhat.com/en/software/containers/openshift-sandboxed-containers/osc-tdx-qgs/6917420fa47ef6ba964bb31d
| @@ -0,0 +1,9 @@ | |||
| apiVersion: v1 | |||
There was a problem hiding this comment.
This file can be dropped as it's already included in args (qgs-ds.yaml:
- name: tdx-qgs
image: registry.redhat.io/openshift-sandboxed-containers/osc-tdx-qgs:latest
args:
- -p=4050
- -n=4| @@ -0,0 +1,16 @@ | |||
| apiVersion: v1 | |||
There was a problem hiding this comment.
This file is also obsolete qgs-ds.yaml
Actual used values are here:
qcnl-conf: '{"pccs_url": "https://pccs-service:8042/sgx/certification/v4/", "use_secure_cert": false, "pck_cache_expire_hours": 168}'| 2. `bash scripts/get-pcr.sh` — retrieves PCR measurements from the peer-pod VM image and stores them at `~/.coco-pattern/measurements.json` (requires `podman`, `skopeo`, and `~/pull-secret.json`) | ||
| 3. Review and customise `~/values-secret-coco-pattern.yaml` — this file is loaded into Vault and provides secrets to the pattern | ||
| 1. `bash scripts/gen-secrets.sh` — generates KBS key pairs, PCCS certificates/tokens (for bare metal), and copies `values-secret.yaml.template` to `~/values-secret-coco-pattern.yaml` | ||
| 2. `bash scripts/get-pcr.sh` — retrieves PCR measurements from the peer-pod VM image and stores them at `~/.coco-pattern/measurements.json` (requires `podman`, `skopeo`, and `~/pull-secret.json`). **Not required for bare metal deployments.** |
There was a problem hiding this comment.
For our testing this step was required for baremetal as it was failing due to lack of file:
- name: pcrStash
vaultPrefixes:
- hub
fields:
- name: json
path: ~/.coco-pattern/measurements.json| Validated pattern for deploying confidential containers on OpenShift using the [Validated Patterns](https://validatedpatterns.io/) framework. | ||
|
|
||
| Confidential containers use hardware-backed Trusted Execution Environments (TEEs) to isolate workloads from cluster and hypervisor administrators. This pattern deploys and configures the Red Hat CoCo stack — including the sandboxed containers operator, Trustee (Key Broker Service), and peer-pod infrastructure — on Azure. | ||
| Confidential containers use hardware-backed Trusted Execution Environments (TEEs) to isolate workloads from cluster and hypervisor administrators. This pattern deploys and configures the Red Hat CoCo stack — including the sandboxed containers operator, Trustee (Key Broker Service), and peer-pod infrastructure — on Azure and bare metal. |
There was a problem hiding this comment.
| Confidential containers use hardware-backed Trusted Execution Environments (TEEs) to isolate workloads from cluster and hypervisor administrators. This pattern deploys and configures the Red Hat CoCo stack — including the sandboxed containers operator, Trustee (Key Broker Service), and peer-pod infrastructure — on Azure and bare metal. | |
| Confidential containers use hardware-backed Trusted Execution Environments (TEEs) to isolate workloads from cluster and hypervisor administrators. This pattern deploys and configures the Red Hat CoCo stack — including the sandboxed containers operator, Trustee (Key Broker Service) operator, and Kata infrastructure — on Azure cloud instances and bare metal. |
I removed peer-pod infra as it gave the impression that it's for both Azure and bare-metal
|
|
||
| **Bare metal deployments:** | ||
|
|
||
| - OpenShift 4.17+ cluster on bare metal with Intel TDX or AMD SEV-SNP hardware |
There was a problem hiding this comment.
Supported OCP version for OSC 1.12 is 4.19.28+ or 4.20.18+
https://docs.redhat.com/en/documentation/openshift_sandboxed_containers/1.12/html/deploying_confidential_containers_on_bare-metal_servers/cc-discover_metal-cc#compatibility-with-openshift_metal-cc
|
Some minor nits. Rest looks good to me |
…lidatedpatterns#75 documentation This commit addresses all review comments from bpradipt and pawelpros on PR validatedpatterns#73, merges documentation from PR validatedpatterns#75, and updates container images. Documentation changes: - README: Replace "peer-pod infrastructure" wording to clarify Azure vs bare metal - README: Update OCP version requirements from 4.17+ to 4.19.28+ (OSC 1.12 requirement) - README: Clarify PCR collection differs for Azure (get-pcr.sh) vs bare metal (manual) - README: Distinguish Azure (kata-remote) from bare metal (kata-cc) runtime classes - values-secret.yaml.template: Add missing kbsPrivateKey secret - values-secret.yaml.template: Reorganize with clear section headers and improved docs - gen-secrets.sh: Add prominent alert when values-secret file is created - Merge docs/nfd-matchall-bug.md from PR validatedpatterns#75 (NFD matchAll bug report) - Merge docs/pcr-reference-values-bare-metal.md from PR validatedpatterns#75 (PCR collection guide) Code cleanup: - Delete obsolete qgs-config-cm.yaml (QGS args now inline) - Delete obsolete qgs-sgx-cm.yaml (QCNL config via downwardAPI) - Remove commented-out detect-runtime-class reference in values-baremetal.yaml Image updates: - intel-dpo-sgx.yaml: Update intel-sgx-plugin to sha256:4ac8769c (v0.35.0) - pccs-deployment.yaml: Update osc-pccs to sha256:edf57087 (v1.12) - qgs-ds.yaml: Update osc-tdx-qgs to sha256:308d66da (v1.12) Resolves review comments from: - bpradipt: peer-pod wording, OCP versions, PCR clarification - pawelpros: obsolete ConfigMaps, image digests, PCR requirements Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Summary
baremetalclusterGroup for deploying CoCo on bare metal with Intel TDX or AMD SEV-SNP hardwarekata-tdxandkata-snpcreated automaticallygen-secrets.shTest plan
baremetalclusterGroup on Intel TDX hardwarebaremetalclusterGroup on AMD SEV-SNP hardware🤖 Generated with Claude Code