intel_vtd: add Intel VT-d IOMMU emulator#3697
Draft
jstarks wants to merge 16 commits into
Draft
Conversation
Add the intel_vtd crate under vm/devices/iommu/ with all specification- derived types needed for the Intel VT-d IOMMU emulator (legacy mode): - MMIO register definitions (VER, CAP, ECAP, GCMD, GSTS, RTADDR, CCMD, FSTS, FECTL, FEDATA/FEADDR, IQH/IQT/IQA, ICS, IECTL, IEDATA/IEADDR, IRTA, IOTLB, IVA, FRCD) with bitfield structs - Root table and context table entry types (128-bit each) - Second-level page table entry (EPT-like 64-bit format) with helpers for IOVA indexing, large page GPA computation - Interrupt remapping table entry (128-bit IRTE with source validation) - Invalidation queue descriptor types (context-cache, IOTLB, interrupt entry cache, invalidation wait) - Fault reason codes for DMA and interrupt remapping faults Based on Intel VT-d Specification Rev 4.1. No runtime behavior yet — this is the spec type foundation for the emulator.
… recording Implement sub-phases 1B, 1C, and 1D of the Intel VT-d IOMMU emulator: 1B - ChipsetDevice with full MMIO register file (VER, CAP, ECAP, GCMD/GSTS, RTADDR, CCMD, FSTS, FECTL/FEDATA/FEADDR, IQH/IQT/IQA, ICS, IECTL/IEDATA/IEADDR, IRTA, IOTLB, FRCD). Supports 4-byte and 8-byte naturally aligned accesses via DWORD-granularity dispatch. GCMD processing handles toggle bits (TE, QIE, IRE, CFI) and one-shot bits (SRTP, SIRTP, WBF) with spec-mandated ordering validation. 1C - Invalidation queue processing: consumes 128-bit descriptors on IQT write, handles INVALIDATION_WAIT (status write + completion interrupt), treats context/IOTLB/IEC invalidation as no-ops (no translation cache). Register- based invalidation (CCMD, IOTLB_REG) also handled as no-ops with correct status echo. 1D - Fault recording via FRCD registers with PPF dynamically computed from FRCD[n].F bits. Fault event MSI delivery through FECTL/FEDATA/FEADDR. Invalidation completion MSI through separate IECTL/IEDATA/IEADDR registers. Both support interrupt pending (IP) on masked-then-unmasked transitions. Also includes VtdSharedState with RwLock for concurrent device access, ChangeDeviceState (start/stop/reset), SaveRestore stub, InspectMut, and 34 unit tests covering register read/write, GCMD sequencing, RW1C behavior, access size validation, and register-based invalidation.
Acquire the read lock once for both DWORD halves of a 64-bit MMIO read, preventing a concurrent writer from producing an inconsistent value where the low DWORD comes from the old state and the high DWORD from the new.
Refactor signal_msi calls out of write-lock-held paths. Instead of calling signal_msi while holding the VtdState write lock, collect pending MSI address/data pairs into a Vec and deliver them after the lock is dropped. This eliminates calling external code (the partition's MSI delivery) while holding the IOMMU's internal lock, avoiding potential lock ordering issues if the signal_msi implementation ever interacts with IOMMU state.
Revert the deferred MSI delivery pattern -- signal_msi under the write lock is fine (matches AMD IOMMU, and the lock ordering invariant holds since VT-d uses devid=None for its own MSIs). Also remove write_register_dword_locked: 64-bit MMIO writes don't need atomicity across both DWORDs. Every 64-bit VT-d register either has its trigger bit in one specific DWORD (CCMD/ICC, IOTLB/IVT, IQT/tail) or is a config register latched by a separate GCMD write (RTADDR, IQA, IRTA). Two independent write_register_dword calls are correct.
…rappers Implement sub-phases 1E, 1F, and parts of 1G: DMA translation (1E): root entry lookup, context entry lookup, second-level page table walker with 4KB/2MB/1GB page support, AND-accumulated R/W permissions across page table levels, pass-through mode, and FPD (fault processing disable) propagation. Interrupt remapping (1F): IRTE lookup with index extraction from remappable-format MSI address/data, source validation (SVT/SID/SQ), compatibility-format interrupt handling (EIME/CFIS), and MSI address/data construction from IRTE fields. Per-device wrappers (1G.1-1G.5): VtdFault error enum with fault reason codes and fault recording, VtdTranslator implementing IommuTranslator with closure-based TOCTOU-safe API, VtdSignalMsi implementing SignalMsi for interrupt remapping, record_fault_locked for fault recording register writes with overflow detection, and factory methods on VtdSharedState.
…alidation Three spec compliance fixes: 1. Eliminate PteNotPresent variant. The VT-d spec has no distinct 'not present' fault code for page table entries -- R=0 produces fault 0x06 (read denied) and W=0 produces fault 0x05 (write denied) regardless of whether other permission bits are set. Non-present PTEs (R=W=0) now produce WriteAccessDenied or ReadAccessDenied based on the actual access type. 2. Pass irte_index through validate_irte_source so SourceValidationFailed carries the correct IRTE index for the fault recording register, not a hardcoded 0. 3. Thread the access type (is_write) through VtdFault::record() so FRCD.T reflects the actual DMA request direction for all fault types, not just ReadAccessDenied. MSI remapping faults use is_write=true since MSI is a posted write transaction.
Add DMAR ACPI table types (acpi_spec::dmar), DMAR builder in the ACPI table builder, and full chipset wiring for Intel VT-d IOMMU emulation. DMAR table generation: - DMAR header with HAW and INTR_REMAP flag - One DRHD per VT-d unit with PCI sub-hierarchy device scope - Integrated into build_acpi_tables_inner() alongside IVRS Chipset wiring: - PcieIommuConfig::IntelVtd variant - intel_vtd_wiring module (parallel to amd_iommu_wiring) - X86IommuSharedState enum in pcie_wiring for dispatch between AMD/Intel - Memory layout allocation (4KB per VT-d unit) - --intel-vtd CLI flag (mutually exclusive with --amd-iommu per RC) - Per-device DMA translation and MSI remapping via iommu_common
…status address Add a comprehensive end-to-end integration test (1J.1) that exercises the full VT-d stack by mimicking a Linux intel-iommu driver init sequence: 1. Read and verify CAP/ECAP capabilities 2. Set up root table, context tables, and 4-level page tables 3. Configure root table pointer via GCMD.SRTP 4. Perform register-based invalidation (CCMD, IOTLB) before QI 5. Enable queued invalidation and submit context/IOTLB/wait descriptors 6. Enable DMA translation and verify IOVA→GPA translation via VtdTranslator 7. Set up interrupt remapping table with source validation 8. Enable interrupt remapping and verify MSI remapping via VtdSignalMsi 9. Verify fault recording for unmapped IOVAs 10. Verify source validation rejects wrong BDFs 11. Verify disable returns to identity mapping Also fixes a bug in process_invalidation_wait where the status address The high 64 bits of the invalidation wait descriptor encode the status address in bits 63:2 (address bits 63:2), so the correct extraction is simply masking off the reserved bottom 2 bits.
Add intel_vtd_mixed_topology VMM test that boots a Linux guest with two PCIe root complexes — one with VT-d enabled (segment 0) and one without (segment 1). The test verifies that Linux discovers the IOMMU via the DMAR ACPI table, creates IOMMU groups for devices behind the VT-d unit, and that DMA and MSI interrupts work correctly through the IOMMU (NVMe I/O, virtio-net). Also adds with_intel_vtd() to petri's OpenVmmModify builder, following the same pattern as with_amd_iommu() and with_smmu().
Fix three issues with the Intel VT-d VMM test: 1. DMAR DRHD device scope: emit one PCI sub-hierarchy (type 0x02) scope entry per root port instead of a single entry for device 0 function 0. Linux's intel-iommu driver matches devices to DRHDs by walking up the PCI hierarchy to find a scope entry ancestor, so each root port needs its own entry to cover its downstream devices (including switches). 2. Kernel command line: append intel_iommu=on since Linux's Intel IOMMU driver is off by default unless CONFIG_INTEL_IOMMU_DEFAULT_ON is set. 3. ACS capabilities: set real ACS capability bits (0x5D) on root ports so Linux creates per-device IOMMU groups, matching SMMU/AMD IOMMU tests. The IntelVtdAcpiConfig now takes a Vec<IntelVtdDeviceScope> instead of assuming a fixed single-entry scope, keeping the ACPI builder agnostic to root complex topology.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an Intel VT-d (DMAR/DRHD-discovered) IOMMU emulator and integrates it into OpenVMM’s x86 PCIe pipeline so devices can use DMA translation and interrupt remapping through the existing iommu_common wiring model.
Changes:
- Introduces a new
intel_vtddevice crate implementing VT-d MMIO registers, DMA translation, interrupt remapping, invalidation, and fault reporting. - Extends ACPI generation to emit a DMAR table (new
acpi_spec::dmar+vmm_corebuilder support) and plumbs a new--intel-vtdCLI flag through config, layout, and dispatch wiring. - Adds/refactors VMM tests to validate mixed-topology IOMMU behavior and includes a new Intel VT-d mixed-topology test.
Reviewed changes
Copilot reviewed 23 out of 25 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| vmm_tests/vmm_tests/tests/tests/multiarch/pcie.rs | Refactors shared IOMMU validation and adds intel_vtd_mixed_topology test coverage. |
| vmm_core/src/acpi_builder.rs | Adds DMAR table construction and replaces x86 IOMMU config plumbing with a unified enum. |
| vm/devices/iommu/intel_vtd/src/lib.rs | Implements the VT-d emulated device (MMIO, translation, IR, invalidation queue, fault recording) plus extensive unit tests. |
| vm/devices/iommu/intel_vtd/src/spec/mod.rs | Exposes VT-d spec-derived submodules. |
| vm/devices/iommu/intel_vtd/src/spec/registers.rs | Defines VT-d MMIO register offsets and bitfield layouts. |
| vm/devices/iommu/intel_vtd/src/spec/root_context.rs | Defines VT-d root/context table entry formats and enums. |
| vm/devices/iommu/intel_vtd/src/spec/pte.rs | Defines second-level page table entry format and helpers. |
| vm/devices/iommu/intel_vtd/src/spec/irte.rs | Defines interrupt remapping table entry format and helpers. |
| vm/devices/iommu/intel_vtd/src/spec/invalidation.rs | Defines queued invalidation descriptor formats and parsing helpers. |
| vm/devices/iommu/intel_vtd/Cargo.toml | Adds the new intel_vtd crate to the workspace. |
| vm/acpi_spec/src/lib.rs | Exposes the new DMAR module. |
| vm/acpi_spec/src/dmar.rs | Adds DMAR/DRHD/device-scope structure definitions for table serialization. |
| petri/src/vm/openvmm/modify.rs | Adds Petri builder helper with_intel_vtd. |
| openvmm/openvmm_entry/src/cli_args.rs | Adds --intel-vtd CLI argument (x86_64). |
| openvmm/openvmm_entry/src/lib.rs | Wires --intel-vtd into VM config and validates exclusivity with --amd-iommu. |
| openvmm/openvmm_defs/src/config.rs | Extends PcieIommuConfig with IntelVtd. |
| openvmm/openvmm_core/src/worker/memory_layout.rs | Allocates per-unit VT-d MMIO ranges in the layout engine. |
| openvmm/openvmm_core/src/worker/dispatch/pcie_wiring.rs | Generalizes x86 IOMMU MSI/DMA wrapping to support both AMD-Vi and Intel VT-d. |
| openvmm/openvmm_core/src/worker/dispatch/intel_vtd_wiring.rs | Adds VT-d device instantiation + DMAR scope config generation. |
| openvmm/openvmm_core/src/worker/dispatch.rs | Integrates VT-d resource resolution, device setup, and ACPI config emission. |
| openvmm/openvmm_core/Cargo.toml | Adds intel_vtd dependency to OpenVMM core. |
| openhcl/underhill_core/src/worker.rs | Updates ACPI arch config field name (amd_iommu → iommu). |
| openhcl/underhill_core/src/loader/mod.rs | Updates ACPI arch config field name (amd_iommu → iommu). |
| Cargo.toml | Adds intel_vtd to [workspace.dependencies]. |
| Cargo.lock | Adds intel_vtd package entries and dependency edges. |
Comment on lines
+836
to
+838
| // HAW field is width - 1 (e.g. 48-bit → 0x2F). | ||
| let haw = dmar_config.host_address_width - 1; | ||
|
|
Comment on lines
+1000
to
+1004
| /// Enable Intel VT-d IOMMU emulation on specified root complexes. | ||
| /// Repeat for each root complex that should have an IOMMU, e.g.: | ||
| /// --intel-vtd rc0 --intel-vtd rc1 | ||
| /// Mutually exclusive with --amd-iommu on the same root complex. | ||
| /// Requires --pcie-root-complex. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds a full Intel VT-d (Virtualization Technology for Directed I/O) IOMMU emulator to OpenVMM, bringing Intel IOMMU support on par with the existing AMD IOMMU and ARM SMMU emulators.
Unlike the AMD IOMMU (which is a PCI device), VT-d is a pure MMIO platform device discovered via the ACPI DMAR table and has no PCI config space. The implementation is based on the Intel VT-d Specification Rev 4.1 (legacy mode).
The emulator implements the complete VT-d pipeline: MMIO register emulation with 4-byte and 8-byte naturally aligned access dispatch, GCMD/GSTS command processing with spec-mandated ordering validation, second-level page table walking for DMA address translation (IOVA → GPA) with 4KB/2MB/1GB page support, interrupt remapping with IRTE source validation, queued invalidation with completion interrupts, and fault recording with MSI delivery. Per-device
VtdTranslatorandVtdSignalMsiwrappers integrate with the existingiommu_commontraits so PCI devices use VT-d identically to how they use the AMD IOMMU or SMMU.Chipset wiring follows the same pattern as the AMD IOMMU: a
--intel-vtdCLI flag (mutually exclusive with--amd-iommuper root complex), memory layout allocation, and anintel_vtd_wiringmodule parallel toamd_iommu_wiring. DMAR ACPI table generation is integrated into the existing ACPI builder, emitting one DRHD per VT-d unit with per-root-port PCI sub-hierarchy device scope entries so Linux's intel-iommu driver correctly associates devices to IOMMU groups.The crate includes 34 unit tests covering register behavior, GCMD sequencing, RW1C semantics, access size validation, and register-based invalidation, plus a comprehensive end-to-end integration test that exercises the full translation and interrupt remapping stack. A VMM integration test (
intel_vtd_mixed_topology) boots a Linux guest with two PCIe root complexes—one with VT-d and one without—and verifies that Linux discovers the IOMMU, creates IOMMU groups, and that DMA and MSI work correctly through the IOMMU.