Skip to content

docs: explain phoneHomeEnabled flag and the phone-home flow#3007

Open
CTOmari360 wants to merge 1 commit into
NVIDIA:mainfrom
CTOmari360:docs/explain-phone-home-enabled
Open

docs: explain phoneHomeEnabled flag and the phone-home flow#3007
CTOmari360 wants to merge 1 commit into
NVIDIA:mainfrom
CTOmari360:docs/explain-phone-home-enabled

Conversation

@CTOmari360

Copy link
Copy Markdown
Contributor

Documents the phoneHomeEnabled flag, which was previously a one-liner ("Enable phone-home on first boot") across the operating-system and instance docs with no explanation of what it does, what NICo injects, or how the callback works.

The explanation is traced from the code, not inferred:

  • Readiness gate. When enabled, NICo holds the instance in a provisioning state and does not report it Ready until the booted OS phones home. This is the carbide tenant-state gate in crates/rpc/src/model/instance/status/tenant.rs (phone_home_pending = phone_home_enrolled && phone_home_last_contact.is_none()), fed by instance_config.os.phone_home_enabled in crates/rpc/src/model/instance/status.rs. The proto note on InstanceOperatingSystemConfig.phone_home_enabled says the same: "the instance will not transition to a Ready state until InstancePhoneHomeLastContact is updated."
  • NICo injects nothing. User-data is served verbatim by the metadata endpoint (USER_DATA_CATEGORY => metadata.user_data.clone() in crates/agent/src/instance_metadata_endpoint.rs); no code mutates user-data based on the flag. The OS must supply a cloud-init phone_home module that POSTs to the link-local metadata endpoint http://169.254.169.254:7777/latest/meta-data/phone_home (the canonical example appears in rest-api/openapi/spec.yaml). The update_phone_home_last_contact handler (crates/api-core/src/handlers/instance.rs) authorizes the calling host/DPU and records the contact timestamp (UPDATE instances SET phone_home_last_contact=now()), which releases the gate.
  • Guidance + caveats. Directly answers the issue's ISV use case (delay Ready until cloud-init finishes applying SSH keys/passwords), and documents the gotchas: enabling the flag without adding a phone_home module leaves the instance never-ready, and a one-time custom-iPXE reboot re-arms the gate (clear_phone_home_last_contact when boot_with_custom_ipxe && phone_home_enabled).

Adds a Phone-home section to docs/configuration/tenant_management.md (the page that already covers the instance lifecycle) and links the operating-system create / operating-system update CLI reference pages to it.

Related issues

Fixes #2790

Type of Change

  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

Testing

  • No testing required (docs, internal refactor, etc.)

Docs-only change. The added relative link (../../../../configuration/tenant_management.md#phone-home) resolves to the existing page and anchor.

Additional Notes

Scoped to documentation only. The flag descriptions in the two CLI reference pages now point at one authoritative explanation rather than duplicating it.

@CTOmari360 CTOmari360 requested a review from a team as a code owner June 30, 2026 03:33
@copy-pr-bot

copy-pr-bot Bot commented Jun 30, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Summary by CodeRabbit

  • Documentation
    • Added a new “Phone-home” subsection explaining phone-home enablement, readiness/status gating, and how it interacts with NICo during provisioning.
    • Documented that NICo does not inject anything into user-data, and that the OS must include a cloud-init phone_home configuration targeting the required endpoint.
    • Expanded help text for operating-system create and operating-system update --phone-home-enabled, including guidance and key caveats (e.g., link-local reachability and re-arming on reboot).

Walkthrough

Three documentation files are updated to explain the phoneHomeEnabled / --phone-home-enabled behavior. The tenant management guide gains a new “Phone-home” section, and the two operating-system CLI references expand their option text with provisioning-state, cloud-init, and endpoint details.

Phone-home documentation expansion

Layer / File(s) Summary
Phone-home reference and CLI option docs
docs/configuration/tenant_management.md, docs/manuals/nico-admin-cli/commands/operating-system/operating-system-create.md, docs/manuals/nico-admin-cli/commands/operating-system/operating-system-update.md
New "Phone-home" subsection in tenant_management.md documents readiness gating behavior, required cloud-init phone_home configuration targeting NICo's link-local metadata endpoint, callback/release semantics, and caveats (indefinite provisioning, link-local reachability, reboot re-arming). Both CLI reference pages replace the single-line --phone-home-enabled description with multi-line provisioning-state semantics, cloud-init module requirements, absence of user-data injection, and a cross-reference link to the new guide section.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately summarizes the main docs change about the phone-home flow.
Description check ✅ Passed The description is directly about the same documentation updates and stays on-topic.
Linked Issues check ✅ Passed The docs address the requested gaps in behavior, injected data, guidance, and affected pages [#2790].
Out of Scope Changes check ✅ Passed No unrelated code or documentation changes appear beyond the requested phone-home documentation updates.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

The phoneHomeEnabled flag was documented as a one-liner ("Enable
phone-home on first boot") with no explanation of what it does, what (if
anything) NICo injects, or how the phone-home callback works. Document
the mechanism, sourced from the code:

- It gates instance readiness: when enabled, the instance is held in a
  provisioning state and not reported Ready until the booted OS contacts
  NICo's metadata service (the carbide tenant-state gate in
  crates/rpc/src/model/instance/status/tenant.rs; proto note on
  InstanceOperatingSystemConfig.phone_home_enabled).
- NICo injects nothing: user-data is served verbatim
  (crates/agent/src/instance_metadata_endpoint.rs). The OS must include a
  cloud-init phone_home module posting to the link-local metadata
  endpoint http://169.254.169.254:7777/latest/meta-data/phone_home, which
  records the contact timestamp and releases the gate.
- Add usage guidance (delay readiness until cloud-init applies SSH
  keys/passwords) and caveats (no module => never ready; custom-iPXE
  reboot re-arms the gate).

Adds a Phone-home section to docs/configuration/tenant_management.md and
links the operating-system create/update CLI reference pages to it.

Fixes NVIDIA#2790

Signed-off-by: Omar Refai <omar@refai.org>
@CTOmari360 CTOmari360 force-pushed the docs/explain-phone-home-enabled branch from 07feed4 to 6951cde Compare June 30, 2026 03:37

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/configuration/tenant_management.md`:
- Around line 594-617: Clarify the precedence of phoneHomeEnabled in the tenant
management docs: update the phone-home section to explicitly state that an
instance-level setting overrides the operating-system default, and that the
OS-level value is only inherited when the instance does not specify one. Make
this change in the phoneHomeEnabled description and keep the wording aligned
with the existing instance create/update and operating-system create/update
references.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2677088f-c21c-4d49-86c0-a82267d33700

📥 Commits

Reviewing files that changed from the base of the PR and between 07feed4 and 6951cde.

📒 Files selected for processing (3)
  • docs/configuration/tenant_management.md
  • docs/manuals/nico-admin-cli/commands/operating-system/operating-system-create.md
  • docs/manuals/nico-admin-cli/commands/operating-system/operating-system-update.md
✅ Files skipped from review due to trivial changes (1)
  • docs/manuals/nico-admin-cli/commands/operating-system/operating-system-create.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/manuals/nico-admin-cli/commands/operating-system/operating-system-update.md

Comment on lines +594 to +617
`phoneHomeEnabled` (`--phone-home-enabled`) controls whether NICo waits for the booted operating system to call back -- to "phone home" -- before it reports the instance as ready. It is a property of the operating-system definition (`operating-system create` / `operating-system update`) and can also be set per instance (`instance create` / `instance update`).

**What it does.** When phone-home is enabled, NICo does not consider the instance fully provisioned until the booted OS contacts NICo's metadata service from inside the guest. The instance is held in a provisioning state -- the transition to `Ready` is gated -- until that callback arrives. When it is disabled, NICo reports the instance ready as soon as provisioning and config sync finish, without waiting for any signal from the OS.

**What NICo injects: nothing.** Enabling the flag does *not* add anything to your `userData`. NICo serves your cloud-init user-data to the booting host verbatim. To make the host actually phone home, your user-data must include a cloud-init [`phone_home`](https://cloudinit.readthedocs.io/en/latest/reference/modules.html#phone-home) module that POSTs to NICo's link-local metadata endpoint:

```yaml
#cloud-config
# ... your first-boot setup (SSH keys, passwords, packages, ...) ...
phone_home:
url: http://169.254.169.254:7777/latest/meta-data/phone_home
post: all
```

cloud-init runs the `phone_home` module in its final stage, after the rest of your configuration has been applied, so the callback fires only once your setup has completed. When NICo receives the POST it records the contact and releases the readiness gate, allowing the instance to become `Ready`.

**When to use it.** Enable phone-home when "ready" must mean "the guest has finished its own first-boot setup" -- for example, to delay readiness until cloud-init has applied SSH keys and passwords, so that automation watching for `Ready` does not connect before the host is fully configured. Leave it disabled when NICo finishing provisioning is a sufficient ready signal and you do not need a callback from inside the guest.

**Notes and caveats.**

- Phone-home only gates *status reporting*. It does not change how the host is provisioned or booted -- the reboot into the provisioned OS happens identically whether or not phone-home is enabled.
- If you enable phone-home but never include a `phone_home` module in your user-data, the instance stays in its provisioning state indefinitely and never reports ready.
- The metadata endpoint (`169.254.169.254:7777`) is reachable only from the provisioned host over its link-local metadata link; it is not a tenant-facing API.
- Rebooting the instance with a one-time custom iPXE override (`instance update --reboot-with-custom-ipxe=true`) re-arms the gate when phone-home is enabled: NICo clears the recorded contact, so the OS must phone home again before the instance is reported ready.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n -C 3 --hidden -g '!**/target/**' 'phoneHomeEnabled|phone-home-enabled' .

Repository: NVIDIA/infra-controller

Length of output: 50381


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the doc section around the reviewed lines.
sed -n '594,617p' docs/configuration/tenant_management.md

echo
echo '---'
echo

# Show the merge logic that resolves phoneHomeEnabled for instance creation/update.
sed -n '2068,2095p' rest-api/api/pkg/api/handler/instance.go

echo
echo '---'
echo

# Show the update-path precedence comment for phoneHomeEnabled.
sed -n '1228,1245p' rest-api/api/pkg/api/model/instance.go

Repository: NVIDIA/infra-controller

Length of output: 4532


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '594,617p' docs/configuration/tenant_management.md
echo '---'
sed -n '2068,2095p' rest-api/api/pkg/api/handler/instance.go
echo '---'
sed -n '1228,1245p' rest-api/api/pkg/api/model/instance.go

Repository: NVIDIA/infra-controller

Length of output: 4528


State the precedence for phoneHomeEnabled. The text says the flag can be set both on the operating system and per instance, but it should explicitly say that the instance value overrides the OS default, and that the OS value is inherited only when the instance omits it.

🧰 Tools
🪛 LanguageTool

[style] ~614-~614: ‘whether or not’ might be wordy. Consider a shorter alternative.
Context: ... the provisioned OS happens identically whether or not phone-home is enabled. - If you enable ...

(EN_WORDINESS_PREMIUM_WHETHER_OR_NOT)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/configuration/tenant_management.md` around lines 594 - 617, Clarify the
precedence of phoneHomeEnabled in the tenant management docs: update the
phone-home section to explicitly state that an instance-level setting overrides
the operating-system default, and that the OS-level value is only inherited when
the instance does not specify one. Make this change in the phoneHomeEnabled
description and keep the wording aligned with the existing instance
create/update and operating-system create/update references.

Source: Path instructions


**What it does.** When phone-home is enabled, NICo does not consider the instance fully provisioned until the booted OS contacts NICo's metadata service from inside the guest. The instance is held in a provisioning state -- the transition to `Ready` is gated -- until that callback arrives. When it is disabled, NICo reports the instance ready as soon as provisioning and config sync finish, without waiting for any signal from the OS.

**What NICo injects: nothing.** Enabling the flag does *not* add anything to your `userData`. NICo serves your cloud-init user-data to the booting host verbatim. To make the host actually phone home, your user-data must include a cloud-init [`phone_home`](https://cloudinit.readthedocs.io/en/latest/reference/modules.html#phone-home) module that POSTs to NICo's link-local metadata endpoint:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thossain-nv I thought we were injecting phone_home into user data?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs: phoneHomeEnabled flag lacks explanation of what it injects and how it works

3 participants