Skip to content

[Bug report] Published images on Docker Hub install Gravitino at /root/gravitino, do not reflect current Dockerfile (PR #10681) #11267

@markhoerth

Description

@markhoerth

Version

main branch

Describe what's wrong

Default helm install gravitino dev/charts/gravitino/ against any currently published Gravitino image tag fails with CrashLoopBackOff within seconds.

The Helm chart and the Dockerfiles in source both correctly use /opt/gravitino as the install location and run as non-root UID 1000, per PR #10681 (#10633, April 15, 2026). However, the published image artifacts on Docker Hub still install Gravitino at /root/gravitino from a pre-PR-#10681 build. The published artifacts do not reflect the current Dockerfile in source, creating a chart-vs-image mismatch that breaks every default install.

Reproduction

helm install gravitino dev/charts/gravitino/ -n gravitino-test --create-namespace
kubectl get pods -n gravitino-test
# Pod enters CrashLoopBackOff within seconds

Main container crash log:
cp: target '/opt/gravitino/conf' is not a directory
Start the Gravitino Server
/bin/bash: /opt/gravitino/bin/start-gravitino.sh: No such file or directory

Init container log:
cp: cannot stat '/opt/gravitino/scripts/': No such file or directory
ls: cannot access '/opt/gravitino/libs/gravitino-server-': No such file or directory

Root cause

PR #10681 (#10633) implemented a synchronized migration: /root/gravitino* paths moved to /opt/gravitino* across both the Dockerfiles in dev/docker/*/ and the Helm chart in dev/charts/gravitino/, and containers switched from root to non-root UID 1000.

The chart-side changes landed correctly. The Dockerfile changes also landed correctly. The published image artifacts were never rebuilt from the post-PR-#10681 Dockerfile:
$ docker run --rm --entrypoint=sh apache/gravitino:1.3.0-SNAPSHOT -c
"find / -name 'gravitino-server-.jar' 2>/dev/null | head -3"
/root/gravitino/iceberg-rest-server/libs/gravitino-server-common-1.3.0-SNAPSHOT.jar
/root/gravitino/libs/gravitino-server-common-1.3.0-SNAPSHOT.jar
/root/gravitino/libs/gravitino-server-1.3.0-SNAPSHOT.jar
$ docker run --rm --entrypoint=sh apache/gravitino:1.2.0 -c
"find / -name 'gravitino-server-
.jar' 2>/dev/null | head -3"
/root/gravitino/iceberg-rest-server/libs/gravitino-server-common-1.2.0.jar
/root/gravitino/libs/gravitino-server-common-1.2.0.jar
/root/gravitino/libs/gravitino-server-1.2.0.jar
$ docker run --rm --entrypoint=ls apache/gravitino:1.3.0-SNAPSHOT /opt/gravitino
ls: cannot access '/opt/gravitino': No such file or directory
$ docker inspect apache/gravitino:1.3.0-SNAPSHOT | jq '.[0].Config | {Entrypoint, WorkingDir}'
{
"Entrypoint": ["/bin/bash", "/root/gravitino/bin/start-gravitino.sh"],
"WorkingDir": "/root/gravitino"
}

So the published artifacts predate PR #10681. The chart correctly assumes the post-#10681 layout (/opt/gravitino + non-root user), but the artifacts customers and CI pull from Docker Hub reflect the pre-#10681 layout (/root/gravitino + root).

Permission layer

Even if a customer-side workaround tried to handle the old /root/gravitino path, the chart correctly enforces runAsNonRoot: true, runAsUser: 1000 (also from PR #10681). The /root/ directory has 700 permissions, accessible only to the root user, so UID 1000 cannot read it:
cp: failed to access '/root/gravitino/conf': Permission denied
/bin/bash: /root/gravitino/bin/start-gravitino.sh: Permission denied

To run the chart against a /root/gravitino image, customers would have to also drop runAsNonRoot: true and set runAsUser: 0, which is a container-security regression unacceptable for enterprise deployments. Both layers (path mismatch + non-root user) are addressed by PR #10681's design. The fix is in source; what is missing is the rebuilt image.

Affected versions

  • 1.3.0-SNAPSHOT (current)
  • 1.2.0 (verified)
  • Likely all currently published image tags

Suggested resolution

Rebuild and republish images from the current Dockerfile. PR #10681 already implemented the correct image layout in source. The publish pipeline needs to rebuild from HEAD (or the appropriate release branch tip) and push fresh tags to Docker Hub. No design change is required to the Dockerfiles or the chart.

Add a chart-vs-image smoke test to the publish pipeline. A helm install --dry-run or pod-startup check against the freshly built image, run before promoting the image tag, would have caught this. Without it, the chart-vs-image contract drifts silently between releases. This recommendation is independent of the immediate rebuild fix.

Environment

  • Helm 3.20.2
  • Docker Desktop Kubernetes v1.34.1
  • Images tested: apache/gravitino:1.3.0-SNAPSHOT (digest sha256:dae8023c7f3d...) and apache/gravitino:1.2.0
  • Host: WSL2 Ubuntu

Previous occurrence

A previous instance of this bug was filed and closed in favor of republishing the SNAPSHOT image. The republished image still installed at /root/gravitino, so the prior resolution did not actually update the build inputs (the rebuild used the pre-PR-#10681 Dockerfile). This is the second time the same root cause has been masked by an incomplete fix. The chart-vs-image smoke test recommendation above is intended to prevent a third recurrence.

Metadata

Metadata

Assignees

Labels

1.3.0Release v1.3.0bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions