pull from upstream#3
Open
fvalle1 wants to merge 1828 commits into
Open
Conversation
Author
|
@Paolo-Beci @iliy27 let's rebase this :) |
Adds a ChannelsBuilder that emits /etc/kubernetes/manifests/kops-channels.manifest. The pod runs one container per channel URL on a 60s interval; the bootstrap-channel container additionally patches the local node with control-plane labels via --bootstrap-node-labels and the downward API. The pod is system-node-critical because it owns the labels addons target for scheduling, and uses hostNetwork so VFS can reach the cloud metadata service before CNI is up. At this commit the static pod and protokube both apply channels in parallel; that is safe because apply is idempotent via manifest-hash annotations. The protokube side is removed in the next commit.
Now that the kops-channels static pod owns both responsibilities, drop the protokube-side reconciliation: the channels exec wrapper, the --channels and --node-name flags, the labeler call, and the host-side install of /opt/kops/bin/channels in the nodeup builder. The KubeBoot struct sheds Channels and NodeName; the sync loop is now an idle keep-alive for the gossip goroutines and will be removed alongside the legacy gossip code path.
The first apply fails while a control-plane node's apiserver is still starting; retry every 5s until it succeeds rather than waiting a full interval, which delays cluster bootstrap. Also reuse a cached kube client per iteration.
The kubelet maxPods calculation runs for AmazonVPC and Cilium-ENI networking and falls back to DefaultMachineType when the IMDS instance-type lookup fails. NewConfig only set DefaultMachineType for AmazonVPC, so a Cilium-ENI node would dereference a nil pointer if IMDS was unavailable.
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
Use kms:ViaService condition on KMS data actions
Signed-off-by: Moshe Vayner <moshe@vayner.me>
chore(channels): bump k8s versions in alpha channel
Adds the first Linode cloudup infrastructure task: VPC create/update support. This intentionally does not add Linode instances, volumes, load balancers, DNS, or full cluster bring-up support. Those should land in follow-up PRs. Signed-off-by: Moshe Vayner <moshe@vayner.me>
linode: Add VPC cloudup task
nodeup: populate DefaultMachineType for Cilium-ENI clusters
Update coredns to v1.14.3
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
Downgrade coredns to v1.14.2
Upgrade containerd to v2.3.0
aws: Use amazonaws.com suffix for kms:ViaService in all partitions
channels: move from protokube to a static pod
chore(channels): promote to stable, bump node images, update recommended kOps versions
Bumps [actions/checkout](https://github.com/actions/checkout) from 6.0.3 to 7.0.0. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@df4cb1c...9c091bb) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 7.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
…ctions/checkout-7.0.0 build(deps): bump actions/checkout from 6.0.3 to 7.0.0
In e2e, `kops create cluster --channel=alpha` reads the channel from the kops master branch, so a PR's edits to channels/alpha or channels/stable are never exercised by its own e2e jobs. When kops is built from the PR checkout, the deployer now rewrites --channel to a file:// path into that checkout's channels/ directory (defaulting to alpha when --channel is unset), so the build uses the PR's channels. Downloaded release/marker binaries don't match the checkout and keep using master's channels.
scaletest: bind etcd metrics to all interfaces
e2e: test the PR's own channels, not master's
The externalTrafficPolicy=Local source-IP-preservation tests only fail on Cilium (the client IP is SNATed to a pod IP instead of being preserved), tracked upstream in cilium/cilium#37613. Move the "implement NodePort and HealthCheckNodePort correctly when ExternalTrafficPolicy changes" skip into the Cilium block next to its sibling so other CNIs run the test. The hostNetwork "function for service endpoints" test was fixed in k8s 1.37 by kubernetes/kubernetes#139819 (it now reads spec.nodeName via the Downward API instead of os.Hostname()), so drop its skip gate from < 1.38 to < 1.37. Also clean up stale/incorrect issue references in the surrounding comments (wrong Azure issue, superseded hostname WIP PR, and the unrelated #129221).
tests/e2e: refine externalTrafficPolicy=Local and hostNetwork skips
Grow slices for explicit indexes while processing --set paths, so paths like cluster.spec.addons[0].manifest can create the first element.
Configure the scenario to install Gateway API CRDs via cluster.spec.addons, using the Gateway API version documented by Istio 1.29.
Allow setting missing slice elements from the command line
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
Add managed Karpenter EC2NodeClass and NodePool
Preperation for something like #18495. Moving away from direct comparison (== or !=) on IG role. Using helper methods such as HasNode() or HasControlPlane(). Also added a hack test so we don't backtrack. Should help prepare for supporting more control plane roles.
The "Services should implement NodePort and HealthCheckNodePort correctly when ExternalTrafficPolicy changes" test was previously gated to Cilium only, but the e2e-kops-aws-cni-* periodic jobs show it also fails on flannel, kopeio and kube-router: the client source IP is SNATed to a pod IP instead of being preserved (kube-router instead times out reaching the local endpoint). It is the sole failure in those three jobs' latest runs. Move it out of the Cilium block into a condition covering cilium, flannel, kopeio and kube-router. amazon-vpc, calico and kindnet preserve the source IP and continue running the test. The sibling "externalTrafficPolicy=Local for type=NodePort" test passes on every non-Cilium CNI, so it stays gated to Cilium only.
…port tests/e2e: skip implement-NodePort ETP=Local test on more CNIs
Azure retired the pinned Ubuntu 24.04 daily images from the uksouth marketplace, so VMSS creation fails with PlatformImageNotFound and every Azure e2e job dies in the Up phase: The platform image 'Canonical:ubuntu-24_04-lts:server:24.04.202606120' is not available. `az vm image show` (the query path a deployment uses) confirms 24.04.202606120 (amd64) and 24.04.202606110 (arm64) are no longer available, while `az vm image list` still lists them from a stale catalog. 24.04.202606060 is the newest version that `az vm image show` confirms deployable for both server and server-arm64, so pin both arches to it.
chore(channels): pin Azure noble image to a deployable version
Switching from comparison on Role to helper.
The "Services should implement NodePort and HealthCheckNodePort correctly when ExternalTrafficPolicy changes" test fails on calico on GCE but passes on calico on AWS. On GCE the VPC drops packets with arbitrary calico pod-CIDR source/dest addresses, so calico must IPIP-encapsulate inter-node pod traffic (routes go via tunl0). The IPIP/masquerade path rewrites the ETP=Local NodePort traffic's source to the node's tunnel address (a pod-CIDR IP) instead of preserving the client IP. On AWS kops disables the EC2 source/dest check, so calico routes pod traffic natively over the VPC (dev ens5, no encapsulation) and the source IP is preserved. Extend the skip to calico when the cloud provider is GCE. amazon-vpc and kindnet continue running the test on both clouds.
tests/e2e: also skip implement-NodePort ETP=Local test on calico+GCE
Bumps [actions/setup-go](https://github.com/actions/setup-go) from 6.4.0 to 6.5.0. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](actions/setup-go@4a36011...924ae3a) --- updated-dependencies: - dependency-name: actions/setup-go dependency-version: 6.5.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
…ctions/setup-go-6.5.0 build(deps): bump actions/setup-go from 6.4.0 to 6.5.0
Make cloud-controller-manager pods tolerate all taints
No new functionality yet. Added 4 new role placholders, etcd, scheduler, ccm and kcm. Sets up the CLI API as well as the accessor functions.
Add an experimental roles feature flag.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.