Managing Clusters

This guide covers common management tasks for Xelon Kubernetes Service (XKS) clusters on Xelon HQ, including node operations, access configuration, monitoring, and production best practices.

Node Pool Operations

Resizing a Node Pool

To change the compute resources (CPU, RAM, disk) for nodes in a pool, navigate to the cluster details page, select the target node pool, and click Edit. Adjust the resource values and confirm. Nodes are updated via a rolling process to minimize downtime.

Adding a Node Pool

Click Add Node Pool from the cluster details page to create a new pool with different resource specifications. This is useful for running workloads with distinct resource requirements on the same cluster.

Removing a Node Pool

Select the node pool to remove and click Delete. Workloads running on nodes in the pool are evicted and rescheduled to other available nodes before the pool is removed.

Capacity Planning

Before removing a node pool, verify that remaining pools have sufficient capacity to absorb the rescheduled workloads. Otherwise, pods may remain in a pending state.

Kubeconfig

Downloading the Kubeconfig

The kubeconfig file provides the credentials and endpoint information needed to connect to your cluster using kubectl. Download it from the cluster details page by clicking Download config.

Using the Kubeconfig

Set the KUBECONFIG environment variable to point to the downloaded file:

# Point kubectl to your cluster
export KUBECONFIG=~/Downloads/my-cluster-kubeconfig.yaml

# Verify cluster access
kubectl cluster-info

# List nodes
kubectl get nodes -o wide

Alternatively, merge the kubeconfig into your default configuration:

# Merge into default kubeconfig
export KUBECONFIG=~/.kube/config:~/Downloads/my-cluster-kubeconfig.yaml
kubectl config view --merge --flatten > ~/.kube/config.merged
mv ~/.kube/config.merged ~/.kube/config

# Switch context
kubectl config use-context my-cluster
Security

Treat kubeconfig files as sensitive credentials. Do not commit them to version control or share them over insecure channels.

Monitoring Cluster Health

Xelon HQ provides cluster health indicators on the cluster details page. Key metrics to watch include:

Metric Description Action Threshold
Worker Node Status Each worker node reports a lifecycle status — Created, Deployed, Running, Changing resources, Upgrading, Deleting, Deleted, or Error — updated live as the node progresses. Any node stuck in Error
Cluster Health Overall cluster health derived from control-plane checks, shown as Checking, Healthy, Unhealthy, or Not checked. Unhealthy status
CPU Usage (%) Aggregate (cluster-wide) or per-node CPU usage, shown as a time-series chart. Suggested watch level: sustained >80%. The platform does not enforce or alert on this threshold; the chart shows raw usage only.
Memory Usage (%) Aggregate (cluster-wide) or per-node memory usage, shown as a time-series chart. Suggested watch level: sustained >85%. The platform does not enforce or alert on this threshold; the chart shows raw usage only.

The CPU and Memory charts can be viewed for the whole cluster or per node. Xelon HQ does not surface a per-node disk usage metric or pod-count metric, and does not apply built-in usage alerts. For disk capacity, pod scheduling, or any threshold-based alerting, monitor these from inside the cluster with your own Kubernetes tooling. CPU-based alerting on the underlying nodes is also available by creating a custom VM alert rule, where you set your own threshold value.

Best Practices for Production Clusters

Production Checklist

Follow these recommendations when running production workloads on Xelon HQ Kubernetes clusters.

  • High availability: Enable the Production (Redundant/HA) control plane option to deploy a redundant 3-node control plane instead of the single-node Test control plane — the count is fixed by this toggle and is not individually configurable. Distribute worker nodes across multiple node pools to spread workloads.
  • Resource requests and limits: Define CPU and memory requests and limits for all pods to ensure fair scheduling and prevent resource contention.
  • Namespaces: Isolate workloads into namespaces by team, environment, or application.
  • RBAC: Use Kubernetes RBAC to enforce least-privilege access. Avoid using cluster-admin for application workloads.
  • Network policies: Implement Kubernetes network policies to control traffic between pods and namespaces.
  • Regular upgrades: Keep your Kubernetes version current to receive security patches and feature improvements.
  • Backups: Back up critical workload configurations and persistent data. Use a backup solution such as Velero for cluster-level backups.
  • Monitoring and alerting: XKS clusters ship with a built-in cluster monitoring stack (Rancher's Prometheus/Grafana), and Xelon HQ surfaces node and cluster CPU and memory metrics from it on the cluster details page. For deeper application-level observability or alerting, deploy your own monitoring tools into the cluster or integrate it with your existing observability platform.