Managing Clusters

This guide covers common management tasks for Xelon Kubernetes Service (XKS) clusters on Xelon HQ, including node operations, access configuration, monitoring, and production best practices.

Node Pool Operations

Resizing a Node Pool

To change the compute resources (CPU, RAM, disk) for nodes in a pool, navigate to the cluster details page, select the target node pool, and click Edit. Adjust the resource values and confirm. Nodes are updated via a rolling process to minimize downtime.

Adding a Node Pool

Click Add Node Pool from the cluster details page to create a new pool with different resource specifications. This is useful for running workloads with distinct resource requirements on the same cluster.

Removing a Node Pool

Select the node pool to remove and click Delete. Workloads running on nodes in the pool are evicted and rescheduled to other available nodes before the pool is removed.

Capacity Planning

Before removing a node pool, verify that remaining pools have sufficient capacity to absorb the rescheduled workloads. Otherwise, pods may remain in a pending state.

Kubeconfig

Downloading the Kubeconfig

The kubeconfig file provides the credentials and endpoint information needed to connect to your cluster using kubectl. Download it from the cluster details page by clicking Download config.

Using the Kubeconfig

Set the KUBECONFIG environment variable to point to the downloaded file:

# Point kubectl to your cluster
export KUBECONFIG=~/Downloads/my-cluster-kubeconfig.yaml

# Verify cluster access
kubectl cluster-info

# List nodes
kubectl get nodes -o wide

Alternatively, merge the kubeconfig into your default configuration:

# Merge into default kubeconfig
export KUBECONFIG=~/.kube/config:~/Downloads/my-cluster-kubeconfig.yaml
kubectl config view --merge --flatten > ~/.kube/config.merged
mv ~/.kube/config.merged ~/.kube/config

# Switch context
kubectl config use-context my-cluster
Security

Treat kubeconfig files as sensitive credentials. Do not commit them to version control or share them over insecure channels.

Monitoring Cluster Health

Xelon HQ provides cluster health indicators on the cluster details page. Key metrics to watch include:

Metric Description Action Threshold
Node Status Ready, NotReady, or Unknown status for each node. Any node in NotReady for >5 minutes
CPU Utilization Aggregate CPU usage across all nodes. Sustained >80% usage
Memory Utilization Aggregate memory usage across all nodes. Sustained >85% usage
Disk Usage Storage consumption per node. >90% capacity
Pod Count Running and pending pods in the cluster. Pending pods for >5 minutes

Best Practices for Production Clusters

Production Checklist

Follow these recommendations when running production workloads on Xelon HQ Kubernetes clusters.

  • High availability: Run at least 3 control plane nodes and distribute worker nodes across multiple node pools.
  • Resource requests and limits: Define CPU and memory requests and limits for all pods to ensure fair scheduling and prevent resource contention.
  • Namespaces: Isolate workloads into namespaces by team, environment, or application.
  • RBAC: Use Kubernetes RBAC to enforce least-privilege access. Avoid using cluster-admin for application workloads.
  • Network policies: Implement Kubernetes network policies to control traffic between pods and namespaces.
  • Regular upgrades: Keep your Kubernetes version current to receive security patches and feature improvements.
  • Backups: Back up critical workload configurations and persistent data. Use a backup solution such as Velero for cluster-level backups.
  • Monitoring and alerting: Deploy a monitoring stack (Prometheus, Grafana) or integrate with your existing observability platform.