Kubernetes

EKS cluster architecture and workload management.

Cluster Architecture

The platform runs on an Amazon EKS cluster with managed node groups. The cluster is configured with multiple node groups optimized for different workload types — general purpose nodes for microservices, compute-optimized nodes for CI runners, and GPU-enabled nodes for AI model serving.

Cluster configuration
text
# EKS Cluster
Kubernetes Version: 1.29+
Region: us-east-1
Node Groups:
  - general:    m6i.xlarge   (2-10 nodes, auto-scaling)
  - compute:    c6i.2xlarge  (1-5 nodes, CI workloads)
  - gpu:        g5.2xlarge   (0-3 nodes, model serving)
 
Add-ons:
  - CoreDNS, kube-proxy, VPC CNI
  - AWS Load Balancer Controller
  - External Secrets Operator
  - Metrics Server

Node groups use auto-scaling policies based on CPU and memory utilization, with the GPU node group scaling to zero when no inference workloads are running.

Namespaces

The cluster uses a namespace-per-concern strategy to isolate workloads and manage resource quotas. Each namespace has its own resource limits, network policies, and RBAC rules.

  • dev-center — All Dev Center backend services, API gateways, and supporting infrastructure.
  • monitoring — Grafana, Prometheus, Loki, Jaeger, and Alloy collection agents.
  • ci — Self-hosted GitHub Actions runners and build caches.
  • ai-platform — Model serving endpoints, inference workers, and model registries.
  • auth — Identity service, authorization service, and token management.

Each namespace has a ResourceQuota and LimitRange to prevent any single service from consuming excessive cluster resources.

Workloads

The platform uses several Kubernetes workload types depending on the service requirements:

  • Deployments — Stateless microservices, APIs, and web frontends. Most services use this workload type.
  • StatefulSets — Databases, message queues, and other stateful services that require stable network identities and persistent storage.
  • CronJobs — Scheduled tasks such as data cleanup, report generation, and model retraining pipelines.
  • DaemonSets — Cluster-wide agents like Alloy (log collection) and node exporters (metrics).

Helm Charts

All services are deployed using a shared Helm chart library. The base charts provide standardized templates for common patterns, reducing boilerplate and ensuring consistency across deployments.

Available base charts
text
helm-charts/
├── js-service-base/       # TypeScript/Node.js services
│   ├── templates/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   ├── ingress.yaml
│   │   └── servicemonitor.yaml
│   └── values.yaml
├── python-service-base/   # Python/FastAPI services
│   ├── templates/
│   └── values.yaml
└── observability/         # Monitoring stack charts
    ├── grafana/
    ├── prometheus/
    └── loki/

Each base chart includes sensible defaults for health checks, resource limits, ServiceMonitors, and security contexts. Services override only the values specific to their needs.

When creating a new service, always start from a base chart rather than writing Kubernetes manifests from scratch. This ensures you get monitoring, health checks, and security defaults for free.

Resource Management

Every workload must define CPU and memory requests and limits. This ensures fair scheduling and prevents resource contention across the cluster.

Resource recommendations
yaml
# Typical TypeScript microservice
resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi
 
# Python/FastAPI service
resources:
  requests:
    cpu: 200m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 1Gi
 
# AI model serving (GPU)
resources:
  requests:
    cpu: 2000m
    memory: 8Gi
    nvidia.com/gpu: 1
  limits:
    cpu: 4000m
    memory: 16Gi
    nvidia.com/gpu: 1

Setting memory limits too low can cause OOMKill events. Monitor actual memory usage via Grafana before setting production limits, and always set requests equal to or lower than limits.

Next Steps