Kubernetes
EKS cluster architecture and workload management.
Cluster Architecture
The platform runs on an Amazon EKS cluster with managed node groups. The cluster is configured with multiple node groups optimized for different workload types — general purpose nodes for microservices, compute-optimized nodes for CI runners, and GPU-enabled nodes for AI model serving.
# EKS Cluster
Kubernetes Version: 1.29+
Region: us-east-1
Node Groups:
- general: m6i.xlarge (2-10 nodes, auto-scaling)
- compute: c6i.2xlarge (1-5 nodes, CI workloads)
- gpu: g5.2xlarge (0-3 nodes, model serving)
Add-ons:
- CoreDNS, kube-proxy, VPC CNI
- AWS Load Balancer Controller
- External Secrets Operator
- Metrics ServerNode groups use auto-scaling policies based on CPU and memory utilization, with the GPU node group scaling to zero when no inference workloads are running.
Namespaces
The cluster uses a namespace-per-concern strategy to isolate workloads and manage resource quotas. Each namespace has its own resource limits, network policies, and RBAC rules.
- dev-center — All Dev Center backend services, API gateways, and supporting infrastructure.
- monitoring — Grafana, Prometheus, Loki, Jaeger, and Alloy collection agents.
- ci — Self-hosted GitHub Actions runners and build caches.
- ai-platform — Model serving endpoints, inference workers, and model registries.
- auth — Identity service, authorization service, and token management.
Each namespace has a ResourceQuota and LimitRange to prevent any single service from consuming excessive cluster resources.
Workloads
The platform uses several Kubernetes workload types depending on the service requirements:
- Deployments — Stateless microservices, APIs, and web frontends. Most services use this workload type.
- StatefulSets — Databases, message queues, and other stateful services that require stable network identities and persistent storage.
- CronJobs — Scheduled tasks such as data cleanup, report generation, and model retraining pipelines.
- DaemonSets — Cluster-wide agents like Alloy (log collection) and node exporters (metrics).
Helm Charts
All services are deployed using a shared Helm chart library. The base charts provide standardized templates for common patterns, reducing boilerplate and ensuring consistency across deployments.
helm-charts/
├── js-service-base/ # TypeScript/Node.js services
│ ├── templates/
│ │ ├── deployment.yaml
│ │ ├── service.yaml
│ │ ├── ingress.yaml
│ │ └── servicemonitor.yaml
│ └── values.yaml
├── python-service-base/ # Python/FastAPI services
│ ├── templates/
│ └── values.yaml
└── observability/ # Monitoring stack charts
├── grafana/
├── prometheus/
└── loki/Each base chart includes sensible defaults for health checks, resource limits, ServiceMonitors, and security contexts. Services override only the values specific to their needs.
When creating a new service, always start from a base chart rather than writing Kubernetes manifests from scratch. This ensures you get monitoring, health checks, and security defaults for free.
Resource Management
Every workload must define CPU and memory requests and limits. This ensures fair scheduling and prevents resource contention across the cluster.
# Typical TypeScript microservice
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
# Python/FastAPI service
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
# AI model serving (GPU)
resources:
requests:
cpu: 2000m
memory: 8Gi
nvidia.com/gpu: 1
limits:
cpu: 4000m
memory: 16Gi
nvidia.com/gpu: 1Setting memory limits too low can cause OOMKill events. Monitor actual memory usage via Grafana before setting production limits, and always set requests equal to or lower than limits.
Next Steps
- CI/CD — Continuous integration and deployment pipelines.
- Observability — Metrics, logs, and traces.
- Infrastructure Overview — Cloud architecture and AWS services.