Infrastructure Overview

Cloud infrastructure powering the Riven platform.

Cloud Architecture

The Riven platform runs on AWS infrastructure in the us-east-1 region. The architecture follows cloud-native best practices with Kubernetes as the primary orchestration layer, managed services for databases and storage, and infrastructure as code for all provisioning.

High-level architecture
text
┌─────────────────────────────────────────────┐
│                  AWS (us-east-1)            │
│  ┌─────────────────────────────────────┐    │
│  │              VPC                    │    │
│  │  ┌──────────┐  ┌──────────────┐    │    │
│  │  │  Public   │  │   Private    │    │    │
│  │  │ Subnets   │  │   Subnets    │    │    │
│  │  │  (ALB)    │  │  (EKS/RDS)   │    │    │
│  │  └──────────┘  └──────────────┘    │    │
│  └─────────────────────────────────────┘    │
│  ┌──────┐ ┌─────┐ ┌─────┐ ┌───────────┐   │
│  │ ECR  │ │ S3  │ │ RDS │ │ DynamoDB  │   │
│  └──────┘ └─────┘ └─────┘ └───────────┘   │
└─────────────────────────────────────────────┘

Infrastructure is provisioned using Pulumi (with state stored in S3) and Terraform, ensuring all resources are version-controlled and reproducible.

AWS Services

The platform leverages the following core AWS services:

  • EKS (Elastic Kubernetes Service) — Managed Kubernetes cluster for running all platform workloads with managed node groups.
  • ECR (Elastic Container Registry) — Private Docker image registry for all service images, scanned for vulnerabilities on push.
  • ALB (Application Load Balancer) — Ingress layer for routing external traffic to Kubernetes services with TLS termination.
  • S3 — Object storage for Pulumi state, build artifacts, model weights, and backup data.
  • RDS — Managed PostgreSQL databases for services requiring relational storage.
  • DynamoDB — NoSQL storage for high-throughput, low-latency workloads such as session management and feature flags.

Kubernetes Cluster

The EKS cluster is the backbone of the platform, running all microservices, observability tools, CI runners, and AI model serving workloads. The cluster uses managed node groups with auto-scaling based on workload demands.

Cluster namespaces
text
# Core namespaces
dev-center          # Dev Center services
monitoring          # Grafana, Prometheus, Loki, Jaeger
ci                  # Self-hosted GitHub Actions runners
ai-platform         # Model serving and inference
auth                # Identity and authorization services
ingress-nginx       # Ingress controller

The cluster runs Kubernetes 1.29+ and is upgraded regularly following the EKS release calendar. Node groups are rotated during upgrades for zero-downtime transitions.

Networking

The VPC is configured with public and private subnets across multiple availability zones. Public subnets host the ALB for inbound traffic, while all workloads run in private subnets with no direct internet access.

  • Public subnets — ALB, NAT Gateways for outbound traffic from private subnets.
  • Private subnets — EKS worker nodes, RDS instances, and all application workloads.
  • VPC Endpoints — Private connectivity to S3, ECR, and other AWS services without traversing the public internet.

Security

Security is layered across the infrastructure with defense-in-depth principles:

  • IAM Roles — Least-privilege IAM roles for all services, with separate roles for CI/CD, application workloads, and infrastructure management.
  • IRSA — IAM Roles for Service Accounts allow Kubernetes pods to assume AWS IAM roles without long-lived credentials.
  • Network Policies — Kubernetes NetworkPolicies restrict pod-to-pod communication, enforcing micro-segmentation within the cluster.
  • Secrets Management — Secrets are stored in AWS Secrets Manager and synced to Kubernetes via the External Secrets Operator.

Never store secrets in Helm values files or environment variables directly. Always use the External Secrets Operator to reference secrets from AWS Secrets Manager.

Next Steps

  • Kubernetes — Cluster architecture and workload management.
  • CI/CD — Continuous integration and deployment pipelines.
  • Observability — Metrics, logs, and traces.