AI Platform Overview

Deploy, manage, and scale AI models with Riven's unified ML platform.

What is the AI Platform?

The Riven AI Platform is a unified machine learning infrastructure built on FastAPI and vLLM. It provides a complete lifecycle for AI models — from training and evaluation to deployment and monitoring — all within your existing Kubernetes cluster.

Whether you are serving large language models, running embedding pipelines, or generating images, the AI Platform abstracts away infrastructure complexity and lets your team focus on model quality and iteration speed.

Supported Models

The platform ships with first-class support for a variety of model families:

  • GPT-OSS — Open-source GPT-compatible models optimized for production throughput.
  • Qwen3 — Alibaba's Qwen3 family, supporting both dense and mixture-of-experts variants.
  • Embedding Models — Sentence transformers and custom embedding models for semantic search and RAG pipelines.
  • Image Generation — Stable Diffusion and compatible architectures for text-to-image workflows.

You can bring your own model weights in Hugging Face or GGUF format. The platform handles conversion and optimization automatically.

Architecture

The AI Platform follows a layered architecture designed for scalability and fault tolerance:

Architecture Overview
text
┌─────────────────────────────────────────────┐
│               API Gateway (FastAPI)          │
│          Authentication, Rate Limiting       │
└──────────────────┬──────────────────────────┘

┌──────────────────▼──────────────────────────┐
│              Model Router                    │
│     Request routing, A/B traffic splitting   │
└──────┬───────────┬───────────┬──────────────┘
       │           │           │
┌──────▼──┐  ┌─────▼───┐  ┌───▼──────┐
│  vLLM   │  │  vLLM   │  │  vLLM    │
│ Worker 1│  │ Worker 2│  │ Worker N │
└─────────┘  └─────────┘  └──────────┘

Requests enter through the API Gateway, which handles authentication and rate limiting. The Model Router directs traffic to the appropriate vLLM worker pool based on model name, version, and any active A/B test configuration. Each vLLM Worker runs an isolated model instance with GPU acceleration.

vLLM Deployment Walkthrough

Here is a complete walkthrough for deploying a vLLM-backed model endpoint from scratch:

1. Prepare the model

Terminal
bash
# Register the model from Hugging Face
riven models register \
  --name qwen3-8b \
  --framework vllm \
  --source hf://Qwen/Qwen3-8B
 
# Verify registration
riven models list

2. Create the deployment config

deploy-qwen3.yaml
yaml
endpoint: qwen3-8b
model:
  name: qwen3-8b
  version: latest
  framework: vllm
 
runtime:
  gpu_memory_utilization: 0.9
  max_model_len: 8192
  tensor_parallel_size: 1
  dtype: float16
 
resources:
  replicas: 2
  gpu: 1
  gpu_type: g5.2xlarge
  memory: 24Gi
  cpu: 4
 
autoscaling:
  enabled: true
  min_replicas: 1
  max_replicas: 4
  target_queue_depth: 10

3. Deploy and verify

Terminal
bash
# Deploy the endpoint
riven inference deploy --config deploy-qwen3.yaml
 
# Wait for health checks to pass
riven inference status qwen3-8b
 
# Test with a request
curl -X POST https://api.riven-ai.dev/v1/inference/qwen3-8b/generate \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Explain Kubernetes in one sentence.", "max_tokens": 50}'

Key Capabilities

  • Model Registry — Centralized catalog for all model artifacts, versions, and metadata.
  • Auto-scaling Inference — Dynamically scale worker replicas based on request queue depth and GPU utilization.
  • A/B Testing — Split traffic between model versions to measure quality regressions before full rollout.
  • Monitoring & Observability — Built-in Prometheus metrics, Grafana dashboards, and distributed tracing via Jaeger.
  • Multi-GPU Support — Tensor parallelism across multiple GPUs for large models that exceed single-GPU memory.

Getting Started

To start using the AI Platform, install the Riven CLI and authenticate with your cluster:

Terminal
bash
# Install the Riven CLI
curl -fsSL https://get.riven-ai.dev | bash
 
# Authenticate
riven auth login
 
# List available models
riven models list

From here, explore the Model Management guide to register your first model, or jump to Inference to deploy a pre-configured endpoint.