Model Management
Register, version, and deploy AI models across environments.
Model Registry
The Model Registry is the central catalog for all AI models in your organization. It stores model artifacts, metadata, version history, and deployment status in a single source of truth. Every model registered in the platform gets a unique identifier and can be referenced across training jobs, inference endpoints, and A/B tests.
Models are organized by name and version. Each version is immutable once published — this guarantees reproducibility and makes rollbacks straightforward.
Registering a Model
Use the Riven CLI to register a new model. The CLI uploads your model artifacts to the platform's object store and creates a registry entry with the specified metadata.
# Register a new model from a local directory
riven models register \
--name my-model \
--framework vllm \
--version 1.0.0 \
--path ./model-weights/ \
--description "Fine-tuned Qwen3 for summarization"
# Register from a Hugging Face repo
riven models register \
--name qwen3-8b \
--framework vllm \
--source hf://Qwen/Qwen3-8BModel uploads are streamed and resumable. If the upload is interrupted, re-running the same command will pick up where it left off.
Model Versions
Every model supports multiple versions. Versions follow semantic versioning and can be promoted through stages: staging, production, and archived.
- staging — The default stage for newly registered versions. Used for testing and validation.
- production — Promoted versions that are actively serving traffic.
- archived — Retired versions retained for audit and reproducibility purposes.
# List all versions of a model
riven models versions my-model
# Promote a version to production
riven models promote my-model --version 1.0.0 --stage production
# Archive an old version
riven models promote my-model --version 0.9.0 --stage archivedSupported Frameworks
The platform supports multiple inference frameworks. Choose the one that best fits your model architecture and performance requirements:
| Framework | Best For | Notes |
|---|---|---|
| vLLM | Autoregressive text generation | High-throughput with PagedAttention |
| ONNX Runtime | Encoder models, classifiers, embeddings | Cross-platform inference |
| TensorRT | Maximum GPU throughput | Requires model compilation step |
| Custom | Any architecture | Bring your own serving container |
Model Configuration
Each model version can include a configuration file that specifies runtime parameters, resource requirements, and serving options. Place a model.yaml file in your model directory:
name: my-model
version: 1.0.0
framework: vllm
description: "Fine-tuned Qwen3-8B for code summarization"
runtime:
gpu_memory_utilization: 0.9
max_model_len: 4096
tensor_parallel_size: 1
dtype: float16
enforce_eager: false # Set true to disable CUDA graphs (debugging)
trust_remote_code: false
resources:
gpu: 1
gpu_type: a10g
memory: 24Gi
cpu: 4
serving:
max_batch_size: 64
timeout_seconds: 30
health_check_path: /health
max_concurrent_requests: 128
metadata:
base_model: "Qwen/Qwen3-8B"
training_dataset: "s3://datasets/code-summaries-v2"
eval_loss: 0.342If no model.yaml is provided, the platform infers sensible defaults based on the framework and model size. You can always override these later via the CLI or dashboard.
Error Handling
Registration Failures
| Error | Cause | Fix |
|---|---|---|
ALREADY_EXISTS | Version already registered | Bump the version number |
INVALID_ARGUMENT: unsupported framework | Unknown framework string | Use one of: vllm, onnx, tensorrt, custom |
RESOURCE_EXHAUSTED | Object store quota exceeded | Archive unused versions or increase quota |
PERMISSION_DENIED | Missing model registry write access | Request model-admin role from your org admin |
Runtime Errors
If a model fails to load at serving time, check the vLLM worker logs:
# Get logs from the model worker
kubectl logs -n ai-platform -l model=my-model --tail=100
# Common issues:
# - "CUDA out of memory" → reduce gpu_memory_utilization or use a larger GPU
# - "model not found" → verify the model path in S3
# - "tokenizer error" → ensure tokenizer files are included in the model artifacts