Model Management

Register, version, and deploy AI models across environments.

Model Registry

The Model Registry is the central catalog for all AI models in your organization. It stores model artifacts, metadata, version history, and deployment status in a single source of truth. Every model registered in the platform gets a unique identifier and can be referenced across training jobs, inference endpoints, and A/B tests.

Models are organized by name and version. Each version is immutable once published — this guarantees reproducibility and makes rollbacks straightforward.

Registering a Model

Use the Riven CLI to register a new model. The CLI uploads your model artifacts to the platform's object store and creates a registry entry with the specified metadata.

Terminal
bash
# Register a new model from a local directory
riven models register \
  --name my-model \
  --framework vllm \
  --version 1.0.0 \
  --path ./model-weights/ \
  --description "Fine-tuned Qwen3 for summarization"
 
# Register from a Hugging Face repo
riven models register \
  --name qwen3-8b \
  --framework vllm \
  --source hf://Qwen/Qwen3-8B

Model uploads are streamed and resumable. If the upload is interrupted, re-running the same command will pick up where it left off.

Model Versions

Every model supports multiple versions. Versions follow semantic versioning and can be promoted through stages: staging, production, and archived.

  • staging — The default stage for newly registered versions. Used for testing and validation.
  • production — Promoted versions that are actively serving traffic.
  • archived — Retired versions retained for audit and reproducibility purposes.
Terminal
bash
# List all versions of a model
riven models versions my-model
 
# Promote a version to production
riven models promote my-model --version 1.0.0 --stage production
 
# Archive an old version
riven models promote my-model --version 0.9.0 --stage archived

Supported Frameworks

The platform supports multiple inference frameworks. Choose the one that best fits your model architecture and performance requirements:

FrameworkBest ForNotes
vLLMAutoregressive text generationHigh-throughput with PagedAttention
ONNX RuntimeEncoder models, classifiers, embeddingsCross-platform inference
TensorRTMaximum GPU throughputRequires model compilation step
CustomAny architectureBring your own serving container

Model Configuration

Each model version can include a configuration file that specifies runtime parameters, resource requirements, and serving options. Place a model.yaml file in your model directory:

model.yaml
yaml
name: my-model
version: 1.0.0
framework: vllm
description: "Fine-tuned Qwen3-8B for code summarization"
 
runtime:
  gpu_memory_utilization: 0.9
  max_model_len: 4096
  tensor_parallel_size: 1
  dtype: float16
  enforce_eager: false            # Set true to disable CUDA graphs (debugging)
  trust_remote_code: false
 
resources:
  gpu: 1
  gpu_type: a10g
  memory: 24Gi
  cpu: 4
 
serving:
  max_batch_size: 64
  timeout_seconds: 30
  health_check_path: /health
  max_concurrent_requests: 128
 
metadata:
  base_model: "Qwen/Qwen3-8B"
  training_dataset: "s3://datasets/code-summaries-v2"
  eval_loss: 0.342

If no model.yaml is provided, the platform infers sensible defaults based on the framework and model size. You can always override these later via the CLI or dashboard.

Error Handling

Registration Failures

ErrorCauseFix
ALREADY_EXISTSVersion already registeredBump the version number
INVALID_ARGUMENT: unsupported frameworkUnknown framework stringUse one of: vllm, onnx, tensorrt, custom
RESOURCE_EXHAUSTEDObject store quota exceededArchive unused versions or increase quota
PERMISSION_DENIEDMissing model registry write accessRequest model-admin role from your org admin

Runtime Errors

If a model fails to load at serving time, check the vLLM worker logs:

Terminal
bash
# Get logs from the model worker
kubectl logs -n ai-platform -l model=my-model --tail=100
 
# Common issues:
# - "CUDA out of memory" → reduce gpu_memory_utilization or use a larger GPU
# - "model not found" → verify the model path in S3
# - "tokenizer error" → ensure tokenizer files are included in the model artifacts