Model Management

Model Registry

The Model Registry is the central catalog for all AI models in your organization. It stores model artifacts, metadata, version history, and deployment status in a single source of truth. Every model registered in the platform gets a unique identifier and can be referenced across training jobs, inference endpoints, and A/B tests.

Models are organized by name and version. Each version is immutable once published — this guarantees reproducibility and makes rollbacks straightforward.

Registering a Model

Use the Riven CLI to register a new model. The CLI uploads your model artifacts to the platform's object store and creates a registry entry with the specified metadata.

Terminal

bash

# Register a new model from a local directory
riven models register \
  --name my-model \
  --framework vllm \
  --version 1.0.0 \
  --path ./model-weights/ \
  --description "Fine-tuned Qwen3 for summarization"
 
# Register from a Hugging Face repo
riven models register \
  --name qwen3-8b \
  --framework vllm \
  --source hf://Qwen/Qwen3-8B

Model uploads are streamed and resumable. If the upload is interrupted, re-running the same command will pick up where it left off.

Model Versions

Every model supports multiple versions. Versions follow semantic versioning and can be promoted through stages: staging, production, and archived.

staging — The default stage for newly registered versions. Used for testing and validation.
production — Promoted versions that are actively serving traffic.
archived — Retired versions retained for audit and reproducibility purposes.

Terminal

bash

# List all versions of a model
riven models versions my-model
 
# Promote a version to production
riven models promote my-model --version 1.0.0 --stage production
 
# Archive an old version
riven models promote my-model --version 0.9.0 --stage archived

Supported Frameworks

The platform supports multiple inference frameworks. Choose the one that best fits your model architecture and performance requirements:

Framework	Best For	Notes
vLLM	Autoregressive text generation	High-throughput with PagedAttention
ONNX Runtime	Encoder models, classifiers, embeddings	Cross-platform inference
TensorRT	Maximum GPU throughput	Requires model compilation step
Custom	Any architecture	Bring your own serving container

Model Configuration

Each model version can include a configuration file that specifies runtime parameters, resource requirements, and serving options. Place a model.yaml file in your model directory:

model.yaml

yaml

name: my-model
version: 1.0.0
framework: vllm
description: "Fine-tuned Qwen3-8B for code summarization"
 
runtime:
  gpu_memory_utilization: 0.9
  max_model_len: 4096
  tensor_parallel_size: 1
  dtype: float16
  enforce_eager: false            # Set true to disable CUDA graphs (debugging)
  trust_remote_code: false
 
resources:
  gpu: 1
  gpu_type: a10g
  memory: 24Gi
  cpu: 4
 
serving:
  max_batch_size: 64
  timeout_seconds: 30
  health_check_path: /health
  max_concurrent_requests: 128
 
metadata:
  base_model: "Qwen/Qwen3-8B"
  training_dataset: "s3://datasets/code-summaries-v2"
  eval_loss: 0.342

If no model.yaml is provided, the platform infers sensible defaults based on the framework and model size. You can always override these later via the CLI or dashboard.

Error Handling

Registration Failures

Error	Cause	Fix
`ALREADY_EXISTS`	Version already registered	Bump the version number
`INVALID_ARGUMENT: unsupported framework`	Unknown framework string	Use one of: `vllm`, `onnx`, `tensorrt`, `custom`
`RESOURCE_EXHAUSTED`	Object store quota exceeded	Archive unused versions or increase quota
`PERMISSION_DENIED`	Missing model registry write access	Request `model-admin` role from your org admin

Runtime Errors

If a model fails to load at serving time, check the vLLM worker logs:

Terminal

bash

# Get logs from the model worker
kubectl logs -n ai-platform -l model=my-model --tail=100
 
# Common issues:
# - "CUDA out of memory" → reduce gpu_memory_utilization or use a larger GPU
# - "model not found" → verify the model path in S3
# - "tokenizer error" → ensure tokenizer files are included in the model artifacts

PreviousAI Platform Overview NextTraining