Resourcing Overview
Resource | Minimum | Preferred |
---|---|---|
CPU | 4 vCPU | 8+ vCPU |
RAM | 10 GB | 16+ GB |
Disk | 50 GB + ~2.5x indexed data | 500 GB for organizations <5000 users |
Vespa (the vector database used by Onyx) does not allow writes once disk usage hits 75%.
Local Deployment (Docker)
You can control the resources available to Docker in the Resources section of the Docker Desktop settings menu.Often old, unused Docker images take up sizeable disk space. To clean up unused images, run
docker system prune --all
.Cloud Providers (AWS, GCP, etc.)
For small to mid scale deployments, we recommend deploying Onyx to a single instance in your cloud provider of choice. When evaluating your instance, follow the Preferred resources in the table above.Provider | Recommended Instance Type |
---|---|
AWS | m7g.xlarge |
GCP | e2-standard-4 or e2-standard-8 |
Azure | D4s_v3 |
DigitalOcean | Meet the preferred resources in the table above |
Vespa on older CPUs
Vespa on older CPUs
Vespa requires Haswell (2013) or later CPUs.For older CPUs, use the
vespaengine/vespa-generic-intel-x86_64
image in your Docker Compose file.
This generic image is slower.For more details, see Vespa CPU Support.Container-Specific Resourcing
For more efficient scaling, you can dedicate resources to each Onyx container using Kubernetes or AWS EKS. See the Onyx Helm chartvalues.yaml
for our default requests and limits.
Component | CPU | Memory |
---|---|---|
api_server | 1 | 2 Gi |
background | 2 | 8 Gi |
indexing_model_server | 2 | 4 Gi |
inference_model_server | 2 | 4 Gi |
postgres | 2 | 2 Gi |
vespa | >= 4 | >= 8 Gi |
nginx | 250m (1/4) | 128 Mi |
The
vespa
recommendation is the bare minimum for a production deployment. With 50GB of documents,
we recommend at least 10 CPU, 20Gi Memory.How Resource Requirements Scale
The main driver of resource requirements is the number of indexed documents. This primarily affects the index component of Onyx (a Vespa vector database), which is responsible for storing the vectorized documents and handling search requests.Vespa’s resource requirements scale linearly with the document count.
- ~3GB of memory for each additional 1GB of documents
- ~1 CPU for each additional 2GB of documents
- The embedding model
- Whether you have quantization and dimensional reduction enabled
Resourcing Example
For a deployment with 10GB of text content, yourindex
component will need:
- CPU: 4 + 10 * 0.5 = 9 cores
- Memory: 4 + 10 * 3 = 34GB
= 13 CPU and >= 50GB of memory.Given these requirements, a
m7g.4xlarge
or a c5.9xlarge
EC2 instance would be appropriate.
If deploying with Kubernetes or AWS EKS, this would give a per-component resource allocation of:
Component | CPU | Memory |
---|---|---|
api_server | 1 | 2 Gi |
background | 2 | 8 Gi |
indexing_model_server | 2 | 4 Gi |
inference_model_server | 2 | 4 Gi |
postgres | 2 | 4 Gi |
vespa | 10 | 34 Gi |