Resourcing Overview

To host Onyx, we recommend:
ResourceMinimumPreferred
CPU4 vCPU8+ vCPU
RAM10 GB16+ GB
Disk50 GB + ~2.5x indexed data500 GB for organizations <5000 users
Vespa does not allow writes once disk usage hits 75%. Make sure to always have some storage headroom.

Local Deployment (Docker)

You can control the resources available to Docker in the Resources section of the Docker Desktop settings menu. Old, unused Docker images take up sizeable disk space. To clean up unused images, run docker system prune --all.

Cloud Providers (AWS, GCP, etc.)

For small to mid scale deployments, we recommend deploying Onyx to a single instance in your cloud provider of choice.
For most use cases a single reasonably sized instance is enough for excellent performance!
When evaluating your instance, follow the Preferred resources in the table above.
ProviderRecommended Instance Type
AWSm7g.xlarge
GCPe2-standard-4 or e2-standard-8
AzureD4s_v3
DigitalOceanMeet the preferred resources in the table above

Container-Specific Resourcing

For more efficient scaling, you can dedicate resources to each Onyx container using Kubernetes or AWS EKS. See the Onyx Helm chart values.yaml for our default requests and limits.
ComponentCPUMemory
api_server12 Gi
background28 Gi
indexing_model_server24 Gi
inference_model_server24 Gi
postgres22 Gi
vespa>= 4>= 8 Gi
nginx250m (1/4)128 Mi
The vespa recommendation is the bare minimum for a production deployment. With 50GB of documents, we recommend at least 10 CPU, 20Gi Memory.
All together, this comes out to a total available node size of at least ~14 CPU and ~30GB of Memory.

How Resource Requirements Scale

The main driver of resource requirements is the number of indexed documents. This primarily affects the index component of Onyx (a Vespa vector database), which is responsible for storing the vectorized documents and handling search requests.
Vespa’s resource requirements scale linearly with the document count.
Based on our experience with large scale deployments, in addition to the previously mentioned minimums, Vespa needs:
  • ~3GB of memory for each additional 1GB of documents
  • ~1 CPU for each additional 2GB of documents
These are our rough estimates. Other factors that may affect resource requirements include:
  • The embedding model
  • Whether you have quantization and dimensional reduction enabled

Resourcing Example

For a deployment with 10GB of text content, your index component will need:
  • CPU: 4 + 10 * 0.5 = 9 cores
  • Memory: 4 + 10 * 3 = 34GB
If deploying in a single instance, this would be in addition to the base requirements. Overall, that would take us to >= 13 CPU and >= 50GB of memory. Given these requirements, a m7g.4xlarge or a c5.9xlarge EC2 instance would be appropriate. If deploying with Kubernetes or AWS EKS, this would give a per-component resource allocation of:
ComponentCPUMemory
api_server12 Gi
background28 Gi
indexing_model_server24 Gi
inference_model_server24 Gi
postgres24 Gi
vespa1034 Gi
Total available node size: ~20 CPU and ~60GB of Memory.

Next Steps