Documentation Index
Fetch the complete documentation index at: https://docs.onyx.app/llms.txt
Use this file to discover all available pages before exploring further.
Resourcing Overview
Onyx Lite
| Resource | Minimum | Preferred |
|---|---|---|
| CPU | 2 vCPU | 4 vCPU |
| RAM | 2 GB | 4 GB |
| Disk | 10 GB | 50 GB |
Onyx Lite uses under 1GB of memory at baseline.
Disk and memory usage scale with the number of files users upload to the system,
since PostgreSQL handles file storage in Lite mode.
Onyx Standard
| Resource | Minimum | Preferred |
|---|---|---|
| CPU | 4 vCPU | 8+ vCPU |
| RAM | 10 GB | 16+ GB |
| Disk | 32 GB + ~2.5x indexed data | 500 GB for organizations <5000 users |
Local Deployment (Docker)
You can control the resources available to Docker in the Resources section of the Docker Desktop settings menu.Often old, unused Docker images take up sizeable disk space. To clean up dangling images, run
docker image prune.Cloud Providers (AWS, GCP, etc.)
For small to mid scale deployments, we recommend deploying Onyx to a single instance in your cloud provider of choice. When evaluating your instance, follow the Preferred resources in the table above.Onyx Lite
| Provider | Recommended Instance Type |
|---|---|
| AWS | t3.medium |
| GCP | e2-medium |
| Azure | B2s |
Onyx Standard
| Provider | Recommended Instance Type |
|---|---|
| AWS | m7g.xlarge |
| GCP | e2-standard-4 or e2-standard-8 |
| Azure | D4s_v3 |
| DigitalOcean | Meet the preferred resources in the table above |
Container-Specific Resourcing (Standard)
For more efficient scaling, you can dedicate resources to each Onyx container using Kubernetes or AWS EKS. See the Onyx Helm chartvalues.yaml for our default requests and limits.
| Component | CPU | Memory |
|---|---|---|
api_server | 1 | 2 Gi |
background | 2 | 8 Gi |
indexing_model_server | 2 | 4 Gi |
inference_model_server | 2 | 4 Gi |
postgres | 2 | 2 Gi |
opensearch | 2 | 4 Gi |
nginx | 250m (1/4) | 128 Mi |
If you are using cloud-based embedding models (e.g. OpenAI, Cohere, etc.) instead of locally hosted ones,
the
indexing_model_server and inference_model_server will use significantly less memory.Container-Specific Resourcing (Lite)
Onyx Lite runs only four services. All storage is consolidated onto PostgreSQL.| Component | CPU | Memory |
|---|---|---|
api_server | 1 | 1 Gi |
web_server | 250m (1/4) | 512 Mi |
postgres | 1 | 1 Gi |
nginx | 250m (1/4) | 128 Mi |
Memory usage in Lite mode scales with the number of user-uploaded files, since PostgreSQL handles file storage,
caching, and session management.
How Resource Requirements Scale
The main driver of resource requirements for Standard mode is the number of indexed documents. This primarily affects the search index (OpenSearch), which is responsible for storing documents and handling search requests.OpenSearch Memory
OpenSearch memory is split roughly 50/50 between the JVM heap and the OS file system cache. Both halves are critical — the heap handles indexing and search operations while the file system cache keeps frequently accessed index segments in memory for fast reads. Key rules for JVM heap sizing:- Set
XmsandXmxto 50% of available RAM (the other 50% goes to OS/file cache) - Never exceed 32 GB heap — beyond this, Java disables compressed ordinary object pointers, causing significant performance degradation
Storage Overhead
OpenSearch adds overhead on top of the raw source data. The formula for on-disk storage is:Onyx defaults to 0 replicas for single-node deployments, so storage is approximately 1.45× the source data size.
Scaling Guidelines
OpenSearch resource requirements scale linearly with the volume of indexed data. The exact ratio depends on deployment size — large distributed clusters are more efficient per GB than single-node deployments due to fixed per-node overhead (cluster management, garbage collection, segment merging). Industry guidelines for large clusters suggest a memory-to-data ratio of around 1:16 for search-heavy workloads. However, for the single-node deployments typical of self-hosted Onyx, the fixed overhead per node is a much larger fraction of total resources. Based on our experience, we recommend the following for single-node or small-cluster deployments:| Scale | Memory per 1 GB of source docs | CPU per 1 GB of source docs |
|---|---|---|
| Small (< 5 GB) | ~2 GB | ~0.25 CPU |
| Medium (5–50 GB) | ~1.5 GB | ~0.25 CPU |
| Large (50+ GB) | ~1 GB | ~0.2 CPU |
- The embedding model and vector dimensions
- Whether you have quantization and dimensional reduction enabled
- Query throughput and concurrency
Resourcing Example
For a deployment with 10GB of text content, youropensearch component will need:
- Memory: 4 (base) + 10 × 1.5 = 19 GB
- CPU: 2 (base) + 10 × 0.25 = 4.5 cores
= 9 CPU and >= 35GB of memory.Given these requirements, a
m7g.2xlarge or c5.4xlarge EC2 instance would be appropriate.
If deploying with Kubernetes or AWS EKS, this would give a per-component resource allocation of:
| Component | CPU | Memory |
|---|---|---|
api_server | 1 | 2 Gi |
background | 2 | 8 Gi |
indexing_model_server | 2 | 4 Gi |
inference_model_server | 2 | 4 Gi |
postgres | 2 | 4 Gi |
opensearch | 5 | 19 Gi |
Next Steps
Guide: Deploy Onyx Locally
Deploy Onyx locally with Docker.
Guide: Deploy on AWS
Deploy Onyx on an EC2 instance.