Indexing Flow
Documents from connectors and user uploads are processed by the Onyx indexing pipeline. With default configurations no data ever leaves the deployment. The general processing outline goes as follows:- Documents, metadata, and access permissions are pulled in from connectors
- Documents are processed into text through document parsing utilities
- The texts are chunked and passed through deep learning (embedding) models
- These representations are stored in the vector database
- Optionally (default off), an LLM can be used to extract entities and relations from the documents and represent them as a graph within Postgres

Note that overriding the default configurations may means that documents will be sent to your selected third party
services for processing
- API based embedding model. Teams may choose to do this instead of choosing between running their own GPUs, using a less capable embedding model, or accepting a slower initial indexing.
- Third party document-to-text service. Some third party services provide better processing using large vision models and other approaches. This can yield better extraction of text from your documents.
- Connecting an LLM for the generation of the knowledge graph. The knowledge graph provides an additional representation of the connected knowledge and can be used to answer more abstract type questions.
Query Flow

Configurable External Services
Admins of the system can configure support for external services to enrich the user experience.It is recommended to enable these functionalities to let your users get the most of out Onyx.