Core Concepts

Actions & MCP

Actions

Actions (also called Tools in the backend) are the functions that your Agents can perform to interact with external systems and services. They extend your agents’ capabilities beyond just the language model.Built-in Actions:

Name	Description	Requires Config	Provider Choices
Internal Search	Search through your organization’s indexed documents and knowledge base	Yes	Built-in with swappable components
Web Search	Search the internet for real-time information and current events	Yes	Google, Serper, Exa, Firecrawl (optional)
Code Interpreter	Execute Python code, analyze data, and generate visualizations	No	Built-in
Image Generation	Create images from text descriptions using AI models	Yes	OpenAI, Azure OpenAI

SCIM support for common IdPs is coming soon!

Custom Actions:

API Integrations: Connect to external REST APIs
Database Operations: Query and update databases
Workflow Automation: Trigger business processes
File Operations: Read, write, and manipulate files

You can define your own Custom Actions in the Admin Panel using an OpenAPI specification.

MCP

Model Context Protocol (MCP) is an open standard that enables AI assistants to securely access external data sources and tools. Onyx can be configured as an MCP client to interact with external systems, databases, and APIs in a controlled manner.Key features of MCP:

External Data Access: Connect to databases, APIs, and file systems
Authentication: Pass through OAuth to ensure secure access to your MCP server.

(Advanced) Custom Built-in Actions

Sometimes, you need more control over your action than what is possible with a Custom Action. Since Onyx is open-source, you can extend the built-in actions to your liking!To find templates for built-in actions, see backend/onyx/tools/tool_implementations in the Onyx repository.

Extending the codebase is not recommended for most users. Before you start, please reach out to us on Slack or Discord for support!

Agents

Agents are AI assistants with custom instructions, Actions, and data access that extend the base LLM’s capabilities.

The terms Personas, Assistants, and Agents are used interchangeably throughout Onyx and refer to the same concept.

Built-in Agents:

id: 0 Search Agent - Uses the Search Tool to answer questions from your knowledge base
id: -1 General Agent - Basic chat with no tools (basic chat with an LLM)
id: -2 Paraphrase Agent - Uses Search Tool and quotes exact snippets from sources
id: -3 Art Agent - Generates images and visual content

You can create your own Agents in the Admin Panel or by API.Most Chat endpoints require an Agent IDTo find your Agent ID, you can:

Use the GET /persona API endpoint to list all agents
In the Admin Panel: Click into an agent and check the first number in the URL

Chat

Streamed Chat Responses

The chat response system uses a packet-based architecture to deliver real-time responses to users. Instead of waiting for a complete response, the system breaks down the chat interaction into discrete packets that can be streamed incrementally.Every packet follows a consistent structure defined by the Packet class:

class Packet(BaseModel):
  ind: int        # Sequential index for ordering
  obj: PacketObj  # The actual content including type of packet

Streaming Flow:

A chat request triggers the streaming process
Various packet types are generated based on the required operations (reasoning, tool calls, AI response, documents, citations, etc.)
Packets are sent with sequential indices to maintain order
The frontend processes packets in real-time to update the UI
An OverallStop packet signals completion

Basic Message Packets

MessageStart and MessageDeltaThese packets form the core of the streaming response system:

MessageStart: Initiates a new message with initial content and final search documents (if any)
MessageDelta: Delivers incremental text content as it’s generated

Control Packets

Session and Section ManagementControl packets manage the flow and lifecycle of the streaming process:

OverallStop: Signals the end of the entire streaming session
SectionEnd: Marks the completion of a packet type (reasoning, message, citations, etc.)

Tool Packets

Tool responses are streamed in the same way as the main message response.Search Tools

SearchToolStart and SearchToolDelta handle document search operations

Image Generation

ImageGenerationToolStart, ImageGenerationToolDelta, and ImageGenerationToolHeartbeat manage AI image creation

Custom Tools

CustomToolStart and CustomToolDelta are used for MCP and custom Actions

The start packet signals the start of the tool response. The delta packets stream the results as they become available.

Reasoning Packets

Any reasoning steps are streamed so the frontend can render them as the system is processing. Reasoning packets are generally the first ones sent.

ReasoningStart: Begins a reasoning section
ReasoningDelta: Streams the AI’s reasoning process

Citation Packets

Citation packets associate citation ids with document ids.

CitationStart: Initiates citation results
CitationDelta: Delivers source citations and references

Connectors

When you see the term Connector in Onyx or elsewhere in this documentation, we’re generally referring to ConnectorCredentialPairs

Connectors

Connectors in Onyx define the data you would like to index

name: Not actually displayed in the UI if ConnectorCredentialPairMetadata:name is set
source: Which system to connect to (see DocumentSource accordion below)
input_type: How the Connector retrieves data (see InputType accordion below)
connector_specific_config: Source-specific settings like folder paths or channels. You will need to see /backend/onyx/connectors for the expected Connector-specific Configurations.
refresh_freq: How often to check for new or updated content in seconds
prune_freq: How often to remove old content from Onyx in seconds
indexing_start: Optional datetime to specify when indexing should begin

Python

class ConnectorBase(BaseModel):
  name: str
  source: DocumentSource
  input_type: InputType
  connector_specific_config: dict[str, Any]
  refresh_freq: int | None = None
  prune_freq: int | None = None
  indexing_start: datetime | None = None

Credentials

Credentials contain the authentication details needed to access data sources. These include API keys, OAuth tokens, personal access tokens (PATs), or service account credentials that allow Onyx to securely connect to your external systems.Types of Credentials:

API Keys: Simple token-based authentication
OAuth Tokens: Delegated authorization with refresh capabilities
Service Accounts: Machine-to-machine authentication
Personal Access Tokens: User-specific access credentials

ConnectorCredentialPairs

Behind the scenes, Connectors and Credentials are combined into a ConnectorCredentialPair (CC-pair). A CC-pair is an active connection that can sync data from your external sources into Onyx. CC-pairs are what you see and manage on the Admin Connectors page.CC-pair functionality:

Active Connections: Live data synchronization between source and Onyx
Status Monitoring: Track sync health and performance
Access Control: Manage who can see data from this connection
Configuration Management: Update sync settings and credentials

If you’re creating Connectors through the API, you must associate them with a Credential (CC-pair) to make them active!

ConnectorCredentialPairMetadata

ConnectorCredentialPairMetadata defines the configuration and access settings for a CC-pair.Configuration options:

name: Optional display name for the CC-pair (overrides the Connector name)
access_type: Who can access data from this CC-pair (see AccessType accordion below)
auto_sync_options: Optional configuration for automatic synchronization settings
groups: List of group IDs that have access to this CC-pair

Python

class ConnectorCredentialPairMetadata(BaseModel):
  name: str | None = None
  access_type: AccessType
  auto_sync_options: dict[str, Any] | None = None
  groups: list[int] = Field(default_factory=list)

Documents

DocumentBase

DocumentBase is a core structure used throughout Onyx for storing and managing document data. Note that the embeddings are stored in Vespa separately.

id: Unique identifier. Generated by Onyx if not provided
sections: List of content sections (see TextSection and ImageSection)
source: The system this document originated from (see DocumentSource)
semantic_identifier: Displayed in the UI as the name of the Document
metadata: Arbitrary string or list[string] that will be saved as tags for this Document
doc_updated_at: UTC timestamp when the document was last updated
chunk_count: Number of chunks the document is split into for processing
primary_owners: Metadata about people associated with the Document
secondary_owners: Metadata about people associated with the Document
title: Used for search (defaults to semantic_identifier if not specified)
from_ingestion_api: Whether this document came from the Ingestion API
additional_info: Connector-specific information that other parts of the code may need
external_access: Permission sync data (Enterprise Edition only)

The Ingestion API extends the DocumentBase definition to include cc_pair_id to automatically associate a document with a CC-pair.

Python

class DocumentBase(BaseModel):
  """Used for Onyx ingestion api, the ID is inferred before use if not provided"""

  id: str | None = None
  sections: list[TextSection | ImageSection]
  source: DocumentSource | None = None
  semantic_identifier: str
  metadata: dict[str, str | list[str]]

  doc_updated_at: datetime | None = None
  chunk_count: int | None = None

  primary_owners: list[BasicExpertInfo] | None = None
  secondary_owners: list[BasicExpertInfo] | None = None
  title: str | None = None
  from_ingestion_api: bool = False
  additional_info: Any = None

  external_access: ExternalAccess | None = None

DocumentSource

DocumentSource is an enum that defines the valid sources for a document. Uploading files to the Ingestion API and creating Connectors programmatically require specifying a DocumentSource.

Python

class DocumentSource(str, Enum):
  INGESTION_API = "ingestion_api"     # Special case, document passed in via Onyx APIs without specifying a source type
  SLACK = "slack"
  WEB = "web"
  GOOGLE_DRIVE = "google_drive"
  GMAIL = "gmail"
  REQUESTTRACKER = "requesttracker"
  GITHUB = "github"
  GITBOOK = "gitbook"
  GITLAB = "gitlab"
  GURU = "guru"
  BOOKSTACK = "bookstack"
  CONFLUENCE = "confluence"
  JIRA = "jira"
  SLAB = "slab"
  PRODUCTBOARD = "productboard"
  FILE = "file"
  NOTION = "notion"
  ZULIP = "zulip"
  LINEAR = "linear"
  HUBSPOT = "hubspot"
  DOCUMENT360 = "document360"
  GONG = "gong"
  GOOGLE_SITES = "google_sites"
  ZENDESK = "zendesk"
  LOOPIO = "loopio"
  DROPBOX = "dropbox"
  SHAREPOINT = "sharepoint"
  TEAMS = "teams"
  SALESFORCE = "salesforce"
  DISCOURSE = "discourse"
  AXERO = "axero"
  CLICKUP = "clickup"
  MEDIAWIKI = "mediawiki"
  WIKIPEDIA = "wikipedia"
  ASANA = "asana"
  S3 = "s3"
  R2 = "r2"
  GOOGLE_CLOUD_STORAGE = "google_cloud_storage"
  OCI_STORAGE = "oci_storage"
  XENFORO = "xenforo"
  NOT_APPLICABLE = "not_applicable"
  DISCORD = "discord"
  FRESHDESK = "freshdesk"
  FIREFLIES = "fireflies"
  EGNYTE = "egnyte"
  AIRTABLE = "airtable"
  HIGHSPOT = "highspot"

  IMAP = "imap"

  # Special case just for integration tests
  MOCK_CONNECTOR = "mock_connector"

TextSection

TextSection is a portion of a Document in Onyx.

text: The actual text content of the section
link: Optional URL that this text section relates to or was sourced from

Python

class TextSection(Section):
  text: str
  link: str | None = None

ImageSection

ImageSection is an image extracted from a Document in Onyx.

image_file_id: UUID of the image file stored in Onyx’s file store
text: Optional text description or caption for the image
link: Optional URL that this image section relates to or was sourced from

Python

class ImageSection(Section):
  image_file_id: str
  text: str | None = None
  link: str | None = None

AccessType

AccessType defines who can access data from a Connector in Onyx.

PUBLIC: All Onyx users may access data from this Connector
PRIVATE: Only the user who created the Connector and specified Groups may access data from this Connector
SYNC: Only Connectors with permission-sync support can be set to SYNC. The Connector will sync access permissions with the source system.

Python

class AccessType(str, Enum):
  PUBLIC = "public"
  PRIVATE = "private"
  SYNC = "sync"

InputType

InputType defines how a Connector retrieves data from its source system.

LOAD_STATE: Single load of data from the source
POLL: Continuous polling for new data from the source (starts with a full load)
EVENT: Not implemented for most Connectors
SLIM_RETRIEVAL: For permission-syncing Connectors

Python

class InputType(str, Enum):
  LOAD_STATE = "load_state"
  POLL = "poll"
  EVENT = "event"
  SLIM_RETRIEVAL = "slim_retrieval"

Getting Started

Guides & Examples

API Reference

Miscellaneous

Actions & MCP

Agents

Chat

Connectors

Documents

Next Steps

Guide: Index files with the Ingestion API

Guide: Send a Chat Message

Getting Started

Guides & Examples

API Reference

Miscellaneous

​Actions & MCP

​Agents

​Chat

​Connectors

​Documents

​Next Steps

Guide: Index files with the Ingestion API

Guide: Send a Chat Message

Actions & MCP

Agents

Chat

Connectors

Documents

Next Steps