Guide

You can configure Onyx to use models served by Ollama.
1

Setup Ollama and Deploy your Models

The Ollama GitHub repository details how to download and deploy models on Ollama.By default, Ollama is configured to run on port 11434.
2

Navigate to AI Model Configuration Page

Access the Admin Panel from your user profile icon → Admin PanelLLM
3

Configure Ollama

Select Add Custom LLM Provider from the available providers.Give your provider a Display Name.Enter your model’s Provider Name.
The Provider Name must match Litellm’s list of supported providers.
In this example, the provider name is vertex_ai.Custom inference provider name
4

Configure Optional Fields and Models

Enter the Ollama Base URL. It should look something like http://localhost:11434/v1.
Do not forget the /v1 suffix in the URL!
Fill out the other optional fields if applicable.In the Model Configurations section, enter each of the models you want to use with Ollama.
5

Configure Default and Fast Models

The Default Model is selected automatically for new custom Agents and Chat sessions.Designating a Fast Model is optional. This Fast Model is used behind the scenes for quick operations such as evaluating the type of message, generating different queries (query expansion), and naming the chat session.
If you select a Fast Model, make sure it is a relatively quick and cost-effective model like GPT-4.1-mini or Claude 3.7 Sonnet.
6

Designate Provider Access

Lastly, you may select whether or not the provider is public to all users in Onyx.If set to private, the provider’s models will be available to Admins and User Groups you explicitly assign the provider to.