Multilingual Setup
How to set up Onyx to support languages other than English
For the majority of use cases, Onyx provides an easy way to configure the system for multilingual use. To skip the details and just go to setup, see the Configuration section below.
How it works
Onyx has two main flows that assume English as a default. To use Onyx in multilingual or non-english cases, these must be handled.
- The vector portion of the hybrid-search/reranking. This selects relevant documents to pass to the Large Language Model as context
- The prompts for the Large Language Model to instruct it on how to answer user queries.
To address the first point, the English first embedding/reranking models can be swapped out for multilingual models. Additionally query expansion is applied to rephrase the user query into the target languages.
To address the second point, additional instructions are given to the LLM to respect the input language and to respond in the same language as the user query.
Note: The Onyx LLM prompts are still in English, just with added instructions if the user configures multilingual settings (see below). If all of your documents and queries are in some other language, it will be better to simply translate all of Onyx’s prompts directly to that target language. You can do so here.
Configuration
Unless you are translating the LLM prompts yourself, the configuration for multilingual can be done entirely via
environment variables. For Docker Compose, simply make an .env
file in onyx/deployment/docker_compose
and
populate it with the desired values before deploying.
Update the values below to suit your particular needs, this example shows using Onyx with both French and English. The settings below have descriptions to help you configure values according to your use case.
An up to date template of multilingual settings is also provided directly in the code repository in the same
directory as where the .env
file should be placed.
See here for reference.
Example
If you’ve configured everything correctly, you should now be able to get good answers across the different languages that you’ve set the system up for. See below for an example using French.