For the majority of use cases, Onyx provides an easy way to configure the system for multilingual use. To skip the details and just go to setup, see the Configuration section below.

How it works

Onyx has two main flows that assume English as a default. To use Onyx in multilingual or non-english cases, these must be handled.

  • The vector portion of the hybrid-search/reranking. This selects relevant documents to pass to the Large Language Model as context
  • The prompts for the Large Language Model to instruct it on how to answer user queries.

To address the first point, the English first embedding/reranking models can be swapped out for multilingual models. Additionally query expansion is applied to rephrase the user query into the target languages.

To address the second point, additional instructions are given to the LLM to respect the input language and to respond in the same language as the user query.

Note: The Onyx LLM prompts are still in English, just with added instructions if the user configures multilingual settings (see below). If all of your documents and queries are in some other language, it will be better to simply translate all of Onyx’s prompts directly to that target language. You can do so here.

Configuration

Unless you are translating the LLM prompts yourself, the configuration for multilingual can be done entirely via environment variables. For Docker Compose, simply make an .env file in onyx/deployment/docker_compose and populate it with the desired values before deploying.

Update the values below to suit your particular needs, this example shows using Onyx with both French and English. The settings below have descriptions to help you configure values according to your use case.

# Rephrase the user query in specified languages using LLM, use comma separated values
MULTILINGUAL_QUERY_EXPANSION="English, French"

# A recent MIT license multilingual model: https://huggingface.co/intfloat/multilingual-e5-small
DOCUMENT_ENCODER_MODEL="intfloat/multilingual-e5-small"

# The model above is trained with the following prefix for queries and passages to improve retrieval
# by letting the model know which of the two type is currently being embedded
ASYM_QUERY_PREFIX="query: "
ASYM_PASSAGE_PREFIX="passage: "

# Depends model by model, the one shown above is tuned with this as True
NORMALIZE_EMBEDDINGS="True"

# Use LLM to determine if chunks are relevant to the query
# May not work well for languages that do not have much training data in the LLM training set
# If using a common language like Spanish, French, Chinese, etc. this can be kept turned on
DISABLE_LLM_CHUNK_FILTER="True"

# The default reranking models are English first
# There are no great quality French/English reranking models currently so turning this off
ENABLE_RERANKING_ASYNC_FLOW="False"
ENABLE_RERANKING_REAL_TIME_FLOW="False"

# Enables fine-grained embeddings for better retrieval
# At the cost of indexing speed (~5x slower), query time is same speed
# Since reranking is turned off and multilingual retrieval is generally harder
# it is advised to turn this one on
ENABLE_MINI_CHUNK="True"

# Using a stronger LLM will help with multilingual tasks
# Since documents may be in multiple languages, and there are additional instructions to respond
# in the user query's language, it is advised to use the best model possible
GEN_AI_MODEL_VERSION="gpt-4"

An up to date template of multilingual settings is also provided directly in the code repository in the same directory as where the .env file should be placed. See here for reference.

Example

If you’ve configured everything correctly, you should now be able to get good answers across the different languages that you’ve set the system up for. See below for an example using French.