Backend APIs
Ingestion API
Ingest Arbitrary Documents
Typical uses for the Ingestion API
A set of backend APIs are provided to take in and process arbitrary document data sent directly to the backend API server. This is generally used for:
- Creating arbitrary documents that may not exist in any actual sources but contain useful information.
- Programmatically passing in documents to Onyx. This is sometimes simpler than creating a connector.
- Editing specific documents in Onyx when the Onyx admin either does not want to or does not have permission to update the original document in the source.
- Supplementing existing connector functionalities. For example, passing in README file contents and attributing it to the GitHub or GitLab source type.
Example Document Ingestion
This example creates a new Document in Onyx of the “Web” type. This document will now show up in Onyx’s search flows like any other webpage pulled in by a Web connector.
Note: The Bearer auth token is generated on server startup in Onyx MIT. There is better API Key support as part of Onyx EE.
See below for a breakdown of the different fields provided:
id
: this is the unique ID of the document, if a document of this ID exists it will be updated/replaced. If not provided, a document ID is generated from the semantic_identifier field instead and returned in the response.sections
: list of sections each containing textual content and an optional link. The document chunking tries to avoid splitting sections internally and favors splitting at section borders. Also the link of the document at query time is the link of the best matched section.source
: Source type, full list can be checked by searching for DocumentSource heresemantic_identifier
: This is the “Title” of the document as shown in the UI (see image below)metadata
: Used for the “Tags” feature which is displayed in the UI. The values can be either strings or list of stringsdoc_updated_at
: The time that the document was last considered updated. By default there is a time based score decay around this value when the document is considered during search.cc_pair_id
: This is the “Connector” ID seen on the Connector Status pages. For example, if running locally, it might behttp://localhost:3000/admin/connector/2
. This allows attaching the ingestion doc to existing connectors so they can be assigned to groups or deleted together with the connector. If not provided or set to1
explicitly, it is considered part of the default catch-all connector.
For even more details, the code for the relevant object is found here, called “DocumentBase”
Checking Ingestion Documents
An API is also provided to fetch all of the documents that have been indexed via the Ingestion API