How it works

The Web Connector scrapes sites based on a base URL.
  • It only indexes files from the same domain and containing the same base path.
  • It will index pages reachable via hyperlinks from the base URL.
  • The text contents are cleaned up via some heuristics and some metadata such as the page Title is extracted.

Setting up

Authorization

  • As long as the page is reachable, no additional authorization is necessary.

Indexing

1

Open Web connector

Navigate to the Admin Panel and select the Web Connector.
2

Enter base URL and index

Input the base URL to index and click on Index.Onyx Web connector form to enter base URL for indexing
To see the status of the indexing, visit the Connectors Status page (top left). Onyx Connectors Status page showing web crawl progress