Auxx.ai
Datasets

Datasets overview

Datasets store and index documents for AI-powered semantic search and knowledge retrieval in Auxx.ai.

Datasets are collections of documents that Auxx.ai processes and indexes for AI-powered search. They power the Retrieval-Augmented Generation (RAG) pipeline — when your AI workflows need context, they query datasets to find relevant information from your uploaded documents.

Access datasets from Resources > Datasets in the sidebar.

Datasets overview page showing stats cards, status filter, and dataset grid

How datasets work

  1. You create a dataset and upload documents (PDFs, text files, markdown, etc.)
  2. Auxx.ai extracts the text content from each document
  3. The text is split into smaller segments (chunks) for better search accuracy
  4. Each segment is converted into a vector embedding using an AI model
  5. When a workflow or search query runs, Auxx.ai finds the most relevant segments and returns them as context

Overview page

The datasets overview page shows all your datasets with summary statistics at the top:

StatDescription
Total DatasetsNumber of datasets, with active/processing breakdown
Total DocumentsCombined document count across all datasets
Storage UsedTotal storage consumed by all documents
Processing IssuesNumber of documents with processing errors

Each dataset card displays:

  • Dataset name and status badge (active, inactive, processing, error)
  • Document count
  • Last updated time
  • Total size
  • Creator name

Use the Status dropdown to filter by dataset status and the Search bar to find datasets by name.

Dataset statuses

StatusDescription
ActiveDataset is available for queries and searches
InactiveDataset exists but is excluded from searches
ProcessingDocuments are being indexed
ErrorOne or more documents failed to process

Dataset detail page

Click a dataset to open its detail page. The header shows key metrics — status, document count, storage used, creation date, and the configured embedding model.

Dataset detail page showing Documents tab with status, segments, and document list

The detail page has three tabs:

TabPurpose
DocumentsUpload, view, and manage documents in the dataset
SearchTest search queries against the dataset
SettingsConfigure chunking, embedding, and search options

Supported document types

TypeExtensions
PDF.pdf
Word.docx
Plain text.txt
HTML.html
Markdown.md
CSV.csv
JSON.json
XML.xml

Next steps