At aiXplain, we currently categorize data assets into two types: Corpora and Datasets.

We will soon rework the aiXplain Data asset types, create new ones, and add support for external data sources.


A corpus is a flexible, context-rich data collection designed for general and exploratory data analysis use cases. aiXplain provides an extensive collection of general-purpose corpora that can be explored, processed, and utilized to create task-specific datasets.


Contrastingly, a dataset is a compilation of data with specified inputs and outputs tailored for a specific ML task such as Speech Recognition, Machine Translation, Sentiment Analysis, etc. These datasets are designed to address particular research questions or applications requiring fine-tuning or benchmarking an ML model.