Skip to main content
Version: 1.0

aixplain.modules.data

__author__

Copyright 2022 The aiXplain SDK authors

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Author: aiXplain team Date: March 20th 2023 Description: Data Class

Data Objects

class Data()

[view_source]

A class representing a collection of data samples of the same type and genre.

This class provides functionality for managing data in the aiXplain platform, supporting various data types, languages, and storage formats. It can handle both structured (e.g., CSV) and unstructured data files.

Attributes:

  • id Text - ID of the data collection.
  • name Text - Name of the data collection.
  • dtype DataType - Type of data (e.g., text, audio, image).
  • privacy Privacy - Privacy settings for the data.
  • onboard_status OnboardStatus - Current onboarding status.
  • data_column Optional[Any] - Column identifier where data is stored in structured files.
  • start_column Optional[Any] - Column identifier for start indexes in structured files.
  • end_column Optional[Any] - Column identifier for end indexes in structured files.
  • files List[File] - List of files containing the data instances.
  • languages List[Language] - List of languages present in the data.
  • name0 DataSubtype - Subtype categorization of the data.
  • name1 Optional[int] - Number of samples/rows in the data collection.
  • name2 dict - Additional keyword arguments for extensibility.

__init__

def __init__(id: Text,
name: Text,
dtype: DataType,
privacy: Privacy,
onboard_status: OnboardStatus,
data_column: Optional[Any] = None,
start_column: Optional[Any] = None,
end_column: Optional[Any] = None,
files: List[File] = [],
languages: List[Language] = [],
dsubtype: DataSubtype = DataSubtype.OTHER,
length: Optional[int] = None,
**kwargs) -> None

[view_source]

Initialize a new Data instance.

Arguments:

  • id Text - ID of the data collection.
  • name Text - Name of the data collection.
  • dtype DataType - Type of data (e.g., text, audio, image).
  • privacy Privacy - Privacy settings for the data.
  • onboard_status OnboardStatus - Current onboarding status of the data.
  • data_column Optional[Any], optional - Column identifier where data is stored in structured files (e.g., CSV). If None, defaults to the value of name.
  • start_column Optional[Any], optional - Column identifier where start indexes are stored in structured files. Defaults to None.
  • end_column Optional[Any], optional - Column identifier where end indexes are stored in structured files. Defaults to None.
  • files List[File], optional - List of files containing the data instances. Defaults to empty list.
  • languages List[Language], optional - List of languages present in the data. Can be provided as Language enums or language codes. Defaults to empty list.
  • name0 DataSubtype, optional - Subtype categorization of the data (e.g., age, topic, race, split). Defaults to DataSubtype.OTHER.
  • name1 Optional[int], optional - Number of samples/rows in the data collection. Defaults to None.
  • name2 - Additional keyword arguments for extensibility.