aixplain.modules.data
__author__
Copyright 2022 The aiXplain SDK authors
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Author: aiXplain team Date: March 20th 2023 Description: Data Class
Data Objects
class Data()
A class representing a collection of data samples of the same type and genre.
This class provides functionality for managing data in the aiXplain platform, supporting various data types, languages, and storage formats. It can handle both structured (e.g., CSV) and unstructured data files.
Attributes:
id
Text - ID of the data collection.name
Text - Name of the data collection.dtype
DataType - Type of data (e.g., text, audio, image).privacy
Privacy - Privacy settings for the data.onboard_status
OnboardStatus - Current onboarding status.data_column
Optional[Any] - Column identifier where data is stored in structured files.start_column
Optional[Any] - Column identifier for start indexes in structured files.end_column
Optional[Any] - Column identifier for end indexes in structured files.files
List[File] - List of files containing the data instances.languages
List[Language] - List of languages present in the data.name
0 DataSubtype - Subtype categorization of the data.name
1 Optional[int] - Number of samples/rows in the data collection.name
2 dict - Additional keyword arguments for extensibility.
__init__
def __init__(id: Text,
name: Text,
dtype: DataType,
privacy: Privacy,
onboard_status: OnboardStatus,
data_column: Optional[Any] = None,
start_column: Optional[Any] = None,
end_column: Optional[Any] = None,
files: List[File] = [],
languages: List[Language] = [],
dsubtype: DataSubtype = DataSubtype.OTHER,
length: Optional[int] = None,
**kwargs) -> None
Initialize a new Data instance.
Arguments:
id
Text - ID of the data collection.name
Text - Name of the data collection.dtype
DataType - Type of data (e.g., text, audio, image).privacy
Privacy - Privacy settings for the data.onboard_status
OnboardStatus - Current onboarding status of the data.data_column
Optional[Any], optional - Column identifier where data is stored in structured files (e.g., CSV). If None, defaults to the value of name.start_column
Optional[Any], optional - Column identifier where start indexes are stored in structured files. Defaults to None.end_column
Optional[Any], optional - Column identifier where end indexes are stored in structured files. Defaults to None.files
List[File], optional - List of files containing the data instances. Defaults to empty list.languages
List[Language], optional - List of languages present in the data. Can be provided as Language enums or language codes. Defaults to empty list.name
0 DataSubtype, optional - Subtype categorization of the data (e.g., age, topic, race, split). Defaults to DataSubtype.OTHER.name
1 Optional[int], optional - Number of samples/rows in the data collection. Defaults to None.name
2 - Additional keyword arguments for extensibility.