module aixplain.processes.data_onboarding.process_interval_files
function compress_folder
compress_folder(folder_path: str)
function process_interval
process_interval(
interval: Any,
storage_type: StorageType,
interval_folder: str
) → List[Dict]
Process text files
Args:
intervals
(Any): content intervals to process the contentstorage_type
(StorageType): type of storage: URL, local path or textual content
Returns:
List[Dict]
: content interval
function validate_format
validate_format(
index: int,
interval: Dict,
metadata: MetaData
) → ContentInterval
Validate the interval format
Args:
index
(int): row indexinterval
(Dict): interval to be validatedmetadata
(MetaData): metadata
Returns:
ContentInterval
: description
function run
run(
metadata: MetaData,
paths: List,
folder: Path,
batch_size: int = 1000
) → Tuple[List[File], int, int]
Process a list of local interval files, compress and upload them to pre-signed URLs in S3
Explanation: Each interval on "paths" is processed. If the interval content is in a public link or local file, it will be downloaded and added to an index CSV file. The intervals are processed in batches such that at each "batch_size" texts, the index CSV file is uploaded into a pre-signed URL in s3 and reset.
Args:
metadata
(MetaData): meta data of the assetpaths
(List): list of paths to local filesfolder
(Path): local folder to save compressed files before upload them to s3.
Returns:
Tuple[List[File], int, int]
: list of s3 links, data colum index and number of rows