Skip to main content

module aixplain.processes.data_onboarding.process_interval_files


function compress_folder

compress_folder(folder_path: str)

function process_interval

process_interval(
interval: Any,
storage_type: StorageType,
interval_folder: str
) → List[Dict]

Process text files

Args:

  • intervals (Any): content intervals to process the content
  • storage_type (StorageType): type of storage: URL, local path or textual content

Returns:

  • List[Dict]: content interval

function validate_format

validate_format(
index: int,
interval: Dict,
metadata: MetaData
) → ContentInterval

Validate the interval format

Args:

  • index (int): row index
  • interval (Dict): interval to be validated
  • metadata (MetaData): metadata

Returns:

  • ContentInterval: description

function run

run(
metadata: MetaData,
paths: List,
folder: Path,
batch_size: int = 1000
) → Tuple[List[File], int, int]

Process a list of local interval files, compress and upload them to pre-signed URLs in S3

Explanation: Each interval on "paths" is processed. If the interval content is in a public link or local file, it will be downloaded and added to an index CSV file. The intervals are processed in batches such that at each "batch_size" texts, the index CSV file is uploaded into a pre-signed URL in s3 and reset.

Args:

  • metadata (MetaData): meta data of the asset
  • paths (List): list of paths to local files
  • folder (Path): local folder to save compressed files before upload them to s3.

Returns:

  • Tuple[List[File], int, int]: list of s3 links, data colum index and number of rows