module aixplain.processes.data_onboarding.process_text_files
function process_text
process_text(content: str, storage_type: StorageType) → str
Process text files
Args:
content
(str): URL with text, local path with text or textual contentstorage_type
(StorageType): type of storage: URL, local path or textual content
Returns:
Text
: textual content
function run
run(
metadata: MetaData,
paths: List,
folder: Path,
batch_size: int = 1000
) → Tuple[List[File], int, int]
Process a list of local textual files, compress and upload them to pre-signed URLs in S3
Explanation: Each text on "paths" is processed. If the text is in a public link or local file, it will be downloaded and added to an index CSV file. The texts are processed in batches such that at each "batch_size" texts, the index CSV file is uploaded into a pre-signed URL in s3 and reset.
Args:
metadata
(MetaData): meta data of the assetpaths
(List): list of paths to local filesfolder
(Path): local folder to save compressed files before upload them to s3.
Returns:
Tuple[List[File], int, int]
: list of s3 links, data colum index and number of rows