Skip to main content
Version: 1.0

aixplain.utils.file_utils

Copyright 2022 The aiXplain SDK authors

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

save_file

def save_file(
download_url: Text,
download_file_path: Optional[Union[str,
Path]] = None) -> Union[str, Path]

[view_source]

Download and save a file from a given URL.

This function downloads a file from the specified URL and saves it either to a specified path or to a generated path in the 'aiXplain' directory.

Arguments:

  • download_url Text - URL of the file to download.
  • download_file_path Optional[Union[str, Path]], optional - Path where the downloaded file should be saved. If None, generates a folder 'aiXplain' in the current working directory and saves the file there with a UUID name. Defaults to None.

Returns:

Union[str, Path]: Path where the file was downloaded.

Notes:

If download_file_path is None, the file will be saved with a UUID name and the original file extension in the 'aiXplain' directory.

download_data

def download_data(url_link: str, local_filename: Optional[str] = None) -> str

[view_source]

Download a file from a URL with streaming support.

This function downloads a file from the specified URL using streaming to handle large files efficiently. The file is downloaded in chunks to minimize memory usage.

Arguments:

  • url_link str - URL of the file to download.
  • local_filename Optional[str], optional - Local path where the file should be saved. If None, uses the last part of the URL as the filename. Defaults to None.

Returns:

  • str - Path to the downloaded file.

Raises:

  • requests.exceptions.RequestException - If the download fails or the server returns an error status.

upload_data

def upload_data(file_name: Union[Text, Path],
tags: Optional[List[Text]] = None,
license: Optional[License] = None,
is_temp: bool = True,
content_type: Text = "text/csv",
content_encoding: Optional[Text] = None,
nattempts: int = 2,
return_download_link: bool = False) -> str

[view_source]

Upload a file to S3 using pre-signed URLs with retry support.

This function handles file uploads to S3 by first obtaining a pre-signed URL from the aiXplain backend and then using it to upload the file. It supports both temporary and permanent storage with optional metadata like tags and license information.

Arguments:

  • file_name Union[Text, Path] - Local path of the file to upload.
  • tags Optional[List[Text]], optional - List of tags to associate with the file. Only used when is_temp is False. Defaults to None.
  • license Optional[License], optional - License to associate with the file. Only used when is_temp is False. Defaults to None.
  • is_temp bool, optional - Whether to upload as a temporary file. Temporary files have different handling and URL generation. Defaults to True.
  • content_type Text, optional - MIME type of the content being uploaded. Defaults to "text/csv".
  • content_encoding Optional[Text], optional - Content encoding of the file (e.g., 'gzip'). Defaults to None.
  • nattempts int, optional - Number of retry attempts for upload failures. Defaults to 2.
  • return_download_link bool, optional - If True, returns a direct download URL instead of the S3 path. Defaults to False.

Returns:

  • str - Either an S3 path (s3://bucket/key) or a download URL, depending on return_download_link parameter.

Raises:

  • Exception - If the upload fails after all retry attempts.

Notes:

The function will automatically retry failed uploads up to nattempts times before raising an exception.

s3_to_csv

def s3_to_csv(
s3_url: Text,
aws_credentials: Optional[Dict[Text, Text]] = {
"AWS_ACCESS_KEY_ID": None,
"AWS_SECRET_ACCESS_KEY": None
}
) -> str

[view_source]

Convert S3 directory contents to a CSV file with file listings.

This function takes an S3 URL and creates a CSV file containing listings of all files in that location. It handles both single files and directories, with special handling for directory structures.

Arguments:

  • s3_url Text - S3 URL in the format 's3://bucket-name/path'.
  • aws_credentials Optional[Dict[Text, Text]], optional - AWS credentials dictionary with 'AWS_ACCESS_KEY_ID' and 'AWS_SECRET_ACCESS_KEY'. If not provided or values are None, uses environment variables. Defaults to {"AWS_ACCESS_KEY_ID": None, "AWS_SECRET_ACCESS_KEY": None}.

Returns:

  • str - Path to the generated CSV file. The file contains listings of all files found in the S3 location.

Raises:

  • Exception - If:
    • boto3 is not installed
    • Invalid S3 URL format
    • AWS credentials are missing
    • Bucket doesn't exist
    • No files found
    • Files are at bucket root
    • Directory structure is invalid (unequal file counts or mismatched names)

Notes:

  • The function requires the boto3 package to be installed
  • The generated CSV will have a UUID as filename
  • For directory structures, all subdirectories must have the same number of files with matching prefixes