Skip to main content

module aixplain.modules.pipeline.pipeline


class ObjectDetectionInputs

method __init__

__init__(node=None)

class ObjectDetectionOutputs

method __init__

__init__(node=None)

class ObjectDetection

Object Detection is a computer vision technology that identifies and locates objects within an image, typically by drawing bounding boxes around the detected objects and classifying them into predefined categories.

InputType: video OutputType: text


class LanguageIdentificationInputs

method __init__

__init__(node=None)

class LanguageIdentificationOutputs

method __init__

__init__(node=None)

class LanguageIdentification

Language Identification is the process of automatically determining the language in which a given piece of text is written.

InputType: text OutputType: text


class OcrInputs

method __init__

__init__(node=None)

class OcrOutputs

method __init__

__init__(node=None)

class Ocr

OCR, or Optical Character Recognition, is a technology that converts different types of documents, such as scanned paper documents, PDFs, or images captured by a digital camera, into editable and searchable data by recognizing and extracting text from the images.

InputType: image OutputType: text


class ScriptExecutionInputs

method __init__

__init__(node=None)

class ScriptExecutionOutputs

method __init__

__init__(node=None)

class ScriptExecution

Script Execution refers to the process of running a set of programmed instructions or code within a computing environment, enabling the automated performance of tasks, calculations, or operations as defined by the script.

InputType: text OutputType: text


class ImageLabelDetectionInputs

method __init__

__init__(node=None)

class ImageLabelDetectionOutputs

method __init__

__init__(node=None)

class ImageLabelDetection

Image Label Detection is a function that automatically identifies and assigns descriptive tags or labels to objects, scenes, or elements within an image, enabling easier categorization, search, and analysis of visual content.

InputType: image OutputType: label


class ImageCaptioningInputs

method __init__

__init__(node=None)

class ImageCaptioningOutputs

method __init__

__init__(node=None)

class ImageCaptioning

Image Captioning is a process that involves generating a textual description of an image, typically using machine learning models to analyze the visual content and produce coherent and contextually relevant sentences that describe the objects, actions, and scenes depicted in the image.

InputType: image OutputType: text


class AudioLanguageIdentificationInputs

method __init__

__init__(node=None)

class AudioLanguageIdentificationOutputs

method __init__

__init__(node=None)

class AudioLanguageIdentification

Audio Language Identification is a process that involves analyzing an audio recording to determine the language being spoken.

InputType: audio OutputType: label


class AsrAgeClassificationInputs

method __init__

__init__(node=None)

class AsrAgeClassificationOutputs

method __init__

__init__(node=None)

class AsrAgeClassification

The ASR Age Classification function is designed to analyze audio recordings of speech to determine the speaker's age group by leveraging automatic speech recognition (ASR) technology and machine learning algorithms.

InputType: audio OutputType: label


class BenchmarkScoringMtInputs

method __init__

__init__(node=None)

class BenchmarkScoringMtOutputs

method __init__

__init__(node=None)

class BenchmarkScoringMt

Benchmark Scoring MT is a function designed to evaluate and score machine translation systems by comparing their output against a set of predefined benchmarks, thereby assessing their accuracy and performance.

InputType: text OutputType: label


class AsrGenderClassificationInputs

method __init__

__init__(node=None)

class AsrGenderClassificationOutputs

method __init__

__init__(node=None)

class AsrGenderClassification

The ASR Gender Classification function analyzes audio recordings to determine and classify the speaker's gender based on their voice characteristics.

InputType: audio OutputType: label


class BaseModelInputs

method __init__

__init__(node=None)

class BaseModelOutputs

method __init__

__init__(node=None)

class BaseModel

The Base-Model function serves as a foundational framework designed to provide essential features and capabilities upon which more specialized or advanced models can be built and customized.

InputType: text OutputType: text


class LanguageIdentificationAudioInputs

method __init__

__init__(node=None)

class LanguageIdentificationAudioOutputs

method __init__

__init__(node=None)

class LanguageIdentificationAudio

The Language Identification Audio function analyzes audio input to determine and identify the language being spoken.

InputType: audio OutputType: label


class LoglikelihoodInputs

method __init__

__init__(node=None)

class LoglikelihoodOutputs

method __init__

__init__(node=None)

class Loglikelihood

The Log Likelihood function measures the probability of observing the given data under a specific statistical model by taking the natural logarithm of the likelihood function, thereby transforming the product of probabilities into a sum, which simplifies the process of optimization and parameter estimation.

InputType: text OutputType: number


class VideoEmbeddingInputs

method __init__

__init__(node=None)

class VideoEmbeddingOutputs

method __init__

__init__(node=None)

class VideoEmbedding

Video Embedding is a process that transforms video content into a fixed- dimensional vector representation, capturing essential features and patterns to facilitate tasks such as retrieval, classification, and recommendation.

InputType: video OutputType: embedding


class TextSegmenationInputs

method __init__

__init__(node=None)

class TextSegmenationOutputs

method __init__

__init__(node=None)

class TextSegmenation

Text Segmentation is the process of dividing a continuous text into meaningful units, such as words, sentences, or topics, to facilitate easier analysis and understanding.

InputType: text OutputType: text


class ImageEmbeddingInputs

method __init__

__init__(node=None)

class ImageEmbeddingOutputs

method __init__

__init__(node=None)

class ImageEmbedding

Image Embedding is a process that transforms an image into a fixed-dimensional vector representation, capturing its essential features and enabling efficient comparison, retrieval, and analysis in various machine learning and computer vision tasks.

InputType: image OutputType: text


class ImageManipulationInputs

method __init__

__init__(node=None)

class ImageManipulationOutputs

method __init__

__init__(node=None)

class ImageManipulation

Image Manipulation refers to the process of altering or enhancing digital images using various techniques and tools to achieve desired visual effects, correct imperfections, or transform the image's appearance.

InputType: image OutputType: image


class ImageToVideoGenerationInputs

method __init__

__init__(node=None)

class ImageToVideoGenerationOutputs

method __init__

__init__(node=None)

class ImageToVideoGeneration

The Image To Video Generation function transforms a series of static images into a cohesive, dynamic video sequence, often incorporating transitions, effects, and synchronization with audio to create a visually engaging narrative.

InputType: image OutputType: video


class AudioForcedAlignmentInputs

method __init__

__init__(node=None)

class AudioForcedAlignmentOutputs

method __init__

__init__(node=None)

class AudioForcedAlignment

Audio Forced Alignment is a process that synchronizes a given audio recording with its corresponding transcript by precisely aligning each spoken word or phoneme to its exact timing within the audio.

InputType: audio OutputType: audio


class BenchmarkScoringAsrInputs

method __init__

__init__(node=None)

class BenchmarkScoringAsrOutputs

method __init__

__init__(node=None)

class BenchmarkScoringAsr

Benchmark Scoring ASR is a function that evaluates and compares the performance of automatic speech recognition systems by analyzing their accuracy, speed, and other relevant metrics against a standardized set of benchmarks.

InputType: audio OutputType: label


class VisualQuestionAnsweringInputs

method __init__

__init__(node=None)

class VisualQuestionAnsweringOutputs

method __init__

__init__(node=None)

class VisualQuestionAnswering

Visual Question Answering (VQA) is a task in artificial intelligence that involves analyzing an image and providing accurate, contextually relevant answers to questions posed about the visual content of that image.

InputType: image OutputType: video


class DocumentImageParsingInputs

method __init__

__init__(node=None)

class DocumentImageParsingOutputs

method __init__

__init__(node=None)

class DocumentImageParsing

Document Image Parsing is the process of analyzing and converting scanned or photographed images of documents into structured, machine-readable formats by identifying and extracting text, layout, and other relevant information.

InputType: image OutputType: text


class DocumentInformationExtractionInputs

method __init__

__init__(node=None)

class DocumentInformationExtractionOutputs

method __init__

__init__(node=None)

class DocumentInformationExtraction

Document Information Extraction is the process of automatically identifying, extracting, and structuring relevant data from unstructured or semi-structured documents, such as invoices, receipts, contracts, and forms, to facilitate easier data management and analysis.

InputType: image OutputType: text


class DepthEstimationInputs

method __init__

__init__(node=None)

class DepthEstimationOutputs

method __init__

__init__(node=None)

class DepthEstimation

Depth estimation is a computational process that determines the distance of objects from a viewpoint, typically using visual data from cameras or sensors to create a three-dimensional understanding of a scene.

InputType: image OutputType: text


class VideoGenerationInputs

method __init__

__init__(node=None)

class VideoGenerationOutputs

method __init__

__init__(node=None)

class VideoGeneration

Video Generation is the process of creating video content through automated or semi-automated means, often utilizing algorithms, artificial intelligence, or software tools to produce visual and audio elements that can range from simple animations to complex, realistic scenes.

InputType: text OutputType: video


class ReferencelessAudioGenerationMetricInputs

method __init__

__init__(node=None)

class ReferencelessAudioGenerationMetricOutputs

method __init__

__init__(node=None)

class ReferencelessAudioGenerationMetric

The Referenceless Audio Generation Metric is a tool designed to evaluate the quality of generated audio content without the need for a reference or original audio sample for comparison.

InputType: text OutputType: text


class MultiClassImageClassificationInputs

method __init__

__init__(node=None)

class MultiClassImageClassificationOutputs

method __init__

__init__(node=None)

class MultiClassImageClassification

Multi Class Image Classification is a machine learning task where an algorithm is trained to categorize images into one of several predefined classes or categories based on their visual content.

InputType: image OutputType: label


class SemanticSegmentationInputs

method __init__

__init__(node=None)

class SemanticSegmentationOutputs

method __init__

__init__(node=None)

class SemanticSegmentation

Semantic segmentation is a computer vision process that involves classifying each pixel in an image into a predefined category, effectively partitioning the image into meaningful segments based on the objects or regions they represent.

InputType: image OutputType: label


class InstanceSegmentationInputs

method __init__

__init__(node=None)

class InstanceSegmentationOutputs

method __init__

__init__(node=None)

class InstanceSegmentation

Instance segmentation is a computer vision task that involves detecting and delineating each distinct object within an image, assigning a unique label and precise boundary to every individual instance of objects, even if they belong to the same category.

InputType: image OutputType: label


class ImageColorizationInputs

method __init__

__init__(node=None)

class ImageColorizationOutputs

method __init__

__init__(node=None)

class ImageColorization

Image colorization is a process that involves adding color to grayscale images, transforming them from black-and-white to full-color representations, often using advanced algorithms and machine learning techniques to predict and apply the appropriate hues and shades.

InputType: image OutputType: image


class AudioGenerationMetricInputs

method __init__

__init__(node=None)

class AudioGenerationMetricOutputs

method __init__

__init__(node=None)

class AudioGenerationMetric

The Audio Generation Metric is a quantitative measure used to evaluate the quality, accuracy, and overall performance of audio generated by artificial intelligence systems, often considering factors such as fidelity, intelligibility, and similarity to human-produced audio.

InputType: text OutputType: text


class ImageImpaintingInputs

method __init__

__init__(node=None)

class ImageImpaintingOutputs

method __init__

__init__(node=None)

class ImageImpainting

Image inpainting is a process that involves filling in missing or damaged parts of an image in a way that is visually coherent and seamlessly blends with the surrounding areas, often using advanced algorithms and techniques to restore the image to its original or intended appearance.

InputType: image OutputType: image


class StyleTransferInputs

method __init__

__init__(node=None)

class StyleTransferOutputs

method __init__

__init__(node=None)

class StyleTransfer

Style Transfer is a technique in artificial intelligence that applies the visual style of one image (such as the brushstrokes of a famous painting) to the content of another image, effectively blending the artistic elements of the first image with the subject matter of the second.

InputType: image OutputType: image


class MultiClassTextClassificationInputs

method __init__

__init__(node=None)

class MultiClassTextClassificationOutputs

method __init__

__init__(node=None)

class MultiClassTextClassification

Multi Class Text Classification is a natural language processing task that involves categorizing a given text into one of several predefined classes or categories based on its content.

InputType: text OutputType: label


class TextEmbeddingInputs

method __init__

__init__(node=None)

class TextEmbeddingOutputs

method __init__

__init__(node=None)

class TextEmbedding

Text embedding is a process that converts text into numerical vectors, capturing the semantic meaning and contextual relationships of words or phrases, enabling machines to understand and analyze natural language more effectively.

InputType: text OutputType: text


class MultiLabelTextClassificationInputs

method __init__

__init__(node=None)

class MultiLabelTextClassificationOutputs

method __init__

__init__(node=None)

class MultiLabelTextClassification

Multi Label Text Classification is a natural language processing task where a given text is analyzed and assigned multiple relevant labels or categories from a predefined set, allowing for the text to belong to more than one category simultaneously.

InputType: text OutputType: label


class TextReconstructionInputs

method __init__

__init__(node=None)

class TextReconstructionOutputs

method __init__

__init__(node=None)

class TextReconstruction

Text Reconstruction is a process that involves piecing together fragmented or incomplete text data to restore it to its original, coherent form.

InputType: text OutputType: text


class FactCheckingInputs

method __init__

__init__(node=None)

class FactCheckingOutputs

method __init__

__init__(node=None)

class FactChecking

Fact Checking is the process of verifying the accuracy and truthfulness of information, statements, or claims by cross-referencing with reliable sources and evidence.

InputType: text OutputType: label


class SpeechClassificationInputs

method __init__

__init__(node=None)

class SpeechClassificationOutputs

method __init__

__init__(node=None)

class SpeechClassification

Speech Classification is a process that involves analyzing and categorizing spoken language into predefined categories or classes based on various features such as tone, pitch, and linguistic content.

InputType: audio OutputType: label


class IntentClassificationInputs

method __init__

__init__(node=None)

class IntentClassificationOutputs

method __init__

__init__(node=None)

class IntentClassification

Intent Classification is a natural language processing task that involves analyzing and categorizing user text input to determine the underlying purpose or goal behind the communication, such as booking a flight, asking for weather information, or setting a reminder.

InputType: text OutputType: label


class PartOfSpeechTaggingInputs

method __init__

__init__(node=None)

class PartOfSpeechTaggingOutputs

method __init__

__init__(node=None)

class PartOfSpeechTagging

Part of Speech Tagging is a natural language processing task that involves assigning each word in a sentence its corresponding part of speech, such as noun, verb, adjective, or adverb, based on its role and context within the sentence.

InputType: text OutputType: label


class MetricAggregationInputs

method __init__

__init__(node=None)

class MetricAggregationOutputs

method __init__

__init__(node=None)

class MetricAggregation

Metric Aggregation is a function that computes and summarizes numerical data by applying statistical operations, such as averaging, summing, or finding the minimum and maximum values, to provide insights and facilitate analysis of large datasets.

InputType: text OutputType: text


class DialectDetectionInputs

method __init__

__init__(node=None)

class DialectDetectionOutputs

method __init__

__init__(node=None)

class DialectDetection

Dialect Detection is a function that identifies and classifies the specific regional or social variations of a language spoken or written by an individual, enabling the recognition of distinct linguistic patterns and nuances associated with different dialects.

InputType: audio OutputType: text


class InverseTextNormalizationInputs

method __init__

__init__(node=None)

class InverseTextNormalizationOutputs

method __init__

__init__(node=None)

class InverseTextNormalization

Inverse Text Normalization is the process of converting spoken or written language in its normalized form, such as numbers, dates, and abbreviations, back into their original, more complex or detailed textual representations.

InputType: text OutputType: label


class TextToAudioInputs

method __init__

__init__(node=None)

class TextToAudioOutputs

method __init__

__init__(node=None)

class TextToAudio

The Text to Audio function converts written text into spoken words, allowing users to listen to the content instead of reading it.

InputType: text OutputType: audio


class FillTextMaskInputs

method __init__

__init__(node=None)

class FillTextMaskOutputs

method __init__

__init__(node=None)

class FillTextMask

The "Fill Text Mask" function takes a text input with masked or placeholder characters and replaces those placeholders with specified or contextually appropriate characters to generate a complete and coherent text output.

InputType: text OutputType: text


class VideoContentModerationInputs

method __init__

__init__(node=None)

class VideoContentModerationOutputs

method __init__

__init__(node=None)

class VideoContentModeration

Video Content Moderation is the process of reviewing, analyzing, and filtering video content to ensure it adheres to community guidelines, legal standards, and platform policies, thereby preventing the dissemination of inappropriate, harmful, or illegal material.

InputType: video OutputType: label


class ExtractAudioFromVideoInputs

method __init__

__init__(node=None)

class ExtractAudioFromVideoOutputs

method __init__

__init__(node=None)

class ExtractAudioFromVideo

The "Extract Audio From Video" function allows users to separate and save the audio track from a video file, enabling them to obtain just the sound without the accompanying visual content.

InputType: video OutputType: audio


class ImageCompressionInputs

method __init__

__init__(node=None)

class ImageCompressionOutputs

method __init__

__init__(node=None)

class ImageCompression

Image compression is a process that reduces the file size of an image by removing redundant or non-essential data, while maintaining an acceptable level of visual quality.

InputType: image OutputType: image


class MultilingualSpeechRecognitionInputs

method __init__

__init__(node=None)

class MultilingualSpeechRecognitionOutputs

method __init__

__init__(node=None)

class MultilingualSpeechRecognition

Multilingual Speech Recognition is a technology that enables the automatic transcription of spoken language into text across multiple languages, allowing for seamless communication and understanding in diverse linguistic contexts.

InputType: audio OutputType: text


class ReferencelessTextGenerationMetricInputs

method __init__

__init__(node=None)

class ReferencelessTextGenerationMetricOutputs

method __init__

__init__(node=None)

class ReferencelessTextGenerationMetric

The Referenceless Text Generation Metric is a method for evaluating the quality of generated text without requiring a reference text for comparison, often leveraging models or algorithms to assess coherence, relevance, and fluency based on intrinsic properties of the text itself.

InputType: text OutputType: text


class TextGenerationMetricDefaultInputs

method __init__

__init__(node=None)

class TextGenerationMetricDefaultOutputs

method __init__

__init__(node=None)

class TextGenerationMetricDefault

The "Text Generation Metric Default" function provides a standard set of evaluation metrics for assessing the quality and performance of text generation models.

InputType: text OutputType: text


class NoiseRemovalInputs

method __init__

__init__(node=None)

class NoiseRemovalOutputs

method __init__

__init__(node=None)

class NoiseRemoval

Noise Removal is a process that involves identifying and eliminating unwanted random variations or disturbances from an audio signal to enhance the clarity and quality of the underlying information.

InputType: audio OutputType: audio


class AudioReconstructionInputs

method __init__

__init__(node=None)

class AudioReconstructionOutputs

method __init__

__init__(node=None)

class AudioReconstruction

Audio Reconstruction is the process of restoring or recreating audio signals from incomplete, damaged, or degraded recordings to achieve a high-quality, accurate representation of the original sound.

InputType: audio OutputType: audio


class VoiceCloningInputs

method __init__

__init__(node=None)

class VoiceCloningOutputs

method __init__

__init__(node=None)

class VoiceCloning

Voice cloning is a technology that uses artificial intelligence to create a digital replica of a person's voice, allowing for the generation of speech that mimics the tone, pitch, and speaking style of the original speaker.

InputType: text OutputType: audio


class DiacritizationInputs

method __init__

__init__(node=None)

class DiacritizationOutputs

method __init__

__init__(node=None)

class Diacritization

Diacritization is the process of adding diacritical marks to letters in a text to indicate pronunciation, stress, tone, or meaning, often used in languages such as Arabic, Hebrew, and Vietnamese to provide clarity and accuracy in written communication.

InputType: text OutputType: text


class AudioEmotionDetectionInputs

method __init__

__init__(node=None)

class AudioEmotionDetectionOutputs

method __init__

__init__(node=None)

class AudioEmotionDetection

Audio Emotion Detection is a technology that analyzes vocal characteristics and patterns in audio recordings to identify and classify the emotional state of the speaker.

InputType: audio OutputType: label


class TextSummarizationInputs

method __init__

__init__(node=None)

class TextSummarizationOutputs

method __init__

__init__(node=None)

class TextSummarization

Text summarization is the process of condensing a large body of text into a shorter version, capturing the main points and essential information while maintaining coherence and meaning.

InputType: text OutputType: text


class EntityLinkingInputs

method __init__

__init__(node=None)

class EntityLinkingOutputs

method __init__

__init__(node=None)

class EntityLinking

Entity Linking is the process of identifying and connecting mentions of entities within a text to their corresponding entries in a structured knowledge base, thereby enabling the disambiguation of terms and enhancing the understanding of the text's context.

InputType: text OutputType: label


class TextGenerationMetricInputs

method __init__

__init__(node=None)

class TextGenerationMetricOutputs

method __init__

__init__(node=None)

class TextGenerationMetric

A Text Generation Metric is a quantitative measure used to evaluate the quality and effectiveness of text produced by natural language processing models, often assessing aspects such as coherence, relevance, fluency, and adherence to given prompts or instructions.

InputType: text OutputType: text


class SplitOnLinebreakInputs

method __init__

__init__(node=None)

class SplitOnLinebreakOutputs

method __init__

__init__(node=None)

class SplitOnLinebreak

The "Split On Linebreak" function divides a given string into a list of substrings, using linebreaks (newline characters) as the points of separation.

InputType: text OutputType: text


class SentimentAnalysisInputs

method __init__

__init__(node=None)

class SentimentAnalysisOutputs

method __init__

__init__(node=None)

class SentimentAnalysis

Sentiment Analysis is a natural language processing technique used to determine and classify the emotional tone or subjective information expressed in a piece of text, such as identifying whether the sentiment is positive, negative, or neutral.

InputType: text OutputType: label


class KeywordSpottingInputs

method __init__

__init__(node=None)

class KeywordSpottingOutputs

method __init__

__init__(node=None)

class KeywordSpotting

Keyword Spotting is a function that enables the detection and identification of specific words or phrases within a stream of audio, often used in voice- activated systems to trigger actions or commands based on recognized keywords.

InputType: audio OutputType: label


class TextClassificationInputs

method __init__

__init__(node=None)

class TextClassificationOutputs

method __init__

__init__(node=None)

class TextClassification

Text Classification is a natural language processing task that involves categorizing text into predefined labels or classes based on its content, enabling automated organization, filtering, and analysis of large volumes of textual data.

InputType: text OutputType: label


class OtherMultipurposeInputs

method __init__

__init__(node=None)

class OtherMultipurposeOutputs

method __init__

__init__(node=None)

class OtherMultipurpose

The "Other (Multipurpose)" function serves as a versatile category designed to accommodate a wide range of tasks and activities that do not fit neatly into predefined classifications, offering flexibility and adaptability for various needs.

InputType: text OutputType: text


class SpeechSynthesisInputs

method __init__

__init__(node=None)

class SpeechSynthesisOutputs

method __init__

__init__(node=None)

class SpeechSynthesis

Speech synthesis is the artificial production of human speech, typically achieved through software or hardware systems that convert text into spoken words, enabling machines to communicate verbally with users.

InputType: text OutputType: audio


class AudioIntentDetectionInputs

method __init__

__init__(node=None)

class AudioIntentDetectionOutputs

method __init__

__init__(node=None)

class AudioIntentDetection

Audio Intent Detection is a process that involves analyzing audio signals to identify and interpret the underlying intentions or purposes behind spoken words, enabling systems to understand and respond appropriately to human speech.

InputType: audio OutputType: label


class VideoLabelDetectionInputs

method __init__

__init__(node=None)

class VideoLabelDetectionOutputs

method __init__

__init__(node=None)

class VideoLabelDetection

Video Label Detection is a function that automatically identifies and tags various objects, scenes, activities, and other relevant elements within a video, providing descriptive labels that enhance searchability and content organization.

InputType: video OutputType: label


class AsrQualityEstimationInputs

method __init__

__init__(node=None)

class AsrQualityEstimationOutputs

method __init__

__init__(node=None)

class AsrQualityEstimation

ASR Quality Estimation is a process that evaluates the accuracy and reliability of automatic speech recognition systems by analyzing their performance in transcribing spoken language into text.

InputType: text OutputType: label


class AudioTranscriptAnalysisInputs

method __init__

__init__(node=None)

class AudioTranscriptAnalysisOutputs

method __init__

__init__(node=None)

class AudioTranscriptAnalysis

Audio Transcript Analysis is a process that involves converting spoken language from audio recordings into written text, followed by examining and interpreting the transcribed content to extract meaningful insights, identify patterns, and derive actionable information.

InputType: audio OutputType: text


class SearchInputs

method __init__

__init__(node=None)

class SearchOutputs

method __init__

__init__(node=None)

The "Search" function allows users to input keywords or phrases to quickly locate specific information, files, or content within a database, website, or application.

InputType: text OutputType: text


class VideoForcedAlignmentInputs

method __init__

__init__(node=None)

class VideoForcedAlignmentOutputs

method __init__

__init__(node=None)

class VideoForcedAlignment

Video Forced Alignment is a process that synchronizes video footage with corresponding audio tracks by precisely aligning the visual and auditory elements, ensuring that the movements of speakers' lips match the spoken words.

InputType: video OutputType: video


class VisemeGenerationInputs

method __init__

__init__(node=None)

class VisemeGenerationOutputs

method __init__

__init__(node=None)

class VisemeGeneration

Viseme Generation is the process of creating visual representations of phonemes, which are the distinct units of sound in speech, to synchronize lip movements with spoken words in animations or virtual avatars.

InputType: text OutputType: label


class TopicClassificationInputs

method __init__

__init__(node=None)

class TopicClassificationOutputs

method __init__

__init__(node=None)

class TopicClassification

Topic Classification is a natural language processing function that categorizes text into predefined topics or subjects based on its content, enabling efficient organization and retrieval of information.

InputType: text OutputType: label


class OffensiveLanguageIdentificationInputs

method __init__

__init__(node=None)

class OffensiveLanguageIdentificationOutputs

method __init__

__init__(node=None)

class OffensiveLanguageIdentification

Offensive Language Identification is a function that analyzes text to detect and flag language that is abusive, harmful, or inappropriate, helping to maintain a respectful and safe communication environment.

InputType: text OutputType: label


class SpeechTranslationInputs

method __init__

__init__(node=None)

class SpeechTranslationOutputs

method __init__

__init__(node=None)

class SpeechTranslation

Speech Translation is a technology that converts spoken language in real-time from one language to another, enabling seamless communication between speakers of different languages.

InputType: audio OutputType: text


class SpeakerDiarizationAudioInputs

method __init__

__init__(node=None)

class SpeakerDiarizationAudioOutputs

method __init__

__init__(node=None)

class SpeakerDiarizationAudio

Speaker Diarization Audio is a process that involves segmenting an audio recording into distinct sections, each corresponding to a different speaker, in order to identify and differentiate between multiple speakers within the same audio stream.

InputType: audio OutputType: label


class AudioTranscriptImprovementInputs

method __init__

__init__(node=None)

class AudioTranscriptImprovementOutputs

method __init__

__init__(node=None)

class AudioTranscriptImprovement

Audio Transcript Improvement is a function that enhances the accuracy and clarity of transcribed audio recordings by correcting errors, refining language, and ensuring the text faithfully represents the original spoken content.

InputType: audio OutputType: text


class SpeechNonSpeechClassificationInputs

method __init__

__init__(node=None)

class SpeechNonSpeechClassificationOutputs

method __init__

__init__(node=None)

class SpeechNonSpeechClassification

The function "Speech or Non-Speech Classification" is designed to analyze audio input and determine whether the sound is human speech or non-speech noise, enabling applications such as voice recognition systems to filter out irrelevant background sounds.

InputType: audio OutputType: label


class TextDenormalizationInputs

method __init__

__init__(node=None)

class TextDenormalizationOutputs

method __init__

__init__(node=None)

class TextDenormalization

Text Denormalization is the process of converting abbreviated, contracted, or otherwise simplified text into its full, standard form, often to improve readability and ensure consistency in natural language processing tasks.

InputType: text OutputType: label


class ImageContentModerationInputs

method __init__

__init__(node=None)

class ImageContentModerationOutputs

method __init__

__init__(node=None)

class ImageContentModeration

Image Content Moderation is a process that involves analyzing and filtering images to detect and manage inappropriate, harmful, or sensitive content, ensuring compliance with community guidelines and legal standards.

InputType: image OutputType: label


class ReferencelessTextGenerationMetricDefaultInputs

method __init__

__init__(node=None)

class ReferencelessTextGenerationMetricDefaultOutputs

method __init__

__init__(node=None)

class ReferencelessTextGenerationMetricDefault

The Referenceless Text Generation Metric Default is a function designed to evaluate the quality of generated text without relying on reference texts for comparison.

InputType: text OutputType: text


class NamedEntityRecognitionInputs

method __init__

__init__(node=None)

class NamedEntityRecognitionOutputs

method __init__

__init__(node=None)

class NamedEntityRecognition

Named Entity Recognition (NER) is a natural language processing task that involves identifying and classifying proper nouns in text into predefined categories such as names of people, organizations, locations, dates, and other entities.

InputType: text OutputType: label


class TextContentModerationInputs

method __init__

__init__(node=None)

class TextContentModerationOutputs

method __init__

__init__(node=None)

class TextContentModeration

Text Content Moderation is the process of reviewing, filtering, and managing user-generated content to ensure it adheres to community guidelines, legal standards, and platform policies, thereby maintaining a safe and respectful online environment.

InputType: text OutputType: label


class SpeakerDiarizationVideoInputs

method __init__

__init__(node=None)

class SpeakerDiarizationVideoOutputs

method __init__

__init__(node=None)

class SpeakerDiarizationVideo

The Speaker Diarization Video function identifies and segments different speakers in a video, attributing portions of the audio to individual speakers to facilitate analysis and understanding of multi-speaker conversations.

InputType: video OutputType: label


class SplitOnSilenceInputs

method __init__

__init__(node=None)

class SplitOnSilenceOutputs

method __init__

__init__(node=None)

class SplitOnSilence

The "Split On Silence" function divides an audio recording into separate segments based on periods of silence, allowing for easier editing and analysis of individual sections.

InputType: audio OutputType: audio


class EmotionDetectionInputs

method __init__

__init__(node=None)

class EmotionDetectionOutputs

method __init__

__init__(node=None)

class EmotionDetection

Emotion Detection is a process that involves analyzing text to identify and categorize the emotional states or sentiments expressed by individuals, such as happiness, sadness, anger, or fear.

InputType: text OutputType: label


class TextSpamDetectionInputs

method __init__

__init__(node=None)

class TextSpamDetectionOutputs

method __init__

__init__(node=None)

class TextSpamDetection

Text Spam Detection is a process that involves analyzing and identifying unsolicited or irrelevant messages within text communications, typically using algorithms and machine learning techniques to filter out spam and ensure the integrity of the communication platform.

InputType: text OutputType: label


class TranslationInputs

method __init__

__init__(node=None)

class TranslationOutputs

method __init__

__init__(node=None)

class Translation

Translation is the process of converting text from one language into an equivalent text in another language, preserving the original meaning and context.

InputType: text OutputType: text


class VoiceActivityDetectionInputs

method __init__

__init__(node=None)

class VoiceActivityDetectionOutputs

method __init__

__init__(node=None)

class VoiceActivityDetection

Voice Activity Detection (VAD) is a technology that identifies the presence or absence of human speech within an audio signal, enabling systems to distinguish between spoken words and background noise.

InputType: audio OutputType: audio


class SpeechEmbeddingInputs

method __init__

__init__(node=None)

class SpeechEmbeddingOutputs

method __init__

__init__(node=None)

class SpeechEmbedding

Speech Embedding is a process that transforms spoken language into a fixed- dimensional vector representation, capturing essential features and characteristics of the speech for tasks such as recognition, classification, and analysis.

InputType: audio OutputType: text


class SubtitlingTranslationInputs

method __init__

__init__(node=None)

class SubtitlingTranslationOutputs

method __init__

__init__(node=None)

class SubtitlingTranslation

Subtitling Translation is the process of converting spoken dialogue from one language into written text in another language, which is then displayed on- screen to aid viewers in understanding the content.

InputType: text OutputType: text


class TextGenerationInputs

method __init__

__init__(node=None)

class TextGenerationOutputs

method __init__

__init__(node=None)

class TextGeneration

Text Generation is a process in which artificial intelligence models, such as neural networks, produce coherent and contextually relevant text based on a given input or prompt, often mimicking human writing styles and patterns.

InputType: text OutputType: text


class VideoUnderstandingInputs

method __init__

__init__(node=None)

class VideoUnderstandingOutputs

method __init__

__init__(node=None)

class VideoUnderstanding

Video Understanding is the process of analyzing and interpreting video content to extract meaningful information, such as identifying objects, actions, events, and contextual relationships within the footage.

InputType: video OutputType: text


class TextToVideoGenerationInputs

method __init__

__init__(node=None)

class TextToVideoGenerationOutputs

method __init__

__init__(node=None)

class TextToVideoGeneration

Text To Video Generation is a process that converts written descriptions or scripts into dynamic, visual video content using advanced algorithms and artificial intelligence.

InputType: text OutputType: video


class TextNormalizationInputs

method __init__

__init__(node=None)

class TextNormalizationOutputs

method __init__

__init__(node=None)

class TextNormalization

Text normalization is the process of transforming text into a standard, consistent format by correcting spelling errors, converting all characters to a uniform case, removing punctuation, and expanding abbreviations to improve the text's readability and usability for further processing or analysis.

InputType: text OutputType: label


class SpeechRecognitionInputs

method __init__

__init__(node=None)

class SpeechRecognitionOutputs

method __init__

__init__(node=None)

class SpeechRecognition

Speech recognition is a technology that enables a computer or device to identify and process spoken language, converting it into text.

InputType: audio OutputType: text


class SubtitlingInputs

method __init__

__init__(node=None)

class SubtitlingOutputs

method __init__

__init__(node=None)

class Subtitling

Subtitling is the process of displaying written text on a screen to represent the spoken dialogue, narration, or other audio elements in a video, typically to aid viewers who are deaf or hard of hearing, or to provide translations for audiences who speak different languages.

InputType: audio OutputType: text


class ClassificationMetricInputs

method __init__

__init__(node=None)

class ClassificationMetricOutputs

method __init__

__init__(node=None)

class ClassificationMetric

A Classification Metric is a quantitative measure used to evaluate the quality and effectiveness of classification models.

InputType: text OutputType: text


class TextToImageGenerationInputs

method __init__

__init__(node=None)

class TextToImageGenerationOutputs

method __init__

__init__(node=None)

class TextToImageGeneration

Text To Image Generation is a process where a system creates visual images based on descriptive text input, translating written language into corresponding graphical representations.

InputType: text OutputType: image


class Pipeline


method asr_age_classification

asr_age_classification(
asset_id: Union[str, Asset],
*args,
**kwargs
) → AsrAgeClassification

The ASR Age Classification function is designed to analyze audio recordings of speech to determine the speaker's age group by leveraging automatic speech recognition (ASR) technology and machine learning algorithms.


method asr_gender_classification

asr_gender_classification(
asset_id: Union[str, Asset],
*args,
**kwargs
) → AsrGenderClassification

The ASR Gender Classification function analyzes audio recordings to determine and classify the speaker's gender based on their voice characteristics.


method asr_quality_estimation

asr_quality_estimation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → AsrQualityEstimation

ASR Quality Estimation is a process that evaluates the accuracy and reliability of automatic speech recognition systems by analyzing their performance in transcribing spoken language into text.


method audio_emotion_detection

audio_emotion_detection(
asset_id: Union[str, Asset],
*args,
**kwargs
) → AudioEmotionDetection

Audio Emotion Detection is a technology that analyzes vocal characteristics and patterns in audio recordings to identify and classify the emotional state of the speaker.


method audio_forced_alignment

audio_forced_alignment(
asset_id: Union[str, Asset],
*args,
**kwargs
) → AudioForcedAlignment

Audio Forced Alignment is a process that synchronizes a given audio recording with its corresponding transcript by precisely aligning each spoken word or phoneme to its exact timing within the audio.


method audio_generation_metric

audio_generation_metric(
asset_id: Union[str, Asset],
*args,
**kwargs
) → AudioGenerationMetric

The Audio Generation Metric is a quantitative measure used to evaluate the quality, accuracy, and overall performance of audio generated by artificial intelligence systems, often considering factors such as fidelity, intelligibility, and similarity to human-produced audio.


method audio_intent_detection

audio_intent_detection(
asset_id: Union[str, Asset],
*args,
**kwargs
) → AudioIntentDetection

Audio Intent Detection is a process that involves analyzing audio signals to identify and interpret the underlying intentions or purposes behind spoken words, enabling systems to understand and respond appropriately to human speech.


method audio_language_identification

audio_language_identification(
asset_id: Union[str, Asset],
*args,
**kwargs
) → AudioLanguageIdentification

Audio Language Identification is a process that involves analyzing an audio recording to determine the language being spoken.


method audio_reconstruction

audio_reconstruction(
asset_id: Union[str, Asset],
*args,
**kwargs
) → AudioReconstruction

Audio Reconstruction is the process of restoring or recreating audio signals from incomplete, damaged, or degraded recordings to achieve a high-quality, accurate representation of the original sound.


method audio_transcript_analysis

audio_transcript_analysis(
asset_id: Union[str, Asset],
*args,
**kwargs
) → AudioTranscriptAnalysis

Audio Transcript Analysis is a process that involves converting spoken language from audio recordings into written text, followed by examining and interpreting the transcribed content to extract meaningful insights, identify patterns, and derive actionable information.


method audio_transcript_improvement

audio_transcript_improvement(
asset_id: Union[str, Asset],
*args,
**kwargs
) → AudioTranscriptImprovement

Audio Transcript Improvement is a function that enhances the accuracy and clarity of transcribed audio recordings by correcting errors, refining language, and ensuring the text faithfully represents the original spoken content.


method base_model

base_model(asset_id: Union[str, Asset], *args, **kwargs) → BaseModel

The Base-Model function serves as a foundational framework designed to provide essential features and capabilities upon which more specialized or advanced models can be built and customized.


method benchmark_scoring_asr

benchmark_scoring_asr(
asset_id: Union[str, Asset],
*args,
**kwargs
) → BenchmarkScoringAsr

Benchmark Scoring ASR is a function that evaluates and compares the performance of automatic speech recognition systems by analyzing their accuracy, speed, and other relevant metrics against a standardized set of benchmarks.


method benchmark_scoring_mt

benchmark_scoring_mt(
asset_id: Union[str, Asset],
*args,
**kwargs
) → BenchmarkScoringMt

Benchmark Scoring MT is a function designed to evaluate and score machine translation systems by comparing their output against a set of predefined benchmarks, thereby assessing their accuracy and performance.


method classification_metric

classification_metric(
asset_id: Union[str, Asset],
*args,
**kwargs
) → ClassificationMetric

A Classification Metric is a quantitative measure used to evaluate the quality and effectiveness of classification models.


method depth_estimation

depth_estimation(asset_id: Union[str, Asset], *args, **kwargs) → DepthEstimation

Depth estimation is a computational process that determines the distance of objects from a viewpoint, typically using visual data from cameras or sensors to create a three-dimensional understanding of a scene.


method diacritization

diacritization(asset_id: Union[str, Asset], *args, **kwargs) → Diacritization

Diacritization is the process of adding diacritical marks to letters in a text to indicate pronunciation, stress, tone, or meaning, often used in languages such as Arabic, Hebrew, and Vietnamese to provide clarity and accuracy in written communication.


method dialect_detection

dialect_detection(
asset_id: Union[str, Asset],
*args,
**kwargs
) → DialectDetection

Dialect Detection is a function that identifies and classifies the specific regional or social variations of a language spoken or written by an individual, enabling the recognition of distinct linguistic patterns and nuances associated with different dialects.


method document_image_parsing

document_image_parsing(
asset_id: Union[str, Asset],
*args,
**kwargs
) → DocumentImageParsing

Document Image Parsing is the process of analyzing and converting scanned or photographed images of documents into structured, machine-readable formats by identifying and extracting text, layout, and other relevant information.


method document_information_extraction

document_information_extraction(
asset_id: Union[str, Asset],
*args,
**kwargs
) → DocumentInformationExtraction

Document Information Extraction is the process of automatically identifying, extracting, and structuring relevant data from unstructured or semi-structured documents, such as invoices, receipts, contracts, and forms, to facilitate easier data management and analysis.


method emotion_detection

emotion_detection(
asset_id: Union[str, Asset],
*args,
**kwargs
) → EmotionDetection

Emotion Detection is a process that involves analyzing text to identify and categorize the emotional states or sentiments expressed by individuals, such as happiness, sadness, anger, or fear.


method entity_linking

entity_linking(asset_id: Union[str, Asset], *args, **kwargs) → EntityLinking

Entity Linking is the process of identifying and connecting mentions of entities within a text to their corresponding entries in a structured knowledge base, thereby enabling the disambiguation of terms and enhancing the understanding of the text's context.


method extract_audio_from_video

extract_audio_from_video(
asset_id: Union[str, Asset],
*args,
**kwargs
) → ExtractAudioFromVideo

The "Extract Audio From Video" function allows users to separate and save the audio track from a video file, enabling them to obtain just the sound without the accompanying visual content.


method fact_checking

fact_checking(asset_id: Union[str, Asset], *args, **kwargs) → FactChecking

Fact Checking is the process of verifying the accuracy and truthfulness of information, statements, or claims by cross-referencing with reliable sources and evidence.


method fill_text_mask

fill_text_mask(asset_id: Union[str, Asset], *args, **kwargs) → FillTextMask

The "Fill Text Mask" function takes a text input with masked or placeholder characters and replaces those placeholders with specified or contextually appropriate characters to generate a complete and coherent text output.


method image_captioning

image_captioning(asset_id: Union[str, Asset], *args, **kwargs) → ImageCaptioning

Image Captioning is a process that involves generating a textual description of an image, typically using machine learning models to analyze the visual content and produce coherent and contextually relevant sentences that describe the objects, actions, and scenes depicted in the image.


method image_colorization

image_colorization(
asset_id: Union[str, Asset],
*args,
**kwargs
) → ImageColorization

Image colorization is a process that involves adding color to grayscale images, transforming them from black-and-white to full-color representations, often using advanced algorithms and machine learning techniques to predict and apply the appropriate hues and shades.


method image_compression

image_compression(
asset_id: Union[str, Asset],
*args,
**kwargs
) → ImageCompression

Image compression is a process that reduces the file size of an image by removing redundant or non-essential data, while maintaining an acceptable level of visual quality.


method image_content_moderation

image_content_moderation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → ImageContentModeration

Image Content Moderation is a process that involves analyzing and filtering images to detect and manage inappropriate, harmful, or sensitive content, ensuring compliance with community guidelines and legal standards.


method image_embedding

image_embedding(asset_id: Union[str, Asset], *args, **kwargs) → ImageEmbedding

Image Embedding is a process that transforms an image into a fixed-dimensional vector representation, capturing its essential features and enabling efficient comparison, retrieval, and analysis in various machine learning and computer vision tasks.


method image_impainting

image_impainting(asset_id: Union[str, Asset], *args, **kwargs) → ImageImpainting

Image inpainting is a process that involves filling in missing or damaged parts of an image in a way that is visually coherent and seamlessly blends with the surrounding areas, often using advanced algorithms and techniques to restore the image to its original or intended appearance.


method image_label_detection

image_label_detection(
asset_id: Union[str, Asset],
*args,
**kwargs
) → ImageLabelDetection

Image Label Detection is a function that automatically identifies and assigns descriptive tags or labels to objects, scenes, or elements within an image, enabling easier categorization, search, and analysis of visual content.


method image_manipulation

image_manipulation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → ImageManipulation

Image Manipulation refers to the process of altering or enhancing digital images using various techniques and tools to achieve desired visual effects, correct imperfections, or transform the image's appearance.


method image_to_video_generation

image_to_video_generation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → ImageToVideoGeneration

The Image To Video Generation function transforms a series of static images into a cohesive, dynamic video sequence, often incorporating transitions, effects, and synchronization with audio to create a visually engaging narrative.


method instance_segmentation

instance_segmentation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → InstanceSegmentation

Instance segmentation is a computer vision task that involves detecting and delineating each distinct object within an image, assigning a unique label and precise boundary to every individual instance of objects, even if they belong to the same category.


method intent_classification

intent_classification(
asset_id: Union[str, Asset],
*args,
**kwargs
) → IntentClassification

Intent Classification is a natural language processing task that involves analyzing and categorizing user text input to determine the underlying purpose or goal behind the communication, such as booking a flight, asking for weather information, or setting a reminder.


method inverse_text_normalization

inverse_text_normalization(
asset_id: Union[str, Asset],
*args,
**kwargs
) → InverseTextNormalization

Inverse Text Normalization is the process of converting spoken or written language in its normalized form, such as numbers, dates, and abbreviations, back into their original, more complex or detailed textual representations.


method keyword_spotting

keyword_spotting(asset_id: Union[str, Asset], *args, **kwargs) → KeywordSpotting

Keyword Spotting is a function that enables the detection and identification of specific words or phrases within a stream of audio, often used in voice- activated systems to trigger actions or commands based on recognized keywords.


method language_identification

language_identification(
asset_id: Union[str, Asset],
*args,
**kwargs
) → LanguageIdentification

Language Identification is the process of automatically determining the language in which a given piece of text is written.


method language_identification_audio

language_identification_audio(
asset_id: Union[str, Asset],
*args,
**kwargs
) → LanguageIdentificationAudio

The Language Identification Audio function analyzes audio input to determine and identify the language being spoken.


method loglikelihood

loglikelihood(asset_id: Union[str, Asset], *args, **kwargs) → Loglikelihood

The Log Likelihood function measures the probability of observing the given data under a specific statistical model by taking the natural logarithm of the likelihood function, thereby transforming the product of probabilities into a sum, which simplifies the process of optimization and parameter estimation.


method metric_aggregation

metric_aggregation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → MetricAggregation

Metric Aggregation is a function that computes and summarizes numerical data by applying statistical operations, such as averaging, summing, or finding the minimum and maximum values, to provide insights and facilitate analysis of large datasets.


method multi_class_image_classification

multi_class_image_classification(
asset_id: Union[str, Asset],
*args,
**kwargs
) → MultiClassImageClassification

Multi Class Image Classification is a machine learning task where an algorithm is trained to categorize images into one of several predefined classes or categories based on their visual content.


method multi_class_text_classification

multi_class_text_classification(
asset_id: Union[str, Asset],
*args,
**kwargs
) → MultiClassTextClassification

Multi Class Text Classification is a natural language processing task that involves categorizing a given text into one of several predefined classes or categories based on its content.


method multi_label_text_classification

multi_label_text_classification(
asset_id: Union[str, Asset],
*args,
**kwargs
) → MultiLabelTextClassification

Multi Label Text Classification is a natural language processing task where a given text is analyzed and assigned multiple relevant labels or categories from a predefined set, allowing for the text to belong to more than one category simultaneously.


method multilingual_speech_recognition

multilingual_speech_recognition(
asset_id: Union[str, Asset],
*args,
**kwargs
) → MultilingualSpeechRecognition

Multilingual Speech Recognition is a technology that enables the automatic transcription of spoken language into text across multiple languages, allowing for seamless communication and understanding in diverse linguistic contexts.


method named_entity_recognition

named_entity_recognition(
asset_id: Union[str, Asset],
*args,
**kwargs
) → NamedEntityRecognition

Named Entity Recognition (NER) is a natural language processing task that involves identifying and classifying proper nouns in text into predefined categories such as names of people, organizations, locations, dates, and other entities.


method noise_removal

noise_removal(asset_id: Union[str, Asset], *args, **kwargs) → NoiseRemoval

Noise Removal is a process that involves identifying and eliminating unwanted random variations or disturbances from an audio signal to enhance the clarity and quality of the underlying information.


method object_detection

object_detection(asset_id: Union[str, Asset], *args, **kwargs) → ObjectDetection

Object Detection is a computer vision technology that identifies and locates objects within an image, typically by drawing bounding boxes around the detected objects and classifying them into predefined categories.


method ocr

ocr(asset_id: Union[str, Asset], *args, **kwargs) → Ocr

OCR, or Optical Character Recognition, is a technology that converts different types of documents, such as scanned paper documents, PDFs, or images captured by a digital camera, into editable and searchable data by recognizing and extracting text from the images.


method offensive_language_identification

offensive_language_identification(
asset_id: Union[str, Asset],
*args,
**kwargs
) → OffensiveLanguageIdentification

Offensive Language Identification is a function that analyzes text to detect and flag language that is abusive, harmful, or inappropriate, helping to maintain a respectful and safe communication environment.


method other__multipurpose_

other__multipurpose_(
asset_id: Union[str, Asset],
*args,
**kwargs
) → OtherMultipurpose

The "Other (Multipurpose)" function serves as a versatile category designed to accommodate a wide range of tasks and activities that do not fit neatly into predefined classifications, offering flexibility and adaptability for various needs.


method part_of_speech_tagging

part_of_speech_tagging(
asset_id: Union[str, Asset],
*args,
**kwargs
) → PartOfSpeechTagging

Part of Speech Tagging is a natural language processing task that involves assigning each word in a sentence its corresponding part of speech, such as noun, verb, adjective, or adverb, based on its role and context within the sentence.


method referenceless_audio_generation_metric

referenceless_audio_generation_metric(
asset_id: Union[str, Asset],
*args,
**kwargs
) → ReferencelessAudioGenerationMetric

The Referenceless Audio Generation Metric is a tool designed to evaluate the quality of generated audio content without the need for a reference or original audio sample for comparison.


method referenceless_text_generation_metric

referenceless_text_generation_metric(
asset_id: Union[str, Asset],
*args,
**kwargs
) → ReferencelessTextGenerationMetric

The Referenceless Text Generation Metric is a method for evaluating the quality of generated text without requiring a reference text for comparison, often leveraging models or algorithms to assess coherence, relevance, and fluency based on intrinsic properties of the text itself.


method referenceless_text_generation_metric_default

referenceless_text_generation_metric_default(
asset_id: Union[str, Asset],
*args,
**kwargs
) → ReferencelessTextGenerationMetricDefault

The Referenceless Text Generation Metric Default is a function designed to evaluate the quality of generated text without relying on reference texts for comparison.


method script_execution

script_execution(asset_id: Union[str, Asset], *args, **kwargs) → ScriptExecution

Script Execution refers to the process of running a set of programmed instructions or code within a computing environment, enabling the automated performance of tasks, calculations, or operations as defined by the script.


search(asset_id: Union[str, Asset], *args, **kwargs) → Search

The "Search" function allows users to input keywords or phrases to quickly locate specific information, files, or content within a database, website, or application.


method semantic_segmentation

semantic_segmentation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → SemanticSegmentation

Semantic segmentation is a computer vision process that involves classifying each pixel in an image into a predefined category, effectively partitioning the image into meaningful segments based on the objects or regions they represent.


method sentiment_analysis

sentiment_analysis(
asset_id: Union[str, Asset],
*args,
**kwargs
) → SentimentAnalysis

Sentiment Analysis is a natural language processing technique used to determine and classify the emotional tone or subjective information expressed in a piece of text, such as identifying whether the sentiment is positive, negative, or neutral.


method speaker_diarization_audio

speaker_diarization_audio(
asset_id: Union[str, Asset],
*args,
**kwargs
) → SpeakerDiarizationAudio

Speaker Diarization Audio is a process that involves segmenting an audio recording into distinct sections, each corresponding to a different speaker, in order to identify and differentiate between multiple speakers within the same audio stream.


method speaker_diarization_video

speaker_diarization_video(
asset_id: Union[str, Asset],
*args,
**kwargs
) → SpeakerDiarizationVideo

The Speaker Diarization Video function identifies and segments different speakers in a video, attributing portions of the audio to individual speakers to facilitate analysis and understanding of multi-speaker conversations.


method speech_classification

speech_classification(
asset_id: Union[str, Asset],
*args,
**kwargs
) → SpeechClassification

Speech Classification is a process that involves analyzing and categorizing spoken language into predefined categories or classes based on various features such as tone, pitch, and linguistic content.


method speech_embedding

speech_embedding(asset_id: Union[str, Asset], *args, **kwargs) → SpeechEmbedding

Speech Embedding is a process that transforms spoken language into a fixed- dimensional vector representation, capturing essential features and characteristics of the speech for tasks such as recognition, classification, and analysis.


method speech_non_speech_classification

speech_non_speech_classification(
asset_id: Union[str, Asset],
*args,
**kwargs
) → SpeechNonSpeechClassification

The function "Speech or Non-Speech Classification" is designed to analyze audio input and determine whether the sound is human speech or non-speech noise, enabling applications such as voice recognition systems to filter out irrelevant background sounds.


method speech_recognition

speech_recognition(
asset_id: Union[str, Asset],
*args,
**kwargs
) → SpeechRecognition

Speech recognition is a technology that enables a computer or device to identify and process spoken language, converting it into text.


method speech_synthesis

speech_synthesis(asset_id: Union[str, Asset], *args, **kwargs) → SpeechSynthesis

Speech synthesis is the artificial production of human speech, typically achieved through software or hardware systems that convert text into spoken words, enabling machines to communicate verbally with users.


method speech_translation

speech_translation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → SpeechTranslation

Speech Translation is a technology that converts spoken language in real-time from one language to another, enabling seamless communication between speakers of different languages.


method split_on_linebreak

split_on_linebreak(
asset_id: Union[str, Asset],
*args,
**kwargs
) → SplitOnLinebreak

The "Split On Linebreak" function divides a given string into a list of substrings, using linebreaks (newline characters) as the points of separation.


method split_on_silence

split_on_silence(asset_id: Union[str, Asset], *args, **kwargs) → SplitOnSilence

The "Split On Silence" function divides an audio recording into separate segments based on periods of silence, allowing for easier editing and analysis of individual sections.


method style_transfer

style_transfer(asset_id: Union[str, Asset], *args, **kwargs) → StyleTransfer

Style Transfer is a technique in artificial intelligence that applies the visual style of one image (such as the brushstrokes of a famous painting) to the content of another image, effectively blending the artistic elements of the first image with the subject matter of the second.


method subtitling

subtitling(asset_id: Union[str, Asset], *args, **kwargs) → Subtitling

Subtitling is the process of displaying written text on a screen to represent the spoken dialogue, narration, or other audio elements in a video, typically to aid viewers who are deaf or hard of hearing, or to provide translations for audiences who speak different languages.


method subtitling_translation

subtitling_translation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → SubtitlingTranslation

Subtitling Translation is the process of converting spoken dialogue from one language into written text in another language, which is then displayed on- screen to aid viewers in understanding the content.


method text_classification

text_classification(
asset_id: Union[str, Asset],
*args,
**kwargs
) → TextClassification

Text Classification is a natural language processing task that involves categorizing text into predefined labels or classes based on its content, enabling automated organization, filtering, and analysis of large volumes of textual data.


method text_content_moderation

text_content_moderation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → TextContentModeration

Text Content Moderation is the process of reviewing, filtering, and managing user-generated content to ensure it adheres to community guidelines, legal standards, and platform policies, thereby maintaining a safe and respectful online environment.


method text_denormalization

text_denormalization(
asset_id: Union[str, Asset],
*args,
**kwargs
) → TextDenormalization

Text Denormalization is the process of converting abbreviated, contracted, or otherwise simplified text into its full, standard form, often to improve readability and ensure consistency in natural language processing tasks.


method text_embedding

text_embedding(asset_id: Union[str, Asset], *args, **kwargs) → TextEmbedding

Text embedding is a process that converts text into numerical vectors, capturing the semantic meaning and contextual relationships of words or phrases, enabling machines to understand and analyze natural language more effectively.


method text_generation

text_generation(asset_id: Union[str, Asset], *args, **kwargs) → TextGeneration

Text Generation is a process in which artificial intelligence models, such as neural networks, produce coherent and contextually relevant text based on a given input or prompt, often mimicking human writing styles and patterns.


method text_generation_metric

text_generation_metric(
asset_id: Union[str, Asset],
*args,
**kwargs
) → TextGenerationMetric

A Text Generation Metric is a quantitative measure used to evaluate the quality and effectiveness of text produced by natural language processing models, often assessing aspects such as coherence, relevance, fluency, and adherence to given prompts or instructions.


method text_generation_metric_default

text_generation_metric_default(
asset_id: Union[str, Asset],
*args,
**kwargs
) → TextGenerationMetricDefault

The "Text Generation Metric Default" function provides a standard set of evaluation metrics for assessing the quality and performance of text generation models.


method text_normalization

text_normalization(
asset_id: Union[str, Asset],
*args,
**kwargs
) → TextNormalization

Text normalization is the process of transforming text into a standard, consistent format by correcting spelling errors, converting all characters to a uniform case, removing punctuation, and expanding abbreviations to improve the text's readability and usability for further processing or analysis.


method text_reconstruction

text_reconstruction(
asset_id: Union[str, Asset],
*args,
**kwargs
) → TextReconstruction

Text Reconstruction is a process that involves piecing together fragmented or incomplete text data to restore it to its original, coherent form.


method text_segmenation

text_segmenation(asset_id: Union[str, Asset], *args, **kwargs) → TextSegmenation

Text Segmentation is the process of dividing a continuous text into meaningful units, such as words, sentences, or topics, to facilitate easier analysis and understanding.


method text_spam_detection

text_spam_detection(
asset_id: Union[str, Asset],
*args,
**kwargs
) → TextSpamDetection

Text Spam Detection is a process that involves analyzing and identifying unsolicited or irrelevant messages within text communications, typically using algorithms and machine learning techniques to filter out spam and ensure the integrity of the communication platform.


method text_summarization

text_summarization(
asset_id: Union[str, Asset],
*args,
**kwargs
) → TextSummarization

Text summarization is the process of condensing a large body of text into a shorter version, capturing the main points and essential information while maintaining coherence and meaning.


method text_to_audio

text_to_audio(asset_id: Union[str, Asset], *args, **kwargs) → TextToAudio

The Text to Audio function converts written text into spoken words, allowing users to listen to the content instead of reading it.


method text_to_image_generation

text_to_image_generation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → TextToImageGeneration

Text To Image Generation is a process where a system creates visual images based on descriptive text input, translating written language into corresponding graphical representations.


method text_to_video_generation

text_to_video_generation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → TextToVideoGeneration

Text To Video Generation is a process that converts written descriptions or scripts into dynamic, visual video content using advanced algorithms and artificial intelligence.


method topic_classification

topic_classification(
asset_id: Union[str, Asset],
*args,
**kwargs
) → TopicClassification

Topic Classification is a natural language processing function that categorizes text into predefined topics or subjects based on its content, enabling efficient organization and retrieval of information.


method translation

translation(asset_id: Union[str, Asset], *args, **kwargs) → Translation

Translation is the process of converting text from one language into an equivalent text in another language, preserving the original meaning and context.


method video_content_moderation

video_content_moderation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → VideoContentModeration

Video Content Moderation is the process of reviewing, analyzing, and filtering video content to ensure it adheres to community guidelines, legal standards, and platform policies, thereby preventing the dissemination of inappropriate, harmful, or illegal material.


method video_embedding

video_embedding(asset_id: Union[str, Asset], *args, **kwargs) → VideoEmbedding

Video Embedding is a process that transforms video content into a fixed- dimensional vector representation, capturing essential features and patterns to facilitate tasks such as retrieval, classification, and recommendation.


method video_forced_alignment

video_forced_alignment(
asset_id: Union[str, Asset],
*args,
**kwargs
) → VideoForcedAlignment

Video Forced Alignment is a process that synchronizes video footage with corresponding audio tracks by precisely aligning the visual and auditory elements, ensuring that the movements of speakers' lips match the spoken words.


method video_generation

video_generation(asset_id: Union[str, Asset], *args, **kwargs) → VideoGeneration

Video Generation is the process of creating video content through automated or semi-automated means, often utilizing algorithms, artificial intelligence, or software tools to produce visual and audio elements that can range from simple animations to complex, realistic scenes.


method video_label_detection

video_label_detection(
asset_id: Union[str, Asset],
*args,
**kwargs
) → VideoLabelDetection

Video Label Detection is a function that automatically identifies and tags various objects, scenes, activities, and other relevant elements within a video, providing descriptive labels that enhance searchability and content organization.


method video_understanding

video_understanding(
asset_id: Union[str, Asset],
*args,
**kwargs
) → VideoUnderstanding

Video Understanding is the process of analyzing and interpreting video content to extract meaningful information, such as identifying objects, actions, events, and contextual relationships within the footage.


method viseme_generation

viseme_generation(
asset_id: Union[str, Asset],
*args,
**kwargs
) → VisemeGeneration

Viseme Generation is the process of creating visual representations of phonemes, which are the distinct units of sound in speech, to synchronize lip movements with spoken words in animations or virtual avatars.


method visual_question_answering

visual_question_answering(
asset_id: Union[str, Asset],
*args,
**kwargs
) → VisualQuestionAnswering

Visual Question Answering (VQA) is a task in artificial intelligence that involves analyzing an image and providing accurate, contextually relevant answers to questions posed about the visual content of that image.


method voice_activity_detection

voice_activity_detection(
asset_id: Union[str, Asset],
*args,
**kwargs
) → VoiceActivityDetection

Voice Activity Detection (VAD) is a technology that identifies the presence or absence of human speech within an audio signal, enabling systems to distinguish between spoken words and background noise.


method voice_cloning

voice_cloning(asset_id: Union[str, Asset], *args, **kwargs) → VoiceCloning

Voice cloning is a technology that uses artificial intelligence to create a digital replica of a person's voice, allowing for the generation of speech that mimics the tone, pitch, and speaking style of the original speaker.