Skip to main content
Version: 1.0

aixplain.modules.pipeline.pipeline

ObjectDetection Objects

class ObjectDetection(AssetNode[ObjectDetectionInputs,
ObjectDetectionOutputs])

[view_source]

Object Detection is a computer vision technology that identifies and locates objects within an image, typically by drawing bounding boxes around the detected objects and classifying them into predefined categories.

InputType: video OutputType: text

TextEmbedding Objects

class TextEmbedding(AssetNode[TextEmbeddingInputs, TextEmbeddingOutputs])

[view_source]

Text embedding is a process that converts text into numerical vectors, capturing the semantic meaning and contextual relationships of words or phrases, enabling machines to understand and analyze natural language more effectively.

InputType: text OutputType: text

SemanticSegmentation Objects

class SemanticSegmentation(AssetNode[SemanticSegmentationInputs,
SemanticSegmentationOutputs])

[view_source]

Semantic segmentation is a computer vision process that involves classifying each pixel in an image into a predefined category, effectively partitioning the image into meaningful segments based on the objects or regions they represent.

InputType: image OutputType: label

ReferencelessAudioGenerationMetric Objects

class ReferencelessAudioGenerationMetric(
BaseMetric[ReferencelessAudioGenerationMetricInputs,
ReferencelessAudioGenerationMetricOutputs])

[view_source]

The Referenceless Audio Generation Metric is a tool designed to evaluate the quality of generated audio content without the need for a reference or original audio sample for comparison.

InputType: text OutputType: text

ScriptExecution Objects

class ScriptExecution(AssetNode[ScriptExecutionInputs,
ScriptExecutionOutputs])

[view_source]

Script Execution refers to the process of running a set of programmed instructions or code within a computing environment, enabling the automated performance of tasks, calculations, or operations as defined by the script.

InputType: text OutputType: text

ImageImpainting Objects

class ImageImpainting(AssetNode[ImageImpaintingInputs,
ImageImpaintingOutputs])

[view_source]

Image inpainting is a process that involves filling in missing or damaged parts of an image in a way that is visually coherent and seamlessly blends with the surrounding areas, often using advanced algorithms and techniques to restore the image to its original or intended appearance.

InputType: image OutputType: image

ImageEmbedding Objects

class ImageEmbedding(AssetNode[ImageEmbeddingInputs, ImageEmbeddingOutputs])

[view_source]

Image Embedding is a process that transforms an image into a fixed-dimensional vector representation, capturing its essential features and enabling efficient comparison, retrieval, and analysis in various machine learning and computer vision tasks.

InputType: image OutputType: text

MetricAggregation Objects

class MetricAggregation(BaseMetric[MetricAggregationInputs,
MetricAggregationOutputs])

[view_source]

Metric Aggregation is a function that computes and summarizes numerical data by applying statistical operations, such as averaging, summing, or finding the minimum and maximum values, to provide insights and facilitate analysis of large datasets.

InputType: text OutputType: text

SpeechTranslation Objects

class SpeechTranslation(AssetNode[SpeechTranslationInputs,
SpeechTranslationOutputs])

[view_source]

Speech Translation is a technology that converts spoken language in real-time from one language to another, enabling seamless communication between speakers of different languages.

InputType: audio OutputType: text

DepthEstimation Objects

class DepthEstimation(AssetNode[DepthEstimationInputs,
DepthEstimationOutputs])

[view_source]

Depth estimation is a computational process that determines the distance of objects from a viewpoint, typically using visual data from cameras or sensors to create a three-dimensional understanding of a scene.

InputType: image OutputType: text

NoiseRemoval Objects

class NoiseRemoval(AssetNode[NoiseRemovalInputs, NoiseRemovalOutputs])

[view_source]

Noise Removal is a process that involves identifying and eliminating unwanted random variations or disturbances from an audio signal to enhance the clarity and quality of the underlying information.

InputType: audio OutputType: audio

Diacritization Objects

class Diacritization(AssetNode[DiacritizationInputs, DiacritizationOutputs])

[view_source]

Adds diacritical marks to text, essential for languages where meaning can change based on diacritics.

InputType: text OutputType: text

AudioTranscriptAnalysis Objects

class AudioTranscriptAnalysis(AssetNode[AudioTranscriptAnalysisInputs,
AudioTranscriptAnalysisOutputs])

[view_source]

Analyzes transcribed audio data for insights, patterns, or specific information extraction.

InputType: audio OutputType: text

ExtractAudioFromVideo Objects

class ExtractAudioFromVideo(AssetNode[ExtractAudioFromVideoInputs,
ExtractAudioFromVideoOutputs])

[view_source]

Isolates and extracts audio tracks from video files, aiding in audio analysis or transcription tasks.

InputType: video OutputType: audio

AudioReconstruction Objects

class AudioReconstruction(BaseReconstructor[AudioReconstructionInputs,
AudioReconstructionOutputs])

[view_source]

Audio Reconstruction is the process of restoring or recreating audio signals from incomplete, damaged, or degraded recordings to achieve a high-quality, accurate representation of the original sound.

InputType: audio OutputType: audio

ClassificationMetric Objects

class ClassificationMetric(BaseMetric[ClassificationMetricInputs,
ClassificationMetricOutputs])

[view_source]

A Classification Metric is a quantitative measure used to evaluate the quality and effectiveness of classification models.

InputType: text OutputType: text

TextGenerationMetric Objects

class TextGenerationMetric(BaseMetric[TextGenerationMetricInputs,
TextGenerationMetricOutputs])

[view_source]

A Text Generation Metric is a quantitative measure used to evaluate the quality and effectiveness of text produced by natural language processing models, often assessing aspects such as coherence, relevance, fluency, and adherence to given prompts or instructions.

InputType: text OutputType: text

TextSpamDetection Objects

class TextSpamDetection(AssetNode[TextSpamDetectionInputs,
TextSpamDetectionOutputs])

[view_source]

Identifies and filters out unwanted or irrelevant text content, ideal for moderating user-generated content or ensuring quality in communication platforms.

InputType: text OutputType: label

TextToImageGeneration Objects

class TextToImageGeneration(AssetNode[TextToImageGenerationInputs,
TextToImageGenerationOutputs])

[view_source]

Creates a visual representation based on textual input, turning descriptions into pictorial forms. Used in creative processes and content generation.

InputType: text OutputType: image

VoiceCloning Objects

class VoiceCloning(AssetNode[VoiceCloningInputs, VoiceCloningOutputs])

[view_source]

Replicates a person's voice based on a sample, allowing for the generation of speech in that person's tone and style. Used cautiously due to ethical considerations.

InputType: text OutputType: audio

TextSegmenation Objects

class TextSegmenation(AssetNode[TextSegmenationInputs,
TextSegmenationOutputs])

[view_source]

Text Segmentation is the process of dividing a continuous text into meaningful units, such as words, sentences, or topics, to facilitate easier analysis and understanding.

InputType: text OutputType: text

BenchmarkScoringMt Objects

class BenchmarkScoringMt(AssetNode[BenchmarkScoringMtInputs,
BenchmarkScoringMtOutputs])

[view_source]

Benchmark Scoring MT is a function designed to evaluate and score machine translation systems by comparing their output against a set of predefined benchmarks, thereby assessing their accuracy and performance.

InputType: text OutputType: label

ImageManipulation Objects

class ImageManipulation(AssetNode[ImageManipulationInputs,
ImageManipulationOutputs])

[view_source]

Image Manipulation refers to the process of altering or enhancing digital images using various techniques and tools to achieve desired visual effects, correct imperfections, or transform the image's appearance.

InputType: image OutputType: image

NamedEntityRecognition Objects

class NamedEntityRecognition(AssetNode[NamedEntityRecognitionInputs,
NamedEntityRecognitionOutputs])

[view_source]

Identifies and classifies named entities (e.g., persons, organizations, locations) within text. Useful for information extraction, content tagging, and search enhancements.

InputType: text OutputType: label

OffensiveLanguageIdentification Objects

class OffensiveLanguageIdentification(
AssetNode[OffensiveLanguageIdentificationInputs,
OffensiveLanguageIdentificationOutputs])

[view_source]

Detects language or phrases that might be considered offensive, aiding in content moderation and creating respectful user interactions.

InputType: text OutputType: label

Search Objects

class Search(AssetNode[SearchInputs, SearchOutputs])

[view_source]

An algorithm that identifies and returns data or items that match particular keywords or conditions from a dataset. A fundamental tool for databases and websites.

InputType: text OutputType: text

SentimentAnalysis Objects

class SentimentAnalysis(AssetNode[SentimentAnalysisInputs,
SentimentAnalysisOutputs])

[view_source]

Determines the sentiment or emotion (e.g., positive, negative, neutral) of a piece of text, aiding in understanding user feedback or market sentiment.

InputType: text OutputType: label

ImageColorization Objects

class ImageColorization(AssetNode[ImageColorizationInputs,
ImageColorizationOutputs])

[view_source]

Image colorization is a process that involves adding color to grayscale images, transforming them from black-and-white to full-color representations, often using advanced algorithms and machine learning techniques to predict and apply the appropriate hues and shades.

InputType: image OutputType: image

SpeechClassification Objects

class SpeechClassification(AssetNode[SpeechClassificationInputs,
SpeechClassificationOutputs])

[view_source]

Categorizes audio clips based on their content, aiding in content organization and targeted actions.

InputType: audio OutputType: label

DialectDetection Objects

class DialectDetection(AssetNode[DialectDetectionInputs,
DialectDetectionOutputs])

[view_source]

Identifies specific dialects within a language, aiding in localized content creation or user experience personalization.

InputType: audio OutputType: text

VideoLabelDetection Objects

class VideoLabelDetection(AssetNode[VideoLabelDetectionInputs,
VideoLabelDetectionOutputs])

[view_source]

Identifies and tags objects, scenes, or activities within a video. Useful for content indexing and recommendation systems.

InputType: video OutputType: label

SpeechSynthesis Objects

class SpeechSynthesis(AssetNode[SpeechSynthesisInputs,
SpeechSynthesisOutputs])

[view_source]

Generates human-like speech from written text. Ideal for text-to-speech applications, audiobooks, and voice assistants.

InputType: text OutputType: audio

SplitOnSilence Objects

class SplitOnSilence(AssetNode[SplitOnSilenceInputs, SplitOnSilenceOutputs])

[view_source]

The "Split On Silence" function divides an audio recording into separate segments based on periods of silence, allowing for easier editing and analysis of individual sections.

InputType: audio OutputType: audio

ExpressionDetection Objects

class ExpressionDetection(AssetNode[ExpressionDetectionInputs,
ExpressionDetectionOutputs])

[view_source]

Expression Detection is the process of identifying and analyzing facial expressions to interpret emotions or intentions using AI and computer vision techniques.

InputType: text OutputType: label

AutoMaskGeneration Objects

class AutoMaskGeneration(AssetNode[AutoMaskGenerationInputs,
AutoMaskGenerationOutputs])

[view_source]

Auto-mask generation refers to the automated process of creating masks in image processing or computer vision, typically for segmentation tasks. A mask is a binary or multi-class image that labels different parts of an image, usually separating the foreground (objects of interest) from the background, or identifying specific object classes in an image.

InputType: image OutputType: label

DocumentImageParsing Objects

class DocumentImageParsing(AssetNode[DocumentImageParsingInputs,
DocumentImageParsingOutputs])

[view_source]

Document Image Parsing is the process of analyzing and converting scanned or photographed images of documents into structured, machine-readable formats by identifying and extracting text, layout, and other relevant information.

InputType: image OutputType: text

EntityLinking Objects

class EntityLinking(AssetNode[EntityLinkingInputs, EntityLinkingOutputs])

[view_source]

Associates identified entities in the text with specific entries in a knowledge base or database.

InputType: text OutputType: label

ReferencelessTextGenerationMetricDefault Objects

class ReferencelessTextGenerationMetricDefault(
BaseMetric[ReferencelessTextGenerationMetricDefaultInputs,
ReferencelessTextGenerationMetricDefaultOutputs])

[view_source]

The Referenceless Text Generation Metric Default is a function designed to evaluate the quality of generated text without relying on reference texts for comparison.

InputType: text OutputType: text

FillTextMask Objects

class FillTextMask(AssetNode[FillTextMaskInputs, FillTextMaskOutputs])

[view_source]

Completes missing parts of a text based on the context, ideal for content generation or data augmentation tasks.

InputType: text OutputType: text

SubtitlingTranslation Objects

class SubtitlingTranslation(AssetNode[SubtitlingTranslationInputs,
SubtitlingTranslationOutputs])

[view_source]

Converts the text of subtitles from one language to another, ensuring context and cultural nuances are maintained. Essential for global content distribution.

InputType: text OutputType: text

InstanceSegmentation Objects

class InstanceSegmentation(AssetNode[InstanceSegmentationInputs,
InstanceSegmentationOutputs])

[view_source]

Instance segmentation is a computer vision task that involves detecting and delineating each distinct object within an image, assigning a unique label and precise boundary to every individual instance of objects, even if they belong to the same category.

InputType: image OutputType: label

VisemeGeneration Objects

class VisemeGeneration(AssetNode[VisemeGenerationInputs,
VisemeGenerationOutputs])

[view_source]

Viseme Generation is the process of creating visual representations of phonemes, which are the distinct units of sound in speech, to synchronize lip movements with spoken words in animations or virtual avatars.

InputType: text OutputType: label

AudioGenerationMetric Objects

class AudioGenerationMetric(BaseMetric[AudioGenerationMetricInputs,
AudioGenerationMetricOutputs])

[view_source]

The Audio Generation Metric is a quantitative measure used to evaluate the quality, accuracy, and overall performance of audio generated by artificial intelligence systems, often considering factors such as fidelity, intelligibility, and similarity to human-produced audio.

InputType: text OutputType: text

VideoUnderstanding Objects

class VideoUnderstanding(AssetNode[VideoUnderstandingInputs,
VideoUnderstandingOutputs])

[view_source]

Video Understanding is the process of analyzing and interpreting video content to extract meaningful information, such as identifying objects, actions, events, and contextual relationships within the footage.

InputType: video OutputType: text

TextNormalization Objects

class TextNormalization(AssetNode[TextNormalizationInputs,
TextNormalizationOutputs])

[view_source]

Converts unstructured or non-standard textual data into a more readable and uniform format, dealing with abbreviations, numerals, and other non-standard words.

InputType: text OutputType: label

AsrQualityEstimation Objects

class AsrQualityEstimation(AssetNode[AsrQualityEstimationInputs,
AsrQualityEstimationOutputs])

[view_source]

ASR Quality Estimation is a process that evaluates the accuracy and reliability of automatic speech recognition systems by analyzing their performance in transcribing spoken language into text.

InputType: text OutputType: label

VoiceActivityDetection Objects

class VoiceActivityDetection(BaseSegmentor[VoiceActivityDetectionInputs,
VoiceActivityDetectionOutputs])

[view_source]

Determines when a person is speaking in an audio clip. It's an essential preprocessing step for other audio-related tasks.

InputType: audio OutputType: audio

SpeechNonSpeechClassification Objects

class SpeechNonSpeechClassification(
AssetNode[SpeechNonSpeechClassificationInputs,
SpeechNonSpeechClassificationOutputs])

[view_source]

Differentiates between speech and non-speech audio segments. Great for editing software and transcription services to exclude irrelevant audio.

InputType: audio OutputType: label

AudioTranscriptImprovement Objects

class AudioTranscriptImprovement(AssetNode[AudioTranscriptImprovementInputs,
AudioTranscriptImprovementOutputs])

[view_source]

Refines and corrects transcriptions generated from audio data, improving readability and accuracy.

InputType: audio OutputType: text

TextContentModeration Objects

class TextContentModeration(AssetNode[TextContentModerationInputs,
TextContentModerationOutputs])

[view_source]

Scans and identifies potentially harmful, offensive, or inappropriate textual content, ensuring safer user environments.

InputType: text OutputType: label

EmotionDetection Objects

class EmotionDetection(AssetNode[EmotionDetectionInputs,
EmotionDetectionOutputs])

[view_source]

Identifies human emotions from text or audio, enhancing user experience in chatbots or customer feedback analysis.

InputType: text OutputType: label

AudioForcedAlignment Objects

class AudioForcedAlignment(AssetNode[AudioForcedAlignmentInputs,
AudioForcedAlignmentOutputs])

[view_source]

Synchronizes phonetic and phonological text with the corresponding segments in an audio file. Useful in linguistic research and detailed transcription tasks.

InputType: audio OutputType: audio

VideoContentModeration Objects

class VideoContentModeration(AssetNode[VideoContentModerationInputs,
VideoContentModerationOutputs])

[view_source]

Automatically reviews video content to detect and possibly remove inappropriate or harmful material. Essential for user-generated content platforms.

InputType: video OutputType: label

ImageLabelDetection Objects

class ImageLabelDetection(AssetNode[ImageLabelDetectionInputs,
ImageLabelDetectionOutputs])

[view_source]

Identifies objects, themes, or topics within images, useful for image categorization, search, and recommendation systems.

InputType: image OutputType: label

VideoForcedAlignment Objects

class VideoForcedAlignment(AssetNode[VideoForcedAlignmentInputs,
VideoForcedAlignmentOutputs])

[view_source]

Aligns the transcription of spoken content in a video with its corresponding timecodes, facilitating subtitle creation.

InputType: video OutputType: video

TextGeneration Objects

class TextGeneration(AssetNode[TextGenerationInputs, TextGenerationOutputs])

[view_source]

Creates coherent and contextually relevant textual content based on prompts or certain parameters. Useful for chatbots, content creation, and data augmentation.

InputType: text OutputType: text

TextClassification Objects

class TextClassification(AssetNode[TextClassificationInputs,
TextClassificationOutputs])

[view_source]

Categorizes text into predefined groups or topics, facilitating content organization and targeted actions.

InputType: text OutputType: label

SpeechEmbedding Objects

class SpeechEmbedding(AssetNode[SpeechEmbeddingInputs,
SpeechEmbeddingOutputs])

[view_source]

Transforms spoken content into a fixed-size vector in a high-dimensional space that captures the content's essence. Facilitates tasks like speech recognition and speaker verification.

InputType: audio OutputType: text

TopicClassification Objects

class TopicClassification(AssetNode[TopicClassificationInputs,
TopicClassificationOutputs])

[view_source]

Assigns categories or topics to a piece of text based on its content, facilitating content organization and retrieval.

InputType: text OutputType: label

Translation Objects

class Translation(AssetNode[TranslationInputs, TranslationOutputs])

[view_source]

Converts text from one language to another while maintaining the original message's essence and context. Crucial for global communication.

InputType: text OutputType: text

SpeechRecognition Objects

class SpeechRecognition(AssetNode[SpeechRecognitionInputs,
SpeechRecognitionOutputs])

[view_source]

Converts spoken language into written text. Useful for transcription services, voice assistants, and applications requiring voice-to-text capabilities.

InputType: audio OutputType: text

Subtitling Objects

class Subtitling(AssetNode[SubtitlingInputs, SubtitlingOutputs])

[view_source]

Generates accurate subtitles for videos, enhancing accessibility for diverse audiences.

InputType: audio OutputType: text

ImageCaptioning Objects

class ImageCaptioning(AssetNode[ImageCaptioningInputs,
ImageCaptioningOutputs])

[view_source]

Image Captioning is a process that involves generating a textual description of an image, typically using machine learning models to analyze the visual content and produce coherent and contextually relevant sentences that describe the objects, actions, and scenes depicted in the image.

InputType: image OutputType: text

AudioLanguageIdentification Objects

class AudioLanguageIdentification(AssetNode[AudioLanguageIdentificationInputs,
AudioLanguageIdentificationOutputs]
)

[view_source]

Audio Language Identification is a process that involves analyzing an audio recording to determine the language being spoken.

InputType: audio OutputType: label

VideoEmbedding Objects

class VideoEmbedding(AssetNode[VideoEmbeddingInputs, VideoEmbeddingOutputs])

[view_source]

Video Embedding is a process that transforms video content into a fixed- dimensional vector representation, capturing essential features and patterns to facilitate tasks such as retrieval, classification, and recommendation.

InputType: video OutputType: embedding

AsrAgeClassification Objects

class AsrAgeClassification(AssetNode[AsrAgeClassificationInputs,
AsrAgeClassificationOutputs])

[view_source]

The ASR Age Classification function is designed to analyze audio recordings of speech to determine the speaker's age group by leveraging automatic speech recognition (ASR) technology and machine learning algorithms.

InputType: audio OutputType: label

AudioIntentDetection Objects

class AudioIntentDetection(AssetNode[AudioIntentDetectionInputs,
AudioIntentDetectionOutputs])

[view_source]

Audio Intent Detection is a process that involves analyzing audio signals to identify and interpret the underlying intentions or purposes behind spoken words, enabling systems to understand and respond appropriately to human speech.

InputType: audio OutputType: label

LanguageIdentification Objects

class LanguageIdentification(AssetNode[LanguageIdentificationInputs,
LanguageIdentificationOutputs])

[view_source]

Detects the language in which a given text is written, aiding in multilingual platforms or content localization.

InputType: text OutputType: text

Ocr Objects

class Ocr(AssetNode[OcrInputs, OcrOutputs])

[view_source]

Converts images of typed, handwritten, or printed text into machine-encoded text. Used in digitizing printed texts for data retrieval.

InputType: image OutputType: text

AsrGenderClassification Objects

class AsrGenderClassification(AssetNode[AsrGenderClassificationInputs,
AsrGenderClassificationOutputs])

[view_source]

The ASR Gender Classification function analyzes audio recordings to determine and classify the speaker's gender based on their voice characteristics.

InputType: audio OutputType: label

LanguageIdentificationAudio Objects

class LanguageIdentificationAudio(AssetNode[LanguageIdentificationAudioInputs,
LanguageIdentificationAudioOutputs]
)

[view_source]

The Language Identification Audio function analyzes audio input to determine and identify the language being spoken.

InputType: audio OutputType: label

BaseModel Objects

class BaseModel(AssetNode[BaseModelInputs, BaseModelOutputs])

[view_source]

The Base-Model function serves as a foundational framework designed to provide essential features and capabilities upon which more specialized or advanced models can be built and customized.

InputType: text OutputType: text

Loglikelihood Objects

class Loglikelihood(AssetNode[LoglikelihoodInputs, LoglikelihoodOutputs])

[view_source]

The Log Likelihood function measures the probability of observing the given data under a specific statistical model by taking the natural logarithm of the likelihood function, thereby transforming the product of probabilities into a sum, which simplifies the process of optimization and parameter estimation.

InputType: text OutputType: number

ImageToVideoGeneration Objects

class ImageToVideoGeneration(AssetNode[ImageToVideoGenerationInputs,
ImageToVideoGenerationOutputs])

[view_source]

The Image To Video Generation function transforms a series of static images into a cohesive, dynamic video sequence, often incorporating transitions, effects, and synchronization with audio to create a visually engaging narrative.

InputType: image OutputType: video

PartOfSpeechTagging Objects

class PartOfSpeechTagging(AssetNode[PartOfSpeechTaggingInputs,
PartOfSpeechTaggingOutputs])

[view_source]

Part of Speech Tagging is a natural language processing task that involves assigning each word in a sentence its corresponding part of speech, such as noun, verb, adjective, or adverb, based on its role and context within the sentence.

InputType: text OutputType: label

BenchmarkScoringAsr Objects

class BenchmarkScoringAsr(AssetNode[BenchmarkScoringAsrInputs,
BenchmarkScoringAsrOutputs])

[view_source]

Benchmark Scoring ASR is a function that evaluates and compares the performance of automatic speech recognition systems by analyzing their accuracy, speed, and other relevant metrics against a standardized set of benchmarks.

InputType: audio OutputType: label

VisualQuestionAnswering Objects

class VisualQuestionAnswering(AssetNode[VisualQuestionAnsweringInputs,
VisualQuestionAnsweringOutputs])

[view_source]

Visual Question Answering (VQA) is a task in artificial intelligence that involves analyzing an image and providing accurate, contextually relevant answers to questions posed about the visual content of that image.

InputType: image OutputType: video

DocumentInformationExtraction Objects

class DocumentInformationExtraction(
AssetNode[DocumentInformationExtractionInputs,
DocumentInformationExtractionOutputs])

[view_source]

Document Information Extraction is the process of automatically identifying, extracting, and structuring relevant data from unstructured or semi-structured documents, such as invoices, receipts, contracts, and forms, to facilitate easier data management and analysis.

InputType: image OutputType: text

VideoGeneration Objects

class VideoGeneration(AssetNode[VideoGenerationInputs,
VideoGenerationOutputs])

[view_source]

Produces video content based on specific inputs or datasets. Can be used for simulations, animations, or even deepfake detection.

InputType: text OutputType: video

MultiClassImageClassification Objects

class MultiClassImageClassification(
AssetNode[MultiClassImageClassificationInputs,
MultiClassImageClassificationOutputs])

[view_source]

Multi Class Image Classification is a machine learning task where an algorithm is trained to categorize images into one of several predefined classes or categories based on their visual content.

InputType: image OutputType: label

StyleTransfer Objects

class StyleTransfer(AssetNode[StyleTransferInputs, StyleTransferOutputs])

[view_source]

Style Transfer is a technique in artificial intelligence that applies the visual style of one image (such as the brushstrokes of a famous painting) to the content of another image, effectively blending the artistic elements of the first image with the subject matter of the second.

InputType: image OutputType: image

MultiClassTextClassification Objects

class MultiClassTextClassification(
AssetNode[MultiClassTextClassificationInputs,
MultiClassTextClassificationOutputs])

[view_source]

Multi Class Text Classification is a natural language processing task that involves categorizing a given text into one of several predefined classes or categories based on its content.

InputType: text OutputType: label

IntentClassification Objects

class IntentClassification(AssetNode[IntentClassificationInputs,
IntentClassificationOutputs])

[view_source]

Intent Classification is a natural language processing task that involves analyzing and categorizing user text input to determine the underlying purpose or goal behind the communication, such as booking a flight, asking for weather information, or setting a reminder.

InputType: text OutputType: label

MultiLabelTextClassification Objects

class MultiLabelTextClassification(
AssetNode[MultiLabelTextClassificationInputs,
MultiLabelTextClassificationOutputs])

[view_source]

Multi Label Text Classification is a natural language processing task where a given text is analyzed and assigned multiple relevant labels or categories from a predefined set, allowing for the text to belong to more than one category simultaneously.

InputType: text OutputType: label

TextReconstruction Objects

class TextReconstruction(BaseReconstructor[TextReconstructionInputs,
TextReconstructionOutputs])

[view_source]

Text Reconstruction is a process that involves piecing together fragmented or incomplete text data to restore it to its original, coherent form.

InputType: text OutputType: text

FactChecking Objects

class FactChecking(AssetNode[FactCheckingInputs, FactCheckingOutputs])

[view_source]

Fact Checking is the process of verifying the accuracy and truthfulness of information, statements, or claims by cross-referencing with reliable sources and evidence.

InputType: text OutputType: label

InverseTextNormalization Objects

class InverseTextNormalization(AssetNode[InverseTextNormalizationInputs,
InverseTextNormalizationOutputs])

[view_source]

Inverse Text Normalization is the process of converting spoken or written language in its normalized form, such as numbers, dates, and abbreviations, back into their original, more complex or detailed textual representations.

InputType: text OutputType: label

TextToAudio Objects

class TextToAudio(AssetNode[TextToAudioInputs, TextToAudioOutputs])

[view_source]

The Text to Audio function converts written text into spoken words, allowing users to listen to the content instead of reading it.

InputType: text OutputType: audio

ImageCompression Objects

class ImageCompression(AssetNode[ImageCompressionInputs,
ImageCompressionOutputs])

[view_source]

Reduces the size of image files without significantly compromising their visual quality. Useful for optimizing storage and improving webpage load times.

InputType: image OutputType: image

MultilingualSpeechRecognition Objects

class MultilingualSpeechRecognition(
AssetNode[MultilingualSpeechRecognitionInputs,
MultilingualSpeechRecognitionOutputs])

[view_source]

Multilingual Speech Recognition is a technology that enables the automatic transcription of spoken language into text across multiple languages, allowing for seamless communication and understanding in diverse linguistic contexts.

InputType: audio OutputType: text

TextGenerationMetricDefault Objects

class TextGenerationMetricDefault(
BaseMetric[TextGenerationMetricDefaultInputs,
TextGenerationMetricDefaultOutputs])

[view_source]

The "Text Generation Metric Default" function provides a standard set of evaluation metrics for assessing the quality and performance of text generation models.

InputType: text OutputType: text

ReferencelessTextGenerationMetric Objects

class ReferencelessTextGenerationMetric(
BaseMetric[ReferencelessTextGenerationMetricInputs,
ReferencelessTextGenerationMetricOutputs])

[view_source]

The Referenceless Text Generation Metric is a method for evaluating the quality of generated text without requiring a reference text for comparison, often leveraging models or algorithms to assess coherence, relevance, and fluency based on intrinsic properties of the text itself.

InputType: text OutputType: text

AudioEmotionDetection Objects

class AudioEmotionDetection(AssetNode[AudioEmotionDetectionInputs,
AudioEmotionDetectionOutputs])

[view_source]

Audio Emotion Detection is a technology that analyzes vocal characteristics and patterns in audio recordings to identify and classify the emotional state of the speaker.

InputType: audio OutputType: label

KeywordSpotting Objects

class KeywordSpotting(AssetNode[KeywordSpottingInputs,
KeywordSpottingOutputs])

[view_source]

Keyword Spotting is a function that enables the detection and identification of specific words or phrases within a stream of audio, often used in voice- activated systems to trigger actions or commands based on recognized keywords.

InputType: audio OutputType: label

TextSummarization Objects

class TextSummarization(AssetNode[TextSummarizationInputs,
TextSummarizationOutputs])

[view_source]

Extracts the main points from a larger body of text, producing a concise summary without losing the primary message.

InputType: text OutputType: text

SplitOnLinebreak Objects

class SplitOnLinebreak(BaseSegmentor[SplitOnLinebreakInputs,
SplitOnLinebreakOutputs])

[view_source]

The "Split On Linebreak" function divides a given string into a list of substrings, using linebreaks (newline characters) as the points of separation.

InputType: text OutputType: text

OtherMultipurpose Objects

class OtherMultipurpose(AssetNode[OtherMultipurposeInputs,
OtherMultipurposeOutputs])

[view_source]

The "Other (Multipurpose)" function serves as a versatile category designed to accommodate a wide range of tasks and activities that do not fit neatly into predefined classifications, offering flexibility and adaptability for various needs.

InputType: text OutputType: text

SpeakerDiarizationAudio Objects

class SpeakerDiarizationAudio(BaseSegmentor[SpeakerDiarizationAudioInputs,
SpeakerDiarizationAudioOutputs])

[view_source]

Identifies individual speakers and their respective speech segments within an audio clip. Ideal for multi-speaker recordings or conference calls.

InputType: audio OutputType: label

ImageContentModeration Objects

class ImageContentModeration(AssetNode[ImageContentModerationInputs,
ImageContentModerationOutputs])

[view_source]

Detects and filters out inappropriate or harmful images, essential for platforms with user-generated visual content.

InputType: image OutputType: label

TextDenormalization Objects

class TextDenormalization(AssetNode[TextDenormalizationInputs,
TextDenormalizationOutputs])

[view_source]

Converts standardized or normalized text into its original, often more readable, form. Useful in natural language generation tasks.

InputType: text OutputType: label

SpeakerDiarizationVideo Objects

class SpeakerDiarizationVideo(AssetNode[SpeakerDiarizationVideoInputs,
SpeakerDiarizationVideoOutputs])

[view_source]

Segments a video based on different speakers, identifying when each individual speaks. Useful for transcriptions and understanding multi-person conversations.

InputType: video OutputType: label

TextToVideoGeneration Objects

class TextToVideoGeneration(AssetNode[TextToVideoGenerationInputs,
TextToVideoGenerationOutputs])

[view_source]

Text To Video Generation is a process that converts written descriptions or scripts into dynamic, visual video content using advanced algorithms and artificial intelligence.

InputType: text OutputType: video

Pipeline Objects

class Pipeline(DefaultPipeline)

[view_source]

object_detection

def object_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ObjectDetection

[view_source]

Object Detection is a computer vision technology that identifies and locates objects within an image, typically by drawing bounding boxes around the detected objects and classifying them into predefined categories.

text_embedding

def text_embedding(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextEmbedding

[view_source]

Text embedding is a process that converts text into numerical vectors, capturing the semantic meaning and contextual relationships of words or phrases, enabling machines to understand and analyze natural language more effectively.

semantic_segmentation

def semantic_segmentation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SemanticSegmentation

[view_source]

Semantic segmentation is a computer vision process that involves classifying each pixel in an image into a predefined category, effectively partitioning the image into meaningful segments based on the objects or regions they represent.

referenceless_audio_generation_metric

def referenceless_audio_generation_metric(
asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ReferencelessAudioGenerationMetric

[view_source]

The Referenceless Audio Generation Metric is a tool designed to evaluate the quality of generated audio content without the need for a reference or original audio sample for comparison.

script_execution

def script_execution(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ScriptExecution

[view_source]

Script Execution refers to the process of running a set of programmed instructions or code within a computing environment, enabling the automated performance of tasks, calculations, or operations as defined by the script.

image_impainting

def image_impainting(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageImpainting

[view_source]

Image inpainting is a process that involves filling in missing or damaged parts of an image in a way that is visually coherent and seamlessly blends with the surrounding areas, often using advanced algorithms and techniques to restore the image to its original or intended appearance.

image_embedding

def image_embedding(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageEmbedding

[view_source]

Image Embedding is a process that transforms an image into a fixed-dimensional vector representation, capturing its essential features and enabling efficient comparison, retrieval, and analysis in various machine learning and computer vision tasks.

metric_aggregation

def metric_aggregation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> MetricAggregation

[view_source]

Metric Aggregation is a function that computes and summarizes numerical data by applying statistical operations, such as averaging, summing, or finding the minimum and maximum values, to provide insights and facilitate analysis of large datasets.

speech_translation

def speech_translation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeechTranslation

[view_source]

Speech Translation is a technology that converts spoken language in real-time from one language to another, enabling seamless communication between speakers of different languages.

depth_estimation

def depth_estimation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> DepthEstimation

[view_source]

Depth estimation is a computational process that determines the distance of objects from a viewpoint, typically using visual data from cameras or sensors to create a three-dimensional understanding of a scene.

noise_removal

def noise_removal(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> NoiseRemoval

[view_source]

Noise Removal is a process that involves identifying and eliminating unwanted random variations or disturbances from an audio signal to enhance the clarity and quality of the underlying information.

diacritization

def diacritization(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> Diacritization

[view_source]

Adds diacritical marks to text, essential for languages where meaning can change based on diacritics.

audio_transcript_analysis

def audio_transcript_analysis(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioTranscriptAnalysis

[view_source]

Analyzes transcribed audio data for insights, patterns, or specific information extraction.

extract_audio_from_video

def extract_audio_from_video(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ExtractAudioFromVideo

[view_source]

Isolates and extracts audio tracks from video files, aiding in audio analysis or transcription tasks.

audio_reconstruction

def audio_reconstruction(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioReconstruction

[view_source]

Audio Reconstruction is the process of restoring or recreating audio signals from incomplete, damaged, or degraded recordings to achieve a high-quality, accurate representation of the original sound.

classification_metric

def classification_metric(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ClassificationMetric

[view_source]

A Classification Metric is a quantitative measure used to evaluate the quality and effectiveness of classification models.

text_generation_metric

def text_generation_metric(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextGenerationMetric

[view_source]

A Text Generation Metric is a quantitative measure used to evaluate the quality and effectiveness of text produced by natural language processing models, often assessing aspects such as coherence, relevance, fluency, and adherence to given prompts or instructions.

text_spam_detection

def text_spam_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextSpamDetection

[view_source]

Identifies and filters out unwanted or irrelevant text content, ideal for moderating user-generated content or ensuring quality in communication platforms.

text_to_image_generation

def text_to_image_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextToImageGeneration

[view_source]

Creates a visual representation based on textual input, turning descriptions into pictorial forms. Used in creative processes and content generation.

voice_cloning

def voice_cloning(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VoiceCloning

[view_source]

Replicates a person's voice based on a sample, allowing for the generation of speech in that person's tone and style. Used cautiously due to ethical considerations.

text_segmenation

def text_segmenation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextSegmenation

[view_source]

Text Segmentation is the process of dividing a continuous text into meaningful units, such as words, sentences, or topics, to facilitate easier analysis and understanding.

benchmark_scoring_mt

def benchmark_scoring_mt(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> BenchmarkScoringMt

[view_source]

Benchmark Scoring MT is a function designed to evaluate and score machine translation systems by comparing their output against a set of predefined benchmarks, thereby assessing their accuracy and performance.

image_manipulation

def image_manipulation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageManipulation

[view_source]

Image Manipulation refers to the process of altering or enhancing digital images using various techniques and tools to achieve desired visual effects, correct imperfections, or transform the image's appearance.

named_entity_recognition

def named_entity_recognition(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> NamedEntityRecognition

[view_source]

Identifies and classifies named entities (e.g., persons, organizations, locations) within text. Useful for information extraction, content tagging, and search enhancements.

offensive_language_identification

def offensive_language_identification(
asset_id: Union[str, asset.Asset], *args,
**kwargs) -> OffensiveLanguageIdentification

[view_source]

Detects language or phrases that might be considered offensive, aiding in content moderation and creating respectful user interactions.

def search(asset_id: Union[str, asset.Asset], *args, **kwargs) -> Search

[view_source]

An algorithm that identifies and returns data or items that match particular keywords or conditions from a dataset. A fundamental tool for databases and websites.

sentiment_analysis

def sentiment_analysis(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SentimentAnalysis

[view_source]

Determines the sentiment or emotion (e.g., positive, negative, neutral) of a piece of text, aiding in understanding user feedback or market sentiment.

image_colorization

def image_colorization(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageColorization

[view_source]

Image colorization is a process that involves adding color to grayscale images, transforming them from black-and-white to full-color representations, often using advanced algorithms and machine learning techniques to predict and apply the appropriate hues and shades.

speech_classification

def speech_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeechClassification

[view_source]

Categorizes audio clips based on their content, aiding in content organization and targeted actions.

dialect_detection

def dialect_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> DialectDetection

[view_source]

Identifies specific dialects within a language, aiding in localized content creation or user experience personalization.

video_label_detection

def video_label_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VideoLabelDetection

[view_source]

Identifies and tags objects, scenes, or activities within a video. Useful for content indexing and recommendation systems.

speech_synthesis

def speech_synthesis(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeechSynthesis

[view_source]

Generates human-like speech from written text. Ideal for text-to-speech applications, audiobooks, and voice assistants.

split_on_silence

def split_on_silence(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SplitOnSilence

[view_source]

The "Split On Silence" function divides an audio recording into separate segments based on periods of silence, allowing for easier editing and analysis of individual sections.

expression_detection

def expression_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ExpressionDetection

[view_source]

Expression Detection is the process of identifying and analyzing facial expressions to interpret emotions or intentions using AI and computer vision techniques.

auto_mask_generation

def auto_mask_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AutoMaskGeneration

[view_source]

Auto-mask generation refers to the automated process of creating masks in image processing or computer vision, typically for segmentation tasks. A mask is a binary or multi-class image that labels different parts of an image, usually separating the foreground (objects of interest) from the background, or identifying specific object classes in an image.

document_image_parsing

def document_image_parsing(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> DocumentImageParsing

[view_source]

Document Image Parsing is the process of analyzing and converting scanned or photographed images of documents into structured, machine-readable formats by identifying and extracting text, layout, and other relevant information.

entity_linking

def entity_linking(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> EntityLinking

[view_source]

Associates identified entities in the text with specific entries in a knowledge base or database.

referenceless_text_generation_metric_default

def referenceless_text_generation_metric_default(
asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ReferencelessTextGenerationMetricDefault

[view_source]

The Referenceless Text Generation Metric Default is a function designed to evaluate the quality of generated text without relying on reference texts for comparison.

fill_text_mask

def fill_text_mask(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> FillTextMask

[view_source]

Completes missing parts of a text based on the context, ideal for content generation or data augmentation tasks.

subtitling_translation

def subtitling_translation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SubtitlingTranslation

[view_source]

Converts the text of subtitles from one language to another, ensuring context and cultural nuances are maintained. Essential for global content distribution.

instance_segmentation

def instance_segmentation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> InstanceSegmentation

[view_source]

Instance segmentation is a computer vision task that involves detecting and delineating each distinct object within an image, assigning a unique label and precise boundary to every individual instance of objects, even if they belong to the same category.

viseme_generation

def viseme_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VisemeGeneration

[view_source]

Viseme Generation is the process of creating visual representations of phonemes, which are the distinct units of sound in speech, to synchronize lip movements with spoken words in animations or virtual avatars.

audio_generation_metric

def audio_generation_metric(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioGenerationMetric

[view_source]

The Audio Generation Metric is a quantitative measure used to evaluate the quality, accuracy, and overall performance of audio generated by artificial intelligence systems, often considering factors such as fidelity, intelligibility, and similarity to human-produced audio.

video_understanding

def video_understanding(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VideoUnderstanding

[view_source]

Video Understanding is the process of analyzing and interpreting video content to extract meaningful information, such as identifying objects, actions, events, and contextual relationships within the footage.

text_normalization

def text_normalization(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextNormalization

[view_source]

Converts unstructured or non-standard textual data into a more readable and uniform format, dealing with abbreviations, numerals, and other non-standard words.

asr_quality_estimation

def asr_quality_estimation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AsrQualityEstimation

[view_source]

ASR Quality Estimation is a process that evaluates the accuracy and reliability of automatic speech recognition systems by analyzing their performance in transcribing spoken language into text.

voice_activity_detection

def voice_activity_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VoiceActivityDetection

[view_source]

Determines when a person is speaking in an audio clip. It's an essential preprocessing step for other audio-related tasks.

speech_non_speech_classification

def speech_non_speech_classification(
asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeechNonSpeechClassification

[view_source]

Differentiates between speech and non-speech audio segments. Great for editing software and transcription services to exclude irrelevant audio.

audio_transcript_improvement

def audio_transcript_improvement(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioTranscriptImprovement

[view_source]

Refines and corrects transcriptions generated from audio data, improving readability and accuracy.

text_content_moderation

def text_content_moderation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextContentModeration

[view_source]

Scans and identifies potentially harmful, offensive, or inappropriate textual content, ensuring safer user environments.

emotion_detection

def emotion_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> EmotionDetection

[view_source]

Identifies human emotions from text or audio, enhancing user experience in chatbots or customer feedback analysis.

audio_forced_alignment

def audio_forced_alignment(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioForcedAlignment

[view_source]

Synchronizes phonetic and phonological text with the corresponding segments in an audio file. Useful in linguistic research and detailed transcription tasks.

video_content_moderation

def video_content_moderation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VideoContentModeration

[view_source]

Automatically reviews video content to detect and possibly remove inappropriate or harmful material. Essential for user-generated content platforms.

image_label_detection

def image_label_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageLabelDetection

[view_source]

Identifies objects, themes, or topics within images, useful for image categorization, search, and recommendation systems.

video_forced_alignment

def video_forced_alignment(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VideoForcedAlignment

[view_source]

Aligns the transcription of spoken content in a video with its corresponding timecodes, facilitating subtitle creation.

text_generation

def text_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextGeneration

[view_source]

Creates coherent and contextually relevant textual content based on prompts or certain parameters. Useful for chatbots, content creation, and data augmentation.

text_classification

def text_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextClassification

[view_source]

Categorizes text into predefined groups or topics, facilitating content organization and targeted actions.

speech_embedding

def speech_embedding(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeechEmbedding

[view_source]

Transforms spoken content into a fixed-size vector in a high-dimensional space that captures the content's essence. Facilitates tasks like speech recognition and speaker verification.

topic_classification

def topic_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TopicClassification

[view_source]

Assigns categories or topics to a piece of text based on its content, facilitating content organization and retrieval.

translation

def translation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> Translation

[view_source]

Converts text from one language to another while maintaining the original message's essence and context. Crucial for global communication.

speech_recognition

def speech_recognition(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeechRecognition

[view_source]

Converts spoken language into written text. Useful for transcription services, voice assistants, and applications requiring voice-to-text capabilities.

subtitling

def subtitling(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> Subtitling

[view_source]

Generates accurate subtitles for videos, enhancing accessibility for diverse audiences.

image_captioning

def image_captioning(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageCaptioning

[view_source]

Image Captioning is a process that involves generating a textual description of an image, typically using machine learning models to analyze the visual content and produce coherent and contextually relevant sentences that describe the objects, actions, and scenes depicted in the image.

audio_language_identification

def audio_language_identification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioLanguageIdentification

[view_source]

Audio Language Identification is a process that involves analyzing an audio recording to determine the language being spoken.

video_embedding

def video_embedding(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VideoEmbedding

[view_source]

Video Embedding is a process that transforms video content into a fixed- dimensional vector representation, capturing essential features and patterns to facilitate tasks such as retrieval, classification, and recommendation.

asr_age_classification

def asr_age_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AsrAgeClassification

[view_source]

The ASR Age Classification function is designed to analyze audio recordings of speech to determine the speaker's age group by leveraging automatic speech recognition (ASR) technology and machine learning algorithms.

audio_intent_detection

def audio_intent_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioIntentDetection

[view_source]

Audio Intent Detection is a process that involves analyzing audio signals to identify and interpret the underlying intentions or purposes behind spoken words, enabling systems to understand and respond appropriately to human speech.

language_identification

def language_identification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> LanguageIdentification

[view_source]

Detects the language in which a given text is written, aiding in multilingual platforms or content localization.

ocr

def ocr(asset_id: Union[str, asset.Asset], *args, **kwargs) -> Ocr

[view_source]

Converts images of typed, handwritten, or printed text into machine-encoded text. Used in digitizing printed texts for data retrieval.

asr_gender_classification

def asr_gender_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AsrGenderClassification

[view_source]

The ASR Gender Classification function analyzes audio recordings to determine and classify the speaker's gender based on their voice characteristics.

language_identification_audio

def language_identification_audio(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> LanguageIdentificationAudio

[view_source]

The Language Identification Audio function analyzes audio input to determine and identify the language being spoken.

base_model

def base_model(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> BaseModel

[view_source]

The Base-Model function serves as a foundational framework designed to provide essential features and capabilities upon which more specialized or advanced models can be built and customized.

loglikelihood

def loglikelihood(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> Loglikelihood

[view_source]

The Log Likelihood function measures the probability of observing the given data under a specific statistical model by taking the natural logarithm of the likelihood function, thereby transforming the product of probabilities into a sum, which simplifies the process of optimization and parameter estimation.

image_to_video_generation

def image_to_video_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageToVideoGeneration

[view_source]

The Image To Video Generation function transforms a series of static images into a cohesive, dynamic video sequence, often incorporating transitions, effects, and synchronization with audio to create a visually engaging narrative.

part_of_speech_tagging

def part_of_speech_tagging(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> PartOfSpeechTagging

[view_source]

Part of Speech Tagging is a natural language processing task that involves assigning each word in a sentence its corresponding part of speech, such as noun, verb, adjective, or adverb, based on its role and context within the sentence.

benchmark_scoring_asr

def benchmark_scoring_asr(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> BenchmarkScoringAsr

[view_source]

Benchmark Scoring ASR is a function that evaluates and compares the performance of automatic speech recognition systems by analyzing their accuracy, speed, and other relevant metrics against a standardized set of benchmarks.

visual_question_answering

def visual_question_answering(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VisualQuestionAnswering

[view_source]

Visual Question Answering (VQA) is a task in artificial intelligence that involves analyzing an image and providing accurate, contextually relevant answers to questions posed about the visual content of that image.

document_information_extraction

def document_information_extraction(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> DocumentInformationExtraction

[view_source]

Document Information Extraction is the process of automatically identifying, extracting, and structuring relevant data from unstructured or semi-structured documents, such as invoices, receipts, contracts, and forms, to facilitate easier data management and analysis.

video_generation

def video_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VideoGeneration

[view_source]

Produces video content based on specific inputs or datasets. Can be used for simulations, animations, or even deepfake detection.

multi_class_image_classification

def multi_class_image_classification(
asset_id: Union[str, asset.Asset], *args,
**kwargs) -> MultiClassImageClassification

[view_source]

Multi Class Image Classification is a machine learning task where an algorithm is trained to categorize images into one of several predefined classes or categories based on their visual content.

style_transfer

def style_transfer(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> StyleTransfer

[view_source]

Style Transfer is a technique in artificial intelligence that applies the visual style of one image (such as the brushstrokes of a famous painting) to the content of another image, effectively blending the artistic elements of the first image with the subject matter of the second.

multi_class_text_classification

def multi_class_text_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> MultiClassTextClassification

[view_source]

Multi Class Text Classification is a natural language processing task that involves categorizing a given text into one of several predefined classes or categories based on its content.

intent_classification

def intent_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> IntentClassification

[view_source]

Intent Classification is a natural language processing task that involves analyzing and categorizing user text input to determine the underlying purpose or goal behind the communication, such as booking a flight, asking for weather information, or setting a reminder.

multi_label_text_classification

def multi_label_text_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> MultiLabelTextClassification

[view_source]

Multi Label Text Classification is a natural language processing task where a given text is analyzed and assigned multiple relevant labels or categories from a predefined set, allowing for the text to belong to more than one category simultaneously.

text_reconstruction

def text_reconstruction(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextReconstruction

[view_source]

Text Reconstruction is a process that involves piecing together fragmented or incomplete text data to restore it to its original, coherent form.

fact_checking

def fact_checking(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> FactChecking

[view_source]

Fact Checking is the process of verifying the accuracy and truthfulness of information, statements, or claims by cross-referencing with reliable sources and evidence.

inverse_text_normalization

def inverse_text_normalization(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> InverseTextNormalization

[view_source]

Inverse Text Normalization is the process of converting spoken or written language in its normalized form, such as numbers, dates, and abbreviations, back into their original, more complex or detailed textual representations.

text_to_audio

def text_to_audio(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextToAudio

[view_source]

The Text to Audio function converts written text into spoken words, allowing users to listen to the content instead of reading it.

image_compression

def image_compression(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageCompression

[view_source]

Reduces the size of image files without significantly compromising their visual quality. Useful for optimizing storage and improving webpage load times.

multilingual_speech_recognition

def multilingual_speech_recognition(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> MultilingualSpeechRecognition

[view_source]

Multilingual Speech Recognition is a technology that enables the automatic transcription of spoken language into text across multiple languages, allowing for seamless communication and understanding in diverse linguistic contexts.

text_generation_metric_default

def text_generation_metric_default(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextGenerationMetricDefault

[view_source]

The "Text Generation Metric Default" function provides a standard set of evaluation metrics for assessing the quality and performance of text generation models.

referenceless_text_generation_metric

def referenceless_text_generation_metric(
asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ReferencelessTextGenerationMetric

[view_source]

The Referenceless Text Generation Metric is a method for evaluating the quality of generated text without requiring a reference text for comparison, often leveraging models or algorithms to assess coherence, relevance, and fluency based on intrinsic properties of the text itself.

audio_emotion_detection

def audio_emotion_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioEmotionDetection

[view_source]

Audio Emotion Detection is a technology that analyzes vocal characteristics and patterns in audio recordings to identify and classify the emotional state of the speaker.

keyword_spotting

def keyword_spotting(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> KeywordSpotting

[view_source]

Keyword Spotting is a function that enables the detection and identification of specific words or phrases within a stream of audio, often used in voice- activated systems to trigger actions or commands based on recognized keywords.

text_summarization

def text_summarization(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextSummarization

[view_source]

Extracts the main points from a larger body of text, producing a concise summary without losing the primary message.

split_on_linebreak

def split_on_linebreak(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SplitOnLinebreak

[view_source]

The "Split On Linebreak" function divides a given string into a list of substrings, using linebreaks (newline characters) as the points of separation.

other__multipurpose_

def other__multipurpose_(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> OtherMultipurpose

[view_source]

The "Other (Multipurpose)" function serves as a versatile category designed to accommodate a wide range of tasks and activities that do not fit neatly into predefined classifications, offering flexibility and adaptability for various needs.

speaker_diarization_audio

def speaker_diarization_audio(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeakerDiarizationAudio

[view_source]

Identifies individual speakers and their respective speech segments within an audio clip. Ideal for multi-speaker recordings or conference calls.

image_content_moderation

def image_content_moderation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageContentModeration

[view_source]

Detects and filters out inappropriate or harmful images, essential for platforms with user-generated visual content.

text_denormalization

def text_denormalization(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextDenormalization

[view_source]

Converts standardized or normalized text into its original, often more readable, form. Useful in natural language generation tasks.

speaker_diarization_video

def speaker_diarization_video(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeakerDiarizationVideo

[view_source]

Segments a video based on different speakers, identifying when each individual speaks. Useful for transcriptions and understanding multi-person conversations.

text_to_video_generation

def text_to_video_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextToVideoGeneration

[view_source]

Text To Video Generation is a process that converts written descriptions or scripts into dynamic, visual video content using advanced algorithms and artificial intelligence.