aixplain.modules.pipeline.pipeline
ObjectDetection Objects
class ObjectDetection(AssetNode[ObjectDetectionInputs,
ObjectDetectionOutputs])
Object Detection is a computer vision technology that identifies and locates objects within an image, typically by drawing bounding boxes around the detected objects and classifying them into predefined categories.
InputType: video OutputType: text
TextEmbedding Objects
class TextEmbedding(AssetNode[TextEmbeddingInputs, TextEmbeddingOutputs])
Text embedding is a process that converts text into numerical vectors, capturing the semantic meaning and contextual relationships of words or phrases, enabling machines to understand and analyze natural language more effectively.
InputType: text OutputType: text
SemanticSegmentation Objects
class SemanticSegmentation(AssetNode[SemanticSegmentationInputs,
SemanticSegmentationOutputs])
Semantic segmentation is a computer vision process that involves classifying each pixel in an image into a predefined category, effectively partitioning the image into meaningful segments based on the objects or regions they represent.
InputType: image OutputType: label
ReferencelessAudioGenerationMetric Objects
class ReferencelessAudioGenerationMetric(
BaseMetric[ReferencelessAudioGenerationMetricInputs,
ReferencelessAudioGenerationMetricOutputs])
The Referenceless Audio Generation Metric is a tool designed to evaluate the quality of generated audio content without the need for a reference or original audio sample for comparison.
InputType: text OutputType: text
ScriptExecution Objects
class ScriptExecution(AssetNode[ScriptExecutionInputs,
ScriptExecutionOutputs])
Script Execution refers to the process of running a set of programmed instructions or code within a computing environment, enabling the automated performance of tasks, calculations, or operations as defined by the script.
InputType: text OutputType: text
ImageImpainting Objects
class ImageImpainting(AssetNode[ImageImpaintingInputs,
ImageImpaintingOutputs])
Image inpainting is a process that involves filling in missing or damaged parts of an image in a way that is visually coherent and seamlessly blends with the surrounding areas, often using advanced algorithms and techniques to restore the image to its original or intended appearance.
InputType: image OutputType: image
ImageEmbedding Objects
class ImageEmbedding(AssetNode[ImageEmbeddingInputs, ImageEmbeddingOutputs])
Image Embedding is a process that transforms an image into a fixed-dimensional vector representation, capturing its essential features and enabling efficient comparison, retrieval, and analysis in various machine learning and computer vision tasks.
InputType: image OutputType: text
MetricAggregation Objects
class MetricAggregation(BaseMetric[MetricAggregationInputs,
MetricAggregationOutputs])
Metric Aggregation is a function that computes and summarizes numerical data by applying statistical operations, such as averaging, summing, or finding the minimum and maximum values, to provide insights and facilitate analysis of large datasets.
InputType: text OutputType: text
SpeechTranslation Objects
class SpeechTranslation(AssetNode[SpeechTranslationInputs,
SpeechTranslationOutputs])
Speech Translation is a technology that converts spoken language in real-time from one language to another, enabling seamless communication between speakers of different languages.
InputType: audio OutputType: text
DepthEstimation Objects
class DepthEstimation(AssetNode[DepthEstimationInputs,
DepthEstimationOutputs])
Depth estimation is a computational process that determines the distance of objects from a viewpoint, typically using visual data from cameras or sensors to create a three-dimensional understanding of a scene.
InputType: image OutputType: text
NoiseRemoval Objects
class NoiseRemoval(AssetNode[NoiseRemovalInputs, NoiseRemovalOutputs])
Noise Removal is a process that involves identifying and eliminating unwanted random variations or disturbances from an audio signal to enhance the clarity and quality of the underlying information.
InputType: audio OutputType: audio
Diacritization Objects
class Diacritization(AssetNode[DiacritizationInputs, DiacritizationOutputs])
Adds diacritical marks to text, essential for languages where meaning can change based on diacritics.
InputType: text OutputType: text
AudioTranscriptAnalysis Objects
class AudioTranscriptAnalysis(AssetNode[AudioTranscriptAnalysisInputs,
AudioTranscriptAnalysisOutputs])
Analyzes transcribed audio data for insights, patterns, or specific information extraction.
InputType: audio OutputType: text
ExtractAudioFromVideo Objects
class ExtractAudioFromVideo(AssetNode[ExtractAudioFromVideoInputs,
ExtractAudioFromVideoOutputs])
Isolates and extracts audio tracks from video files, aiding in audio analysis or transcription tasks.
InputType: video OutputType: audio
AudioReconstruction Objects
class AudioReconstruction(BaseReconstructor[AudioReconstructionInputs,
AudioReconstructionOutputs])
Audio Reconstruction is the process of restoring or recreating audio signals from incomplete, damaged, or degraded recordings to achieve a high-quality, accurate representation of the original sound.
InputType: audio OutputType: audio
ClassificationMetric Objects
class ClassificationMetric(BaseMetric[ClassificationMetricInputs,
ClassificationMetricOutputs])
A Classification Metric is a quantitative measure used to evaluate the quality and effectiveness of classification models.
InputType: text OutputType: text
TextGenerationMetric Objects
class TextGenerationMetric(BaseMetric[TextGenerationMetricInputs,
TextGenerationMetricOutputs])
A Text Generation Metric is a quantitative measure used to evaluate the quality and effectiveness of text produced by natural language processing models, often assessing aspects such as coherence, relevance, fluency, and adherence to given prompts or instructions.
InputType: text OutputType: text
TextSpamDetection Objects
class TextSpamDetection(AssetNode[TextSpamDetectionInputs,
TextSpamDetectionOutputs])
Identifies and filters out unwanted or irrelevant text content, ideal for moderating user-generated content or ensuring quality in communication platforms.
InputType: text OutputType: label
TextToImageGeneration Objects
class TextToImageGeneration(AssetNode[TextToImageGenerationInputs,
TextToImageGenerationOutputs])
Creates a visual representation based on textual input, turning descriptions into pictorial forms. Used in creative processes and content generation.
InputType: text OutputType: image
VoiceCloning Objects
class VoiceCloning(AssetNode[VoiceCloningInputs, VoiceCloningOutputs])
Replicates a person's voice based on a sample, allowing for the generation of speech in that person's tone and style. Used cautiously due to ethical considerations.
InputType: text OutputType: audio
TextSegmenation Objects
class TextSegmenation(AssetNode[TextSegmenationInputs,
TextSegmenationOutputs])
Text Segmentation is the process of dividing a continuous text into meaningful units, such as words, sentences, or topics, to facilitate easier analysis and understanding.
InputType: text OutputType: text
BenchmarkScoringMt Objects
class BenchmarkScoringMt(AssetNode[BenchmarkScoringMtInputs,
BenchmarkScoringMtOutputs])
Benchmark Scoring MT is a function designed to evaluate and score machine translation systems by comparing their output against a set of predefined benchmarks, thereby assessing their accuracy and performance.
InputType: text OutputType: label
ImageManipulation Objects
class ImageManipulation(AssetNode[ImageManipulationInputs,
ImageManipulationOutputs])
Image Manipulation refers to the process of altering or enhancing digital images using various techniques and tools to achieve desired visual effects, correct imperfections, or transform the image's appearance.
InputType: image OutputType: image
NamedEntityRecognition Objects
class NamedEntityRecognition(AssetNode[NamedEntityRecognitionInputs,
NamedEntityRecognitionOutputs])
Identifies and classifies named entities (e.g., persons, organizations, locations) within text. Useful for information extraction, content tagging, and search enhancements.
InputType: text OutputType: label
OffensiveLanguageIdentification Objects
class OffensiveLanguageIdentification(
AssetNode[OffensiveLanguageIdentificationInputs,
OffensiveLanguageIdentificationOutputs])
Detects language or phrases that might be considered offensive, aiding in content moderation and creating respectful user interactions.
InputType: text OutputType: label
Search Objects
class Search(AssetNode[SearchInputs, SearchOutputs])
An algorithm that identifies and returns data or items that match particular keywords or conditions from a dataset. A fundamental tool for databases and websites.
InputType: text OutputType: text
SentimentAnalysis Objects
class SentimentAnalysis(AssetNode[SentimentAnalysisInputs,
SentimentAnalysisOutputs])
Determines the sentiment or emotion (e.g., positive, negative, neutral) of a piece of text, aiding in understanding user feedback or market sentiment.
InputType: text OutputType: label
ImageColorization Objects
class ImageColorization(AssetNode[ImageColorizationInputs,
ImageColorizationOutputs])
Image colorization is a process that involves adding color to grayscale images, transforming them from black-and-white to full-color representations, often using advanced algorithms and machine learning techniques to predict and apply the appropriate hues and shades.
InputType: image OutputType: image
SpeechClassification Objects
class SpeechClassification(AssetNode[SpeechClassificationInputs,
SpeechClassificationOutputs])
Categorizes audio clips based on their content, aiding in content organization and targeted actions.
InputType: audio OutputType: label
DialectDetection Objects
class DialectDetection(AssetNode[DialectDetectionInputs,
DialectDetectionOutputs])
Identifies specific dialects within a language, aiding in localized content creation or user experience personalization.
InputType: audio OutputType: text
VideoLabelDetection Objects
class VideoLabelDetection(AssetNode[VideoLabelDetectionInputs,
VideoLabelDetectionOutputs])
Identifies and tags objects, scenes, or activities within a video. Useful for content indexing and recommendation systems.
InputType: video OutputType: label
SpeechSynthesis Objects
class SpeechSynthesis(AssetNode[SpeechSynthesisInputs,
SpeechSynthesisOutputs])
Generates human-like speech from written text. Ideal for text-to-speech applications, audiobooks, and voice assistants.
InputType: text OutputType: audio
SplitOnSilence Objects
class SplitOnSilence(AssetNode[SplitOnSilenceInputs, SplitOnSilenceOutputs])
The "Split On Silence" function divides an audio recording into separate segments based on periods of silence, allowing for easier editing and analysis of individual sections.
InputType: audio OutputType: audio
ExpressionDetection Objects
class ExpressionDetection(AssetNode[ExpressionDetectionInputs,
ExpressionDetectionOutputs])
Expression Detection is the process of identifying and analyzing facial expressions to interpret emotions or intentions using AI and computer vision techniques.
InputType: text OutputType: label
AutoMaskGeneration Objects
class AutoMaskGeneration(AssetNode[AutoMaskGenerationInputs,
AutoMaskGenerationOutputs])
Auto-mask generation refers to the automated process of creating masks in image processing or computer vision, typically for segmentation tasks. A mask is a binary or multi-class image that labels different parts of an image, usually separating the foreground (objects of interest) from the background, or identifying specific object classes in an image.
InputType: image OutputType: label
DocumentImageParsing Objects
class DocumentImageParsing(AssetNode[DocumentImageParsingInputs,
DocumentImageParsingOutputs])
Document Image Parsing is the process of analyzing and converting scanned or photographed images of documents into structured, machine-readable formats by identifying and extracting text, layout, and other relevant information.
InputType: image OutputType: text
EntityLinking Objects
class EntityLinking(AssetNode[EntityLinkingInputs, EntityLinkingOutputs])
Associates identified entities in the text with specific entries in a knowledge base or database.
InputType: text OutputType: label
ReferencelessTextGenerationMetricDefault Objects
class ReferencelessTextGenerationMetricDefault(
BaseMetric[ReferencelessTextGenerationMetricDefaultInputs,
ReferencelessTextGenerationMetricDefaultOutputs])
The Referenceless Text Generation Metric Default is a function designed to evaluate the quality of generated text without relying on reference texts for comparison.
InputType: text OutputType: text
FillTextMask Objects
class FillTextMask(AssetNode[FillTextMaskInputs, FillTextMaskOutputs])
Completes missing parts of a text based on the context, ideal for content generation or data augmentation tasks.
InputType: text OutputType: text
SubtitlingTranslation Objects
class SubtitlingTranslation(AssetNode[SubtitlingTranslationInputs,
SubtitlingTranslationOutputs])
Converts the text of subtitles from one language to another, ensuring context and cultural nuances are maintained. Essential for global content distribution.
InputType: text OutputType: text
InstanceSegmentation Objects
class InstanceSegmentation(AssetNode[InstanceSegmentationInputs,
InstanceSegmentationOutputs])
Instance segmentation is a computer vision task that involves detecting and delineating each distinct object within an image, assigning a unique label and precise boundary to every individual instance of objects, even if they belong to the same category.
InputType: image OutputType: label
VisemeGeneration Objects
class VisemeGeneration(AssetNode[VisemeGenerationInputs,
VisemeGenerationOutputs])
Viseme Generation is the process of creating visual representations of phonemes, which are the distinct units of sound in speech, to synchronize lip movements with spoken words in animations or virtual avatars.
InputType: text OutputType: label
AudioGenerationMetric Objects
class AudioGenerationMetric(BaseMetric[AudioGenerationMetricInputs,
AudioGenerationMetricOutputs])
The Audio Generation Metric is a quantitative measure used to evaluate the quality, accuracy, and overall performance of audio generated by artificial intelligence systems, often considering factors such as fidelity, intelligibility, and similarity to human-produced audio.
InputType: text OutputType: text
VideoUnderstanding Objects
class VideoUnderstanding(AssetNode[VideoUnderstandingInputs,
VideoUnderstandingOutputs])
Video Understanding is the process of analyzing and interpreting video content to extract meaningful information, such as identifying objects, actions, events, and contextual relationships within the footage.
InputType: video OutputType: text
TextNormalization Objects
class TextNormalization(AssetNode[TextNormalizationInputs,
TextNormalizationOutputs])
Converts unstructured or non-standard textual data into a more readable and uniform format, dealing with abbreviations, numerals, and other non-standard words.
InputType: text OutputType: label
AsrQualityEstimation Objects
class AsrQualityEstimation(AssetNode[AsrQualityEstimationInputs,
AsrQualityEstimationOutputs])
ASR Quality Estimation is a process that evaluates the accuracy and reliability of automatic speech recognition systems by analyzing their performance in transcribing spoken language into text.
InputType: text OutputType: label
VoiceActivityDetection Objects
class VoiceActivityDetection(BaseSegmentor[VoiceActivityDetectionInputs,
VoiceActivityDetectionOutputs])
Determines when a person is speaking in an audio clip. It's an essential preprocessing step for other audio-related tasks.
InputType: audio OutputType: audio
SpeechNonSpeechClassification Objects
class SpeechNonSpeechClassification(
AssetNode[SpeechNonSpeechClassificationInputs,
SpeechNonSpeechClassificationOutputs])
Differentiates between speech and non-speech audio segments. Great for editing software and transcription services to exclude irrelevant audio.
InputType: audio OutputType: label
AudioTranscriptImprovement Objects
class AudioTranscriptImprovement(AssetNode[AudioTranscriptImprovementInputs,
AudioTranscriptImprovementOutputs])
Refines and corrects transcriptions generated from audio data, improving readability and accuracy.
InputType: audio OutputType: text
TextContentModeration Objects
class TextContentModeration(AssetNode[TextContentModerationInputs,
TextContentModerationOutputs])
Scans and identifies potentially harmful, offensive, or inappropriate textual content, ensuring safer user environments.
InputType: text OutputType: label
EmotionDetection Objects
class EmotionDetection(AssetNode[EmotionDetectionInputs,
EmotionDetectionOutputs])
Identifies human emotions from text or audio, enhancing user experience in chatbots or customer feedback analysis.
InputType: text OutputType: label
AudioForcedAlignment Objects
class AudioForcedAlignment(AssetNode[AudioForcedAlignmentInputs,
AudioForcedAlignmentOutputs])
Synchronizes phonetic and phonological text with the corresponding segments in an audio file. Useful in linguistic research and detailed transcription tasks.
InputType: audio OutputType: audio
VideoContentModeration Objects
class VideoContentModeration(AssetNode[VideoContentModerationInputs,
VideoContentModerationOutputs])
Automatically reviews video content to detect and possibly remove inappropriate or harmful material. Essential for user-generated content platforms.
InputType: video OutputType: label
ImageLabelDetection Objects
class ImageLabelDetection(AssetNode[ImageLabelDetectionInputs,
ImageLabelDetectionOutputs])
Identifies objects, themes, or topics within images, useful for image categorization, search, and recommendation systems.
InputType: image OutputType: label
VideoForcedAlignment Objects
class VideoForcedAlignment(AssetNode[VideoForcedAlignmentInputs,
VideoForcedAlignmentOutputs])
Aligns the transcription of spoken content in a video with its corresponding timecodes, facilitating subtitle creation.
InputType: video OutputType: video
TextGeneration Objects
class TextGeneration(AssetNode[TextGenerationInputs, TextGenerationOutputs])
Creates coherent and contextually relevant textual content based on prompts or certain parameters. Useful for chatbots, content creation, and data augmentation.
InputType: text OutputType: text
TextClassification Objects
class TextClassification(AssetNode[TextClassificationInputs,
TextClassificationOutputs])
Categorizes text into predefined groups or topics, facilitating content organization and targeted actions.
InputType: text OutputType: label
SpeechEmbedding Objects
class SpeechEmbedding(AssetNode[SpeechEmbeddingInputs,
SpeechEmbeddingOutputs])
Transforms spoken content into a fixed-size vector in a high-dimensional space that captures the content's essence. Facilitates tasks like speech recognition and speaker verification.
InputType: audio OutputType: text
TopicClassification Objects
class TopicClassification(AssetNode[TopicClassificationInputs,
TopicClassificationOutputs])
Assigns categories or topics to a piece of text based on its content, facilitating content organization and retrieval.
InputType: text OutputType: label
Translation Objects
class Translation(AssetNode[TranslationInputs, TranslationOutputs])
Converts text from one language to another while maintaining the original message's essence and context. Crucial for global communication.
InputType: text OutputType: text
SpeechRecognition Objects
class SpeechRecognition(AssetNode[SpeechRecognitionInputs,
SpeechRecognitionOutputs])
Converts spoken language into written text. Useful for transcription services, voice assistants, and applications requiring voice-to-text capabilities.
InputType: audio OutputType: text
Subtitling Objects
class Subtitling(AssetNode[SubtitlingInputs, SubtitlingOutputs])
Generates accurate subtitles for videos, enhancing accessibility for diverse audiences.
InputType: audio OutputType: text
ImageCaptioning Objects
class ImageCaptioning(AssetNode[ImageCaptioningInputs,
ImageCaptioningOutputs])
Image Captioning is a process that involves generating a textual description of an image, typically using machine learning models to analyze the visual content and produce coherent and contextually relevant sentences that describe the objects, actions, and scenes depicted in the image.
InputType: image OutputType: text
AudioLanguageIdentification Objects
class AudioLanguageIdentification(AssetNode[AudioLanguageIdentificationInputs,
AudioLanguageIdentificationOutputs]
)
Audio Language Identification is a process that involves analyzing an audio recording to determine the language being spoken.
InputType: audio OutputType: label
VideoEmbedding Objects
class VideoEmbedding(AssetNode[VideoEmbeddingInputs, VideoEmbeddingOutputs])
Video Embedding is a process that transforms video content into a fixed- dimensional vector representation, capturing essential features and patterns to facilitate tasks such as retrieval, classification, and recommendation.
InputType: video OutputType: embedding
AsrAgeClassification Objects
class AsrAgeClassification(AssetNode[AsrAgeClassificationInputs,
AsrAgeClassificationOutputs])
The ASR Age Classification function is designed to analyze audio recordings of speech to determine the speaker's age group by leveraging automatic speech recognition (ASR) technology and machine learning algorithms.
InputType: audio OutputType: label
AudioIntentDetection Objects
class AudioIntentDetection(AssetNode[AudioIntentDetectionInputs,
AudioIntentDetectionOutputs])
Audio Intent Detection is a process that involves analyzing audio signals to identify and interpret the underlying intentions or purposes behind spoken words, enabling systems to understand and respond appropriately to human speech.
InputType: audio OutputType: label
LanguageIdentification Objects
class LanguageIdentification(AssetNode[LanguageIdentificationInputs,
LanguageIdentificationOutputs])
Detects the language in which a given text is written, aiding in multilingual platforms or content localization.
InputType: text OutputType: text
Ocr Objects
class Ocr(AssetNode[OcrInputs, OcrOutputs])
Converts images of typed, handwritten, or printed text into machine-encoded text. Used in digitizing printed texts for data retrieval.
InputType: image OutputType: text
AsrGenderClassification Objects
class AsrGenderClassification(AssetNode[AsrGenderClassificationInputs,
AsrGenderClassificationOutputs])
The ASR Gender Classification function analyzes audio recordings to determine and classify the speaker's gender based on their voice characteristics.
InputType: audio OutputType: label
LanguageIdentificationAudio Objects
class LanguageIdentificationAudio(AssetNode[LanguageIdentificationAudioInputs,
LanguageIdentificationAudioOutputs]
)
The Language Identification Audio function analyzes audio input to determine and identify the language being spoken.
InputType: audio OutputType: label
BaseModel Objects
class BaseModel(AssetNode[BaseModelInputs, BaseModelOutputs])
The Base-Model function serves as a foundational framework designed to provide essential features and capabilities upon which more specialized or advanced models can be built and customized.
InputType: text OutputType: text
Loglikelihood Objects
class Loglikelihood(AssetNode[LoglikelihoodInputs, LoglikelihoodOutputs])
The Log Likelihood function measures the probability of observing the given data under a specific statistical model by taking the natural logarithm of the likelihood function, thereby transforming the product of probabilities into a sum, which simplifies the process of optimization and parameter estimation.
InputType: text OutputType: number
ImageToVideoGeneration Objects
class ImageToVideoGeneration(AssetNode[ImageToVideoGenerationInputs,
ImageToVideoGenerationOutputs])
The Image To Video Generation function transforms a series of static images into a cohesive, dynamic video sequence, often incorporating transitions, effects, and synchronization with audio to create a visually engaging narrative.
InputType: image OutputType: video
PartOfSpeechTagging Objects
class PartOfSpeechTagging(AssetNode[PartOfSpeechTaggingInputs,
PartOfSpeechTaggingOutputs])
Part of Speech Tagging is a natural language processing task that involves assigning each word in a sentence its corresponding part of speech, such as noun, verb, adjective, or adverb, based on its role and context within the sentence.
InputType: text OutputType: label
BenchmarkScoringAsr Objects
class BenchmarkScoringAsr(AssetNode[BenchmarkScoringAsrInputs,
BenchmarkScoringAsrOutputs])
Benchmark Scoring ASR is a function that evaluates and compares the performance of automatic speech recognition systems by analyzing their accuracy, speed, and other relevant metrics against a standardized set of benchmarks.
InputType: audio OutputType: label
VisualQuestionAnswering Objects
class VisualQuestionAnswering(AssetNode[VisualQuestionAnsweringInputs,
VisualQuestionAnsweringOutputs])
Visual Question Answering (VQA) is a task in artificial intelligence that involves analyzing an image and providing accurate, contextually relevant answers to questions posed about the visual content of that image.
InputType: image OutputType: video
DocumentInformationExtraction Objects
class DocumentInformationExtraction(
AssetNode[DocumentInformationExtractionInputs,
DocumentInformationExtractionOutputs])
Document Information Extraction is the process of automatically identifying, extracting, and structuring relevant data from unstructured or semi-structured documents, such as invoices, receipts, contracts, and forms, to facilitate easier data management and analysis.
InputType: image OutputType: text
VideoGeneration Objects
class VideoGeneration(AssetNode[VideoGenerationInputs,
VideoGenerationOutputs])
Produces video content based on specific inputs or datasets. Can be used for simulations, animations, or even deepfake detection.
InputType: text OutputType: video
MultiClassImageClassification Objects
class MultiClassImageClassification(
AssetNode[MultiClassImageClassificationInputs,
MultiClassImageClassificationOutputs])
Multi Class Image Classification is a machine learning task where an algorithm is trained to categorize images into one of several predefined classes or categories based on their visual content.
InputType: image OutputType: label
StyleTransfer Objects
class StyleTransfer(AssetNode[StyleTransferInputs, StyleTransferOutputs])
Style Transfer is a technique in artificial intelligence that applies the visual style of one image (such as the brushstrokes of a famous painting) to the content of another image, effectively blending the artistic elements of the first image with the subject matter of the second.
InputType: image OutputType: image
MultiClassTextClassification Objects
class MultiClassTextClassification(
AssetNode[MultiClassTextClassificationInputs,
MultiClassTextClassificationOutputs])
Multi Class Text Classification is a natural language processing task that involves categorizing a given text into one of several predefined classes or categories based on its content.
InputType: text OutputType: label
IntentClassification Objects
class IntentClassification(AssetNode[IntentClassificationInputs,
IntentClassificationOutputs])
Intent Classification is a natural language processing task that involves analyzing and categorizing user text input to determine the underlying purpose or goal behind the communication, such as booking a flight, asking for weather information, or setting a reminder.
InputType: text OutputType: label
MultiLabelTextClassification Objects
class MultiLabelTextClassification(
AssetNode[MultiLabelTextClassificationInputs,
MultiLabelTextClassificationOutputs])
Multi Label Text Classification is a natural language processing task where a given text is analyzed and assigned multiple relevant labels or categories from a predefined set, allowing for the text to belong to more than one category simultaneously.
InputType: text OutputType: label
TextReconstruction Objects
class TextReconstruction(BaseReconstructor[TextReconstructionInputs,
TextReconstructionOutputs])
Text Reconstruction is a process that involves piecing together fragmented or incomplete text data to restore it to its original, coherent form.
InputType: text OutputType: text
FactChecking Objects
class FactChecking(AssetNode[FactCheckingInputs, FactCheckingOutputs])
Fact Checking is the process of verifying the accuracy and truthfulness of information, statements, or claims by cross-referencing with reliable sources and evidence.
InputType: text OutputType: label
InverseTextNormalization Objects
class InverseTextNormalization(AssetNode[InverseTextNormalizationInputs,
InverseTextNormalizationOutputs])
Inverse Text Normalization is the process of converting spoken or written language in its normalized form, such as numbers, dates, and abbreviations, back into their original, more complex or detailed textual representations.
InputType: text OutputType: label
TextToAudio Objects
class TextToAudio(AssetNode[TextToAudioInputs, TextToAudioOutputs])
The Text to Audio function converts written text into spoken words, allowing users to listen to the content instead of reading it.
InputType: text OutputType: audio
ImageCompression Objects
class ImageCompression(AssetNode[ImageCompressionInputs,
ImageCompressionOutputs])
Reduces the size of image files without significantly compromising their visual quality. Useful for optimizing storage and improving webpage load times.
InputType: image OutputType: image
MultilingualSpeechRecognition Objects
class MultilingualSpeechRecognition(
AssetNode[MultilingualSpeechRecognitionInputs,
MultilingualSpeechRecognitionOutputs])
Multilingual Speech Recognition is a technology that enables the automatic transcription of spoken language into text across multiple languages, allowing for seamless communication and understanding in diverse linguistic contexts.
InputType: audio OutputType: text
TextGenerationMetricDefault Objects
class TextGenerationMetricDefault(
BaseMetric[TextGenerationMetricDefaultInputs,
TextGenerationMetricDefaultOutputs])
The "Text Generation Metric Default" function provides a standard set of evaluation metrics for assessing the quality and performance of text generation models.
InputType: text OutputType: text
ReferencelessTextGenerationMetric Objects
class ReferencelessTextGenerationMetric(
BaseMetric[ReferencelessTextGenerationMetricInputs,
ReferencelessTextGenerationMetricOutputs])
The Referenceless Text Generation Metric is a method for evaluating the quality of generated text without requiring a reference text for comparison, often leveraging models or algorithms to assess coherence, relevance, and fluency based on intrinsic properties of the text itself.
InputType: text OutputType: text
AudioEmotionDetection Objects
class AudioEmotionDetection(AssetNode[AudioEmotionDetectionInputs,
AudioEmotionDetectionOutputs])
Audio Emotion Detection is a technology that analyzes vocal characteristics and patterns in audio recordings to identify and classify the emotional state of the speaker.
InputType: audio OutputType: label
KeywordSpotting Objects
class KeywordSpotting(AssetNode[KeywordSpottingInputs,
KeywordSpottingOutputs])
Keyword Spotting is a function that enables the detection and identification of specific words or phrases within a stream of audio, often used in voice- activated systems to trigger actions or commands based on recognized keywords.
InputType: audio OutputType: label
TextSummarization Objects
class TextSummarization(AssetNode[TextSummarizationInputs,
TextSummarizationOutputs])
Extracts the main points from a larger body of text, producing a concise summary without losing the primary message.
InputType: text OutputType: text
SplitOnLinebreak Objects
class SplitOnLinebreak(BaseSegmentor[SplitOnLinebreakInputs,
SplitOnLinebreakOutputs])
The "Split On Linebreak" function divides a given string into a list of substrings, using linebreaks (newline characters) as the points of separation.
InputType: text OutputType: text
OtherMultipurpose Objects
class OtherMultipurpose(AssetNode[OtherMultipurposeInputs,
OtherMultipurposeOutputs])
The "Other (Multipurpose)" function serves as a versatile category designed to accommodate a wide range of tasks and activities that do not fit neatly into predefined classifications, offering flexibility and adaptability for various needs.
InputType: text OutputType: text
SpeakerDiarizationAudio Objects
class SpeakerDiarizationAudio(BaseSegmentor[SpeakerDiarizationAudioInputs,
SpeakerDiarizationAudioOutputs])
Identifies individual speakers and their respective speech segments within an audio clip. Ideal for multi-speaker recordings or conference calls.
InputType: audio OutputType: label
ImageContentModeration Objects
class ImageContentModeration(AssetNode[ImageContentModerationInputs,
ImageContentModerationOutputs])
Detects and filters out inappropriate or harmful images, essential for platforms with user-generated visual content.
InputType: image OutputType: label
TextDenormalization Objects
class TextDenormalization(AssetNode[TextDenormalizationInputs,
TextDenormalizationOutputs])
Converts standardized or normalized text into its original, often more readable, form. Useful in natural language generation tasks.
InputType: text OutputType: label
SpeakerDiarizationVideo Objects
class SpeakerDiarizationVideo(AssetNode[SpeakerDiarizationVideoInputs,
SpeakerDiarizationVideoOutputs])
Segments a video based on different speakers, identifying when each individual speaks. Useful for transcriptions and understanding multi-person conversations.
InputType: video OutputType: label
TextToVideoGeneration Objects
class TextToVideoGeneration(AssetNode[TextToVideoGenerationInputs,
TextToVideoGenerationOutputs])
Text To Video Generation is a process that converts written descriptions or scripts into dynamic, visual video content using advanced algorithms and artificial intelligence.
InputType: text OutputType: video
Pipeline Objects
class Pipeline(DefaultPipeline)
object_detection
def object_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ObjectDetection
Object Detection is a computer vision technology that identifies and locates objects within an image, typically by drawing bounding boxes around the detected objects and classifying them into predefined categories.
text_embedding
def text_embedding(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextEmbedding
Text embedding is a process that converts text into numerical vectors, capturing the semantic meaning and contextual relationships of words or phrases, enabling machines to understand and analyze natural language more effectively.
semantic_segmentation
def semantic_segmentation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SemanticSegmentation
Semantic segmentation is a computer vision process that involves classifying each pixel in an image into a predefined category, effectively partitioning the image into meaningful segments based on the objects or regions they represent.
referenceless_audio_generation_metric
def referenceless_audio_generation_metric(
asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ReferencelessAudioGenerationMetric
The Referenceless Audio Generation Metric is a tool designed to evaluate the quality of generated audio content without the need for a reference or original audio sample for comparison.
script_execution
def script_execution(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ScriptExecution
Script Execution refers to the process of running a set of programmed instructions or code within a computing environment, enabling the automated performance of tasks, calculations, or operations as defined by the script.
image_impainting
def image_impainting(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageImpainting
Image inpainting is a process that involves filling in missing or damaged parts of an image in a way that is visually coherent and seamlessly blends with the surrounding areas, often using advanced algorithms and techniques to restore the image to its original or intended appearance.
image_embedding
def image_embedding(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageEmbedding
Image Embedding is a process that transforms an image into a fixed-dimensional vector representation, capturing its essential features and enabling efficient comparison, retrieval, and analysis in various machine learning and computer vision tasks.
metric_aggregation
def metric_aggregation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> MetricAggregation
Metric Aggregation is a function that computes and summarizes numerical data by applying statistical operations, such as averaging, summing, or finding the minimum and maximum values, to provide insights and facilitate analysis of large datasets.
speech_translation
def speech_translation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeechTranslation
Speech Translation is a technology that converts spoken language in real-time from one language to another, enabling seamless communication between speakers of different languages.
depth_estimation
def depth_estimation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> DepthEstimation
Depth estimation is a computational process that determines the distance of objects from a viewpoint, typically using visual data from cameras or sensors to create a three-dimensional understanding of a scene.
noise_removal
def noise_removal(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> NoiseRemoval
Noise Removal is a process that involves identifying and eliminating unwanted random variations or disturbances from an audio signal to enhance the clarity and quality of the underlying information.
diacritization
def diacritization(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> Diacritization
Adds diacritical marks to text, essential for languages where meaning can change based on diacritics.
audio_transcript_analysis
def audio_transcript_analysis(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioTranscriptAnalysis
Analyzes transcribed audio data for insights, patterns, or specific information extraction.
extract_audio_from_video
def extract_audio_from_video(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ExtractAudioFromVideo
Isolates and extracts audio tracks from video files, aiding in audio analysis or transcription tasks.
audio_reconstruction
def audio_reconstruction(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioReconstruction
Audio Reconstruction is the process of restoring or recreating audio signals from incomplete, damaged, or degraded recordings to achieve a high-quality, accurate representation of the original sound.
classification_metric
def classification_metric(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ClassificationMetric
A Classification Metric is a quantitative measure used to evaluate the quality and effectiveness of classification models.
text_generation_metric
def text_generation_metric(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextGenerationMetric
A Text Generation Metric is a quantitative measure used to evaluate the quality and effectiveness of text produced by natural language processing models, often assessing aspects such as coherence, relevance, fluency, and adherence to given prompts or instructions.
text_spam_detection
def text_spam_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextSpamDetection
Identifies and filters out unwanted or irrelevant text content, ideal for moderating user-generated content or ensuring quality in communication platforms.
text_to_image_generation
def text_to_image_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextToImageGeneration
Creates a visual representation based on textual input, turning descriptions into pictorial forms. Used in creative processes and content generation.
voice_cloning
def voice_cloning(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VoiceCloning
Replicates a person's voice based on a sample, allowing for the generation of speech in that person's tone and style. Used cautiously due to ethical considerations.
text_segmenation
def text_segmenation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextSegmenation
Text Segmentation is the process of dividing a continuous text into meaningful units, such as words, sentences, or topics, to facilitate easier analysis and understanding.
benchmark_scoring_mt
def benchmark_scoring_mt(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> BenchmarkScoringMt
Benchmark Scoring MT is a function designed to evaluate and score machine translation systems by comparing their output against a set of predefined benchmarks, thereby assessing their accuracy and performance.
image_manipulation
def image_manipulation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageManipulation
Image Manipulation refers to the process of altering or enhancing digital images using various techniques and tools to achieve desired visual effects, correct imperfections, or transform the image's appearance.
named_entity_recognition
def named_entity_recognition(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> NamedEntityRecognition
Identifies and classifies named entities (e.g., persons, organizations, locations) within text. Useful for information extraction, content tagging, and search enhancements.
offensive_language_identification
def offensive_language_identification(
asset_id: Union[str, asset.Asset], *args,
**kwargs) -> OffensiveLanguageIdentification
Detects language or phrases that might be considered offensive, aiding in content moderation and creating respectful user interactions.
search
def search(asset_id: Union[str, asset.Asset], *args, **kwargs) -> Search
An algorithm that identifies and returns data or items that match particular keywords or conditions from a dataset. A fundamental tool for databases and websites.
sentiment_analysis
def sentiment_analysis(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SentimentAnalysis
Determines the sentiment or emotion (e.g., positive, negative, neutral) of a piece of text, aiding in understanding user feedback or market sentiment.
image_colorization
def image_colorization(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageColorization
Image colorization is a process that involves adding color to grayscale images, transforming them from black-and-white to full-color representations, often using advanced algorithms and machine learning techniques to predict and apply the appropriate hues and shades.
speech_classification
def speech_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeechClassification
Categorizes audio clips based on their content, aiding in content organization and targeted actions.
dialect_detection
def dialect_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> DialectDetection
Identifies specific dialects within a language, aiding in localized content creation or user experience personalization.
video_label_detection
def video_label_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VideoLabelDetection
Identifies and tags objects, scenes, or activities within a video. Useful for content indexing and recommendation systems.
speech_synthesis
def speech_synthesis(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeechSynthesis
Generates human-like speech from written text. Ideal for text-to-speech applications, audiobooks, and voice assistants.
split_on_silence
def split_on_silence(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SplitOnSilence
The "Split On Silence" function divides an audio recording into separate segments based on periods of silence, allowing for easier editing and analysis of individual sections.
expression_detection
def expression_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ExpressionDetection
Expression Detection is the process of identifying and analyzing facial expressions to interpret emotions or intentions using AI and computer vision techniques.
auto_mask_generation
def auto_mask_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AutoMaskGeneration
Auto-mask generation refers to the automated process of creating masks in image processing or computer vision, typically for segmentation tasks. A mask is a binary or multi-class image that labels different parts of an image, usually separating the foreground (objects of interest) from the background, or identifying specific object classes in an image.
document_image_parsing
def document_image_parsing(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> DocumentImageParsing
Document Image Parsing is the process of analyzing and converting scanned or photographed images of documents into structured, machine-readable formats by identifying and extracting text, layout, and other relevant information.
entity_linking
def entity_linking(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> EntityLinking
Associates identified entities in the text with specific entries in a knowledge base or database.
referenceless_text_generation_metric_default
def referenceless_text_generation_metric_default(
asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ReferencelessTextGenerationMetricDefault
The Referenceless Text Generation Metric Default is a function designed to evaluate the quality of generated text without relying on reference texts for comparison.
fill_text_mask
def fill_text_mask(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> FillTextMask
Completes missing parts of a text based on the context, ideal for content generation or data augmentation tasks.
subtitling_translation
def subtitling_translation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SubtitlingTranslation
Converts the text of subtitles from one language to another, ensuring context and cultural nuances are maintained. Essential for global content distribution.
instance_segmentation
def instance_segmentation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> InstanceSegmentation
Instance segmentation is a computer vision task that involves detecting and delineating each distinct object within an image, assigning a unique label and precise boundary to every individual instance of objects, even if they belong to the same category.
viseme_generation
def viseme_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VisemeGeneration
Viseme Generation is the process of creating visual representations of phonemes, which are the distinct units of sound in speech, to synchronize lip movements with spoken words in animations or virtual avatars.
audio_generation_metric
def audio_generation_metric(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioGenerationMetric
The Audio Generation Metric is a quantitative measure used to evaluate the quality, accuracy, and overall performance of audio generated by artificial intelligence systems, often considering factors such as fidelity, intelligibility, and similarity to human-produced audio.
video_understanding
def video_understanding(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VideoUnderstanding
Video Understanding is the process of analyzing and interpreting video content to extract meaningful information, such as identifying objects, actions, events, and contextual relationships within the footage.
text_normalization
def text_normalization(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextNormalization
Converts unstructured or non-standard textual data into a more readable and uniform format, dealing with abbreviations, numerals, and other non-standard words.
asr_quality_estimation
def asr_quality_estimation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AsrQualityEstimation
ASR Quality Estimation is a process that evaluates the accuracy and reliability of automatic speech recognition systems by analyzing their performance in transcribing spoken language into text.
voice_activity_detection
def voice_activity_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VoiceActivityDetection
Determines when a person is speaking in an audio clip. It's an essential preprocessing step for other audio-related tasks.
speech_non_speech_classification
def speech_non_speech_classification(
asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeechNonSpeechClassification
Differentiates between speech and non-speech audio segments. Great for editing software and transcription services to exclude irrelevant audio.
audio_transcript_improvement
def audio_transcript_improvement(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioTranscriptImprovement
Refines and corrects transcriptions generated from audio data, improving readability and accuracy.
text_content_moderation
def text_content_moderation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextContentModeration
Scans and identifies potentially harmful, offensive, or inappropriate textual content, ensuring safer user environments.
emotion_detection
def emotion_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> EmotionDetection
Identifies human emotions from text or audio, enhancing user experience in chatbots or customer feedback analysis.
audio_forced_alignment
def audio_forced_alignment(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioForcedAlignment
Synchronizes phonetic and phonological text with the corresponding segments in an audio file. Useful in linguistic research and detailed transcription tasks.
video_content_moderation
def video_content_moderation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VideoContentModeration
Automatically reviews video content to detect and possibly remove inappropriate or harmful material. Essential for user-generated content platforms.
image_label_detection
def image_label_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageLabelDetection
Identifies objects, themes, or topics within images, useful for image categorization, search, and recommendation systems.
video_forced_alignment
def video_forced_alignment(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VideoForcedAlignment
Aligns the transcription of spoken content in a video with its corresponding timecodes, facilitating subtitle creation.
text_generation
def text_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextGeneration
Creates coherent and contextually relevant textual content based on prompts or certain parameters. Useful for chatbots, content creation, and data augmentation.
text_classification
def text_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextClassification
Categorizes text into predefined groups or topics, facilitating content organization and targeted actions.
speech_embedding
def speech_embedding(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeechEmbedding
Transforms spoken content into a fixed-size vector in a high-dimensional space that captures the content's essence. Facilitates tasks like speech recognition and speaker verification.
topic_classification
def topic_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TopicClassification
Assigns categories or topics to a piece of text based on its content, facilitating content organization and retrieval.
translation
def translation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> Translation
Converts text from one language to another while maintaining the original message's essence and context. Crucial for global communication.
speech_recognition
def speech_recognition(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeechRecognition
Converts spoken language into written text. Useful for transcription services, voice assistants, and applications requiring voice-to-text capabilities.
subtitling
def subtitling(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> Subtitling
Generates accurate subtitles for videos, enhancing accessibility for diverse audiences.
image_captioning
def image_captioning(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageCaptioning
Image Captioning is a process that involves generating a textual description of an image, typically using machine learning models to analyze the visual content and produce coherent and contextually relevant sentences that describe the objects, actions, and scenes depicted in the image.
audio_language_identification
def audio_language_identification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioLanguageIdentification
Audio Language Identification is a process that involves analyzing an audio recording to determine the language being spoken.
video_embedding
def video_embedding(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VideoEmbedding
Video Embedding is a process that transforms video content into a fixed- dimensional vector representation, capturing essential features and patterns to facilitate tasks such as retrieval, classification, and recommendation.
asr_age_classification
def asr_age_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AsrAgeClassification
The ASR Age Classification function is designed to analyze audio recordings of speech to determine the speaker's age group by leveraging automatic speech recognition (ASR) technology and machine learning algorithms.
audio_intent_detection
def audio_intent_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioIntentDetection
Audio Intent Detection is a process that involves analyzing audio signals to identify and interpret the underlying intentions or purposes behind spoken words, enabling systems to understand and respond appropriately to human speech.
language_identification
def language_identification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> LanguageIdentification
Detects the language in which a given text is written, aiding in multilingual platforms or content localization.
ocr
def ocr(asset_id: Union[str, asset.Asset], *args, **kwargs) -> Ocr
Converts images of typed, handwritten, or printed text into machine-encoded text. Used in digitizing printed texts for data retrieval.
asr_gender_classification
def asr_gender_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AsrGenderClassification
The ASR Gender Classification function analyzes audio recordings to determine and classify the speaker's gender based on their voice characteristics.
language_identification_audio
def language_identification_audio(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> LanguageIdentificationAudio
The Language Identification Audio function analyzes audio input to determine and identify the language being spoken.
base_model
def base_model(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> BaseModel
The Base-Model function serves as a foundational framework designed to provide essential features and capabilities upon which more specialized or advanced models can be built and customized.
loglikelihood
def loglikelihood(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> Loglikelihood
The Log Likelihood function measures the probability of observing the given data under a specific statistical model by taking the natural logarithm of the likelihood function, thereby transforming the product of probabilities into a sum, which simplifies the process of optimization and parameter estimation.
image_to_video_generation
def image_to_video_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageToVideoGeneration
The Image To Video Generation function transforms a series of static images into a cohesive, dynamic video sequence, often incorporating transitions, effects, and synchronization with audio to create a visually engaging narrative.
part_of_speech_tagging
def part_of_speech_tagging(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> PartOfSpeechTagging
Part of Speech Tagging is a natural language processing task that involves assigning each word in a sentence its corresponding part of speech, such as noun, verb, adjective, or adverb, based on its role and context within the sentence.
benchmark_scoring_asr
def benchmark_scoring_asr(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> BenchmarkScoringAsr
Benchmark Scoring ASR is a function that evaluates and compares the performance of automatic speech recognition systems by analyzing their accuracy, speed, and other relevant metrics against a standardized set of benchmarks.
visual_question_answering
def visual_question_answering(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VisualQuestionAnswering
Visual Question Answering (VQA) is a task in artificial intelligence that involves analyzing an image and providing accurate, contextually relevant answers to questions posed about the visual content of that image.
document_information_extraction
def document_information_extraction(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> DocumentInformationExtraction
Document Information Extraction is the process of automatically identifying, extracting, and structuring relevant data from unstructured or semi-structured documents, such as invoices, receipts, contracts, and forms, to facilitate easier data management and analysis.
video_generation
def video_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> VideoGeneration
Produces video content based on specific inputs or datasets. Can be used for simulations, animations, or even deepfake detection.
multi_class_image_classification
def multi_class_image_classification(
asset_id: Union[str, asset.Asset], *args,
**kwargs) -> MultiClassImageClassification
Multi Class Image Classification is a machine learning task where an algorithm is trained to categorize images into one of several predefined classes or categories based on their visual content.
style_transfer
def style_transfer(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> StyleTransfer
Style Transfer is a technique in artificial intelligence that applies the visual style of one image (such as the brushstrokes of a famous painting) to the content of another image, effectively blending the artistic elements of the first image with the subject matter of the second.
multi_class_text_classification
def multi_class_text_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> MultiClassTextClassification
Multi Class Text Classification is a natural language processing task that involves categorizing a given text into one of several predefined classes or categories based on its content.
intent_classification
def intent_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> IntentClassification
Intent Classification is a natural language processing task that involves analyzing and categorizing user text input to determine the underlying purpose or goal behind the communication, such as booking a flight, asking for weather information, or setting a reminder.
multi_label_text_classification
def multi_label_text_classification(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> MultiLabelTextClassification
Multi Label Text Classification is a natural language processing task where a given text is analyzed and assigned multiple relevant labels or categories from a predefined set, allowing for the text to belong to more than one category simultaneously.
text_reconstruction
def text_reconstruction(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextReconstruction
Text Reconstruction is a process that involves piecing together fragmented or incomplete text data to restore it to its original, coherent form.
fact_checking
def fact_checking(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> FactChecking
Fact Checking is the process of verifying the accuracy and truthfulness of information, statements, or claims by cross-referencing with reliable sources and evidence.
inverse_text_normalization
def inverse_text_normalization(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> InverseTextNormalization
Inverse Text Normalization is the process of converting spoken or written language in its normalized form, such as numbers, dates, and abbreviations, back into their original, more complex or detailed textual representations.
text_to_audio
def text_to_audio(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextToAudio
The Text to Audio function converts written text into spoken words, allowing users to listen to the content instead of reading it.
image_compression
def image_compression(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageCompression
Reduces the size of image files without significantly compromising their visual quality. Useful for optimizing storage and improving webpage load times.
multilingual_speech_recognition
def multilingual_speech_recognition(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> MultilingualSpeechRecognition
Multilingual Speech Recognition is a technology that enables the automatic transcription of spoken language into text across multiple languages, allowing for seamless communication and understanding in diverse linguistic contexts.
text_generation_metric_default
def text_generation_metric_default(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextGenerationMetricDefault
The "Text Generation Metric Default" function provides a standard set of evaluation metrics for assessing the quality and performance of text generation models.
referenceless_text_generation_metric
def referenceless_text_generation_metric(
asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ReferencelessTextGenerationMetric
The Referenceless Text Generation Metric is a method for evaluating the quality of generated text without requiring a reference text for comparison, often leveraging models or algorithms to assess coherence, relevance, and fluency based on intrinsic properties of the text itself.
audio_emotion_detection
def audio_emotion_detection(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> AudioEmotionDetection
Audio Emotion Detection is a technology that analyzes vocal characteristics and patterns in audio recordings to identify and classify the emotional state of the speaker.
keyword_spotting
def keyword_spotting(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> KeywordSpotting
Keyword Spotting is a function that enables the detection and identification of specific words or phrases within a stream of audio, often used in voice- activated systems to trigger actions or commands based on recognized keywords.
text_summarization
def text_summarization(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextSummarization
Extracts the main points from a larger body of text, producing a concise summary without losing the primary message.
split_on_linebreak
def split_on_linebreak(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SplitOnLinebreak
The "Split On Linebreak" function divides a given string into a list of substrings, using linebreaks (newline characters) as the points of separation.
other__multipurpose_
def other__multipurpose_(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> OtherMultipurpose
The "Other (Multipurpose)" function serves as a versatile category designed to accommodate a wide range of tasks and activities that do not fit neatly into predefined classifications, offering flexibility and adaptability for various needs.
speaker_diarization_audio
def speaker_diarization_audio(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeakerDiarizationAudio
Identifies individual speakers and their respective speech segments within an audio clip. Ideal for multi-speaker recordings or conference calls.
image_content_moderation
def image_content_moderation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> ImageContentModeration
Detects and filters out inappropriate or harmful images, essential for platforms with user-generated visual content.
text_denormalization
def text_denormalization(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextDenormalization
Converts standardized or normalized text into its original, often more readable, form. Useful in natural language generation tasks.
speaker_diarization_video
def speaker_diarization_video(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> SpeakerDiarizationVideo
Segments a video based on different speakers, identifying when each individual speaks. Useful for transcriptions and understanding multi-person conversations.
text_to_video_generation
def text_to_video_generation(asset_id: Union[str, asset.Asset], *args,
**kwargs) -> TextToVideoGeneration
Text To Video Generation is a process that converts written descriptions or scripts into dynamic, visual video content using advanced algorithms and artificial intelligence.