Skip to main content

Script Nodes

Script Nodes allow users to integrate custom Python scripts into aiXplain pipelines. They are particularly useful when:

  • Modifying or transforming data between pipeline nodes.
  • Implementing custom logic not covered by built-in nodes.
  • Formatting outputs to meet specific requirements.

This guide walks through creating a Retrieval Augmented Generation (RAG) based pipeline using Script Nodes.

Creating a RAG Pipeline with a Script Node

A RAG pipeline enhances a generative model's responses by including relevant contextual information from external sources. We'll build a pipeline consisting of the following:

  1. Input Node (to receive the query)

  2. Search Asset Node (using Google Search)

  3. Script Node (to structure search results into a prompt)

  4. Text Generation (LLM) Node (OpenAI's GPT-4o Mini for text generation)

  5. Output Node

For more information about pipelines, please refer to this guide.

Intermediate Representations

Before building the script, it's crucial to understand the expected input and output formats for aiXplain nodes.

Example Node Input Format

Each node input or output follows a standardized dictionary structure:

{
"index": 0, # index in the list
"success": True, # whether the segment was processed successfully or not
"input_type": "audio", # input type
"is_url": true, # is the data stored in a URL?
"details": {}, # detail information
"input_segment_info": [], # Information on how the input to the segment looks like
"attributes": {
"data": "NODE_IO_DATA_HERE",
} # output parameters of the node
}

Example 1: Question Input to the Pipeline

This is how a typical input node might receive a user query:

question_data = [{
"index": 0,
"success": True,
"input_type": "text",
"is_url": False,
"details": {},
"input_segment_info": [],
"attributes": {
"input": "Who is the author of 'Sargento Getúlio'"
}
}]

Example 2: Google Search Output to Script Node

After querying a search model, the output contains a list of passages (web results) that are similar to the original query:

context_data = [{
"index": 0,
"success": True,
"input_type": "text",
"is_url": False,
"details": [
{'score': 0, 'document': ''},
{
'score': 1,
'data': 'Sergeant Getulio. Theatrical release poster. Directed by, Hermanno Penna. Written by, Flávio Porto Hermanno Penna. Based on, Sargento Getúlio by João Ubaldo ...',
'document': 'https://en.wikipedia.org/wiki/Sergeant_Getulio'
},
{
'score': 2,
'data': 'Amazon.com: Sargento Getulio: 9783518394359: ribeiro-jo-o-ubaldo: Books. ... just a fantastic read, an Author worth following...next time in Portuguise ...',
'document': 'https://www.amazon.com/Sargento-Getulio-ribeiro-jo-ubaldo/dp/3518394355'
},
{
'score': 3,
'data': 'Amazon.com: Sargento Getúlio: 9783803127068: Ribeiro, Joao Ubaldo: Books.',
'document': 'https://www.amazon.com/Sargento-Get%C3%BAlio/dp/3803127068'
},
{
'score': 4,
'data': 'Top cast13 · Director. Hermanno Penna · Writers · Hermanno Penna · Flávio Porto · João Ubaldo Ribeiro · All cast & crew · Production, box office & more at IMDbPro ...',
'document': 'https://www.imdb.com/title/tt0130995/'
},
{
'score': 5,
'data': 'João Ubaldo Ribeiro (January 23, 1941 – July 18, 2014) was a Brazilian writer, journalist, screenwriter and professor. Several of his books and short ...',
'document': 'https://en.wikipedia.org/wiki/Jo%C3%A3o_Ubaldo_Ribeiro'
},
{
'score': 6,
'data': 'Sargento Getúlio – Edição especial de 50 anos João Ubaldo Ribeiro in Portuguese ; Author. João Ubaldo Ribeiro ; Book Title. Sargento Getúlio – Edição especial de ...',
'document': 'https://www.ebay.com/itm/394707990573'
},
{
'score': 7,
'data': 'Sargento Getúlio by Ribeiro, João Ubaldo and a great selection of related books, art and collectibles available now at AbeBooks.com.',
'document': 'https://www.abebooks.com/book-search/title/sargento-getulio/first-edition/'
},
{
'score': 8,
'data': 'Sergeant Getúlio by João Ubaldo Ribeiro. Sergeant Getúlio. Published January 1st 1984 by Avon Books ; Sargento Getúlio by João Ubaldo Ribeiro. Sargento Getúlio.',
'document': 'https://www.goodreads.com/work/editions/300244-sargento-get-lio'
},
{
'score': 9,
'data': "Details. Bookseller: Turtle Creek Books CA (CA); Bookseller's Inventory #: 095188; Title: Sargento Getulio; Author: Ribeiro, Joao Ubaldo; Format/Binding ...",
'document': 'https://www.biblio.com/book/sargento-getulio-ribeiro-joao-ubaldo/d/781088222?srsltid=AfmBOoo_aOde6ULsSIW0SF7PnNYssTZVRLP431mX1nmnIHqeRxP8f6OB'
}
],
"input_segment_info": [],
"attributes": {"data": "Sergeant Getulio. Theatrical release poster. Directed by, Hermanno Penna. Written by, Flávio Porto Hermanno Penna. Based on, Sargento Getúlio by João Ubaldo ..."},
}]

The details field here contains the top-k web results, sorted by relevance (score). This is where the Script Node focuses when generating a prompt.

Developing the Script Node Logic

Now that we understand the intermediate data structure used, let’s create a Script Node that prepares a prompt for a language generation model.

This script will take in:

  • A user question (from the Input Node)

  • A list of related documents (from the Google Search Node)

It will then format these into a structured prompt that can be passed to a text generation (LLM) node.

PROMPT = """Based on the context, answer the question.

Context:
<<CONTEXT>>

Question:
<<QUESTION>>

Answer:"""

def main(question_data, context_data):
# prepare question
try:
question = question_data[0]["attributes"]["data"]
except Exception:
question = question_data[0]["attributes"]["input"]

# prepare context
context = "\n".join([f"Document {i+1}: {d['data']}" for i, d in enumerate(context_data[0]["details"][1:])])

# prepare prompt
prompt = PROMPT.replace("<<QUESTION>>", question).replace("<<CONTEXT>>", context)

# prepare response
output_response = [{
"index": 0,
"success": True,
"input_type": "text",
"is_url": False,
"details": {},
"input_segment_info": [],
"attributes": { "data": prompt },
}]
return output_response

Before adding this to the pipeline, you can test the script with mocked inputs:

response = main(question_data=question_data, context_data=context_data)
response
Show output

Now that the script is ready, let’s write it to a file so it can be attached to your pipeline.

script = """def main(question_data, context_data):
PROMPT = \"\"\"Based on the context, answer the question.

Context:
<<CONTEXT>>

Question:
<<QUESTION>>

Answer:\"\"\"
# prepare question
try:
question = question_data[0]["attributes"]["data"]
except Exception:
question = question_data[0]["attributes"]["input"]

# prepare context
context = "\\n".join([f"Document {i+1}: {d['data']}" for i, d in enumerate(context_data[0]["details"][1:])])

# prepare prompt
prompt = PROMPT.replace("<<QUESTION>>", question).replace("<<CONTEXT>>", context)

# prepare response
output_response = [{
"index": 0,
"success": True,
"input_type": "text",
"is_url": False,
"details": {},
"input_segment_info": [],
"attributes": { "data": prompt },
}]
return output_response"""
with open("script.py", "w") as f:
f.write(script)

Creating the Pipeline

Now that we have the Script Node logic, lets create the pipeline.

Initialize the Pipeline

from aixplain.enums import DataType
from aixplain.factories import PipelineFactory
from aixplain.modules import Pipeline

pipeline = PipelineFactory.init(name="RAG Pipeline")

Add Pipeline Nodes

Input Node

question_input = pipeline.input()
question_input.label = "QuestionInput"

Google Search Node

Add the Google Search Utility as an asset node to retrieve contextually relevant web content based on the user's query.

GOOGLE_SERP_ASSET = "65c51c556eb563350f6e1bb1"
search_node = pipeline.search(asset_id=GOOGLE_SERP_ASSET)

Script Node

note

When designing the script node, make sure the input parameters have the same name as depicted in the Python script, i.e. question_data and context_data.

script = pipeline.script(script_path="script.py")
script.label = "ContextGeneratorScript"
script.inputs.create_param(code="question_data", data_type=DataType.TEXT)
script.inputs.create_param(code="context_data", data_type=DataType.TEXT)
script.outputs.create_param(code="data", data_type=DataType.TEXT)

Text Generation (LLM) Node

Use OpenAI's GPT 4o Minias the text generation model to produce responses based on the formatted prompt.

OPENAI_GPT4O_MINI_ASSET = "669a63646eb56306647e1091"
llm_node = pipeline.text_generation(asset_id=OPENAI_GPT4O_MINI_ASSET)
# Question Input -> Search
question_input.outputs.input.link(search_node.inputs.text)
# Question Input -> Script
question_input.outputs.input.link(script.inputs.question_data)
# Search -> Script
search_node.outputs.data.link(script.inputs.context_data)
# Script -> LLM
script.outputs.data.link(llm_node.inputs.text)
# LLM -> Output
llm_node.use_output("data")

Save the Pipeline

pipeline.save(save_as_asset=True)

Run the Pipeline

response = pipeline.run("Who is the author of 'Sargento Getulio'?", **{ "version": "3.0" })
response
Show output

Delete the Pipeline (Optional)

If the pipeline is no longer needed, it can be deleted.

pipeline.delete()

Script Nodes provide a flexible way to incorporate custom processing within aiXplain pipelines. They allow developers to:

  • Modify intermediate outputs.
  • Format and structure data before passing it to the next node.
  • Implement custom logic within AI-driven workflows.

By leveraging Script Nodes, you can enhance the capabilities of aiXplain pipelines for a variety of use cases.