Script Nodes
Script Nodes allow users to integrate custom Python scripts into aiXplain pipelines. They are particularly useful when:
- Modifying or transforming data between pipeline nodes.
- Implementing custom logic not covered by built-in nodes.
- Formatting outputs to meet specific requirements.
This guide walks through creating a Retrieval Augmented Generation (RAG) based pipeline using Script Nodes.
Creating a RAG Pipeline with a Script Node
A RAG pipeline enhances a generative model's responses by including relevant contextual information from external sources. We'll build a pipeline consisting of the following:
-
Input Node (to receive the query)
-
Search Asset Node (using Google Search)
-
Script Node (to structure search results into a prompt)
-
Text Generation (LLM) Node (OpenAI's GPT-4o Mini for text generation)
-
Output Node
For more information about pipelines, please refer to this guide.
Intermediate Representations
Before building the script, it's crucial to understand the expected input and output formats for aiXplain nodes.
Example Node Input Format
Each node input or output follows a standardized dictionary structure:
{
"index": 0, # index in the list
"success": True, # whether the segment was processed successfully or not
"input_type": "audio", # input type
"is_url": true, # is the data stored in a URL?
"details": {}, # detail information
"input_segment_info": [], # Information on how the input to the segment looks like
"attributes": {
"data": "NODE_IO_DATA_HERE",
} # output parameters of the node
}
Example 1: Question Input to the Pipeline
This is how a typical input node might receive a user query:
question_data = [{
"index": 0,
"success": True,
"input_type": "text",
"is_url": False,
"details": {},
"input_segment_info": [],
"attributes": {
"input": "Who is the author of 'Sargento Getúlio'"
}
}]
Example 2: Google Search Output to Script Node
After querying a search model, the output contains a list of passages (web results) that are similar to the original query:
context_data = [{
"index": 0,
"success": True,
"input_type": "text",
"is_url": False,
"details": [
{'score': 0, 'document': ''},
{
'score': 1,
'data': 'Sergeant Getulio. Theatrical release poster. Directed by, Hermanno Penna. Written by, Flávio Porto Hermanno Penna. Based on, Sargento Getúlio by João Ubaldo ...',
'document': 'https://en.wikipedia.org/wiki/Sergeant_Getulio'
},
{
'score': 2,
'data': 'Amazon.com: Sargento Getulio: 9783518394359: ribeiro-jo-o-ubaldo: Books. ... just a fantastic read, an Author worth following...next time in Portuguise ...',
'document': 'https://www.amazon.com/Sargento-Getulio-ribeiro-jo-ubaldo/dp/3518394355'
},
{
'score': 3,
'data': 'Amazon.com: Sargento Getúlio: 9783803127068: Ribeiro, Joao Ubaldo: Books.',
'document': 'https://www.amazon.com/Sargento-Get%C3%BAlio/dp/3803127068'
},
{
'score': 4,
'data': 'Top cast13 · Director. Hermanno Penna · Writers · Hermanno Penna · Flávio Porto · João Ubaldo Ribeiro · All cast & crew · Production, box office & more at IMDbPro ...',
'document': 'https://www.imdb.com/title/tt0130995/'
},
{
'score': 5,
'data': 'João Ubaldo Ribeiro (January 23, 1941 – July 18, 2014) was a Brazilian writer, journalist, screenwriter and professor. Several of his books and short ...',
'document': 'https://en.wikipedia.org/wiki/Jo%C3%A3o_Ubaldo_Ribeiro'
},
{
'score': 6,
'data': 'Sargento Getúlio – Edição especial de 50 anos João Ubaldo Ribeiro in Portuguese ; Author. João Ubaldo Ribeiro ; Book Title. Sargento Getúlio – Edição especial de ...',
'document': 'https://www.ebay.com/itm/394707990573'
},
{
'score': 7,
'data': 'Sargento Getúlio by Ribeiro, João Ubaldo and a great selection of related books, art and collectibles available now at AbeBooks.com.',
'document': 'https://www.abebooks.com/book-search/title/sargento-getulio/first-edition/'
},
{
'score': 8,
'data': 'Sergeant Getúlio by João Ubaldo Ribeiro. Sergeant Getúlio. Published January 1st 1984 by Avon Books ; Sargento Getúlio by João Ubaldo Ribeiro. Sargento Getúlio.',
'document': 'https://www.goodreads.com/work/editions/300244-sargento-get-lio'
},
{
'score': 9,
'data': "Details. Bookseller: Turtle Creek Books CA (CA); Bookseller's Inventory #: 095188; Title: Sargento Getulio; Author: Ribeiro, Joao Ubaldo; Format/Binding ...",
'document': 'https://www.biblio.com/book/sargento-getulio-ribeiro-joao-ubaldo/d/781088222?srsltid=AfmBOoo_aOde6ULsSIW0SF7PnNYssTZVRLP431mX1nmnIHqeRxP8f6OB'
}
],
"input_segment_info": [],
"attributes": {"data": "Sergeant Getulio. Theatrical release poster. Directed by, Hermanno Penna. Written by, Flávio Porto Hermanno Penna. Based on, Sargento Getúlio by João Ubaldo ..."},
}]
The details
field here contains the top-k web results, sorted by relevance (score
). This is where the Script Node focuses when generating a prompt.
Developing the Script Node Logic
Now that we understand the intermediate data structure used, let’s create a Script Node that prepares a prompt for a language generation model.
This script will take in:
-
A user question (from the Input Node)
-
A list of related documents (from the Google Search Node)
It will then format these into a structured prompt that can be passed to a text generation (LLM) node.
PROMPT = """Based on the context, answer the question.
Context:
<<CONTEXT>>
Question:
<<QUESTION>>
Answer:"""
def main(question_data, context_data):
# prepare question
try:
question = question_data[0]["attributes"]["data"]
except Exception:
question = question_data[0]["attributes"]["input"]
# prepare context
context = "\n".join([f"Document {i+1}: {d['data']}" for i, d in enumerate(context_data[0]["details"][1:])])
# prepare prompt
prompt = PROMPT.replace("<<QUESTION>>", question).replace("<<CONTEXT>>", context)
# prepare response
output_response = [{
"index": 0,
"success": True,
"input_type": "text",
"is_url": False,
"details": {},
"input_segment_info": [],
"attributes": { "data": prompt },
}]
return output_response
Before adding this to the pipeline, you can test the script with mocked inputs:
response = main(question_data=question_data, context_data=context_data)
response
Now that the script is ready, let’s write it to a file so it can be attached to your pipeline.
script = """def main(question_data, context_data):
PROMPT = \"\"\"Based on the context, answer the question.
Context:
<<CONTEXT>>
Question:
<<QUESTION>>
Answer:\"\"\"
# prepare question
try:
question = question_data[0]["attributes"]["data"]
except Exception:
question = question_data[0]["attributes"]["input"]
# prepare context
context = "\\n".join([f"Document {i+1}: {d['data']}" for i, d in enumerate(context_data[0]["details"][1:])])
# prepare prompt
prompt = PROMPT.replace("<<QUESTION>>", question).replace("<<CONTEXT>>", context)
# prepare response
output_response = [{
"index": 0,
"success": True,
"input_type": "text",
"is_url": False,
"details": {},
"input_segment_info": [],
"attributes": { "data": prompt },
}]
return output_response"""
with open("script.py", "w") as f:
f.write(script)
Creating the Pipeline
Now that we have the Script Node logic, lets create the pipeline.
Initialize the Pipeline
from aixplain.enums import DataType
from aixplain.factories import PipelineFactory
from aixplain.modules import Pipeline
pipeline = PipelineFactory.init(name="RAG Pipeline")
Add Pipeline Nodes
Input Node
question_input = pipeline.input()
question_input.label = "QuestionInput"
Google Search Node
Add the Google Search Utility as an asset node to retrieve contextually relevant web content based on the user's query.
GOOGLE_SERP_ASSET = "65c51c556eb563350f6e1bb1"
search_node = pipeline.search(asset_id=GOOGLE_SERP_ASSET)
Script Node
When designing the script node, make sure the input parameters have the same name as depicted in the Python script, i.e. question_data
and context_data
.
script = pipeline.script(script_path="script.py")
script.label = "ContextGeneratorScript"
script.inputs.create_param(code="question_data", data_type=DataType.TEXT)
script.inputs.create_param(code="context_data", data_type=DataType.TEXT)
script.outputs.create_param(code="data", data_type=DataType.TEXT)
Text Generation (LLM) Node
Use OpenAI's GPT 4o Minias the text generation model to produce responses based on the formatted prompt.
OPENAI_GPT4O_MINI_ASSET = "669a63646eb56306647e1091"
llm_node = pipeline.text_generation(asset_id=OPENAI_GPT4O_MINI_ASSET)
Link the Nodes
# Question Input -> Search
question_input.outputs.input.link(search_node.inputs.text)
# Question Input -> Script
question_input.outputs.input.link(script.inputs.question_data)
# Search -> Script
search_node.outputs.data.link(script.inputs.context_data)
# Script -> LLM
script.outputs.data.link(llm_node.inputs.text)
# LLM -> Output
llm_node.use_output("data")
Save the Pipeline
pipeline.save(save_as_asset=True)
Run the Pipeline
response = pipeline.run("Who is the author of 'Sargento Getulio'?", **{ "version": "3.0" })
response
Delete the Pipeline (Optional)
If the pipeline is no longer needed, it can be deleted.
pipeline.delete()
Script Nodes provide a flexible way to incorporate custom processing within aiXplain pipelines. They allow developers to:
- Modify intermediate outputs.
- Format and structure data before passing it to the next node.
- Implement custom logic within AI-driven workflows.
By leveraging Script Nodes, you can enhance the capabilities of aiXplain pipelines for a variety of use cases.