Custom models - step 1 (structure)
Onboarding a custom model to aiXplain requires structuring the model according to the aiXplain standard and uploading it to the platform - the following guide details how to organize your custom model to match the aiXplain standard. If you have completed the model implementation and are ready to upload the model for hosting on our platform, proceed to Onboarding: Custom Models - Upload (2).
1.1 Model Directory Structure
All your implementations should take place in a model directory containing a model.py
file and optional bash, requirements, model artifact, and additional dependency files:
src
│ model.py
│ bash.sh [Optional]
│ requirements.txt [Optional]
| model_artifacts [Optional]
| Additional files [Optional]
1.1.1 Model Artifacts (optional, depending on the model)
Most non-trivial models contain files holding weights and other metadata that define the model's state. These should all be placed in a single directory with the name MODEL_NAME
, which is the unique name you will use to refer to your model in the model.py
file.
For example, a possible MODEL_NAME
for onboarding Hugging Face's Meta-Llama2 7B would be llama-2-7b-hf
:
src
│ model.py
│ requirements.txt
| llama-2-7b-hf
| weights1.safetensor
| weights2.safetensor
| ...
The contents of this directory should be loaded into machine memory via the model.py
's load
function.
1.1.2 Implementing model.py
The steps are organised as follows:
- 1.1.2.1 Imports
- 1.1.2.2 The
load
function - 1.1.2.3 The
run
function - 1.1.2.4 Additional functions
- 1.1.2.5 Starting the Model Server
The model.py
file should contain an implementation of your model as an instance of an aiXplain function-based model class, as listed in function_model.py
in the model-interfaces
repository. Use the model class that matches your model's function (e.g. TextGenerationModel
for a Text Generation model).
1.1.2.1 Imports
The first step is to import all the necessary interfaces and input/ouput schemas associated with your model class. For example, if your model is a text generation model, this step may look like the following:
# Interface and schemas imports
from aixplain.model_interfaces.interfaces.function_models import (
TextGenerationChatModel,
TextGenerationPredictInput,
TextGenerationRunModelOutput,
TextGenerationTokenizeOutput,
TextGenerationChatTemplatizeInput
)
from aixplain.model_interfaces.schemas.function.function_input import TextGenerationInput
from aixplain.model_interfaces.schemas.function.function_output import TextGenerationOutput
from aixplain.model_interfaces.schemas.modality.modality_input import TextListInput
from aixplain.model_interfaces.schemas.modality.modality_output import TextListOutput
# MISCELLANEOUS ADDITIONAL IMPORTS
# MISCELLANEOUS ENVIRONMENT VARIABLES
All interfaces and schemas are available via the aixplain.model_interfaces
package, which can be installed as an extra dependency to the main aiXplain SDK via pip.
pip install aixplain[model-builder]
1.1.2.2 The load
function
The load
function is one of two functions that must be implemented in every model class. The other is the run_model
function.
Implement the load
function to load all model artifacts from the model directory specified in MODEL_NAME
. The model artifacts loaded here can be used by the model during prediction time, i.e. by executing run_model
. Importantly, two instance variables must be set to correctly implement the function: self.model
and self.ready
.
self.model
must be set to an instantiated instance of your model.self.ready
must be set toTrue
once loading is finished. You may use any number of helper functions to implement this.
Here is an example for a text generation model:
def load(self):
model_file = os.path.join(MODEL_DIR, "openai-community--gpt2")
self.model = self.load_model(model_file) # self.model instantiated.
self.tokenizer = AutoTokenizer.from_pretrained(model_file)
torch_dtype = TORCH_DTYPE_MAP[MODEL_TORCH_DTYPE]
self.pipeline = transformers.pipeline(
"text-generation",
model=self.model,
tokenizer=self.tokenizer,
torch_dtype=torch_dtype,
trust_remote_code=True
)
self.ready = True # The model is now ready.
def load_model(self, model_file):
return AutoModelForCausalLM.from_pretrained(model_file, device_map='auto')
1.1.2.3 The run_model
function
The run_model
function contains all the logic for running the instantiated model on the list of inputs. Most importantly, all input and output schemas for the specific model's class must be followed. For a text generation model, this means implementing a function that takes in a list of TextGenerationInput
values and outputs a list of TextGenerationOutput
values:
def run_model(self, api_input: List[TextGenerationInput], headers: Dict[str, str] = None) -> List[TextGenerationOutput]:
generated_instances = []
for instance in api_input:
generation_config = {
"max_new_tokens": instance.max_new_tokens,
"do_sample": True,
"top_p": instance.top_p,
"top_k": instance.top_k,
"num_return_sequences": instance.num_return_sequences
}
sequences = self.pipeline(
instance.data,
eos_token_id=self.tokenizer.eos_token_id,
**generation_config
)
output = {
"data": str(sequences[0]["generated_text"])
}
generated_instances.append(TextGenerationOutput(**output))
return generated_instances
1.1.2.4 Additional functions
Some model classes may require additional functions in the model.py
file.
Example: The tokenize
and templatize
functions
An additional tokenize
function is required for text generation models to calculate the input size correctly. Chat-specific text generation models must also implement templatize
, which takes all inputs and formats them to a correct template before model inference. Both functions must follow their specific interfaces as specified in the model-interfaces
repository. The sample implementation at the end of this section includes an example of the tokenize
and templatize
functions.
1.1.2.5 Starting the Model Server
Finally, add a main
method to the end of the model file to start the server. This script will call the model's load
function before starting the KServe ModelServer. Below is what a main
method can look like for our GPT2 model example.
if __name__ == "__main__":
model = GPT2_Model_Chat(MODEL_NAME, USE_PEFT_LORA)
model.load()
kserve.ModelServer().start([model])
Here is the full version of an example model.py
file for a GPT-2 model:
1.1.3 The System and Python Requirements Files
Include any scripts in a bash.sh
file for running system-level installations, and specify necessary Python packages in a requirements.txt
file.
If your local environment already contains all the necessary model packages, you can run the following command to generate requirements.txt
:
pip freeze >> requirements.txt
Otherwise, first, install all the requirements in your environment, then run the command.
1.2 Testing the Model Locally
Run your model with the following command:
MODEL_DIR=<path/to/model_artifacts_dir> MODEL_NAME=<model_name> python -m model
This command should spawn a local server that can run inference requests. Run inference requests to the server using either the command line or a Python script to ensure that the model is working as expected. Once the model is verified as working, it is ready for uploading.