Custom Model Onboarding
Overview
Onboarding a custom model to aiXplain involves two major steps: structuring the model according to aiXplain standards and uploading it to the platform. This guide combines both steps into a streamlined process.
Step 1: Structure the Model
1.1 Model Directory Structure
All your implementations should take place in a model directory containing a model.py
file and optional bash, requirements, model artifact, and additional dependency files:
src
│ model.py
│ bash.sh [Optional]
│ requirements.txt [Optional]
| model_artifacts [Optional]
| Additional files [Optional]
1.1.1 Model Artifacts (optional, depending on the model)
Most non-trivial models contain files holding weights and other metadata that define the model's state. These should all be placed in a single directory with the name MODEL_NAME
, which is the unique name you will use to refer to your model in the model.py
file.
For example, a possible MODEL_NAME
for onboarding Hugging Face's Meta-Llama2 7B would be llama-2-7b-hf
:
src
│ model.py
│ requirements.txt
| llama-2-7b-hf
| weights1.safetensor
| weights2.safetensor
| ...
The contents of this directory should be loaded into machine memory via the model.py
's load
function.
1.1.2 Implementing model.py
The steps are organised as follows:
- 1.1.2.1 Imports
- 1.1.2.2 The
load
function - 1.1.2.3 The
run
function - 1.1.2.4 Additional functions
- 1.1.2.5 Starting the Model Server
The model.py
file should contain an implementation of your model as an instance of an aiXplain function-based model class, as listed in function_model.py
in the model-interfaces
repository. Use the model class that matches your model's function (e.g. TextGenerationModel
for a Text Generation model).
1.1.2.1 Imports
The first step is to import all the necessary interfaces and input/ouput schemas associated with your model class. For example, if your model is a text generation model, this step may look like the following:
# Interface and schemas imports
from aixplain.model_interfaces.interfaces.function_models import (
TextGenerationChatModel,
TextGenerationPredictInput,
TextGenerationRunModelOutput,
TextGenerationTokenizeOutput,
TextGenerationChatTemplatizeInput
)
from aixplain.model_interfaces.schemas.function.function_input import TextGenerationInput
from aixplain.model_interfaces.schemas.function.function_output import TextGenerationOutput
from aixplain.model_interfaces.schemas.modality.modality_input import TextListInput
from aixplain.model_interfaces.schemas.modality.modality_output import TextListOutput
# MISCELLANEOUS ADDITIONAL IMPORTS
# MISCELLANEOUS ENVIRONMENT VARIABLES
All interfaces and schemas are available via the aixplain.model_interfaces
package, which can be installed as an extra dependency to the main aiXplain SDK via pip.
pip install aixplain[model-builder]
1.1.2.2 The load
function
The load
function is one of two functions that must be implemented in every model class. The other is the run_model
function.
Implement the load
function to load all model artifacts from the model directory specified in MODEL_NAME
. The model artifacts loaded here can be used by the model during prediction time, i.e. by executing run_model
. Importantly, two instance variables must be set to correctly implement the function: self.model
and self.ready
.
self.model
must be set to an instantiated instance of your model.self.ready
must be set toTrue
once loading is finished. You may use any number of helper functions to implement this.
Here is an example for a text generation model:
def load(self):
model_file = os.path.join(MODEL_DIR, "openai-community--gpt2")
self.model = self.load_model(model_file) # self.model instantiated.
self.tokenizer = AutoTokenizer.from_pretrained(model_file)
torch_dtype = TORCH_DTYPE_MAP[MODEL_TORCH_DTYPE]
self.pipeline = transformers.pipeline(
"text-generation",
model=self.model,
tokenizer=self.tokenizer,
torch_dtype=torch_dtype,
trust_remote_code=True
)
self.ready = True # The model is now ready.
def load_model(self, model_file):
return AutoModelForCausalLM.from_pretrained(model_file, device_map='auto')
1.1.2.3 The run_model
function
The run_model
function contains all the logic for running the instantiated model on the list of inputs. Most importantly, all input and output schemas for the specific model's class must be followed. For a text generation model, this means implementing a function that takes in a list of TextGenerationInput
values and outputs a list of TextGenerationOutput
values:
def run_model(self, api_input: List[TextGenerationInput], headers: Dict[str, str] = None) -> List[TextGenerationOutput]:
generated_instances = []
for instance in api_input:
generation_config = {
"max_new_tokens": instance.max_new_tokens,
"do_sample": True,
"top_p": instance.top_p,
"top_k": instance.top_k,
"num_return_sequences": instance.num_return_sequences
}
sequences = self.pipeline(
instance.data,
eos_token_id=self.tokenizer.eos_token_id,
**generation_config
)
output = {
"data": str(sequences[0]["generated_text"])
}
generated_instances.append(TextGenerationOutput(**output))
return generated_instances
1.1.2.4 Additional functions
Some model classes may require additional functions in the model.py
file.
Example: The tokenize
and templatize
functions
An additional tokenize
function is required for text generation models to calculate the input size correctly. Chat-specific text generation models must also implement templatize
, which takes all inputs and formats them to a correct template before model inference. Both functions must follow their specific interfaces as specified in the model-interfaces
repository. The sample implementation at the end of this section includes an example of the tokenize
and templatize
functions.
1.1.2.5 Starting the Model Server
Finally, add a main
method to the end of the model file to start the server. This script will call the model's load
function before starting the KServe ModelServer. Below is what a main
method can look like for our GPT2 model example.
if __name__ == "__main__":
model = GPT2_Model_Chat(MODEL_NAME, USE_PEFT_LORA)
model.load()
kserve.ModelServer().start([model])
Here is the full version of an example model.py
file for a GPT-2 model:
1.1.3 The System and Python Requirements Files
Include any scripts in a bash.sh
file for running system-level installations, and specify necessary Python packages in a requirements.txt
file.
If your local environment already contains all the necessary model packages, you can run the following command to generate requirements.txt
:
pip freeze >> requirements.txt
Otherwise, first, install all the requirements in your environment, then run the command.
1.2 Testing the Model Locally
Run your model with the following command:
MODEL_DIR=<path/to/model_artifacts_dir> MODEL_NAME=<model_name> python -m model
This command should spawn a local server that can run inference requests. Run inference requests to the server using either the command line or a Python script to ensure that the model is working as expected. Once the model is verified as working, it is ready for uploading.
Step 2: Upload the Model
Onboarding a custom model to aiXplain requires structuring the model according to the aiXplain standard and uploading it to the platform - the following guide details how to upload your already implemented model using a Docker image. If you have yet to implement your model according to the aiXplain standard, begin with Step 1: Structure the Model.
Model uploading also requires an aiXplain account and a TEAM_API_KEY
, which should be set either as an environment variable or passed into each of the CLI commands below.
2.1 Preparing the Repository
For any of the CLI commands, running aixplain [verb] [resource] --help
will display a description of each argument that should be passed into that command. Use the api-key
parameter if the TEAM_API_KEY
environment variable isn't set or you would like to override the existing environment variable.
2.1.1 Register Model and Create Image Repository
Register your model and create an image repository using the command below, which returns a model ID and a repository name.
aixplain create image-repo --name <model_name> --description <model_description> --function <function_name> --source-language <source_language> --input-modality <input_type> --output-modality <output_type> --documentation-url <information_url> [--api-key <TEAM_API_KEY>]
name
: Your model's namedescription
: A short summary of your model's purposefunction
: The function name corresponding to the model class used in your model implementationsource-language
: Your model's source languageinput-modality
: The input type for your model (e.g.,text
,audio
,video
,image
)output-modality
: The output type for your model (e.g.,text
,audio
,video
,image
)documentation-url
: The URL to any additional documentation stored on a different webpage (if applicable)api-key
: Optional
Find the appropriate function name using the following command:
aixplain list functions [--verbose] [--api-key <TEAM_API_KEY>]
verbose
: Optional, set toFalse
by defaultapi-key
: Optional
2.1.2 Obtain Repository Login Credentials
Obtain login credentials for the newly created repository:
aixplain get image-repo-login [--api-key <TEAM_API_KEY>]
These credentials are valid for 12 hours, after which you must log in again for a fresh set of credentials.
2.1.3 Log In
You can use your credentials to log in using the following Docker command:
docker login --username $USERNAME --password $PASSWORD 535945872701.dkr.ecr.us-east-1.amazonaws.com
2.2 Image Building and Repository Push
Here is an example Dockerfile script you will need for building an image:
FROM python:3.8.10
RUN mkdir /code
WORKDIR /code
COPY . /code/
# Optional: Run only if your implementation has a requirements file.
RUN pip install -r --no-cache-dir requirements.txt
# Optional: Run only if your implementation has a bash file.
RUN chmod +x /code/bash.sh
RUN ./bash.sh
CMD python -m model
You can adjust the file according to your model's specific requirements.
More Dockerfile writing guidelines can be found in their official documentation guide.
2.2.1 Build an Image
Build your image.
docker build . -t $REGISTRY/$REPO_NAME:<your-choice-of-tag>
tag
: A descriptor for your specific model, usually a version tag likev0.0.1
2.2.2 Push Image to Repository
Push the newly tagged image to the corresponding repository.
docker push $REGISTRY/$REPO_NAME:<the-tag-you-chose>
2.3 Onboard the Model
Onboard the model once the image has been pushed to its repository. The following command will send an email to an aiXplain associate to finalize the onboarding process.
aixplain onboard model --model-id <model_id> --image-tag <model_image_tag> --image-hash <model_image_hash> --host-machine <host_machine_code> [--api-key <TEAM_API_KEY>]
model-id
: The model ID returned by thecreate image-repo
function used earlierimage-tag
: The string used to tag your model imagesha256
: The image's sha256 hash, obtained by runningdocker images --digests
host-machine
: The machine code on which to host the model
The following command lists the codes of all the available machines on which you can host your model:
aixplain list gpus [--api-key <TEAM_API_KEY>]
By following this guide, you can successfully structure and onboard your custom model to aiXplain.