Skip to main content

Custom Model Onboarding

Overview

Onboarding a custom model to aiXplain involves two major steps: structuring the model according to aiXplain standards and uploading it to the platform. This guide combines both steps into a streamlined process.


Step 1: Structure the Model

1.1 Model Directory Structure

All your implementations should take place in a model directory containing a model.py file and optional bash, requirements, model artifact, and additional dependency files:

src
│ model.py
│ bash.sh [Optional]
│ requirements.txt [Optional]
| model_artifacts [Optional]
| Additional files [Optional]

1.1.1 Model Artifacts (optional, depending on the model)

Most non-trivial models contain files holding weights and other metadata that define the model's state. These should all be placed in a single directory with the name MODEL_NAME, which is the unique name you will use to refer to your model in the model.py file.

info

For example, a possible MODEL_NAME for onboarding Hugging Face's Meta-Llama2 7B would be llama-2-7b-hf:

src
│ model.py
│ requirements.txt
| llama-2-7b-hf
| weights1.safetensor
| weights2.safetensor
| ...
note

The contents of this directory should be loaded into machine memory via the model.py's load function.

1.1.2 Implementing model.py

The steps are organised as follows:

  • 1.1.2.1 Imports
  • 1.1.2.2 The load function
  • 1.1.2.3 The run function
  • 1.1.2.4 Additional functions
  • 1.1.2.5 Starting the Model Server

The model.py file should contain an implementation of your model as an instance of an aiXplain function-based model class, as listed in function_model.py in the model-interfaces repository. Use the model class that matches your model's function (e.g. TextGenerationModel for a Text Generation model).

1.1.2.1 Imports

The first step is to import all the necessary interfaces and input/ouput schemas associated with your model class. For example, if your model is a text generation model, this step may look like the following:

# Interface and schemas imports
from aixplain.model_interfaces.interfaces.function_models import (
TextGenerationChatModel,
TextGenerationPredictInput,
TextGenerationRunModelOutput,
TextGenerationTokenizeOutput,
TextGenerationChatTemplatizeInput
)
from aixplain.model_interfaces.schemas.function.function_input import TextGenerationInput
from aixplain.model_interfaces.schemas.function.function_output import TextGenerationOutput
from aixplain.model_interfaces.schemas.modality.modality_input import TextListInput
from aixplain.model_interfaces.schemas.modality.modality_output import TextListOutput

# MISCELLANEOUS ADDITIONAL IMPORTS
# MISCELLANEOUS ENVIRONMENT VARIABLES
note

All interfaces and schemas are available via the aixplain.model_interfaces package, which can be installed as an extra dependency to the main aiXplain SDK via pip.

pip install aixplain[model-builder]
1.1.2.2 The load function

The load function is one of two functions that must be implemented in every model class. The other is the run_model function.

Implement the load function to load all model artifacts from the model directory specified in MODEL_NAME. The model artifacts loaded here can be used by the model during prediction time, i.e. by executing run_model. Importantly, two instance variables must be set to correctly implement the function: self.model and self.ready.

  • self.model must be set to an instantiated instance of your model.
  • self.ready must be set to True once loading is finished. You may use any number of helper functions to implement this.

Here is an example for a text generation model:

def load(self):
model_file = os.path.join(MODEL_DIR, "openai-community--gpt2")
self.model = self.load_model(model_file) # self.model instantiated.
self.tokenizer = AutoTokenizer.from_pretrained(model_file)
torch_dtype = TORCH_DTYPE_MAP[MODEL_TORCH_DTYPE]
self.pipeline = transformers.pipeline(
"text-generation",
model=self.model,
tokenizer=self.tokenizer,
torch_dtype=torch_dtype,
trust_remote_code=True
)
self.ready = True # The model is now ready.

def load_model(self, model_file):
return AutoModelForCausalLM.from_pretrained(model_file, device_map='auto')
1.1.2.3 The run_model function

The run_model function contains all the logic for running the instantiated model on the list of inputs. Most importantly, all input and output schemas for the specific model's class must be followed. For a text generation model, this means implementing a function that takes in a list of TextGenerationInput values and outputs a list of TextGenerationOutput values:

def run_model(self, api_input: List[TextGenerationInput], headers: Dict[str, str] = None) -> List[TextGenerationOutput]:
generated_instances = []
for instance in api_input:
generation_config = {
"max_new_tokens": instance.max_new_tokens,
"do_sample": True,
"top_p": instance.top_p,
"top_k": instance.top_k,
"num_return_sequences": instance.num_return_sequences
}
sequences = self.pipeline(
instance.data,
eos_token_id=self.tokenizer.eos_token_id,
**generation_config
)
output = {
"data": str(sequences[0]["generated_text"])
}
generated_instances.append(TextGenerationOutput(**output))
return generated_instances
1.1.2.4 Additional functions

Some model classes may require additional functions in the model.py file.

Example: The tokenize and templatize functions

An additional tokenize function is required for text generation models to calculate the input size correctly. Chat-specific text generation models must also implement templatize, which takes all inputs and formats them to a correct template before model inference. Both functions must follow their specific interfaces as specified in the model-interfaces repository. The sample implementation at the end of this section includes an example of the tokenize and templatize functions.

1.1.2.5 Starting the Model Server

Finally, add a main method to the end of the model file to start the server. This script will call the model's load function before starting the KServe ModelServer. Below is what a main method can look like for our GPT2 model example.

if __name__ == "__main__":
model = GPT2_Model_Chat(MODEL_NAME, USE_PEFT_LORA)
model.load()
kserve.ModelServer().start([model])

Here is the full version of an example model.py file for a GPT-2 model:

Show code

1.1.3 The System and Python Requirements Files

Include any scripts in a bash.sh file for running system-level installations, and specify necessary Python packages in a requirements.txt file.

tip

If your local environment already contains all the necessary model packages, you can run the following command to generate requirements.txt:

pip freeze >> requirements.txt

Otherwise, first, install all the requirements in your environment, then run the command.

1.2 Testing the Model Locally

Run your model with the following command:

MODEL_DIR=<path/to/model_artifacts_dir> MODEL_NAME=<model_name> python -m model

This command should spawn a local server that can run inference requests. Run inference requests to the server using either the command line or a Python script to ensure that the model is working as expected. Once the model is verified as working, it is ready for uploading.

Step 2: Upload the Model

Onboarding a custom model to aiXplain requires structuring the model according to the aiXplain standard and uploading it to the platform - the following guide details how to upload your already implemented model using a Docker image. If you have yet to implement your model according to the aiXplain standard, begin with Step 1: Structure the Model.

Model uploading also requires an aiXplain account and a TEAM_API_KEY, which should be set either as an environment variable or passed into each of the CLI commands below.

2.1 Preparing the Repository

tip

For any of the CLI commands, running aixplain [verb] [resource] --help will display a description of each argument that should be passed into that command. Use the api-key parameter if the TEAM_API_KEY environment variable isn't set or you would like to override the existing environment variable.

2.1.1 Register Model and Create Image Repository

Register your model and create an image repository using the command below, which returns a model ID and a repository name.

aixplain create image-repo --name <model_name> --description <model_description> --function <function_name> --source-language <source_language> --input-modality <input_type> --output-modality <output_type> --documentation-url <information_url>  [--api-key <TEAM_API_KEY>]
Show output
  • name: Your model's name
  • description: A short summary of your model's purpose
  • function: The function name corresponding to the model class used in your model implementation
  • source-language: Your model's source language
  • input-modality: The input type for your model (e.g., text, audio, video, image)
  • output-modality: The output type for your model (e.g., text, audio, video, image)
  • documentation-url: The URL to any additional documentation stored on a different webpage (if applicable)
  • api-key: Optional
tip

Find the appropriate function name using the following command:

aixplain list functions [--verbose] [--api-key <TEAM_API_KEY>]
Show output
  • verbose: Optional, set to False by default
  • api-key: Optional

2.1.2 Obtain Repository Login Credentials

Obtain login credentials for the newly created repository:

aixplain get image-repo-login [--api-key <TEAM_API_KEY>]
Show output

These credentials are valid for 12 hours, after which you must log in again for a fresh set of credentials.

2.1.3 Log In

You can use your credentials to log in using the following Docker command:

docker login --username $USERNAME --password $PASSWORD 535945872701.dkr.ecr.us-east-1.amazonaws.com

2.2 Image Building and Repository Push

Here is an example Dockerfile script you will need for building an image:

FROM python:3.8.10

RUN mkdir /code
WORKDIR /code
COPY . /code/

# Optional: Run only if your implementation has a requirements file.
RUN pip install -r --no-cache-dir requirements.txt

# Optional: Run only if your implementation has a bash file.
RUN chmod +x /code/bash.sh
RUN ./bash.sh

CMD python -m model

You can adjust the file according to your model's specific requirements.

info

More Dockerfile writing guidelines can be found in their official documentation guide.

2.2.1 Build an Image

Build your image.

docker build . -t $REGISTRY/$REPO_NAME:<your-choice-of-tag>
  • tag: A descriptor for your specific model, usually a version tag like v0.0.1

2.2.2 Push Image to Repository

Push the newly tagged image to the corresponding repository.

docker push $REGISTRY/$REPO_NAME:<the-tag-you-chose>

2.3 Onboard the Model

Onboard the model once the image has been pushed to its repository. The following command will send an email to an aiXplain associate to finalize the onboarding process.

aixplain onboard model --model-id <model_id> --image-tag <model_image_tag> --image-hash <model_image_hash> --host-machine <host_machine_code> [--api-key <TEAM_API_KEY>]
  • model-id: The model ID returned by the create image-repo function used earlier
  • image-tag: The string used to tag your model image
  • sha256: The image's sha256 hash, obtained by running docker images --digests
  • host-machine: The machine code on which to host the model
tip

The following command lists the codes of all the available machines on which you can host your model:

aixplain list gpus [--api-key <TEAM_API_KEY>]
Show output

By following this guide, you can successfully structure and onboard your custom model to aiXplain.