Custom models - step 1 (structure)

Onboarding a custom model to aiXplain requires structuring the model according to the aiXplain standard and uploading it to the platform - the following guide details how to organize your custom model to match the aiXplain standard. If you have completed the model implementation and are ready to upload the model for hosting on our platform, proceed to Onboarding: Custom Models - Upload (2).

1.1 Model Directory Structure

All your implementations should take place in a model directory containing a model.py file and optional bash, requirements, model artifact, and additional dependency files:

src
│   model.py
│   bash.sh [Optional]
│   requirements.txt [Optional]
|   model_artifacts [Optional]
|   Additional files [Optional]

1.1.1 Model Artifacts (optional, depending on the model)

Most non-trivial models contain files holding weights and other metadata that define the model's state. These should all be placed in a single directory with the name MODEL_NAME, which is the unique name you will use to refer to your model in the model.py file.

info

For example, a possible MODEL_NAME for onboarding Hugging Face's Meta-Llama2 7B would be llama-2-7b-hf:

src
│   model.py
│   requirements.txt
|   llama-2-7b-hf
    |   weights1.safetensor
    |   weights2.safetensor
    |   ...

note

The contents of this directory should be loaded into machine memory via the model.py's load function.

1.1.2 Implementing `model.py`

The steps are organised as follows:

1.1.2.1 Imports
1.1.2.2 The load function
1.1.2.3 The run function
1.1.2.4 Additional functions
1.1.2.5 Starting the Model Server

The model.py file should contain an implementation of your model as an instance of an aiXplain function-based model class, as listed in function_model.py in the model-interfaces repository. Use the model class that matches your model's function (e.g. TextGenerationModel for a Text Generation model).

1.1.2.1 Imports

The first step is to import all the necessary interfaces and input/ouput schemas associated with your model class. For example, if your model is a text generation model, this step may look like the following:

# Interface and schemas imports
from aixplain.model_interfaces.interfaces.function_models import (
    TextGenerationChatModel,
    TextGenerationPredictInput,
    TextGenerationRunModelOutput,
    TextGenerationTokenizeOutput,
    TextGenerationChatTemplatizeInput
)
from aixplain.model_interfaces.schemas.function.function_input import TextGenerationInput
from aixplain.model_interfaces.schemas.function.function_output import TextGenerationOutput
from aixplain.model_interfaces.schemas.modality.modality_input import TextListInput
from aixplain.model_interfaces.schemas.modality.modality_output import TextListOutput

# MISCELLANEOUS ADDITIONAL IMPORTS
# MISCELLANEOUS ENVIRONMENT VARIABLES

note

All interfaces and schemas are available via the aixplain.model_interfaces package, which can be installed as an extra dependency to the main aiXplain SDK via pip.

pip install aixplain[model-builder]

1.1.2.2 The `load` function

The load function is one of two functions that must be implemented in every model class. The other is the run_model function.

Implement the load function to load all model artifacts from the model directory specified in MODEL_NAME. The model artifacts loaded here can be used by the model during prediction time, i.e. by executing run_model. Importantly, two instance variables must be set to correctly implement the function: self.model and self.ready.

self.model must be set to an instantiated instance of your model.
self.ready must be set to True once loading is finished. You may use any number of helper functions to implement this.

Here is an example for a text generation model:

def load(self):
    model_file = os.path.join(MODEL_DIR, "openai-community--gpt2")
    self.model = self.load_model(model_file) # self.model instantiated.
    self.tokenizer = AutoTokenizer.from_pretrained(model_file)
    torch_dtype = TORCH_DTYPE_MAP[MODEL_TORCH_DTYPE]
    self.pipeline = transformers.pipeline(
        "text-generation",
        model=self.model,
        tokenizer=self.tokenizer,
        torch_dtype=torch_dtype,
        trust_remote_code=True
    )
    self.ready = True # The model is now ready.

def load_model(self, model_file):
    return AutoModelForCausalLM.from_pretrained(model_file, device_map='auto')

1.1.2.3 The `run_model` function

The run_model function contains all the logic for running the instantiated model on the list of inputs. Most importantly, all input and output schemas for the specific model's class must be followed. For a text generation model, this means implementing a function that takes in a list of TextGenerationInput values and outputs a list of TextGenerationOutput values:

def run_model(self, api_input: List[TextGenerationInput], headers: Dict[str, str] = None) -> List[TextGenerationOutput]:
    generated_instances = []
    for instance in api_input:
        generation_config = {
            "max_new_tokens": instance.max_new_tokens,
            "do_sample": True,
            "top_p": instance.top_p,
            "top_k": instance.top_k,
            "num_return_sequences": instance.num_return_sequences
        }
        sequences = self.pipeline(
            instance.data,
            eos_token_id=self.tokenizer.eos_token_id,
            **generation_config
        )
        output = {
            "data": str(sequences[0]["generated_text"])
        }
        generated_instances.append(TextGenerationOutput(**output))
    return generated_instances

1.1.2.4 Additional functions

Some model classes may require additional functions in the model.py file.

Example: The `tokenize` and `templatize` functions

An additional tokenize function is required for text generation models to calculate the input size correctly. Chat-specific text generation models must also implement templatize, which takes all inputs and formats them to a correct template before model inference. Both functions must follow their specific interfaces as specified in the model-interfaces repository. The sample implementation at the end of this section includes an example of the tokenize and templatize functions.

1.1.2.5 Starting the Model Server

Finally, add a main method to the end of the model file to start the server. This script will call the model's load function before starting the KServe ModelServer. Below is what a main method can look like for our GPT2 model example.

if __name__ == "__main__":
    model = GPT2_Model_Chat(MODEL_NAME, USE_PEFT_LORA)
    model.load()
    kserve.ModelServer().start([model])

Here is the full version of an example model.py file for a GPT-2 model:

Show code

1.1.3 The System and Python Requirements Files

Include any scripts in a bash.sh file for running system-level installations, and specify necessary Python packages in a requirements.txt file.

tip

If your local environment already contains all the necessary model packages, you can run the following command to generate requirements.txt:

pip freeze >> requirements.txt

Otherwise, first, install all the requirements in your environment, then run the command.

1.2 Testing the Model Locally

Run your model with the following command:

MODEL_DIR=<path/to/model_artifacts_dir> MODEL_NAME=<model_name> python -m model

This command should spawn a local server that can run inference requests. Run inference requests to the server using either the command line or a Python script to ensure that the model is working as expected. Once the model is verified as working, it is ready for uploading.

1.1 Model Directory Structure​

1.1.1 Model Artifacts (optional, depending on the model)​

1.1.2 Implementing model.py​

1.1.2.1 Imports​

1.1.2.2 The load function​

1.1.2.3 The run_model function​

1.1.2.4 Additional functions​

Example: The tokenize and templatize functions​

1.1.2.5 Starting the Model Server​

1.1.3 The System and Python Requirements Files​

1.2 Testing the Model Locally​