Data scientists often train models in Workspaces, Jupyter notebooks or locally, and require a seamless process to save, register and manage model versions for production use.
JFrog ML provides the Build Model SDK to address this need, simplifying the transition from model training to deployment, from your local machine or your Jupyter notebook.
Key Features
1. Build Models from Workspaces
JFrog ML Build SDK simplifies the registration of locally or Jupyter notebook-based model training. It ensures precise versioning, tracking, and effortless transition from research to deployment in production.
2. Python-driven Model Builds
Automate model builds using Python and seamlessly integrate with continuous integration/continuous deployment (CI/CD) pipelines and various automation workflows.
3. Streamlined Versioning for Pre-built Models
Build and register pre-trained models from any Python environment by supplying existing trained model instances, skipping remote build phases.
Using Build SDK
This document will explore the various options available when using the Build SDK.
In general, there are two choices when working with the Build SDK:
Providing a pre-trained model artifact to the
build_modelmethod.Omitting a pre-built model, in which case the SDK will upload local model files and builds the model on JFrog ML.
from frogml.sdk.frogml_client.client import FrogMLClient from frogml.sdk.model.tools import run_local # Creating an instance of the Frogml client client = FrogMLClient() # Triggering a build with model files from the local `main` directory # This option does not provide a pre-built model, and the model is build on JFrog platform client.build_model( model_id='my_example_model', ) # # Triggering a build with model files from the local `main` directory # # This option provides a pre-built model, and the build method will not be called remotely. model = MyFrogmlModel() model.build() client.build_model( model_id='my_example_model', prebuilt_frogml_model=model )
Folder Structure
Warning
File Structure Requirements
When using the Build SDK, your file and folder structure is preserved when uploading to JFrogML.
Please ensure to:
Avoid Python files with top-level executable statements in the build directory.
All code with side effects should be guarded with
if __name__ == "__main__"blocks.Place shared functionality in properly encapsulated classes and functions.
Failure to follow these guidelines may cause unintended code execution during the import process.
The Build SDK uploads local model files together with the trained model object. By default, the Build SDK uploads the main folder under the current file location.
Make sure to place your model files in the main directory.
-> your-model-directory ---- build_sdk_runner.py ---> main ------ model.py
It is possible to change the uploaded directory by providing an explicit path as will be described in this document.
Building Pre-trained Models
Use the Build SDK to build models with an existing instance of a trained model to upload the pre-trained model artifact. This flexibility empowers data scientists to train or fine-tune models within notebooks, effortlessly incorporate the trained versions into the model registry, and deploy them to production environments.
Creating a Model Instance
In this example, we'll use the Titanic model, which can be found on the FrogML Examples repository.
Our folder structure will look as follows:
titanic -- run_build.py -- main ---- __init__.py ---- model.py ---- requirements.txt
Note
Make sure to import from frogml.sdk.model.tools import run_local when using the build SDK. The build command cannot complete without it.
requirements.txt
pandas scikit-learn catboost
titanic/main/__init__.py
from .model import TitanicSurvivalPrediction def load_model(): return TitanicSurvivalPrediction()
titanic/main/model.py
import os
import frogml
import numpy as np
import pandas as pd
from catboost import CatBoostClassifier, Pool, cv
from catboost.datasets import titanic
from frogml import FrogMlModel
from frogml.sdk.model.schema import ExplicitFeature, InferenceOutput, ModelSchema
from sklearn.model_selection import train_test_split
class TitanicSurvivalPrediction(FrogMlModel):
def __init__(self):
loss_function = os.getenv(""loss_fn"", ""Logloss"")
learning_rate = os.getenv(""learning_rate"", None)
if learning_rate:
learning_rate = int(learning_rate)
iterations = int(os.getenv(""iterations"", 1000))
custom_loss = ""Accuracy""
self.model = CatBoostClassifier(
iterations=iterations,
custom_loss=[custom_loss],
loss_function=loss_function,
learning_rate=learning_rate,
)
def build(self):
titanic_train, _ = titanic()
titanic_train.fillna(-999, inplace=True)
x = titanic_train.drop([""Survived"", ""PassengerId""], axis=1)
y = titanic_train.Survived
x_train, x_test, y_train, y_test = train_test_split(
x, y, train_size=0.85, random_state=42
)
# mark categorical features
cate_features_index = np.where(x_train.dtypes != float)[0]
self.model.fit(
x_train,
y_train,
cat_features=cate_features_index,
eval_set=(x_test, y_test),
)
# Cross validating the model (5-fold)
cv_data = cv(
Pool(x, y, cat_features=cate_features_index),
self.model.get_params(),
fold_count=5,
)
@frogml.api()
def predict(self, df: pd.DataFrame) -> pd.DataFrame:
df = df.drop([""PassengerId""], axis=1)
return pd.DataFrame(
self.model.predict_proba(df)[:, 1], columns=[""Survived_Probability""]
)Training the Model
Let's create a new model instance and run the build method to train it.
titanic/run_build.py
from titanic.main import TitanicSurvivalPrediction # Create a new model instance model = TitanicSurvivalPrediction() # Run the build function which trains the model model.build()
Output
Learning rate set to 0.029583 0: learn: 0.6756870 test: 0.6751626 best: 0.6751626 (0) total: 66.5ms remaining: 1m 6s 1: learn: 0.6578988 test: 0.6579213 best: 0.6579213 (1) total: 69.8ms remaining: 34.8s 2: learn: 0.6427410 test: 0.6427901 best: 0.6427901 (2) total: 72.5ms remaining: 24.1s
Registering the Trained Model
Now that we have trained a model locally, we want to register this model version and save in the in FrogML model register as a new build, so we can later deploy it to production.
The code below will register a new build under the titanic_survival_prediction model, with the trained titanic model we just created and a tag: prebuilt
titanic/run_build.py
from frogml.sdk.frogml_client.client import FrogMLClient from frogml.sdk.model.tools import run_local # Creating an instance of the Frogml client client = FrogMLClient() # Triggering a build with model files from the local `main` directory client.build_model( model_id='titanic_survival_prediction', prebuilt_frogml_model=model, ## Providing a trained instance to skip remote build tags=['prebuilt'] )
Output
Fetching model code - Using given build ID - 116a6385-8bbf-41bb-b30f-d6528869fac9 Fetching model code - Found dependency type: PIP by file: main/requirements.txt Fetching model code - Successfully fetched model code Registering frogml build - 10% Registering frogml build - 20% Registering frogml build - 30% Registering frogml build - 40% Registering frogml build - 48% Registering frogml build - 50% Registering frogml build - 60% Registering frogml build - 70% Registering frogml build - 80% Registering frogml build - 90% Registering frogml build - 96% Registering frogml build - 96% Registering frogml build - 100% Registering frogml build - Start remote build - 116a6385-8bbf-41bb-b30f-d6528869fac9 Registering frogml build - Remote build started successfully Build ID 116a6385-8bbf-41bb-b30f-d6528869fac9 was triggered remotely To follow build logs using frogml platform: https://mydemo.jfrog.io/ui/ml/models/credit_risk_frogml/build/116a6385-8bbf-41bb-b30f-d6528869fac9
Build SDK configuration
The Build SDK supports a multitude of parameters which users may configure
Description | Required | Default Value | Description |
|---|---|---|---|
| Yes | Model ID on the JFrog platform | |
| No | "main" | Path to the local folder where model files exists |
| No | Path to a Python dependencies file, in pip, poetry or conda format. | |
| No | List of strict Python dependencies | |
| No | List of tags saved on the remote model build | |
| No | "small" | Instance type during mode build |
| No | Build the model using a GPU compatible image | |
| No | True | Run tests during model build |
| No | True | Validate model deployment during build phase |
| No | Model validation timeout | |
| No | Providing a prebuilt FrogmlModel instance will skip the build phase and use a pre-existing trained model version. |
For example, the below is an example using the advanced features of the Build SDK.
The code snippet using a medium instance to build the model, provide build tags and build a GPU compatible image.
Python
from frogml.sdk.frogml_client.client import FrogMLClient from frogml.sdk.model.tools import run_local from main import TitanicSurvivalPrediction model = TitanicSurvivalPrediction() model.build() # Creating an instance of the Frogml client client = FrogMLClient() client.build_model( model_id='titanic_survival_prediction', main_module_path='main', dependencies_file="requirements.txt", prebuilt_frogml_model=model, tags=['prebuilt', 'local'], instance="medium", gpu_compatible=True )
Unsupported Parameters in Build SDK
The Build SDK support most of the parameters that are supported in the FrogML CLI under frogml models build
The following parameters are not supported:
Parameter | Description |
|---|---|
| JFrog ML environment |
| Receiving only the build id and any exception as return values (Depends on --programmatic in order to avoid UI output) |
| The instance size to automatically deploy the build after completion |
| Automatically deploy build after completion |
| Return the live build logs as JSON |
| Provide a list of parameters to the build |
| Change the name of the |
| Provide a list of environment variables |
| Change the base image of the model build |
| Use or disable docker cache |
| Provide git credentials token |
| The git credentials secret |
| Use a different git branch |