Backfill

JFrog ML Documentation

Products
JFrog ML
Content Type
User Guide

The feature set backfill process enables users to replace data, either entirely or within a specific time interval, ensuring both online and offline data are appropriately updated according to the specific requirements.

Backfill can be triggered via the JFrog platform UI or via CLI command options.

There are three types of backfill:

  • Initial Backfill: Triggered once only on feature set creation if a backfill_spec is defined to fill an empty feature set.

  • Interval Backfill: Use this to replace data within a specific time interval: you can specify a time interval within which data should be replaced in the feature set. Use, for example, for a different transformation, or updated data source etc.

    Data outside backfill boundaries remains unaffected and available at all times.

  • Reset Backfill: Use this to replace all data in the feature set: Users can trigger a backfill process to replace all existing data within a feature set. Use, for example, if the feature set definition changed, or the data source changed, etc.

All the backfill processes support different data sources and transforms; the data sources and transformation methods can vary for different backfill processes.

The backfill command can be run via the UI or using the CLI, as described below.

Backfill via the UI

To run backfill:

  1. In the JFrog platform, navigate to AI/ML > Feature Sets.

  2. Select an existing feature set, and click the three dots button in the top-right corner.

  3. From the drop down menu that displays, select Run backfill.

  4. Enter the details in the backfill window and run the backfill.

Note

You can select different cluster-template sizes for backfill executions. We recommend that for large backfills you select a cluster-template size larger than the size defined for the feature set to handle the increased processing load.

Backfill via a CLI Command

The CLI command to trigger the backfill process is different according to the type of backfill required, as follows:

Initial Backfill

See Creating a Feature set. (Feature Store Quickstart guide)

Interval Backfill

frogml features backfill --start-time <start_time> --stop-time <stop_time> [--cluster-template <cluster_template>] [--comment <comment>] --environment <environment> --feature-set <feature_set_name>

Reset Backfill

frogml features backfill --reset-backfill [--cluster-template <cluster_template>] [--comment <comment>] --environment <environment> --feature-set <feature_set_name>

Note

For reset backfills, you can either use --reset-backfill or --reset.

Command Options Summary:

  • --reset-backfill, --reset: Perform a complete reset of the feature set's data. This option results in the deletion of the current existing data.

  • --start-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S]: The start time from which the feature set's data should be backfilled in UTC. Defaults to the feature set's configured backfill start time.

  • --stop-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S]: The stop time up until which the feature set's data should be backfilled in UTC. Defaults to the current timestamp. If the time provided is in the future, the stop time will be rounded down to the current time.

  • --cluster-template TEXT: Backfill resource configuration, expects a ClusterType size. Optional and defaults to the feature set's resource configuration.

  • --comment TEXT: Optional comment tag line for the backfill job.

  • --environment ENVIRONMENT: JFrog ML environment.

  • --feature-set TEXT or --name TEXT: The name of the feature set for which the backfill process is to be performed. This option is required.