Feature Store Overview

JFrog ML Documentation

Products
JFrog ML
Content Type
User Guide

JFrog ML's Feature Store is a centralized service that facilitates the discoverability, reuse and accuracy of features.

It provides a centralized method for developing features using batch or streaming data, and for serving those features instantly or retrieving them as training data. It also allows the discovery and reuse of available features, instead of recreating identical or similar ones.

The Feature Store serves the following main purposes:

  • Source of truth: A single and discoverable source of truth for features to be used by machine learning models.

  • Feature collaboration: A mechanism that enables data scientists and machine learning engineers to share features between projects.

  • Ensures Consistency (Prevents Training/serving skew): Systematically ensures features generated for training (offline) and inference (online) are identical.

Feature Store Concepts

The JFrog ML Feature Store follows three main concepts:

Entity Keys

The specific identifier (for example: user_id, transaction_id, merchant_id) for which feature values are calculated and retrieved.

Data Sources

The external systems (for example, databases, event streams) from which raw data is ingested to create features.

Feature Sets

The operational unit of the Feature Store. This computational definition (schema and logic) takes raw data as input and outputs a logical group of related features.

Feature Set Types:

  • Batch Feature Sets: Features defined from static or historical batch data sources (for example, Snowflake, BigQuery).

  • Streaming Features Sets: Features defined from streaming data sources (for example, Kafka) for continuous, near real-time updates.

  • Real Time Feature Sets: Features based on data provided directly at the time of an inference request and are not pre-computed.

Note

JFrog ML Cloud (SaaS) supports Batch Feature Sets only. To use real-time and streaming features, please opt for JFrog ML hybrid deployments.

Feature Consumption

With the JFrog ML Feature Store you can define features once, calculate them once, and reuse them at any time. 

JFrog ML's Feature Store systematically ensures that there are no discrepancies between data generated for training and serving. Both the offline and the online stores are populated from the same singular feature extraction process.

  • Inference: Serve the most up-to-date feature values for a given Key from one centralized location.

  • Training: Keep a log of all features, and then retrieve them for training, at any point in time.

feature-store-materialization.png