JFrog ML's Feature Store is a centralized service that facilitates the discoverability, reuse and accuracy of features.
It provides a centralized method for developing features using batch or streaming data, and for serving those features instantly or retrieving them as training data. It also allows the discovery and reuse of available features, instead of recreating identical or similar ones.
The Feature Store serves the following main purposes:
Source of truth: A single and discoverable source of truth for features to be used by machine learning models.
Feature collaboration: A mechanism that enables data scientists and machine learning engineers to share features between projects.
Ensures Consistency (Prevents Training/serving skew): Systematically ensures features generated for training (offline) and inference (online) are identical.
Feature Store Concepts
The JFrog ML Feature Store follows three main concepts:
Entity Keys | The specific identifier (for example: | |
Data Sources | The external systems (for example, databases, event streams) from which raw data is ingested to create features. | |
Feature Sets | The operational unit of the Feature Store. This computational definition (schema and logic) takes raw data as input and outputs a logical group of related features. Feature Set Types:
NoteJFrog ML Cloud (SaaS) supports Batch Feature Sets only. To use real-time and streaming features, please opt for JFrog ML hybrid deployments. | |
Feature Consumption
With the JFrog ML Feature Store you can define features once, calculate them once, and reuse them at any time.
JFrog ML's Feature Store systematically ensures that there are no discrepancies between data generated for training and serving. Both the offline and the online stores are populated from the same singular feature extraction process.
Inference: Serve the most up-to-date feature values for a given Key from one centralized location.
Training: Keep a log of all features, and then retrieve them for training, at any point in time.