JFrog ML data sources are used to configure connections to your data. Data sources are used in order to create feature sets.
There are two main types of data sources:
To connect to a data source:
Enable network connectivity between the data sources and JFrog ML cluster if they are not publicly accessible.
Grant JFrog ML access to your data lake components by creating read-only service accounts and/or IAM roles.
Defining Data Sources
Data sources can be defined and registered programmatically via JFrog ML SDK and CLI, or created altogether via the JFrog ML UI.
Via JFrogML SDK/CLI
JFrog ML provides Python classes to define any data source type using the frogml.feature_store.data_sources package.
For example, you can define a CsvSource to read from an S3 based CSV file as follows:
from frogml.feature_store.data_sources import CsvSource
# The S3 anonymous config class is required for public S3 buckets
from frogml.feature_store.data_sources import AnonymousS3Configuration
# Create a CsvSource object to represent a CSV data source
# This example uses a CSV file from a public S3 bucket
csv_source = CsvSource(
name='credit_risk_data', # Name of the data source
description='A dataset of personal credit details', # Description of the data source
date_created_column='date_created', # Column name of the column that holds the creation date
path='s3://qwak-public/example_data/data_credit_risk.csv', # S3 path to the CSV file
filesystem_configuration=AnonymousS3Configuration(), # Configuration for anonymous access to S3
quote_character='"', # Character used for quoting in the CSV file
escape_character='"' # Character used for escaping in the CSV file
)Note
The Data Sources defined with the FrogML SDK are ONLY REGISTERED IN THE CLOUD PLATFORM when the frogml features register command is run for that object.
Via the UI:
Select AI/ML > Data Sources from the JFrog side menu.
Click Create New Data Source.
Select the required data source type from the list.
Fill in the form (mandatory fields are marked with an asterisk).
Test the connection to the data source to verify it is operating (Click Test connection).
Click Save. The data source is created.
Registering Data Sources
To register a data source class defined with the SDK you can use the JFrog ML CLI features command as follows:
frogml features register -p data_source.py
Deleting Data Sources
To delete a data source, execute the following frogml command in the terminal:
frogml features delete --data-source <data-source-name>
Warning
Deleting Data Sources In Use
Before you can delete a data source that is linked to one or more Feature Sets, you must either remove those Feature Sets or reassign them to a different data source.