Streaming deployments enable you to connect Kafka streams with your models for real-time inference. Leveraging this architecture simplifies the process of processing large amounts of distributed data by eliminating the need for complex triggering and scheduling as fresh data arrives.
Streaming Use Cases
Consider using streaming deployments in the following scenarios:
Event-driven applications: Respond to specific events or triggers in real-time.
Real-time decision-making: Process data 'as it arrives' for time-sensitive systems.
Scalable processing: Efficiently handle large data volumes from multiple sources.
Deploying Streaming Models
To use streaming inference on JFrog ML, use the following examples:
Option 1: Deploying via UI
Build a new model.
Click Deploy and choose Streaming.
Choose the instance type and configure the Kafka server and topics.
Note
Make sure you select the number of replicas and instance type to achieve a balance with the number of partitions from your Kafka broker. The partitions will be divided across all the replicas from your deployment.
Click Deploy to make your model live.
Option 2: Deploying via CLI
You can also deploy your streaming model using the command line interface (CLI). Here’s a general command structure:
frogml models deploy stream \ --model-id <model-id> \ --build-id <build-id> \ --instance <instance-type> \ --replicas <number-of-replicas> \ --consumer-bootstrap-server <bootstrap-server> \ --consumer-topic <consumer-topic-name> \ --producer-bootstrap-server <bootstrap-server> \ --producer-topic <producer-topic-name>
Example Command 1 (using the above structure): Deploying a FLAN-T5 streaming model with 2 replicas on 2 A10.XL on-demand GPU instances
# Deploy a FLAN-T5 streaming model on 2 A10.XL on-demand instances
frogml models deploy stream \
--model-id "flan-t5" \
--build-id "0511ffa9-986e-4fd9-ae29-515269af5bac" \
--instance "gpu.a10.xl" \
--replicas 2 \
--purchase-option "on-demand" \ # Use only on demand instances for executors
--consumer-bootstrap-server <bootstrap-server> \
--consumer-topic <consumer-topic-name> \
--producer-bootstrap-server <bootstrap-server> \
--producer-topic <producer-topic-name>Example Command 2 (using the above structure): Deploying a churn streaming model on 2 small instances
# Deploy a churn streaming model on 2 small instances
frogml models deploy stream \
--model-id "stream-churn-model" \
--build-id "d4d56c1a-f326-40a0-88f3-f8e59c691a3e" \
--instance "small" \
--replicas 2 \
--consumer-bootstrap-server "10.0.0.8" \
--consumer-topic "model-input-topic" \
--producer-bootstrap-server "10.0.0.9" \
--producer-topic "model-output-topic"Note
To use the above examples, do not forget to replace the parameter values with your own values.
Option 3: Configuring Streaming Models via YAML
You can also configure your streaming model with a YAML file.
Example
build_id: "d4d56c1a-f326-40a0-88f3-f8e59c691a3e" model_id: "stream-churn-model" resources: instance_size: "small" replicas: 2 stream: consumer_bootstrap_server: "10.0.0.8" consumer_topic: "model-input-topic" producer_bootstrap_server: "10.0.0.9" producer_topic: "model-output-topic"
Then run the deployment command:
frogml models deploy stream --from-file stream_deploy_config.yaml
Mixing Command Line and Configuration File
Additionally, you can mix and match between command line arguments and configuration file. In this case, command line arguments will take precedence and override the values specified in the YAML file.
Example Command
frogml models deploy stream --replicas 5 --from-file stream_deploy_config.yaml
Monitoring Your Streaming Deployment
JFrog ML models run on Kubernetes, where we automatically install advanced production-grade monitoring tools. This setup provides a robust platform for deploying and managing your models efficiently.
Under the Model Overview tab, you can view the following metrics for your deployed build:
Average message throughput: Evaluates how many messages are being processed over a specific period.
Total consumed messages: Tracks the total number of messages your model has consumed over a given minute.
Total produced messages: Measures the total number of output messages.
Consumer lag: Measures the number of messages (requests) that have been produced but remain unconsumed in a consumer group at a given time. High consumer lag indicates that the rate at which messages are being consumed is slower than the rate at which new messages are being generated. This can signal performance issues.
Error rate over time: Monitors any issues encountered processing messages.
Memory, CPU and GPU utilization: Analyzes resource usage to optimize performance and scaling.
Processing lag: . Shows the difference between the number of responses produced and the number of requests consumed by the model runtime in a given minute.
A positive value indicates that more messages are produced than consumed.
0 indicates that more messages are consumed than are produced, or they are equal.
A stable or decreasing chart indicates stable performance.
Configuring Streaming Models Overview
JFrog ML streaming deployments enable granular control through multiple consumer and producer specifications. Below is an overview of the key parameters and configurations you can set to tailor your streaming deployment.
CONFIGURATION | DESCRIPTION |
|---|---|
Consumer Configuration | Settings relating to how your model consumes messages from Kafka. |
Producer Configuration | Settings for how your model produces messages to Kafka. |
Custom Configurations | Use environment variables to customize consumer and producer settings for advanced control. |
Consumer Configuration
PARAMETER | DESCRIPTION |
|---|---|
Consumer Bootstrap Servers (Required) | A list of Kafka bootstrap serverS to consume messages for model inference. |
Consumer Topic* | The Kafka topic from which the model will consume messages (for model inference). |
Consumer Group* | The name of the consumer group the model inference is attached to. |
Group ID | Identifies a group of consumers that belong to the same consumer group. (Each message in a partition is delivered to only one consumer within the same group, enabling load balancing and parallel processing.) |
Auto offset reset | Defines behavior when a consumer first joins a group or cannot find a valid offset. (For example, when the offset does not exist, or has been deleted.) Defines whether the consumer should start reading messages from the earliest available offset ("earliest") or from the latest offset ("latest") in the topic. Default: Earliest |
Consumer timeout | Maximum time a consumer can be inactive before being considered dead or no longer part of the consumer group. If a consumer does not send a heartbeat to Kafka within this timeout period, it may be removed from the group, and its partitions will be reassigned to other consumers. Default: 60000 ms |
Max batch | Maximum number of records the consumer will attempt to fetch in a single poll request to Kafka. This can affect throughput and latency. Default: 1 |
Max polling latency | Maximum time the consumer will wait for new records to be available in a poll request (for new records to be available in the topic). If no new records are available within this time limit, the poll request will return empty. Default: 1000 ms |
Offset Behavior in Redeployments
When you redeploy a streaming model using the same consumer group, the new deployment resumes consumption from the last committed offset for that group.
It does not start consuming messages based on the deployment timestamp. This ensures message continuity—no data is skipped or reprocessed unnecessarily.
If the consumer group does not have a previously committed offset (for example, on the first deployment), the consumer starts from the offset defined by auto.offset.reset:
earliest: consumes all available messages from the beginning of the topic.latest: consumes only new messages that arrive after deployment.
* Required
Producer Configuration
PARAMETER | DESCRIPTION | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Producer Bootstrap servers* | A list of Kafka bootstrap servers to produce model inference outputs. | |||||||||||||||
Producer Topic* | The Kafka topic to which the model will produce messages. | |||||||||||||||
Compression type | Determines how messages are compressed before being stored in Kafka. Default: uncompressed Compressing messages can significantly reduce the amount of data transmitted over the network and storage requirements in Kafka. Options:
|
* Required
Custom Producer and Consumer Configurations
In a streaming inference deployment, you can customize Kafka consumer and producer configurations via environment variables. Use the following prefixes:
kafka.consumerfor consumer attributeskafka.producerfor producer attributes
For example, to modify the bootstrap.servers configuration for a producer, use the environment variable:
kafka.producer.bootstrap.servers
Example Command for SASL Authentication
If your Kafka topics require SASL authentication, you can provide the necessary username and password as environment variables when deploying the streaming model. The following example shows how to configure both the consumer and producer connections with SASL:
frogml models deploy stream --env-vars \ kafka.consumer.sasl.mechanism=PLAIN \ kafka.consumer.security_protocol=SASL_SSL \ kafka.consumer.sasl_plain_username=<username> \ kafka.consumer.sasl_plain_password=<password> kafka.producer.sasl.mechanism=PLAIN \ kafka.producer.security_protocol=SASL_SSL \ kafka.producer.sasl_plain_username=<username> \ kafka.producer.sasl_plain_password=<password>
Note
These environment variables are only required if your Kafka server is configured to use SASL authentication. Replace the placeholders <username> and <password> with your actual credentials.
Summary
Configuring your streaming deployments with an understanding of replicas, partitions, and their parameters will help you achieve efficient and scalable real-time inference for your applications. Tailor these settings to meet your specific needs for optimal performance and reliability.